* Work around Sequoia's "Local Network" permission with a helper process
* README.md: macOS 15 (Sequoia) warning
* Make "orchard dev" unix-specific too, otherwise Release fails
* Fix typo in "localNetworkHerlper"
* Slightly improve the macOS 15 (Sequoia) note
* orchard worker run: better documentation for --user
* Make sure privilege dropping is the first step we do in runWorker()
* support enable tls flag
* modify tls enable control flag
Co-authored-by: Nikolay Edigaryev <edigaryev@gmail.com>
* Optimize message print
* Avoid unrelated changes to the bootstrap message
* Consistent command-line argument order
* Extra spacing
* No need to shadow controllerCert
---------
Co-authored-by: Nikolay Edigaryev <edigaryev@gmail.com>
* Always randomize MAC address
* Worker: check DHCP lease time and print a warning if it's unconfigured
* Further improve the explanation
* Add two leases example to the explanation
* Add an example of the resulting /var/db/dhcpd_leases
* Startup script: implement retries for connection-related operations
* assert.Equal → assert.Contains
* Wait for at least 1,000 lines of logs
* Join slice of strings before calling assert.Contains()
* TestHostDirs: use require.Contains() instead of require.EqualValues()
* TestHostDirs: wait for at least 4 log lines
* Small timout for worker notification
It seems at the moment if a worker re-establishes notify stream (for example, if network flips or proxy breaks the connection) then we can see "no worker registered with this name" errors.
This change makes Notifier to wait for 30 seconds before failing, at the time of calling `Notifier#Notify` we know such worker exists.
PS not sure if we need to make the timeout configurable.
* Wait via context
* Make sure all `context`s for `Notify` is time bounded
* Lint issues
* Allow creating VMs with implicit CPU and memory
* Clarify why cpu/memory can be 0 a bit better
* Controller(API): don't forget to update DefaultCPU and DefaultMemory
* Add an integration test for implicit CPU and memory
* Introduce WebSocket-based RPC v2
* go test: add -ldflags="-B gobuildid"
* No need to change the "controller.workerNotifier.Notify()" error message
* No need to modify Protocol Buffers/gRPC generated code
* rpcWatch(): explain that connection shouldn't be normally be closed
* Avoid "port forwarding failed: " repetition in error messages
* Improve comments and avoid repetition in IP resolution errors
* API: only overwrite specific worker fields when worker already exists
* Don't forget to return when creating new worker
* Return updated worker when updating the worker
* POST /v1/workers: do not list workers in a single update txn
* schedulingLoopIteration(): do not list workers in a single update txn
* .golangci.yml: remove mentions of fully deprecated linters
* Schedule opportunistically and more granularly
To avoid transaction conflicts.
* Measure scheduling loop iteration duration and log it at debugging level
* Use "continue NextWorker" instead of just "continue" for clarity
* Controller: emit lifecycle events when the VM gets restarted or deleted
* vm_{scheduling,run}_time → vm_{scheduling,run}_duration for clarity
* Update VM endpoint: only update VM started time when zero
* proxy.Connections(): require io.ReadWriteCloser instead of net.Conn
* Orchard Controller: implement an SSH server that acts as a jump host
* Issue a warning if the name used will be invalid in the future
* Further restrict uppercase characters in names in the future
The rationale is similar to https://github.com/kubernetes/kubernetes/issues/71140.
We won't want to munge the user's input and introduce subtle bugs doing
lowercase comparisons.
* Client: prevent double slashes at the end of URLs
* orchard context create: let the user know which association flow is used
* Client: rename parsePath() to formatPath()
* Client: grab the ServerName from the trusted certificate
* Always Close() the Worker instance
* orchard list vms: show assigned worker for each of the VMs
* Stop the failed VMs before we schedule new VMs
To avoid violating resource constraints.
* syncOnDiskVMs: don't ignore running VMs
* Worker: show correct remote and local VM counts
* Switch from golang.org/x/net/websocket to nhooyr.io/websocket
* Do not attach errors that we can handle to the Gin's context
* Add missing newline to "no credentials specified or found, ..." message
* Fix potential NPE in ChooseUsernameAndPassword()
* Fix type in PortForward() error message in "orchard ssh vm"
* Fix potential NPE in Connections()
* Use header.Set() for consistency's sake for Authorization header
* Fix typo when passing arguments to tls.LoadX509KeyPair()
* Support TLS 1.2 too
* Do not require a controller to only present a single certificate
* No need to set ServerName since we use InsecureSkipVerify
* Use host's root CA set by default and support normal SNI scenarios
* Change event prefix to preserve order under load
When there are a lot of events streamed from a worker, it's possible to have two batches coming for the same timestamp (which is a timestamp of the event on the worker). This way the existing logic would mess up the order because `index` and the random number doesn't guarantee the order.
To fix this I've changed the format of the prefix for the event to include tro things:
1. Timestamp in nanoseconds of the injection time on the controller so two sequential batches will have guaranteed order unless they are processed within a nanosecond.
2. Made the `index` being fixed length with trailing zeros, so they are properly lexicographically sorted (`000001`, `000002`, ...).
* No need to disable linting
* Implement restart policy for VMs
* Do not update VM.Resource, we only use it as a read-only specification
* Err()/setErr(): use atomic.Pointer instead of sync.Mutex
* Controller API: introduce controller's information endpoint
* Prevent generation of empty events after channel closure
* Allow events to be buffered in the events channel
* Controller API: introduce controller's information endpoint[1]
* IntegrationGuide.md: a couple of Python and Golang examples
* Rephrase a sentence
Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
---------
Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
* Fail VMs if the worker had crashed/is unhealthy
* OnDiskName: properly handle cases when VM's name contains hyphens
* Worker: introduce Offline() method and check it before scheduling
* tart.List(): use Tart's JSON output
* OnDiskName: remove empty parts check
* Scheduler: move health-checking logic to a separate function
* Only fail "running" VMs
* Only fail orphaned VMs if they're in terminal state
* Integration tests
* Run healthCheckingLoopIteration() before schedulingLoopIteration()
* Worker: sync on-disk VMs only once at start
Before we had two main loops: controller loop to assign VMs and worker loop to start VMs. Each of the loops was performed upon an interval every N seconds.
This change introduces a mechanism for reactively requesting loop execution:
1. Controller loop will be executed upon VM creation to try to immediately schedule.
2. A worker will be notified upon a VM assigment and worker loop will be requested to sync immediately.
Fixes#31
* Resources support
* Ability to provide VM and worker resources via the CLI
* orchard dev: always listen on :6120
* orchard dev: support --resources
* REST API: provide resource defaults when creating VM
* OpenAPI: document "resources" field
* orchard dev: serve Swagger API documentation on /v1/
* Integration guide
* Make sure we list names/VM names as required argument in the --help
* Introduce "orchard logs vm" command
* Make sure each command has a Short field
* Future-proof port-forward, ssh and vnc commands
To support not only the VM resource.
* Simplified bootstrapping of a cluster
Introduced a new convention about a pre-defined `bootstrap-admin` account for `orchard controller run`. Providing `ORCHARD_BOOTSTRAP_ADMIN_TOKEN` will auto-create such user for easier configuration. `bootstrap-admin` can be used for creating other service accounts on the first run and after that can be disposed.
Also change `orchard worker run` to expect controller URL as the only parameter and a bootstrap token passed via an argument instead of using a context that might not be created.
* Missing error check
* proxy.Connections(): handle "use of closed network connection" error
* Controller: less strict timeouts that work nicely for WebSockets
* Worker: only attempt connect to the gRPC once our UID is known
* Introduce "orchard ssh" and "orchard vnc" commands
* Worker: prevent context leak by moving logic into a separate function
* Fix linter errors
* Port forwarding integration test
* Check for "uname -mo" output
* Port forwarding support
* .golangci.yml: remove and replace deprecated and archived linters
* Client: pass credentials when calling WebSocket API methods
* API: require ServiceAccountRoleComputeWrite role for port forwarding
* Use Buf
* Rename Poll() RPC method to Watch()
* Split Rendezvous into two parts: Watcher and Proxy (#32)
* Split Rendezvous into two parts: Watcher and Proxy
* Implement Proxy cancellation
* Use Protocol Buffers structure directly in Watcher
* Fix TestWatcher after switching to Protocol Buffers structure
* portForwardVM(): ensure we also check for gin's context