diff --git a/src/driver/controller-proxy/compatibility.md b/src/driver/controller-proxy/compatibility.md new file mode 100644 index 0000000..6e29227 --- /dev/null +++ b/src/driver/controller-proxy/compatibility.md @@ -0,0 +1,108 @@ + +# Proxy driver compatibility + +There are 2 challenges with the proxy driver: + +1. Proxy needs dynamic state. CSI spec implies that dynamic state must be external, +which isn't ideal for small deployments, and is incompatible with democratic-csi. + +2. Proxy must provide a common set of capabilities for all drivers it represents. + +A great discussion of difficulties per storage class state can be found here: +- https://github.com/container-storage-interface/spec/issues/370 + +## Terminology and structure + +"Proxy" is the driver created via `driver: proxy` in the main config. +Other drivers are referred to as "real driver" and "underlying driver". + +"Connection" is a way to distinguish real drivers in proxy driver calls + +- Connection name is set in storage class parameters +- Connection name is stored in volume handle +- Connection name is used as part of config file path + +All config files must be mounted into democratic-csi filesystem. +They can be added, updated and removed dynamically. + +## CSI features + +Generally most features are supported. + +However, some calls will not work: + +- `ListVolumes`: storage class context is missing +- - https://github.com/container-storage-interface/spec/issues/461 +- `ListSnapshots`: TODO: can be implemented. Would require adding snapshot secret + +`NodeGetInfo` works but it brings additional challenges. +Node info is common for all storage classes. +If different drivers need different output in `NodeGetInfo`, they can't coexist. +See [node info support notes](./nodeInfo.md) + +## Driver compatibility + +Proxy driver has the following minimal requirements for real underlying drivers: + +- Node methods should not use config values +- - This can be lifted +- - This is added because drivers use only volume context for mounting, and sometimes secrets +- - - There is one exception to this rule, and I would argue that that driver is just broken +- Driver should not need any exotic capabilities, since capabilities are shared +- Driver should use `CreateVolume`, so that proxy can set proper `volume_id` +- Controller publishing is not supported, see [Controller publish support](#controller-publish-support) + +Proxy advertises that it supports most CSI methods. +If some methods are missing from underlying driver, +proxy will throw `INVALID_ARGUMENT` error. +Some methods are expected to be missing from some of the underlying drivers. In such cases proxy returns default value: + +- `GetCapacity` returns infinite capacity when underlying driver does not report capacity + +## Volume ID format + +- `volume_id` format: `v:connection-name/original-handle` +- `snapshot_id` format: `s:connection-name/original-handle` + +Where: + +- `v`, `s` - fixed prefix +- - Allows to check that volume ID was created using proxy driver +- `connection-name` - identifies connection for all CSI calls +- `original-handle` - `volume_id` handle created by the underlying driver + +## Controller publish support + +`ControllerPublishVolume` is not implemented because currently no driver need this. +Implementation would need to replace `node_id` just like other methods replace `volume_id`. + +See [node info support notes](./nodeInfo.md) + +## Incompatible drivers + +- `zfs-local-ephemeral-inline`: proxy can't set volume_id in `CreateVolume` to identify underlying driver +- - are inline-ephemeral and standard drivers even compatible? +- `objectivefs`: `NodeStageVolume` uses driver parameters +- - `NodeStageVolume` needs `this.options` in `getDefaultObjectiveFSInstance` +- - Other node methods don't need driver options +- - Possible fix: add support for config values for node methods +- - Possible fix: add public pool data into volume attributes, move private data (if any) into a secret + +## Volume cloning and snapshots + +Cloning works without any adjustments when both volumes use the same connection. +If the connection is different: +- TODO: Same driver, same server +- - It's up to driver to add support +- - Support is easy: just need to get proper source location in the CreateVolume +- TODO: Same driver, different servers +- - It's up to driver to add support +- - Example: zfs send-receive +- - Example: file copy between nfs servers +- Different drivers: block <-> file: unlikely to be practical +- - Users should probably do such things manually, by mounting both volumes into a pod +- Different drivers: same filesystem type +- - Drivers should implement generic export and import functions +- - For example: TrueNas -> generic-zfs can theoretically be possible via zfs send +- - For example: nfs -> nfs can theoretically be possible via file copy +- - How to coordinate different drivers? diff --git a/src/driver/controller-proxy/index.js b/src/driver/controller-proxy/index.js new file mode 100644 index 0000000..d9177da --- /dev/null +++ b/src/driver/controller-proxy/index.js @@ -0,0 +1,404 @@ +const _ = require("lodash"); +const semver = require("semver"); +const { CsiBaseDriver } = require("../index"); +const yaml = require("js-yaml"); +const fs = require('fs'); +const { Registry } = require("../../utils/registry"); +const { GrpcError, grpc } = require("../../utils/grpc"); +const path = require('path'); + +const volumeIdPrefix = 'v:'; +const snapshotIdPrefix = 's:'; +const NODE_TOPOLOGY_KEY_NAME = "org.democratic-csi.topology/node"; + +class CsiProxyDriver extends CsiBaseDriver { + constructor(ctx, options) { + super(...arguments); + this.options.proxy.configFolder = path.normalize(this.options.proxy.configFolder); + if (this.options.proxy.configFolder.slice(-1) == '/') { + this.options.proxy.configFolder = this.options.proxy.configFolder.slice(0, -1); + } + + // corresponding storage class could be deleted without notice + // let's delete entry from cache after 1 hour, so it can be cleaned by GC + // one hour seems long enough to avoid recreating frequently used drivers + // creating a new instance after long inactive period shouldn't be a problem + const oneMinuteInMs = 1000 * 60; + this.enableCacheTimeout = this.options.proxy.cacheTimeoutMinutes != -1; + this.cacheTimeout = (this.options.proxy.cacheTimeoutMinutes ?? 60) * oneMinuteInMs; + if (!this.enableCacheTimeout) { + this.ctx.logger.info("driver cache is permanent"); + } else { + this.ctx.logger.info(`driver cache timeout is ${this.options.proxy.cacheTimeoutMinutes} minutes`); + } + + options = options || {}; + options.service = options.service || {}; + options.service.identity = options.service.identity || {}; + options.service.controller = options.service.controller || {}; + options.service.node = options.service.node || {}; + + options.service.identity.capabilities = + options.service.identity.capabilities || {}; + + options.service.controller.capabilities = + options.service.controller.capabilities || {}; + + options.service.node.capabilities = options.service.node.capabilities || {}; + + if (!("service" in options.service.identity.capabilities)) { + this.ctx.logger.debug("setting default identity service caps"); + + options.service.identity.capabilities.service = [ + //"UNKNOWN", + "CONTROLLER_SERVICE", + "VOLUME_ACCESSIBILITY_CONSTRAINTS", + ]; + } + + if (!("volume_expansion" in options.service.identity.capabilities)) { + this.ctx.logger.debug("setting default identity volume_expansion caps"); + + options.service.identity.capabilities.volume_expansion = [ + //"UNKNOWN", + "ONLINE", + //"OFFLINE" + ]; + } + + if (!("rpc" in options.service.controller.capabilities)) { + this.ctx.logger.debug("setting default controller caps"); + + options.service.controller.capabilities.rpc = [ + //"UNKNOWN", + "CREATE_DELETE_VOLUME", + //"PUBLISH_UNPUBLISH_VOLUME", + //"LIST_VOLUMES_PUBLISHED_NODES", + // "LIST_VOLUMES", + "GET_CAPACITY", + "CREATE_DELETE_SNAPSHOT", + // "LIST_SNAPSHOTS", + "CLONE_VOLUME", + //"PUBLISH_READONLY", + "EXPAND_VOLUME", + ]; + + if (semver.satisfies(this.ctx.csiVersion, ">=1.3.0")) { + options.service.controller.capabilities.rpc.push( + //"VOLUME_CONDITION", + // "GET_VOLUME" + ); + } + + if (semver.satisfies(this.ctx.csiVersion, ">=1.5.0")) { + options.service.controller.capabilities.rpc.push( + "SINGLE_NODE_MULTI_WRITER" + ); + } + } + + if (!("rpc" in options.service.node.capabilities)) { + this.ctx.logger.debug("setting default node caps"); + options.service.node.capabilities.rpc = [ + //"UNKNOWN", + "STAGE_UNSTAGE_VOLUME", + "GET_VOLUME_STATS", + "EXPAND_VOLUME", + //"VOLUME_CONDITION", + ]; + + if (semver.satisfies(this.ctx.csiVersion, ">=1.3.0")) { + //options.service.node.capabilities.rpc.push("VOLUME_CONDITION"); + } + + if (semver.satisfies(this.ctx.csiVersion, ">=1.5.0")) { + options.service.node.capabilities.rpc.push("SINGLE_NODE_MULTI_WRITER"); + /** + * This is for volumes that support a mount time gid such as smb or fat + */ + //options.service.node.capabilities.rpc.push("VOLUME_MOUNT_GROUP"); // in k8s is sent in as the security context fsgroup + } + } + } + + parseVolumeHandle(handle, prefix = volumeIdPrefix) { + if (!handle.startsWith(prefix)) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `invalid volume handle: ${handle}: expected prefix ${prefix}` + ); + } + handle = handle.substring(prefix.length); + return { + connectionName: handle.substring(0, handle.indexOf('/')), + realHandle: handle.substring(handle.indexOf('/') + 1), + }; + } + + decorateVolumeHandle(connectionName, handle, prefix = volumeIdPrefix) { + return prefix + connectionName + '/' + handle; + } + + // returns real driver object + // internally drivers are cached and deleted on timeout + lookUpConnection(connectionName) { + const configFolder = this.options.proxy.configFolder; + const configPath = configFolder + '/' + connectionName + '.yaml'; + + if (this.timeout == 0) { + // when timeout is 0, force creating a new driver on each request + return this.createDriverFromFile(configPath); + } + + const driverPlaceholder = { + connectionName: connectionName, + fileTime: 0, + driver: null, + }; + const cachedDriver = this.ctx.registry.get(`controller:driver/connection=${connectionName}`, driverPlaceholder); + if (cachedDriver.timer !== null) { + clearTimeout(cachedDriver.timer); + cachedDriver.timer = null; + } + if (this.enableCacheTimeout) { + cachedDriver.timer = setTimeout(() => { + this.ctx.logger.info("removing inactive connection: %s", connectionName); + this.ctx.registry.delete(`controller:driver/connection=${connectionName}`); + cachedDriver.timer = null; + }, this.timeout); + } + + const fileTime = this.getFileTime(configPath); + if (cachedDriver.fileTime != fileTime) { + this.ctx.logger.debug("connection version is old: file time %d != %d", cachedDriver.fileTime, fileTime); + cachedDriver.fileTime = fileTime; + this.ctx.logger.info("creating a new connection: %s", connectionName); + cachedDriver.driver = this.createDriverFromFile(configPath); + } + return cachedDriver.driver; + } + + getFileTime(path) { + try { + const configFileStats = fs.statSync(path); + this.ctx.logger.debug("file time for '%s' is: %d", path, configFileStats.mtime); + return configFileStats.mtime.getTime(); + } catch (e) { + this.ctx.logger.error("fs.statSync failed: %s", e.toString()); + throw e; + } + } + + createDriverFromFile(configPath) { + const fileOptions = this.createOptionsFromFile(configPath); + const mergedOptions = structuredClone(this.options); + _.merge(mergedOptions, fileOptions); + return this.createRealDriver(mergedOptions); + } + + createOptionsFromFile(configPath) { + this.ctx.logger.debug("loading config: %s", configPath); + try { + return yaml.load(fs.readFileSync(configPath, "utf8")); + } catch (e) { + this.ctx.logger.error("failed parsing config file: %s", e.toString()); + throw e; + } + } + + validateDriverType(driver) { + const unsupportedDrivers = [ + "zfs-local-", + "local-hostpath", + "objectivefs", + "proxy", + ]; + for (const prefix in unsupportedDrivers) { + if (driver.startsWith(prefix)) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `proxy is not supported for driver: ${mergedOptions.driver}` + ); + } + } + } + + createRealDriver(options) { + this.validateDriverType(options.driver); + const realContext = Object.assign({}, this.ctx); + realContext.registry = new Registry(); + const realDriver = this.ctx.factory(realContext, options); + if (realDriver.constructor.name == this.constructor.name) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `cyclic dependency: proxy on proxy` + ); + } + this.ctx.logger.debug("using driver %s", realDriver.constructor.name); + return realDriver; + } + + async checkAndRun(driver, methodName, call, defaultValue) { + if(typeof driver[methodName] !== 'function') { + if (defaultValue) return defaultValue; + // UNIMPLEMENTED could possibly confuse CSI CO into thinking + // that driver does not support methodName at all. + // INVALID_ARGUMENT should allow CO to use methodName with other storage classes. + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `underlying driver does not support ` + methodName + ); + } + return await driver[methodName](call); + } + + async controllerRunWrapper(methodName, call, defaultValue) { + const volumeHandle = this.parseVolumeHandle(call.request.volume_id); + const driver = this.lookUpConnection(volumeHandle.connectionName); + call.request.volume_id = volumeHandle.realHandle; + return await this.checkAndRun(driver, methodName, call, defaultValue); + } + + // =========================================== + // Controller methods below + // =========================================== + + async GetCapacity(call) { + const parameters = call.request.parameters; + if (!parameters.connection) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `connection missing from parameters` + ); + } + const connectionName = parameters.connection; + const driver = this.lookUpConnection(connectionName); + return await this.checkAndRun(driver, 'GetCapacity', call, { + available_capacity: Number.MAX_SAFE_INTEGER, + }); + } + + async CreateVolume(call) { + const parameters = call.request.parameters; + if (!parameters.connection) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `connection missing from parameters` + ); + } + const connectionName = parameters.connection; + const driver = this.lookUpConnection(connectionName); + + switch (call.request.volume_content_source?.type) { + case "snapshot": { + const snapshotHandle = this.parseVolumeHandle(call.request.volume_content_source.snapshot.snapshot_id, snapshotIdPrefix); + if (snapshotHandle.connectionName != connectionName) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `can not inflate snapshot from a different connection` + ); + } + call.request.volume_content_source.snapshot.snapshot_id = snapshotHandle.realHandle; + break; + } + case "volume": { + const volumeHandle = this.parseVolumeHandle(call.request.volume_content_source.volume.volume_id); + if (volumeHandle.connectionName != connectionName) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `can not clone volume from a different connection` + ); + } + call.request.volume_content_source.volume.volume_id = volumeHandle.realHandle; + break; + } + case undefined: + case null: + break; + default: + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `unknown volume_content_source type: ${call.request.volume_content_source.type}` + ); + } + const result = await this.checkAndRun(driver, 'CreateVolume', call); + this.ctx.logger.debug("CreateVolume result " + result); + result.volume.volume_id = this.decorateVolumeHandle(connectionName, result.volume.volume_id); + return result; + } + + async DeleteVolume(call) { + return await this.controllerRunWrapper('DeleteVolume', call); + } + + async ControllerGetVolume(call) { + return await this.controllerRunWrapper('ControllerGetVolume', call); + } + + async ControllerExpandVolume(call) { + return await this.controllerRunWrapper('ControllerExpandVolume', call); + } + + async CreateSnapshot(call) { + const volumeHandle = this.parseVolumeHandle(call.request.source_volume_id); + const driver = this.lookUpConnection(volumeHandle.connectionName); + call.request.source_volume_id = volumeHandle.realHandle; + const result = await this.checkAndRun(driver, 'CreateSnapshot', call); + result.snapshot.source_volume_id = this.decorateVolumeHandle(connectionName, result.snapshot.source_volume_id); + result.snapshot.snapshot_id = this.decorateVolumeHandle(connectionName, result.snapshot.snapshot_id, snapshotIdPrefix); + return result; + } + + async DeleteSnapshot(call) { + const volumeHandle = this.parseVolumeHandle(call.request.snapshot_id, snapshotIdPrefix); + const driver = this.lookUpConnection(volumeHandle.connectionName); + call.request.snapshot_id = volumeHandle.realHandle; + return await this.checkAndRun(driver, 'DeleteSnapshot', call); + } + + async ValidateVolumeCapabilities(call) { + return await this.controllerRunWrapper('ValidateVolumeCapabilities', call); + } + + // =========================================== + // Node methods below + // =========================================== + // + // Theoretically, controller setup with config files could be replicated in node deployment, + // and node could create proper drivers for each call. + // But it doesn't seem like node would benefit from this. + // - CsiBaseDriver.NodeStageVolume calls this.assertCapabilities which should be run in the real driver + // but no driver-specific functions or options are used. + // So we can just create an empty driver with default options + // - Other Node* methods don't use anything driver specific + + lookUpNodeDriver(call) { + const driverType = call.request.volume_context.provisioner_driver; + return this.ctx.registry.get(`node:driver/${driverType}`, () => { + const driverOptions = structuredClone(this.options); + driverOptions.driver = driverType; + return this.createRealDriver(driverOptions); + }); + } + + async NodeStageVolume(call) { + const driver = this.lookUpNodeDriver(call); + return await this.checkAndRun(driver, 'NodeStageVolume', call); + } + + async NodeGetInfo(call) { + const nodeName = process.env.CSI_NODE_ID || os.hostname(); + const result = { + node_id: nodeName, + max_volumes_per_node: 0, + }; + result.accessible_topology = { + segments: { + [NODE_TOPOLOGY_KEY_NAME]: nodeName, + }, + }; + return result; + } +} + +module.exports.CsiProxyDriver = CsiProxyDriver; diff --git a/src/driver/controller-proxy/nodeInfo.md b/src/driver/controller-proxy/nodeInfo.md new file mode 100644 index 0000000..5fb5ef8 --- /dev/null +++ b/src/driver/controller-proxy/nodeInfo.md @@ -0,0 +1,120 @@ + +# Node info + +Node info is common for all storage classes. +Proxy driver must report some values that are compatible with all real drivers. + +There are 2 important values: +- topology +- node ID + +There are only 2 types of topology in democratic-csi: +topology without constraints and node-local volumes. +It's easy to account for with proxy settings. + +Node ID is a bit harder to solve, but this page suggests a solution. +Also, currently no real driver actually needs `node_id` to work, +so all of this is mostly a proof-of-concept. +A proof that you can create a functional proxy driver even with current CSI spec. + +We can replace `node_id` with fixed value, just like we do with `volume_id` field, +before calling the actual real driver method. + +Node ID docs are not a part of user documentation because currently this is very theoretical. +Current implementation works fine but doesn't do anything useful for users. + +# Node info: config example + +```yaml +# configured in root proxy config +proxy: + nodeId: + parts: + # when value is true, corresponding node info is included into node_id, + # and can be accessed by proxy driver in controller + # it allows you to cut info from node_id to make it shorter + nodeName: true + hostname: false + iqn: false + nqn: false + # prefix allows you to save shorter values into node_id, so it can fit more than one value + # on node prefix is replaced with short name, on controller the reverse [can] happen + nqnPrefix: + - shortName: '1' + prefix: 'nqn.2000-01.com.example.nvmeof:' + - shortName: '2' + prefix: 'nqn.2014-08.org.nvmexpress:uuid:' + iqnPrefix: + - shortName: '1' + prefix: 'iqn.2000-01.com.example:' + nodeTopology: + # 'cluster': all nodes have the same value + # 'node': each node will get its own topology group + type: cluster +``` + +```yaml +# add to each _real_ driver config +proxy: + perDriver: + # allowed values: nodeName, hostname, iqn, nqn + # proxy will use this to decide how to fill node_id for current driver + nodeIdType: nodeName +``` + +# Reasoning why such complex node_id is required + +`node_name + iqn + nqn` can be very long. + +Each of these values can theoretically exceed 200 symbols in length. +It's unreasonable to expect users to always use short values. + +But it's reasonable to expect that IQNs and NQNs in the cluster will have only a few patterns. +Many clusters likely only use one pattern with only a short variable suffix. +Even if not all nodes follow the same pattern, the amount of patterns is limited. + +Saving short suffix allows you to fit all identifiers into node_id without dynamic state. + +Values example: + +- node name: `node-name.cluster-name.customer-name.suffix` +- iqn: `iqn.2000-01.com.example:qwerty1234` +- nqn: `nqn.2014-08.org.nvmexpress:uuid:68f1d462-633b-4085-a634-899b88e5af74` +- node_id: `n=node-name.cluster-name.customer-name.suffix/i1=qwerty1234/v2=68f1d462-633b-4085-a634-899b88e5af74` +- - Note: even with kinda long node name and default debian IQN and NQN values this still comfortably fits into `node_id` length limit of 256 chars. +- - Maybe we could add prefix and suffix mechanism for node name if very long node name is an issue in real production clusters. + I'm not too familiar with managed k8s node name practices. + +For example, if driver needs iqn, proxy will find field in node id starting with `i`, +search `proxy.nodeId.iqnPrefix` for entry with `shortName = 1`, and then set `node_id` to +`proxy.nodeId.iqnPrefix[name=1].prefix` + `qwerty` + +## Alternatives to prefixes + +Each driver can override `node_id` based on node name. + +Each driver can use template for `node_id` based on node name and/or hostname. + +Config example: + +```yaml +# add to each _real_ driver config +proxy: + perDriver: + # local means that this driver uses node ID template instead of using values from NodeGetInfo + # Individual nodes can use nodeIdMap instead of template. + # Possibly, even all nodes could use nodeIdMap. + nodeIdType: local + nodeIdMap: + - nodeName: node1 + value: nqn.2000-01.com.example:qwerty + - nodeName: node2 + value: nqn.2000-01.com.example:node2 + nodeIdTemplate: iqn.2000-01.com.example:{{ hostname }}:{{ nodeName }}-suffix +``` + +The obvious disadvantage is that it requires a lot more configuration from the user. +Still, if this were to be useful for some reason, this is fully compatible with the current `node_id` format in proxy. + +Theoretically, more info can be extracted from node to be used in `nodeIdTemplate`, +provided the info is short enough to fit into `node_id` length limit. diff --git a/src/driver/factory.js b/src/driver/factory.js index 0235758..1822af0 100644 --- a/src/driver/factory.js +++ b/src/driver/factory.js @@ -15,9 +15,13 @@ const { ControllerLustreClientDriver } = require("./controller-lustre-client"); const { ControllerObjectiveFSDriver } = require("./controller-objectivefs"); const { ControllerSynologyDriver } = require("./controller-synology"); const { NodeManualDriver } = require("./node-manual"); +const { CsiProxyDriver } = require("./controller-proxy"); function factory(ctx, options) { + ctx.factory = factory; switch (options.driver) { + case "proxy": + return new CsiProxyDriver(ctx, options); case "freenas-nfs": case "freenas-smb": case "freenas-iscsi":