From bb0e3c341b04430053d92e5de8d90d5e9f11a7f8 Mon Sep 17 00:00:00 2001 From: Danil Uzlov <36223296+d-uzlov@users.noreply.github.com> Date: Wed, 26 Mar 2025 08:34:37 +0000 Subject: [PATCH] proxy: add topology support --- docs/proxy-topology.md | 176 ++++++++++++++++++ examples/proxy.yaml | 39 +++- src/driver/controller-proxy/index.js | 225 +++++++++++++++++++++++- src/driver/controller-proxy/nodeInfo.md | 27 ++- 4 files changed, 447 insertions(+), 20 deletions(-) create mode 100644 docs/proxy-topology.md diff --git a/docs/proxy-topology.md b/docs/proxy-topology.md new file mode 100644 index 0000000..cf37111 --- /dev/null +++ b/docs/proxy-topology.md @@ -0,0 +1,176 @@ + +# Proxy driver with topology support + +Proxy driver can support storage connections +that aren't accessible from every node. +You can specify, that connection C1 is only accessible from zone Z1, for example. + +See here for general proxy setup: [proxy-driver.md](./proxy-driver.md) + +# Topology support in Helm values + +In addition to general proxy values you need to add extra args for `externalProvisioner`: + +```yaml +csiDriver: + name: org.democratic-csi.proxy-topology +controller: + extraVolumes: + - name: connections + secret: + secretName: connections + driver: + extraVolumeMounts: + - name: connections + mountPath: /mnt/connections + externalProvisioner: + extraArgs: + - --feature-gates=Topology=true + # strict-topology and immediate-topology can be altered, + # see below in storage class description or in this link + # https://github.com/kubernetes-csi/external-provisioner#topology-support + - --strict-topology=true + - --immediate-topology=false +``` + +# Topology support in storage connection + +Add the following proxy-specific part into your connection config: + +```yaml +# add to each _real_ driver config +proxy: + perDriver: + topology: + # keys must correspond to proxy.nodeTopology.fromRegexp[*].topologyKey + # values must correspond to values reported by nodes + - requirements: + # use short keys, proxy will automatically add a prefix + region: region1 + zone: zone1 + # you can add custom node affinity labels + # they will be added on top of node requirements + # specify full labels here + extra: + custom.prefix/custom.name: custom.value +``` + +Config specified above will do the following: +- If PVC is created with zone requirements, proxy will check them against `proxy.perDriver.topology.requirements` before creating volume +- Volumes created for this connection will get node affinity for labels: +- - `prefix/region: region1` +- - `prefix/zone: zone1` +- - `custom.prefix/custom.name: custom.value` +- Pods consuming this volume will be schedulable only on nodes having all of these labels + +# Topology support in nodes + +Proxy driver needs to be able to report supported topology zones on each node. + +Add `proxy.nodeTopology` to your proxy config file to configure topology. +Check corresponding example section for available options: [proxy.yaml](../examples/proxy.yaml). + +Driver reports node topology based on the list of rules in the config. + +If some rule does not match the input, the rule is ignored. +So, if needed, you can use rules that are only valid on certain nodes. + +Ideas for writing rules: + +- Encode zone name in the node name +- Wait for k8s DownwardAPI for node labels +- - Should be alpha in k8s v1.33: https://github.com/kubernetes/kubernetes/issues/40610 +- Inject node labels into environment variables via a webhook: https://kyverno.io/policies/other/mutate-pod-binding/mutate-pod-binding/ +- Deploy a separate node DaemonSet for each zone, with zone in an environment variable +- Configure each node: place zone info into a file on host + +# Topology support in storage class + +Topology of the node is decided during volume creation. +K8s (or another container orchestration tool) sets requirements, +and driver must decide how to satisfy them or decline the request. + +In k8s there are 3 main ways to set requirements. +They are described in more details and with alternative options here: +https://github.com/kubernetes-csi/external-provisioner#topology-support + +1. No requirements. Topology matching during volume creation is disabled. + +- Volume creation will never fail. +- NodeAffinity for volume is based on connection config only. +- Pod affinity requirements are ignored. + +Deployment requirements: +- Requires `--immediate-topology=false`. +- `--strict-topology` does not matter +- Requires `volumeBindingMode: Immediate`. + +Storage class example: + +```yaml +--- +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: s1 +provisioner: org.democratic-csi.proxy-topology +volumeBindingMode: Immediate +parameters: + connection: c1 +``` + +2. Topology matching is based on storage class config. + +- Requirements are based ONLY on Storage Class. +- Volume creation will fail if: storage class parameters do not match connection config parameters. +- Pod affinity requirements are ignored. + +Deployment requirements: +- Requires `--strict-topology=false`. +- Requires `allowedTopologies` to be present. +- `--immediate-topology` does not matter + +Storage class example: + +```yaml +--- +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: s1 +provisioner: org.democratic-csi.proxy-topology +# volumeBindingMode can be either Immediate or WaitForFirstConsumer +volumeBindingMode: Immediate +parameters: + connection: c1 +allowedTopologies: +- matchLabelExpressions: + - key: org.democratic-csi.topology/zone + values: + - zone1 +``` + +3. Topology matching is based on pod scheduling: + +- Requirements are based ONLY on the first consumer pod. +- Volume is allocated in the zone that the first pod is scheduled to +- Volume creation will fail if: pod node does not match connection config. + +Deployment requirements: +- Requires `--strict-topology=true`. +- Requires `volumeBindingMode: WaitForFirstConsumer`. +- `--immediate-topology` does not matter + +Storage class example: + +```yaml +--- +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: s1 +provisioner: org.democratic-csi.proxy-topology +volumeBindingMode: WaitForFirstConsumer +parameters: + connection: c1 +``` diff --git a/examples/proxy.yaml b/examples/proxy.yaml index 1084b13..b546374 100644 --- a/examples/proxy.yaml +++ b/examples/proxy.yaml @@ -10,11 +10,40 @@ proxy: # when timeout is -1, cache timeout is disabled, drivers are cached forever # default: 60 cacheTimeoutMinutes: 60 - # Node topology defines isolated groups of nodes - # local drivers need node topology - # All other drivers do not depend on topology, and will work fine with simpler cluster topology nodeTopology: - # 'cluster': the whole cluster has unified storage - # 'node': each node has its own storage. Required for 'local-' drivers + # allowed values: + # node - each node has its own storage + # cluster - the whole cluster has unified storage (usually the case with external NAS systems) + # custom - there are several custom zones with internal storage # default: cluster type: cluster + # topology reported by CSI driver is reflected in k8s as node labels. + # you may want to set unique prefixes on different drivers to avoid collisions + prefix: org.democratic-csi.topology + customRules: + # resulting topology looks like this: + # ${ prefix }/${ customRules[*].keySuffix } == ${ customRules[*].resultTemplate } + - keySuffix: zone + # possible sources: + # - nodeName + # - hostname + # - env + # - file + source: nodeName + # envName is used only when "source: env" + envName: DEMOCRATIC_CSI_REGION + # file is used only when "source: file" + # file must be mounted into container filesystem manually + file: /mnt/topology/region + # match can: + # - be exact: "matchRegexp: my-node-1.domain" (though technically this is still regex) + # - use regex: "matchRegexp: .*\.domain" + # - use capture groups: "matchRegexp: .*\.(zone-.*)\.domain" + # Partial matches are not allowed: driver implicitly appends ^ and $ to regex. + matchRegexp: my-node-1.domain + # result template can: + # - be exact: zone-1 + # - use values from capture groups: zone-${match:1} + # - - ${match:0} contains the whole input + # - - ${match:1} contains the first capture group, and so on + resultTemplate: zone1 diff --git a/src/driver/controller-proxy/index.js b/src/driver/controller-proxy/index.js index 805212c..2dcec80 100644 --- a/src/driver/controller-proxy/index.js +++ b/src/driver/controller-proxy/index.js @@ -23,6 +23,7 @@ class CsiProxyDriver extends CsiBaseDriver { configFolder = configFolder.slice(0, -1); } this.nodeIdSerializer = new NodeIdSerializer(ctx, options.proxy.nodeId || {}); + this.nodeTopologyGenerator = new NodeTopologyGenerator(ctx, options.proxy.nodeTopology || {}); const timeoutMinutes = this.options.proxy.cacheTimeoutMinutes ?? 60; const defaultOptions = this.options; @@ -164,6 +165,99 @@ class CsiProxyDriver extends CsiBaseDriver { return await this.checkAndRun(driver, methodName, call, defaultValue); } + checkTopologyRequirement(segments, driverTopologies) { + for (let i = 0; i < driverTopologies.length; i++) { + let matches = true; + + const requirements = driverTopologies[i].requirements; + if (!requirements) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `requirements is missing from proxy.perDriver.topology[${i}]` + ); + } + for (const reqKey in segments) { + const connectionZone = requirements[reqKey]; + if (!connectionZone) { + // If some part of topology is not specified in connection, + // then assume this part is not important. + // For example, node can report node name and zone as topology. + // Then requirements will include both zone and node name. + // But connection specifies only zone, and that's okay. + continue; + } + const reqZone = segments[reqKey]; + if (connectionZone != reqZone) { + matches = false; + break; + } + } + if (matches) { + // this driver topology satisfies req + return true; + } + } + // we didn't find any driver topology that would match req + return false; + } + + // returns (required_topology < driver_topology) + // returns true it driver does not specify topology + checkTopology(call, driver) { + const requirements = call.request.accessibility_requirements?.requisite; + if (!requirements) { + return true; + } + const driverTopologies = driver.options.proxy?.perDriver?.topology; + if (!driverTopologies) { + return true; + } + + for (let reqI = 0; reqI < requirements.length; reqI++) { + const req = requirements[reqI]; + const segments = this.nodeTopologyGenerator.stripPrefixFromMap(req.segments); + const reqMatches = this.checkTopologyRequirement(segments, driverTopologies); + + // this req does not match any topology from the connection + // it doesn't make sense to check any remaining requirements + if (!reqMatches) { + this.ctx.logger.debug(`failed topology check: ${JSON.stringify(segments)} is not in ${JSON.stringify(driverTopologies)}`); + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `topology is not accessible for this connection: ${JSON.stringify(segments)}` + ); + } + } + + return true; + } + + decorateTopology(volume, driver) { + const driverTopologies = driver.options.proxy?.perDriver?.topology; + if (!driverTopologies) { + return; + } + volume.accessible_topology = []; + for (let i = 0; i < driverTopologies.length; i++) { + const requirements = driverTopologies[i].requirements; + if (!requirements) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `requirements is missing from proxy.perDriver.topology[${i}]` + ); + } + const segments = {}; + for (const k in requirements) { + const topologyKey = this.nodeTopologyGenerator.addPrefix(k); + segments[topologyKey] = requirements[k]; + } + for (const k in driverTopologies[i].extra) { + segments[k] = driverTopologies[i].extra[k]; + } + volume.accessible_topology.push({ segments: segments }); + } + } + // =========================================== // Controller methods below // =========================================== @@ -194,6 +288,14 @@ class CsiProxyDriver extends CsiBaseDriver { const connectionName = parameters.connection; const driver = this.driverCache.lookUpConnection(connectionName); + const topologyOK = this.checkTopology(call, driver); + if (!topologyOK) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `topology is not accessible for this connection` + ); + } + switch (call.request.volume_content_source?.type) { case "snapshot": { const snapshotHandle = this.parseVolumeHandle(call.request.volume_content_source.snapshot.snapshot_id, snapshotIdPrefix); @@ -228,6 +330,7 @@ class CsiProxyDriver extends CsiBaseDriver { } const result = await this.checkAndRun(driver, 'CreateVolume', call); result.volume.volume_id = this.decorateVolumeHandle(connectionName, result.volume.volume_id); + this.decorateTopology(result.volume, driver); return result; } @@ -236,7 +339,13 @@ class CsiProxyDriver extends CsiBaseDriver { } async ControllerGetVolume(call) { - return await this.controllerRunWrapper('ControllerGetVolume', call); + const volumeHandle = this.parseVolumeHandle(call.request.volume_id); + const driver = this.lookUpConnection(volumeHandle.connectionName); + call.request.volume_id = volumeHandle.realHandle; + const result = await this.checkAndRun(driver, 'ControllerGetVolume', call); + result.volume.volume_id = this.decorateVolumeHandle(volumeHandle.connectionName, result.volume.volume_id); + this.decorateTopology(result.volume, driver); + return result; } async ControllerExpandVolume(call) { @@ -314,6 +423,11 @@ class CsiProxyDriver extends CsiBaseDriver { }, }; break + case 'zone': + result.accessible_topology = { + segments: this.nodeTopologyGenerator.generate(), + }; + break default: throw new GrpcError( grpc.status.INVALID_ARGUMENT, @@ -627,4 +741,113 @@ class NodeIdSerializer { } } +class NodeTopologyGenerator { + constructor(ctx, config) { + this.ctx = ctx; + this.config = config || {}; + this.prefix = this.config.prefix || TOPOLOGY_DEFAULT_PREFIX; + this.config.fromRegexp = this.config.fromRegexp || []; + } + + generate() { + const result = {}; + for (const e of this.config.fromRegexp) { + if (!e.topologyKey) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `topologyKey is missing` + ); + } + const value = this.getValueFromSource(e); + const re = '^' + (e.regex ?? "(.*)") + '$'; + const regex = new RegExp(re); + const match = regex.exec(value); + if (match === null) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `${e.source}: ${this.getNameFromSource()} value ${value} does not match regex ${re}` + ); + } + const key = this.prefix + '/' + e.topologyKey; + const template = e.template ?? '{match:0}'; + result[key] = this.fillMatches(template, match); + } + return result; + } + + addPrefix(key) { + return this.prefix + '/' + key; + } + + stripPrefix(key) { + if (!key.startsWith(this.prefix)) { + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `topology key ${key} does not match prefix ${prefix}` + ); + } + // remove prefix and '/' + return key.slice(this.prefix.length + 1); + } + + // checks that each key in req starts with prefix + // returns map with short keys + stripPrefixFromMap(segments) { + const result = {}; + for (const key in segments) { + if (!key.startsWith(this.prefix)) { + // since topology is generated in proxy node with the same config, + // we expect that topology prefix will always be the same + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `topology key ${key} does not match prefix ${this.prefix}` + ); + } + const strippedKey = this.stripPrefix(key); + result[strippedKey] = segments[key]; + } + return result; + } + + // return string value of resource referenced by e + getValueFromSource(e) { + const type = e.source; + switch (type) { + case 'hostname': return os.hostname(); + case 'nodeName': return process.env.CSI_NODE_ID; + case 'env': return process.env[e.envName]; + case 'file': return fs.readFileSync(e.file, "utf8"); + default: + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `unknown node topology source type: ${type}` + ); + } + } + // return resource name for error logs + getNameFromSource(e) { + const type = e.source; + switch (type) { + case 'hostname': return ''; + case 'nodeName': return ''; + case 'env': return e.envName; + case 'file': return e.file; + default: + throw new GrpcError( + grpc.status.INVALID_ARGUMENT, + `unknown node topology source type: ${type}` + ); + } + } + + fillMatches(template, matches) { + let result = template; + for (const i in matches) { + const ref = `{match:${i}}`; + result = result.replaceAll(ref, matches[i]); + } + return result; + } +} + module.exports.CsiProxyDriver = CsiProxyDriver; diff --git a/src/driver/controller-proxy/nodeInfo.md b/src/driver/controller-proxy/nodeInfo.md index ab65d1b..01146db 100644 --- a/src/driver/controller-proxy/nodeInfo.md +++ b/src/driver/controller-proxy/nodeInfo.md @@ -125,20 +125,19 @@ There are 3 cases of cluster topology: - Each node has unique topology domain (`local` drivers) - All nodes are the same (usually the case for non-local drivers) - Several availability zones that can contain several nodes +- - https://github.com/democratic-csi/democratic-csi/issues/459 -Simple cases are currently supported by the proxy. -Custom availability zones are TBD. +There are 2 components to this: +1. Node driver must correctly report its availability zone +2. Controller must set required zone labels in volume -Example configuration: +Since proxy driver should work with drivers from potentially different availability zones, +it requires a config to distinguish zones. -```yaml -proxy: - nodeTopology: - # allowed values: - # node - each node has its own storage - # cluster - the whole cluster has unified storage - type: node - # topology reported by CSI driver is reflected in k8s as node labels. - # you may want to set unique prefixes on different drivers to avoid collisions - prefix: org.democratic-csi.topology -``` +## Custom topology: controller driver + +The only thing needed from controller is to set topology requirements when volume is created. + +Proxy will set these constraints when volume is created, no other configuration is required. + +Different connections can have different topology.