proxy: add topology support
This commit is contained in:
		
							parent
							
								
									cf51e5c0ba
								
							
						
					
					
						commit
						bb0e3c341b
					
				|  | @ -0,0 +1,176 @@ | |||
| 
 | ||||
| # Proxy driver with topology support | ||||
| 
 | ||||
| Proxy driver can support storage connections | ||||
| that aren't accessible from every node. | ||||
| You can specify, that connection C1 is only accessible from zone Z1, for example. | ||||
| 
 | ||||
| See here for general proxy setup: [proxy-driver.md](./proxy-driver.md) | ||||
| 
 | ||||
| # Topology support in Helm values | ||||
| 
 | ||||
| In addition to general proxy values you need to add extra args for `externalProvisioner`: | ||||
| 
 | ||||
| ```yaml | ||||
| csiDriver: | ||||
|   name: org.democratic-csi.proxy-topology | ||||
| controller: | ||||
|   extraVolumes: | ||||
|   - name: connections | ||||
|     secret: | ||||
|       secretName: connections | ||||
|   driver: | ||||
|     extraVolumeMounts: | ||||
|     - name: connections | ||||
|       mountPath: /mnt/connections | ||||
|   externalProvisioner: | ||||
|     extraArgs: | ||||
|     - --feature-gates=Topology=true | ||||
|     # strict-topology and immediate-topology can be altered, | ||||
|     # see below in storage class description or in this link | ||||
|     # https://github.com/kubernetes-csi/external-provisioner#topology-support | ||||
|     - --strict-topology=true | ||||
|     - --immediate-topology=false | ||||
| ``` | ||||
| 
 | ||||
| # Topology support in storage connection | ||||
| 
 | ||||
| Add the following proxy-specific part into your connection config: | ||||
| 
 | ||||
| ```yaml | ||||
| # add to each _real_ driver config | ||||
| proxy: | ||||
|   perDriver: | ||||
|     topology: | ||||
|     # keys must correspond to proxy.nodeTopology.fromRegexp[*].topologyKey | ||||
|     # values must correspond to values reported by nodes | ||||
|     - requirements: | ||||
|         # use short keys, proxy will automatically add a prefix | ||||
|         region: region1 | ||||
|         zone: zone1 | ||||
|       # you can add custom node affinity labels | ||||
|       # they will be added on top of node requirements | ||||
|       # specify full labels here | ||||
|       extra: | ||||
|         custom.prefix/custom.name: custom.value | ||||
| ``` | ||||
| 
 | ||||
| Config specified above will do the following: | ||||
| - If PVC is created with zone requirements, proxy will check them against `proxy.perDriver.topology.requirements` before creating volume | ||||
| - Volumes created for this connection will get node affinity for labels: | ||||
| - - `prefix/region: region1` | ||||
| - - `prefix/zone: zone1` | ||||
| - - `custom.prefix/custom.name: custom.value` | ||||
| - Pods consuming this volume will be schedulable only on nodes having all of these labels | ||||
| 
 | ||||
| # Topology support in nodes | ||||
| 
 | ||||
| Proxy driver needs to be able to report supported topology zones on each node. | ||||
| 
 | ||||
| Add `proxy.nodeTopology` to your proxy config file to configure topology. | ||||
| Check corresponding example section for available options: [proxy.yaml](../examples/proxy.yaml). | ||||
| 
 | ||||
| Driver reports node topology based on the list of rules in the config. | ||||
| 
 | ||||
| If some rule does not match the input, the rule is ignored. | ||||
| So, if needed, you can use rules that are only valid on certain nodes. | ||||
| 
 | ||||
| Ideas for writing rules: | ||||
| 
 | ||||
| - Encode zone name in the node name | ||||
| - Wait for k8s DownwardAPI for node labels | ||||
| - - Should be alpha in k8s v1.33: https://github.com/kubernetes/kubernetes/issues/40610 | ||||
| - Inject node labels into environment variables via a webhook: https://kyverno.io/policies/other/mutate-pod-binding/mutate-pod-binding/ | ||||
| - Deploy a separate node DaemonSet for each zone, with zone in an environment variable | ||||
| - Configure each node: place zone info into a file on host | ||||
| 
 | ||||
| # Topology support in storage class | ||||
| 
 | ||||
| Topology of the node is decided during volume creation. | ||||
| K8s (or another container orchestration tool) sets requirements, | ||||
| and driver must decide how to satisfy them or decline the request. | ||||
| 
 | ||||
| In k8s there are 3 main ways to set requirements. | ||||
| They are described in more details and with alternative options here: | ||||
| https://github.com/kubernetes-csi/external-provisioner#topology-support | ||||
| 
 | ||||
| 1. No requirements. Topology matching during volume creation is disabled. | ||||
| 
 | ||||
| - Volume creation will never fail. | ||||
| - NodeAffinity for volume is based on connection config only. | ||||
| - Pod affinity requirements are ignored. | ||||
| 
 | ||||
| Deployment requirements: | ||||
| - Requires `--immediate-topology=false`. | ||||
| - `--strict-topology` does not matter | ||||
| - Requires `volumeBindingMode: Immediate`. | ||||
| 
 | ||||
| Storage class example: | ||||
| 
 | ||||
| ```yaml | ||||
| --- | ||||
| apiVersion: storage.k8s.io/v1 | ||||
| kind: StorageClass | ||||
| metadata: | ||||
|   name: s1 | ||||
| provisioner: org.democratic-csi.proxy-topology | ||||
| volumeBindingMode: Immediate | ||||
| parameters: | ||||
|   connection: c1 | ||||
| ``` | ||||
| 
 | ||||
| 2. Topology matching is based on storage class config. | ||||
| 
 | ||||
| - Requirements are based ONLY on Storage Class. | ||||
| - Volume creation will fail if: storage class parameters do not match connection config parameters. | ||||
| - Pod affinity requirements are ignored. | ||||
| 
 | ||||
| Deployment requirements: | ||||
| - Requires `--strict-topology=false`. | ||||
| - Requires `allowedTopologies` to be present. | ||||
| - `--immediate-topology` does not matter | ||||
| 
 | ||||
| Storage class example: | ||||
| 
 | ||||
| ```yaml | ||||
| --- | ||||
| apiVersion: storage.k8s.io/v1 | ||||
| kind: StorageClass | ||||
| metadata: | ||||
|   name: s1 | ||||
| provisioner: org.democratic-csi.proxy-topology | ||||
| # volumeBindingMode can be either Immediate or WaitForFirstConsumer | ||||
| volumeBindingMode: Immediate | ||||
| parameters: | ||||
|   connection: c1 | ||||
| allowedTopologies: | ||||
| - matchLabelExpressions: | ||||
|   - key: org.democratic-csi.topology/zone | ||||
|     values: | ||||
|     - zone1 | ||||
| ``` | ||||
| 
 | ||||
| 3. Topology matching is based on pod scheduling: | ||||
| 
 | ||||
| - Requirements are based ONLY on the first consumer pod. | ||||
| - Volume is allocated in the zone that the first pod is scheduled to | ||||
| - Volume creation will fail if: pod node does not match connection config. | ||||
| 
 | ||||
| Deployment requirements: | ||||
| - Requires `--strict-topology=true`. | ||||
| - Requires `volumeBindingMode: WaitForFirstConsumer`. | ||||
| - `--immediate-topology` does not matter | ||||
| 
 | ||||
| Storage class example: | ||||
| 
 | ||||
| ```yaml | ||||
| --- | ||||
| apiVersion: storage.k8s.io/v1 | ||||
| kind: StorageClass | ||||
| metadata: | ||||
|   name: s1 | ||||
| provisioner: org.democratic-csi.proxy-topology | ||||
| volumeBindingMode: WaitForFirstConsumer | ||||
| parameters: | ||||
|   connection: c1 | ||||
| ``` | ||||
|  | @ -10,11 +10,40 @@ proxy: | |||
|   # when timeout is -1, cache timeout is disabled, drivers are cached forever | ||||
|   # default: 60 | ||||
|   cacheTimeoutMinutes: 60 | ||||
|   # Node topology defines isolated groups of nodes | ||||
|   # local drivers need node topology | ||||
|   # All other drivers do not depend on topology, and will work fine with simpler cluster topology | ||||
|   nodeTopology: | ||||
|     # 'cluster': the whole cluster has unified storage | ||||
|     # 'node': each node has its own storage. Required for 'local-' drivers | ||||
|     # allowed values: | ||||
|     # node - each node has its own storage | ||||
|     # cluster - the whole cluster has unified storage (usually the case with external NAS systems) | ||||
|     # custom - there are several custom zones with internal storage | ||||
|     # default: cluster | ||||
|     type: cluster | ||||
|     # topology reported by CSI driver is reflected in k8s as node labels. | ||||
|     # you may want to set unique prefixes on different drivers to avoid collisions | ||||
|     prefix: org.democratic-csi.topology | ||||
|     customRules: | ||||
|     # resulting topology looks like this: | ||||
|     # ${ prefix }/${ customRules[*].keySuffix } == ${ customRules[*].resultTemplate } | ||||
|     - keySuffix: zone | ||||
|       # possible sources: | ||||
|       # - nodeName | ||||
|       # - hostname | ||||
|       # - env | ||||
|       # - file | ||||
|       source: nodeName | ||||
|       # envName is used only when "source: env" | ||||
|       envName: DEMOCRATIC_CSI_REGION | ||||
|       # file is used only when "source: file" | ||||
|       # file must be mounted into container filesystem manually | ||||
|       file: /mnt/topology/region | ||||
|       # match can: | ||||
|       # - be exact: "matchRegexp: my-node-1.domain" (though technically this is still regex) | ||||
|       # - use regex: "matchRegexp: .*\.domain" | ||||
|       # - use capture groups: "matchRegexp: .*\.(zone-.*)\.domain" | ||||
|       # Partial matches are not allowed: driver implicitly appends ^ and $ to regex. | ||||
|       matchRegexp: my-node-1.domain | ||||
|       # result template can: | ||||
|       # - be exact: zone-1 | ||||
|       # - use values from capture groups: zone-${match:1} | ||||
|       # - - ${match:0} contains the whole input | ||||
|       # - - ${match:1} contains the first capture group, and so on | ||||
|       resultTemplate: zone1 | ||||
|  |  | |||
|  | @ -23,6 +23,7 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|       configFolder = configFolder.slice(0, -1); | ||||
|     } | ||||
|     this.nodeIdSerializer = new NodeIdSerializer(ctx, options.proxy.nodeId || {}); | ||||
|     this.nodeTopologyGenerator = new NodeTopologyGenerator(ctx, options.proxy.nodeTopology || {}); | ||||
| 
 | ||||
|     const timeoutMinutes = this.options.proxy.cacheTimeoutMinutes ?? 60; | ||||
|     const defaultOptions = this.options; | ||||
|  | @ -164,6 +165,99 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|     return await this.checkAndRun(driver, methodName, call, defaultValue); | ||||
|   } | ||||
| 
 | ||||
|   checkTopologyRequirement(segments, driverTopologies) { | ||||
|     for (let i = 0; i < driverTopologies.length; i++) { | ||||
|       let matches = true; | ||||
| 
 | ||||
|       const requirements = driverTopologies[i].requirements; | ||||
|       if (!requirements) { | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `requirements is missing from proxy.perDriver.topology[${i}]` | ||||
|         ); | ||||
|       } | ||||
|       for (const reqKey in segments) { | ||||
|         const connectionZone = requirements[reqKey]; | ||||
|         if (!connectionZone) { | ||||
|           // If some part of topology is not specified in connection,
 | ||||
|           // then assume this part is not important.
 | ||||
|           // For example, node can report node name and zone as topology.
 | ||||
|           // Then requirements will include both zone and node name.
 | ||||
|           // But connection specifies only zone, and that's okay.
 | ||||
|           continue; | ||||
|         } | ||||
|         const reqZone = segments[reqKey]; | ||||
|         if (connectionZone != reqZone) { | ||||
|           matches = false; | ||||
|           break; | ||||
|         } | ||||
|       } | ||||
|       if (matches) { | ||||
|         // this driver topology satisfies req
 | ||||
|         return true; | ||||
|       } | ||||
|     } | ||||
|     // we didn't find any driver topology that would match req
 | ||||
|     return false; | ||||
|   } | ||||
| 
 | ||||
|   // returns (required_topology < driver_topology)
 | ||||
|   // returns true it driver does not specify topology
 | ||||
|   checkTopology(call, driver) { | ||||
|     const requirements = call.request.accessibility_requirements?.requisite; | ||||
|     if (!requirements) { | ||||
|       return true; | ||||
|     } | ||||
|     const driverTopologies = driver.options.proxy?.perDriver?.topology; | ||||
|     if (!driverTopologies) { | ||||
|       return true; | ||||
|     } | ||||
| 
 | ||||
|     for (let reqI = 0; reqI < requirements.length; reqI++) { | ||||
|       const req = requirements[reqI]; | ||||
|       const segments = this.nodeTopologyGenerator.stripPrefixFromMap(req.segments); | ||||
|       const reqMatches = this.checkTopologyRequirement(segments, driverTopologies); | ||||
| 
 | ||||
|       // this req does not match any topology from the connection
 | ||||
|       // it doesn't make sense to check any remaining requirements
 | ||||
|       if (!reqMatches) { | ||||
|         this.ctx.logger.debug(`failed topology check: ${JSON.stringify(segments)} is not in ${JSON.stringify(driverTopologies)}`); | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `topology is not accessible for this connection: ${JSON.stringify(segments)}` | ||||
|         ); | ||||
|       } | ||||
|     } | ||||
| 
 | ||||
|     return true; | ||||
|   } | ||||
| 
 | ||||
|   decorateTopology(volume, driver) { | ||||
|     const driverTopologies = driver.options.proxy?.perDriver?.topology; | ||||
|     if (!driverTopologies) { | ||||
|       return; | ||||
|     } | ||||
|     volume.accessible_topology = []; | ||||
|     for (let i = 0; i < driverTopologies.length; i++) { | ||||
|       const requirements = driverTopologies[i].requirements; | ||||
|       if (!requirements) { | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `requirements is missing from proxy.perDriver.topology[${i}]` | ||||
|         ); | ||||
|       } | ||||
|       const segments = {}; | ||||
|       for (const k in requirements) { | ||||
|         const topologyKey = this.nodeTopologyGenerator.addPrefix(k); | ||||
|         segments[topologyKey] = requirements[k]; | ||||
|       } | ||||
|       for (const k in driverTopologies[i].extra) { | ||||
|         segments[k] = driverTopologies[i].extra[k]; | ||||
|       } | ||||
|       volume.accessible_topology.push({ segments: segments }); | ||||
|     } | ||||
|   } | ||||
| 
 | ||||
|   // ===========================================
 | ||||
|   //    Controller methods below
 | ||||
|   // ===========================================
 | ||||
|  | @ -194,6 +288,14 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|     const connectionName = parameters.connection; | ||||
|     const driver = this.driverCache.lookUpConnection(connectionName); | ||||
| 
 | ||||
|     const topologyOK = this.checkTopology(call, driver); | ||||
|     if (!topologyOK) { | ||||
|       throw new GrpcError( | ||||
|         grpc.status.INVALID_ARGUMENT, | ||||
|         `topology is not accessible for this connection` | ||||
|       ); | ||||
|     } | ||||
| 
 | ||||
|     switch (call.request.volume_content_source?.type) { | ||||
|       case "snapshot": { | ||||
|         const snapshotHandle = this.parseVolumeHandle(call.request.volume_content_source.snapshot.snapshot_id, snapshotIdPrefix); | ||||
|  | @ -228,6 +330,7 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|     } | ||||
|     const result = await this.checkAndRun(driver, 'CreateVolume', call); | ||||
|     result.volume.volume_id = this.decorateVolumeHandle(connectionName, result.volume.volume_id); | ||||
|     this.decorateTopology(result.volume, driver); | ||||
|     return result; | ||||
|   } | ||||
| 
 | ||||
|  | @ -236,7 +339,13 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|   } | ||||
| 
 | ||||
|   async ControllerGetVolume(call) { | ||||
|     return await this.controllerRunWrapper('ControllerGetVolume', call); | ||||
|     const volumeHandle = this.parseVolumeHandle(call.request.volume_id); | ||||
|     const driver = this.lookUpConnection(volumeHandle.connectionName); | ||||
|     call.request.volume_id = volumeHandle.realHandle; | ||||
|     const result = await this.checkAndRun(driver, 'ControllerGetVolume', call); | ||||
|     result.volume.volume_id = this.decorateVolumeHandle(volumeHandle.connectionName, result.volume.volume_id); | ||||
|     this.decorateTopology(result.volume, driver); | ||||
|     return result; | ||||
|   } | ||||
| 
 | ||||
|   async ControllerExpandVolume(call) { | ||||
|  | @ -314,6 +423,11 @@ class CsiProxyDriver extends CsiBaseDriver { | |||
|           }, | ||||
|         }; | ||||
|         break | ||||
|       case 'zone': | ||||
|         result.accessible_topology = { | ||||
|           segments: this.nodeTopologyGenerator.generate(), | ||||
|         }; | ||||
|         break | ||||
|       default: | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|  | @ -627,4 +741,113 @@ class NodeIdSerializer { | |||
|   } | ||||
| } | ||||
| 
 | ||||
| class NodeTopologyGenerator { | ||||
|   constructor(ctx, config) { | ||||
|     this.ctx = ctx; | ||||
|     this.config = config || {}; | ||||
|     this.prefix = this.config.prefix || TOPOLOGY_DEFAULT_PREFIX; | ||||
|     this.config.fromRegexp = this.config.fromRegexp || []; | ||||
|   } | ||||
| 
 | ||||
|   generate() { | ||||
|     const result = {}; | ||||
|     for (const e of this.config.fromRegexp) { | ||||
|       if (!e.topologyKey) { | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `topologyKey is missing` | ||||
|         ); | ||||
|       } | ||||
|       const value = this.getValueFromSource(e); | ||||
|       const re = '^' + (e.regex ?? "(.*)") + '$'; | ||||
|       const regex = new RegExp(re); | ||||
|       const match = regex.exec(value); | ||||
|       if (match === null) { | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `${e.source}: ${this.getNameFromSource()} value ${value} does not match regex ${re}` | ||||
|         ); | ||||
|       } | ||||
|       const key = this.prefix + '/' + e.topologyKey; | ||||
|       const template = e.template ?? '{match:0}'; | ||||
|       result[key] = this.fillMatches(template, match); | ||||
|     } | ||||
|     return result; | ||||
|   } | ||||
| 
 | ||||
|   addPrefix(key) { | ||||
|     return this.prefix + '/' + key; | ||||
|   } | ||||
| 
 | ||||
|   stripPrefix(key) { | ||||
|     if (!key.startsWith(this.prefix)) { | ||||
|       throw new GrpcError( | ||||
|         grpc.status.INVALID_ARGUMENT, | ||||
|         `topology key ${key} does not match prefix ${prefix}` | ||||
|       ); | ||||
|     } | ||||
|     // remove prefix and '/'
 | ||||
|     return key.slice(this.prefix.length + 1); | ||||
|   } | ||||
| 
 | ||||
|   // checks that each key in req starts with prefix
 | ||||
|   // returns map<key,zone> with short keys
 | ||||
|   stripPrefixFromMap(segments) { | ||||
|     const result = {}; | ||||
|     for (const key in segments) { | ||||
|       if (!key.startsWith(this.prefix)) { | ||||
|         // since topology is generated in proxy node with the same config,
 | ||||
|         // we expect that topology prefix will always be the same
 | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `topology key ${key} does not match prefix ${this.prefix}` | ||||
|         ); | ||||
|       } | ||||
|       const strippedKey = this.stripPrefix(key); | ||||
|       result[strippedKey] = segments[key]; | ||||
|     } | ||||
|     return result; | ||||
|   } | ||||
| 
 | ||||
|   // return string value of resource referenced by e
 | ||||
|   getValueFromSource(e) { | ||||
|     const type = e.source; | ||||
|     switch (type) { | ||||
|       case 'hostname': return os.hostname(); | ||||
|       case 'nodeName': return process.env.CSI_NODE_ID; | ||||
|       case 'env': return process.env[e.envName]; | ||||
|       case 'file': return fs.readFileSync(e.file, "utf8"); | ||||
|       default: | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `unknown node topology source type: ${type}` | ||||
|         ); | ||||
|     } | ||||
|   } | ||||
|   // return resource name for error logs
 | ||||
|   getNameFromSource(e) { | ||||
|     const type = e.source; | ||||
|     switch (type) { | ||||
|       case 'hostname': return ''; | ||||
|       case 'nodeName': return ''; | ||||
|       case 'env': return e.envName; | ||||
|       case 'file': return e.file; | ||||
|       default: | ||||
|         throw new GrpcError( | ||||
|           grpc.status.INVALID_ARGUMENT, | ||||
|           `unknown node topology source type: ${type}` | ||||
|         ); | ||||
|     } | ||||
|   } | ||||
| 
 | ||||
|   fillMatches(template, matches) { | ||||
|     let result = template; | ||||
|     for (const i in matches) { | ||||
|       const ref = `{match:${i}}`; | ||||
|       result = result.replaceAll(ref, matches[i]); | ||||
|     } | ||||
|     return result; | ||||
|   } | ||||
| } | ||||
| 
 | ||||
| module.exports.CsiProxyDriver = CsiProxyDriver; | ||||
|  |  | |||
|  | @ -125,20 +125,19 @@ There are 3 cases of cluster topology: | |||
| - Each node has unique topology domain (`local` drivers) | ||||
| - All nodes are the same (usually the case for non-local drivers) | ||||
| - Several availability zones that can contain several nodes | ||||
| - - https://github.com/democratic-csi/democratic-csi/issues/459 | ||||
| 
 | ||||
| Simple cases are currently supported by the proxy. | ||||
| Custom availability zones are TBD. | ||||
| There are 2 components to this: | ||||
| 1. Node driver must correctly report its availability zone | ||||
| 2. Controller must set required zone labels in volume | ||||
| 
 | ||||
| Example configuration: | ||||
| Since proxy driver should work with drivers from potentially different availability zones, | ||||
| it requires a config to distinguish zones. | ||||
| 
 | ||||
| ```yaml | ||||
| proxy: | ||||
|   nodeTopology: | ||||
|     # allowed values: | ||||
|     # node - each node has its own storage | ||||
|     # cluster - the whole cluster has unified storage | ||||
|     type: node | ||||
|     # topology reported by CSI driver is reflected in k8s as node labels. | ||||
|     # you may want to set unique prefixes on different drivers to avoid collisions | ||||
|     prefix: org.democratic-csi.topology | ||||
| ``` | ||||
| ## Custom topology: controller driver | ||||
| 
 | ||||
| The only thing needed from controller is to set topology requirements when volume is created. | ||||
| 
 | ||||
| Proxy will set these constraints when volume is created, no other configuration is required. | ||||
| 
 | ||||
| Different connections can have different topology. | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue