proxy: add topology support

This commit is contained in:
Danil Uzlov 2025-03-26 08:34:37 +00:00
parent cf51e5c0ba
commit bb0e3c341b
4 changed files with 447 additions and 20 deletions

176
docs/proxy-topology.md Normal file
View File

@ -0,0 +1,176 @@
# Proxy driver with topology support
Proxy driver can support storage connections
that aren't accessible from every node.
You can specify, that connection C1 is only accessible from zone Z1, for example.
See here for general proxy setup: [proxy-driver.md](./proxy-driver.md)
# Topology support in Helm values
In addition to general proxy values you need to add extra args for `externalProvisioner`:
```yaml
csiDriver:
name: org.democratic-csi.proxy-topology
controller:
extraVolumes:
- name: connections
secret:
secretName: connections
driver:
extraVolumeMounts:
- name: connections
mountPath: /mnt/connections
externalProvisioner:
extraArgs:
- --feature-gates=Topology=true
# strict-topology and immediate-topology can be altered,
# see below in storage class description or in this link
# https://github.com/kubernetes-csi/external-provisioner#topology-support
- --strict-topology=true
- --immediate-topology=false
```
# Topology support in storage connection
Add the following proxy-specific part into your connection config:
```yaml
# add to each _real_ driver config
proxy:
perDriver:
topology:
# keys must correspond to proxy.nodeTopology.fromRegexp[*].topologyKey
# values must correspond to values reported by nodes
- requirements:
# use short keys, proxy will automatically add a prefix
region: region1
zone: zone1
# you can add custom node affinity labels
# they will be added on top of node requirements
# specify full labels here
extra:
custom.prefix/custom.name: custom.value
```
Config specified above will do the following:
- If PVC is created with zone requirements, proxy will check them against `proxy.perDriver.topology.requirements` before creating volume
- Volumes created for this connection will get node affinity for labels:
- - `prefix/region: region1`
- - `prefix/zone: zone1`
- - `custom.prefix/custom.name: custom.value`
- Pods consuming this volume will be schedulable only on nodes having all of these labels
# Topology support in nodes
Proxy driver needs to be able to report supported topology zones on each node.
Add `proxy.nodeTopology` to your proxy config file to configure topology.
Check corresponding example section for available options: [proxy.yaml](../examples/proxy.yaml).
Driver reports node topology based on the list of rules in the config.
If some rule does not match the input, the rule is ignored.
So, if needed, you can use rules that are only valid on certain nodes.
Ideas for writing rules:
- Encode zone name in the node name
- Wait for k8s DownwardAPI for node labels
- - Should be alpha in k8s v1.33: https://github.com/kubernetes/kubernetes/issues/40610
- Inject node labels into environment variables via a webhook: https://kyverno.io/policies/other/mutate-pod-binding/mutate-pod-binding/
- Deploy a separate node DaemonSet for each zone, with zone in an environment variable
- Configure each node: place zone info into a file on host
# Topology support in storage class
Topology of the node is decided during volume creation.
K8s (or another container orchestration tool) sets requirements,
and driver must decide how to satisfy them or decline the request.
In k8s there are 3 main ways to set requirements.
They are described in more details and with alternative options here:
https://github.com/kubernetes-csi/external-provisioner#topology-support
1. No requirements. Topology matching during volume creation is disabled.
- Volume creation will never fail.
- NodeAffinity for volume is based on connection config only.
- Pod affinity requirements are ignored.
Deployment requirements:
- Requires `--immediate-topology=false`.
- `--strict-topology` does not matter
- Requires `volumeBindingMode: Immediate`.
Storage class example:
```yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: s1
provisioner: org.democratic-csi.proxy-topology
volumeBindingMode: Immediate
parameters:
connection: c1
```
2. Topology matching is based on storage class config.
- Requirements are based ONLY on Storage Class.
- Volume creation will fail if: storage class parameters do not match connection config parameters.
- Pod affinity requirements are ignored.
Deployment requirements:
- Requires `--strict-topology=false`.
- Requires `allowedTopologies` to be present.
- `--immediate-topology` does not matter
Storage class example:
```yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: s1
provisioner: org.democratic-csi.proxy-topology
# volumeBindingMode can be either Immediate or WaitForFirstConsumer
volumeBindingMode: Immediate
parameters:
connection: c1
allowedTopologies:
- matchLabelExpressions:
- key: org.democratic-csi.topology/zone
values:
- zone1
```
3. Topology matching is based on pod scheduling:
- Requirements are based ONLY on the first consumer pod.
- Volume is allocated in the zone that the first pod is scheduled to
- Volume creation will fail if: pod node does not match connection config.
Deployment requirements:
- Requires `--strict-topology=true`.
- Requires `volumeBindingMode: WaitForFirstConsumer`.
- `--immediate-topology` does not matter
Storage class example:
```yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: s1
provisioner: org.democratic-csi.proxy-topology
volumeBindingMode: WaitForFirstConsumer
parameters:
connection: c1
```

View File

@ -10,11 +10,40 @@ proxy:
# when timeout is -1, cache timeout is disabled, drivers are cached forever
# default: 60
cacheTimeoutMinutes: 60
# Node topology defines isolated groups of nodes
# local drivers need node topology
# All other drivers do not depend on topology, and will work fine with simpler cluster topology
nodeTopology:
# 'cluster': the whole cluster has unified storage
# 'node': each node has its own storage. Required for 'local-' drivers
# allowed values:
# node - each node has its own storage
# cluster - the whole cluster has unified storage (usually the case with external NAS systems)
# custom - there are several custom zones with internal storage
# default: cluster
type: cluster
# topology reported by CSI driver is reflected in k8s as node labels.
# you may want to set unique prefixes on different drivers to avoid collisions
prefix: org.democratic-csi.topology
customRules:
# resulting topology looks like this:
# ${ prefix }/${ customRules[*].keySuffix } == ${ customRules[*].resultTemplate }
- keySuffix: zone
# possible sources:
# - nodeName
# - hostname
# - env
# - file
source: nodeName
# envName is used only when "source: env"
envName: DEMOCRATIC_CSI_REGION
# file is used only when "source: file"
# file must be mounted into container filesystem manually
file: /mnt/topology/region
# match can:
# - be exact: "matchRegexp: my-node-1.domain" (though technically this is still regex)
# - use regex: "matchRegexp: .*\.domain"
# - use capture groups: "matchRegexp: .*\.(zone-.*)\.domain"
# Partial matches are not allowed: driver implicitly appends ^ and $ to regex.
matchRegexp: my-node-1.domain
# result template can:
# - be exact: zone-1
# - use values from capture groups: zone-${match:1}
# - - ${match:0} contains the whole input
# - - ${match:1} contains the first capture group, and so on
resultTemplate: zone1

View File

@ -23,6 +23,7 @@ class CsiProxyDriver extends CsiBaseDriver {
configFolder = configFolder.slice(0, -1);
}
this.nodeIdSerializer = new NodeIdSerializer(ctx, options.proxy.nodeId || {});
this.nodeTopologyGenerator = new NodeTopologyGenerator(ctx, options.proxy.nodeTopology || {});
const timeoutMinutes = this.options.proxy.cacheTimeoutMinutes ?? 60;
const defaultOptions = this.options;
@ -164,6 +165,99 @@ class CsiProxyDriver extends CsiBaseDriver {
return await this.checkAndRun(driver, methodName, call, defaultValue);
}
checkTopologyRequirement(segments, driverTopologies) {
for (let i = 0; i < driverTopologies.length; i++) {
let matches = true;
const requirements = driverTopologies[i].requirements;
if (!requirements) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`requirements is missing from proxy.perDriver.topology[${i}]`
);
}
for (const reqKey in segments) {
const connectionZone = requirements[reqKey];
if (!connectionZone) {
// If some part of topology is not specified in connection,
// then assume this part is not important.
// For example, node can report node name and zone as topology.
// Then requirements will include both zone and node name.
// But connection specifies only zone, and that's okay.
continue;
}
const reqZone = segments[reqKey];
if (connectionZone != reqZone) {
matches = false;
break;
}
}
if (matches) {
// this driver topology satisfies req
return true;
}
}
// we didn't find any driver topology that would match req
return false;
}
// returns (required_topology < driver_topology)
// returns true it driver does not specify topology
checkTopology(call, driver) {
const requirements = call.request.accessibility_requirements?.requisite;
if (!requirements) {
return true;
}
const driverTopologies = driver.options.proxy?.perDriver?.topology;
if (!driverTopologies) {
return true;
}
for (let reqI = 0; reqI < requirements.length; reqI++) {
const req = requirements[reqI];
const segments = this.nodeTopologyGenerator.stripPrefixFromMap(req.segments);
const reqMatches = this.checkTopologyRequirement(segments, driverTopologies);
// this req does not match any topology from the connection
// it doesn't make sense to check any remaining requirements
if (!reqMatches) {
this.ctx.logger.debug(`failed topology check: ${JSON.stringify(segments)} is not in ${JSON.stringify(driverTopologies)}`);
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`topology is not accessible for this connection: ${JSON.stringify(segments)}`
);
}
}
return true;
}
decorateTopology(volume, driver) {
const driverTopologies = driver.options.proxy?.perDriver?.topology;
if (!driverTopologies) {
return;
}
volume.accessible_topology = [];
for (let i = 0; i < driverTopologies.length; i++) {
const requirements = driverTopologies[i].requirements;
if (!requirements) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`requirements is missing from proxy.perDriver.topology[${i}]`
);
}
const segments = {};
for (const k in requirements) {
const topologyKey = this.nodeTopologyGenerator.addPrefix(k);
segments[topologyKey] = requirements[k];
}
for (const k in driverTopologies[i].extra) {
segments[k] = driverTopologies[i].extra[k];
}
volume.accessible_topology.push({ segments: segments });
}
}
// ===========================================
// Controller methods below
// ===========================================
@ -194,6 +288,14 @@ class CsiProxyDriver extends CsiBaseDriver {
const connectionName = parameters.connection;
const driver = this.driverCache.lookUpConnection(connectionName);
const topologyOK = this.checkTopology(call, driver);
if (!topologyOK) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`topology is not accessible for this connection`
);
}
switch (call.request.volume_content_source?.type) {
case "snapshot": {
const snapshotHandle = this.parseVolumeHandle(call.request.volume_content_source.snapshot.snapshot_id, snapshotIdPrefix);
@ -228,6 +330,7 @@ class CsiProxyDriver extends CsiBaseDriver {
}
const result = await this.checkAndRun(driver, 'CreateVolume', call);
result.volume.volume_id = this.decorateVolumeHandle(connectionName, result.volume.volume_id);
this.decorateTopology(result.volume, driver);
return result;
}
@ -236,7 +339,13 @@ class CsiProxyDriver extends CsiBaseDriver {
}
async ControllerGetVolume(call) {
return await this.controllerRunWrapper('ControllerGetVolume', call);
const volumeHandle = this.parseVolumeHandle(call.request.volume_id);
const driver = this.lookUpConnection(volumeHandle.connectionName);
call.request.volume_id = volumeHandle.realHandle;
const result = await this.checkAndRun(driver, 'ControllerGetVolume', call);
result.volume.volume_id = this.decorateVolumeHandle(volumeHandle.connectionName, result.volume.volume_id);
this.decorateTopology(result.volume, driver);
return result;
}
async ControllerExpandVolume(call) {
@ -314,6 +423,11 @@ class CsiProxyDriver extends CsiBaseDriver {
},
};
break
case 'zone':
result.accessible_topology = {
segments: this.nodeTopologyGenerator.generate(),
};
break
default:
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
@ -627,4 +741,113 @@ class NodeIdSerializer {
}
}
class NodeTopologyGenerator {
constructor(ctx, config) {
this.ctx = ctx;
this.config = config || {};
this.prefix = this.config.prefix || TOPOLOGY_DEFAULT_PREFIX;
this.config.fromRegexp = this.config.fromRegexp || [];
}
generate() {
const result = {};
for (const e of this.config.fromRegexp) {
if (!e.topologyKey) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`topologyKey is missing`
);
}
const value = this.getValueFromSource(e);
const re = '^' + (e.regex ?? "(.*)") + '$';
const regex = new RegExp(re);
const match = regex.exec(value);
if (match === null) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`${e.source}: ${this.getNameFromSource()} value ${value} does not match regex ${re}`
);
}
const key = this.prefix + '/' + e.topologyKey;
const template = e.template ?? '{match:0}';
result[key] = this.fillMatches(template, match);
}
return result;
}
addPrefix(key) {
return this.prefix + '/' + key;
}
stripPrefix(key) {
if (!key.startsWith(this.prefix)) {
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`topology key ${key} does not match prefix ${prefix}`
);
}
// remove prefix and '/'
return key.slice(this.prefix.length + 1);
}
// checks that each key in req starts with prefix
// returns map<key,zone> with short keys
stripPrefixFromMap(segments) {
const result = {};
for (const key in segments) {
if (!key.startsWith(this.prefix)) {
// since topology is generated in proxy node with the same config,
// we expect that topology prefix will always be the same
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`topology key ${key} does not match prefix ${this.prefix}`
);
}
const strippedKey = this.stripPrefix(key);
result[strippedKey] = segments[key];
}
return result;
}
// return string value of resource referenced by e
getValueFromSource(e) {
const type = e.source;
switch (type) {
case 'hostname': return os.hostname();
case 'nodeName': return process.env.CSI_NODE_ID;
case 'env': return process.env[e.envName];
case 'file': return fs.readFileSync(e.file, "utf8");
default:
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`unknown node topology source type: ${type}`
);
}
}
// return resource name for error logs
getNameFromSource(e) {
const type = e.source;
switch (type) {
case 'hostname': return '';
case 'nodeName': return '';
case 'env': return e.envName;
case 'file': return e.file;
default:
throw new GrpcError(
grpc.status.INVALID_ARGUMENT,
`unknown node topology source type: ${type}`
);
}
}
fillMatches(template, matches) {
let result = template;
for (const i in matches) {
const ref = `{match:${i}}`;
result = result.replaceAll(ref, matches[i]);
}
return result;
}
}
module.exports.CsiProxyDriver = CsiProxyDriver;

View File

@ -125,20 +125,19 @@ There are 3 cases of cluster topology:
- Each node has unique topology domain (`local` drivers)
- All nodes are the same (usually the case for non-local drivers)
- Several availability zones that can contain several nodes
- - https://github.com/democratic-csi/democratic-csi/issues/459
Simple cases are currently supported by the proxy.
Custom availability zones are TBD.
There are 2 components to this:
1. Node driver must correctly report its availability zone
2. Controller must set required zone labels in volume
Example configuration:
Since proxy driver should work with drivers from potentially different availability zones,
it requires a config to distinguish zones.
```yaml
proxy:
nodeTopology:
# allowed values:
# node - each node has its own storage
# cluster - the whole cluster has unified storage
type: node
# topology reported by CSI driver is reflected in k8s as node labels.
# you may want to set unique prefixes on different drivers to avoid collisions
prefix: org.democratic-csi.topology
```
## Custom topology: controller driver
The only thing needed from controller is to set topology requirements when volume is created.
Proxy will set these constraints when volume is created, no other configuration is required.
Different connections can have different topology.