proxy: add topology support
This commit is contained in:
parent
cf51e5c0ba
commit
bb0e3c341b
|
|
@ -0,0 +1,176 @@
|
|||
|
||||
# Proxy driver with topology support
|
||||
|
||||
Proxy driver can support storage connections
|
||||
that aren't accessible from every node.
|
||||
You can specify, that connection C1 is only accessible from zone Z1, for example.
|
||||
|
||||
See here for general proxy setup: [proxy-driver.md](./proxy-driver.md)
|
||||
|
||||
# Topology support in Helm values
|
||||
|
||||
In addition to general proxy values you need to add extra args for `externalProvisioner`:
|
||||
|
||||
```yaml
|
||||
csiDriver:
|
||||
name: org.democratic-csi.proxy-topology
|
||||
controller:
|
||||
extraVolumes:
|
||||
- name: connections
|
||||
secret:
|
||||
secretName: connections
|
||||
driver:
|
||||
extraVolumeMounts:
|
||||
- name: connections
|
||||
mountPath: /mnt/connections
|
||||
externalProvisioner:
|
||||
extraArgs:
|
||||
- --feature-gates=Topology=true
|
||||
# strict-topology and immediate-topology can be altered,
|
||||
# see below in storage class description or in this link
|
||||
# https://github.com/kubernetes-csi/external-provisioner#topology-support
|
||||
- --strict-topology=true
|
||||
- --immediate-topology=false
|
||||
```
|
||||
|
||||
# Topology support in storage connection
|
||||
|
||||
Add the following proxy-specific part into your connection config:
|
||||
|
||||
```yaml
|
||||
# add to each _real_ driver config
|
||||
proxy:
|
||||
perDriver:
|
||||
topology:
|
||||
# keys must correspond to proxy.nodeTopology.fromRegexp[*].topologyKey
|
||||
# values must correspond to values reported by nodes
|
||||
- requirements:
|
||||
# use short keys, proxy will automatically add a prefix
|
||||
region: region1
|
||||
zone: zone1
|
||||
# you can add custom node affinity labels
|
||||
# they will be added on top of node requirements
|
||||
# specify full labels here
|
||||
extra:
|
||||
custom.prefix/custom.name: custom.value
|
||||
```
|
||||
|
||||
Config specified above will do the following:
|
||||
- If PVC is created with zone requirements, proxy will check them against `proxy.perDriver.topology.requirements` before creating volume
|
||||
- Volumes created for this connection will get node affinity for labels:
|
||||
- - `prefix/region: region1`
|
||||
- - `prefix/zone: zone1`
|
||||
- - `custom.prefix/custom.name: custom.value`
|
||||
- Pods consuming this volume will be schedulable only on nodes having all of these labels
|
||||
|
||||
# Topology support in nodes
|
||||
|
||||
Proxy driver needs to be able to report supported topology zones on each node.
|
||||
|
||||
Add `proxy.nodeTopology` to your proxy config file to configure topology.
|
||||
Check corresponding example section for available options: [proxy.yaml](../examples/proxy.yaml).
|
||||
|
||||
Driver reports node topology based on the list of rules in the config.
|
||||
|
||||
If some rule does not match the input, the rule is ignored.
|
||||
So, if needed, you can use rules that are only valid on certain nodes.
|
||||
|
||||
Ideas for writing rules:
|
||||
|
||||
- Encode zone name in the node name
|
||||
- Wait for k8s DownwardAPI for node labels
|
||||
- - Should be alpha in k8s v1.33: https://github.com/kubernetes/kubernetes/issues/40610
|
||||
- Inject node labels into environment variables via a webhook: https://kyverno.io/policies/other/mutate-pod-binding/mutate-pod-binding/
|
||||
- Deploy a separate node DaemonSet for each zone, with zone in an environment variable
|
||||
- Configure each node: place zone info into a file on host
|
||||
|
||||
# Topology support in storage class
|
||||
|
||||
Topology of the node is decided during volume creation.
|
||||
K8s (or another container orchestration tool) sets requirements,
|
||||
and driver must decide how to satisfy them or decline the request.
|
||||
|
||||
In k8s there are 3 main ways to set requirements.
|
||||
They are described in more details and with alternative options here:
|
||||
https://github.com/kubernetes-csi/external-provisioner#topology-support
|
||||
|
||||
1. No requirements. Topology matching during volume creation is disabled.
|
||||
|
||||
- Volume creation will never fail.
|
||||
- NodeAffinity for volume is based on connection config only.
|
||||
- Pod affinity requirements are ignored.
|
||||
|
||||
Deployment requirements:
|
||||
- Requires `--immediate-topology=false`.
|
||||
- `--strict-topology` does not matter
|
||||
- Requires `volumeBindingMode: Immediate`.
|
||||
|
||||
Storage class example:
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: s1
|
||||
provisioner: org.democratic-csi.proxy-topology
|
||||
volumeBindingMode: Immediate
|
||||
parameters:
|
||||
connection: c1
|
||||
```
|
||||
|
||||
2. Topology matching is based on storage class config.
|
||||
|
||||
- Requirements are based ONLY on Storage Class.
|
||||
- Volume creation will fail if: storage class parameters do not match connection config parameters.
|
||||
- Pod affinity requirements are ignored.
|
||||
|
||||
Deployment requirements:
|
||||
- Requires `--strict-topology=false`.
|
||||
- Requires `allowedTopologies` to be present.
|
||||
- `--immediate-topology` does not matter
|
||||
|
||||
Storage class example:
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: s1
|
||||
provisioner: org.democratic-csi.proxy-topology
|
||||
# volumeBindingMode can be either Immediate or WaitForFirstConsumer
|
||||
volumeBindingMode: Immediate
|
||||
parameters:
|
||||
connection: c1
|
||||
allowedTopologies:
|
||||
- matchLabelExpressions:
|
||||
- key: org.democratic-csi.topology/zone
|
||||
values:
|
||||
- zone1
|
||||
```
|
||||
|
||||
3. Topology matching is based on pod scheduling:
|
||||
|
||||
- Requirements are based ONLY on the first consumer pod.
|
||||
- Volume is allocated in the zone that the first pod is scheduled to
|
||||
- Volume creation will fail if: pod node does not match connection config.
|
||||
|
||||
Deployment requirements:
|
||||
- Requires `--strict-topology=true`.
|
||||
- Requires `volumeBindingMode: WaitForFirstConsumer`.
|
||||
- `--immediate-topology` does not matter
|
||||
|
||||
Storage class example:
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: s1
|
||||
provisioner: org.democratic-csi.proxy-topology
|
||||
volumeBindingMode: WaitForFirstConsumer
|
||||
parameters:
|
||||
connection: c1
|
||||
```
|
||||
|
|
@ -10,11 +10,40 @@ proxy:
|
|||
# when timeout is -1, cache timeout is disabled, drivers are cached forever
|
||||
# default: 60
|
||||
cacheTimeoutMinutes: 60
|
||||
# Node topology defines isolated groups of nodes
|
||||
# local drivers need node topology
|
||||
# All other drivers do not depend on topology, and will work fine with simpler cluster topology
|
||||
nodeTopology:
|
||||
# 'cluster': the whole cluster has unified storage
|
||||
# 'node': each node has its own storage. Required for 'local-' drivers
|
||||
# allowed values:
|
||||
# node - each node has its own storage
|
||||
# cluster - the whole cluster has unified storage (usually the case with external NAS systems)
|
||||
# custom - there are several custom zones with internal storage
|
||||
# default: cluster
|
||||
type: cluster
|
||||
# topology reported by CSI driver is reflected in k8s as node labels.
|
||||
# you may want to set unique prefixes on different drivers to avoid collisions
|
||||
prefix: org.democratic-csi.topology
|
||||
customRules:
|
||||
# resulting topology looks like this:
|
||||
# ${ prefix }/${ customRules[*].keySuffix } == ${ customRules[*].resultTemplate }
|
||||
- keySuffix: zone
|
||||
# possible sources:
|
||||
# - nodeName
|
||||
# - hostname
|
||||
# - env
|
||||
# - file
|
||||
source: nodeName
|
||||
# envName is used only when "source: env"
|
||||
envName: DEMOCRATIC_CSI_REGION
|
||||
# file is used only when "source: file"
|
||||
# file must be mounted into container filesystem manually
|
||||
file: /mnt/topology/region
|
||||
# match can:
|
||||
# - be exact: "matchRegexp: my-node-1.domain" (though technically this is still regex)
|
||||
# - use regex: "matchRegexp: .*\.domain"
|
||||
# - use capture groups: "matchRegexp: .*\.(zone-.*)\.domain"
|
||||
# Partial matches are not allowed: driver implicitly appends ^ and $ to regex.
|
||||
matchRegexp: my-node-1.domain
|
||||
# result template can:
|
||||
# - be exact: zone-1
|
||||
# - use values from capture groups: zone-${match:1}
|
||||
# - - ${match:0} contains the whole input
|
||||
# - - ${match:1} contains the first capture group, and so on
|
||||
resultTemplate: zone1
|
||||
|
|
|
|||
|
|
@ -23,6 +23,7 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
configFolder = configFolder.slice(0, -1);
|
||||
}
|
||||
this.nodeIdSerializer = new NodeIdSerializer(ctx, options.proxy.nodeId || {});
|
||||
this.nodeTopologyGenerator = new NodeTopologyGenerator(ctx, options.proxy.nodeTopology || {});
|
||||
|
||||
const timeoutMinutes = this.options.proxy.cacheTimeoutMinutes ?? 60;
|
||||
const defaultOptions = this.options;
|
||||
|
|
@ -164,6 +165,99 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
return await this.checkAndRun(driver, methodName, call, defaultValue);
|
||||
}
|
||||
|
||||
checkTopologyRequirement(segments, driverTopologies) {
|
||||
for (let i = 0; i < driverTopologies.length; i++) {
|
||||
let matches = true;
|
||||
|
||||
const requirements = driverTopologies[i].requirements;
|
||||
if (!requirements) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`requirements is missing from proxy.perDriver.topology[${i}]`
|
||||
);
|
||||
}
|
||||
for (const reqKey in segments) {
|
||||
const connectionZone = requirements[reqKey];
|
||||
if (!connectionZone) {
|
||||
// If some part of topology is not specified in connection,
|
||||
// then assume this part is not important.
|
||||
// For example, node can report node name and zone as topology.
|
||||
// Then requirements will include both zone and node name.
|
||||
// But connection specifies only zone, and that's okay.
|
||||
continue;
|
||||
}
|
||||
const reqZone = segments[reqKey];
|
||||
if (connectionZone != reqZone) {
|
||||
matches = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (matches) {
|
||||
// this driver topology satisfies req
|
||||
return true;
|
||||
}
|
||||
}
|
||||
// we didn't find any driver topology that would match req
|
||||
return false;
|
||||
}
|
||||
|
||||
// returns (required_topology < driver_topology)
|
||||
// returns true it driver does not specify topology
|
||||
checkTopology(call, driver) {
|
||||
const requirements = call.request.accessibility_requirements?.requisite;
|
||||
if (!requirements) {
|
||||
return true;
|
||||
}
|
||||
const driverTopologies = driver.options.proxy?.perDriver?.topology;
|
||||
if (!driverTopologies) {
|
||||
return true;
|
||||
}
|
||||
|
||||
for (let reqI = 0; reqI < requirements.length; reqI++) {
|
||||
const req = requirements[reqI];
|
||||
const segments = this.nodeTopologyGenerator.stripPrefixFromMap(req.segments);
|
||||
const reqMatches = this.checkTopologyRequirement(segments, driverTopologies);
|
||||
|
||||
// this req does not match any topology from the connection
|
||||
// it doesn't make sense to check any remaining requirements
|
||||
if (!reqMatches) {
|
||||
this.ctx.logger.debug(`failed topology check: ${JSON.stringify(segments)} is not in ${JSON.stringify(driverTopologies)}`);
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`topology is not accessible for this connection: ${JSON.stringify(segments)}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
decorateTopology(volume, driver) {
|
||||
const driverTopologies = driver.options.proxy?.perDriver?.topology;
|
||||
if (!driverTopologies) {
|
||||
return;
|
||||
}
|
||||
volume.accessible_topology = [];
|
||||
for (let i = 0; i < driverTopologies.length; i++) {
|
||||
const requirements = driverTopologies[i].requirements;
|
||||
if (!requirements) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`requirements is missing from proxy.perDriver.topology[${i}]`
|
||||
);
|
||||
}
|
||||
const segments = {};
|
||||
for (const k in requirements) {
|
||||
const topologyKey = this.nodeTopologyGenerator.addPrefix(k);
|
||||
segments[topologyKey] = requirements[k];
|
||||
}
|
||||
for (const k in driverTopologies[i].extra) {
|
||||
segments[k] = driverTopologies[i].extra[k];
|
||||
}
|
||||
volume.accessible_topology.push({ segments: segments });
|
||||
}
|
||||
}
|
||||
|
||||
// ===========================================
|
||||
// Controller methods below
|
||||
// ===========================================
|
||||
|
|
@ -194,6 +288,14 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
const connectionName = parameters.connection;
|
||||
const driver = this.driverCache.lookUpConnection(connectionName);
|
||||
|
||||
const topologyOK = this.checkTopology(call, driver);
|
||||
if (!topologyOK) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`topology is not accessible for this connection`
|
||||
);
|
||||
}
|
||||
|
||||
switch (call.request.volume_content_source?.type) {
|
||||
case "snapshot": {
|
||||
const snapshotHandle = this.parseVolumeHandle(call.request.volume_content_source.snapshot.snapshot_id, snapshotIdPrefix);
|
||||
|
|
@ -228,6 +330,7 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
}
|
||||
const result = await this.checkAndRun(driver, 'CreateVolume', call);
|
||||
result.volume.volume_id = this.decorateVolumeHandle(connectionName, result.volume.volume_id);
|
||||
this.decorateTopology(result.volume, driver);
|
||||
return result;
|
||||
}
|
||||
|
||||
|
|
@ -236,7 +339,13 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
}
|
||||
|
||||
async ControllerGetVolume(call) {
|
||||
return await this.controllerRunWrapper('ControllerGetVolume', call);
|
||||
const volumeHandle = this.parseVolumeHandle(call.request.volume_id);
|
||||
const driver = this.lookUpConnection(volumeHandle.connectionName);
|
||||
call.request.volume_id = volumeHandle.realHandle;
|
||||
const result = await this.checkAndRun(driver, 'ControllerGetVolume', call);
|
||||
result.volume.volume_id = this.decorateVolumeHandle(volumeHandle.connectionName, result.volume.volume_id);
|
||||
this.decorateTopology(result.volume, driver);
|
||||
return result;
|
||||
}
|
||||
|
||||
async ControllerExpandVolume(call) {
|
||||
|
|
@ -314,6 +423,11 @@ class CsiProxyDriver extends CsiBaseDriver {
|
|||
},
|
||||
};
|
||||
break
|
||||
case 'zone':
|
||||
result.accessible_topology = {
|
||||
segments: this.nodeTopologyGenerator.generate(),
|
||||
};
|
||||
break
|
||||
default:
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
|
|
@ -627,4 +741,113 @@ class NodeIdSerializer {
|
|||
}
|
||||
}
|
||||
|
||||
class NodeTopologyGenerator {
|
||||
constructor(ctx, config) {
|
||||
this.ctx = ctx;
|
||||
this.config = config || {};
|
||||
this.prefix = this.config.prefix || TOPOLOGY_DEFAULT_PREFIX;
|
||||
this.config.fromRegexp = this.config.fromRegexp || [];
|
||||
}
|
||||
|
||||
generate() {
|
||||
const result = {};
|
||||
for (const e of this.config.fromRegexp) {
|
||||
if (!e.topologyKey) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`topologyKey is missing`
|
||||
);
|
||||
}
|
||||
const value = this.getValueFromSource(e);
|
||||
const re = '^' + (e.regex ?? "(.*)") + '$';
|
||||
const regex = new RegExp(re);
|
||||
const match = regex.exec(value);
|
||||
if (match === null) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`${e.source}: ${this.getNameFromSource()} value ${value} does not match regex ${re}`
|
||||
);
|
||||
}
|
||||
const key = this.prefix + '/' + e.topologyKey;
|
||||
const template = e.template ?? '{match:0}';
|
||||
result[key] = this.fillMatches(template, match);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
addPrefix(key) {
|
||||
return this.prefix + '/' + key;
|
||||
}
|
||||
|
||||
stripPrefix(key) {
|
||||
if (!key.startsWith(this.prefix)) {
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`topology key ${key} does not match prefix ${prefix}`
|
||||
);
|
||||
}
|
||||
// remove prefix and '/'
|
||||
return key.slice(this.prefix.length + 1);
|
||||
}
|
||||
|
||||
// checks that each key in req starts with prefix
|
||||
// returns map<key,zone> with short keys
|
||||
stripPrefixFromMap(segments) {
|
||||
const result = {};
|
||||
for (const key in segments) {
|
||||
if (!key.startsWith(this.prefix)) {
|
||||
// since topology is generated in proxy node with the same config,
|
||||
// we expect that topology prefix will always be the same
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`topology key ${key} does not match prefix ${this.prefix}`
|
||||
);
|
||||
}
|
||||
const strippedKey = this.stripPrefix(key);
|
||||
result[strippedKey] = segments[key];
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
// return string value of resource referenced by e
|
||||
getValueFromSource(e) {
|
||||
const type = e.source;
|
||||
switch (type) {
|
||||
case 'hostname': return os.hostname();
|
||||
case 'nodeName': return process.env.CSI_NODE_ID;
|
||||
case 'env': return process.env[e.envName];
|
||||
case 'file': return fs.readFileSync(e.file, "utf8");
|
||||
default:
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`unknown node topology source type: ${type}`
|
||||
);
|
||||
}
|
||||
}
|
||||
// return resource name for error logs
|
||||
getNameFromSource(e) {
|
||||
const type = e.source;
|
||||
switch (type) {
|
||||
case 'hostname': return '';
|
||||
case 'nodeName': return '';
|
||||
case 'env': return e.envName;
|
||||
case 'file': return e.file;
|
||||
default:
|
||||
throw new GrpcError(
|
||||
grpc.status.INVALID_ARGUMENT,
|
||||
`unknown node topology source type: ${type}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
fillMatches(template, matches) {
|
||||
let result = template;
|
||||
for (const i in matches) {
|
||||
const ref = `{match:${i}}`;
|
||||
result = result.replaceAll(ref, matches[i]);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
}
|
||||
|
||||
module.exports.CsiProxyDriver = CsiProxyDriver;
|
||||
|
|
|
|||
|
|
@ -125,20 +125,19 @@ There are 3 cases of cluster topology:
|
|||
- Each node has unique topology domain (`local` drivers)
|
||||
- All nodes are the same (usually the case for non-local drivers)
|
||||
- Several availability zones that can contain several nodes
|
||||
- - https://github.com/democratic-csi/democratic-csi/issues/459
|
||||
|
||||
Simple cases are currently supported by the proxy.
|
||||
Custom availability zones are TBD.
|
||||
There are 2 components to this:
|
||||
1. Node driver must correctly report its availability zone
|
||||
2. Controller must set required zone labels in volume
|
||||
|
||||
Example configuration:
|
||||
Since proxy driver should work with drivers from potentially different availability zones,
|
||||
it requires a config to distinguish zones.
|
||||
|
||||
```yaml
|
||||
proxy:
|
||||
nodeTopology:
|
||||
# allowed values:
|
||||
# node - each node has its own storage
|
||||
# cluster - the whole cluster has unified storage
|
||||
type: node
|
||||
# topology reported by CSI driver is reflected in k8s as node labels.
|
||||
# you may want to set unique prefixes on different drivers to avoid collisions
|
||||
prefix: org.democratic-csi.topology
|
||||
```
|
||||
## Custom topology: controller driver
|
||||
|
||||
The only thing needed from controller is to set topology requirements when volume is created.
|
||||
|
||||
Proxy will set these constraints when volume is created, no other configuration is required.
|
||||
|
||||
Different connections can have different topology.
|
||||
|
|
|
|||
Loading…
Reference in New Issue