# PoC: Multi-Network / DRA-Driver / Network Status This PoC is about 3 different things: 1. ResourceClaim Status for Networking 2. Container Runtime as DRA Driver 3. Network DRA Driver as Re-Usable Framework/Pattern Repositories: * https://github.com/LionelJouin/multi-network/tree/framework * https://github.com/LionelJouin/kubernetes/tree/dra-device-status * https://github.com/LionelJouin/containerd/tree/dra-cni Summary: * [Flow](#Flow) * [PoC](#PoC) * [ResourceClaim Status for Networking](#ResourceClaim-Status-for-Networking) * [Container Runtime as DRA Driver](#Container-Runtime-as-DRA-Driver) * [Network DRA Driver as Re-Usable Framework/Pattern](#Network-DRA-Driver-as-Re-Usable-FrameworkPattern) * [Build](#Build) * [Demo](#Demo) * [Resources](#Resources) ## Flow ![Containerd-PoC](https://gist.github.com/user-attachments/assets/50bb0bef-e2a5-4b42-a356-c076d16a6d9c) 1. [NodePrepareResources](https://github.com/kubernetes/kubernetes/blob/v1.31.0/staging/src/k8s.io/kubelet/pkg/apis/dra/v1alpha4/api.proto#L35) is called from Kubelet to the DRA Driver with the list of claim names/UIDs to prepare. 2. [The Claims are retrieved from the Kubernetes API](https://github.com/LionelJouin/multi-network/blob/framework/pkg/dra/driver.go#L115), so the devices are prepared (Stored in order to be used when CNI ADD will be called) and returned from the NodePrepareResources call. 3. Kubelet calls RunPodSandbox to the Container Runtime, in order to create the Pod. 4. During the RunPodSandbox process, [the claims for the pod currently handled are retrieved from step 2](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L63), and [the CNIs are called](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L180) based on the information contained in the claims. 5. [The status is set and updated via the Kubernetes API](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L145), then the RunPodSandbox call is finished. ## PoC ### ResourceClaim Status for Networking * https://github.com/LionelJouin/kubernetes/tree/dra-device-status API: https://github.com/LionelJouin/kubernetes/blob/dra-device-status/pkg/apis/resource/types.go#L1102 The `ResourceClaimStatus` has been extended to contain a new field: - `AllocatedDeviceStatus`: A field containing the status of an allocated device. This contains two ways to report actual data of the device: - `DeviceInfo`: A field accepting any kind of data like the opaque parameters (.spec.devices.config.opaque.parameters). - `NetworkDeviceInfo`: A field only for the network devices. ```golang // ResourceClaimStatus tracks whether the resource has been allocated and what // the result of that was. type ResourceClaimStatus struct { ... // DeviceStatuses contains the status of each device allocated for this // claim, as reported by the driver. This can include driver-specific // information. Entries are owned by their respective drivers. // // +optional // +listType=map // +listMapKey=devicePoolName // +listMapKey=deviceName DeviceStatuses []AllocatedDeviceStatus `json:"deviceStatuses,omitempty" protobuf:"bytes,4,opt,name=deviceStatuses"` } // AllocatedDeviceStatus contains the status of an allocated device, if the // driver chooses to report it. This may include driver-specific information. type AllocatedDeviceStatus struct { // Request is the name of the request in the claim which caused this // device to be allocated. Multiple devices may have been allocated // per request. // // +required Request string `json:"request" protobuf:"bytes,1,rep,name=request"` // Driver specifies the name of the DRA driver whose kubelet // plugin should be invoked to process the allocation once the claim is // needed on a node. // // Must be a DNS subdomain and should end with a DNS domain owned by the // vendor of the driver. // // +required Driver string `json:"driver" protobuf:"bytes,2,rep,name=driver"` // This name together with the driver name and the device name field // identify which device was allocated (`//`). // // Must not be longer than 253 characters and may contain one or more // DNS sub-domains separated by slashes. // // +required Pool string `json:"pool" protobuf:"bytes,3,rep,name=pool"` // Device references one device instance via its name in the driver's // resource pool. It must be a DNS label. // // +required Device string `json:"device" protobuf:"bytes,4,rep,name=device"` // Conditions contains the latest observation of the device's state. // If the device has been configured according to the class and claim // config references, the `Ready` condition should be True. // // +optional // +listType=atomic Conditions []metav1.Condition `json:"conditions" protobuf:"bytes,5,rep,name=conditions"` // DeviceInfo contains Arbitrary driver-specific data. // // +optional DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"` // NetworkDeviceInfo contains network-related information specific to the device. // // +optional NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"` } // NetworkDeviceInfo provides network-related details for the allocated device. // This information may be filled by drivers or other components to configure // or identify the device within a network context. type NetworkDeviceInfo struct { // Interface specifies the name of the network interface associated with // the allocated device. This might be the name of a physical or virtual // network interface. // // +optional Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"` // IPs lists the IP addresses assigned to the device's network interface. // This can include both IPv4 and IPv6 addresses. // // +optional IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"` // Mac represents the MAC address of the device's network interface. // // +optional Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"` } ``` Here is an example of the final ResourceClaim for the demo shown in this PoC: ```yaml apiVersion: resource.k8s.io/v1alpha3 kind: ResourceClaim metadata: name: macvlan-eth0-attachment spec: devices: config: - opaque: driver: poc.dra.networking parameters: config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [ { "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }' interface: net1 requests: - macvlan-eth0 requests: - allocationMode: ExactCount count: 1 deviceClassName: cni-v1 name: macvlan-eth0 status: allocation: devices: config: - opaque: driver: poc.dra.networking parameters: config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [ { "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }' interface: net1 requests: - macvlan-eth0 source: FromClaim results: - device: cni driver: poc.dra.networking pool: kind-worker request: macvlan-eth0 nodeSelector: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - kind-worker deviceStatuses: - conditions: null device: cni deviceInfo: cniVersion: 1.0.0 interfaces: - mac: 1e:32:6c:b7:c9:66 name: net1 sandbox: /var/run/netns/cni-5b7c0846-7995-9450-f441-a177399d08d5 ips: - address: 10.10.1.2/24 gateway: 10.10.1.1 interface: 0 driver: poc.dra.networking networkDeviceInfo: interface: net1 ips: - 10.10.1.2/24 mac: 1e:32:6c:b7:c9:66 pool: kind-worker request: macvlan-eth0 reservedFor: - name: demo-a resource: pods uid: 2bd46adf-b478-4e25-9e37-828539799169 ``` ### Container Runtime as DRA Driver * https://github.com/LionelJouin/containerd/tree/dra-cni The Networking DRA Driver is running in Containerd, so the NRI plugin required in previous PoCs ([LionelJouin/network-dra](https://github.com/LionelJouin/network-dra) / [aojea/kubernetes-network-driver](https://github.com/aojea/kubernetes-network-driver)) is no longer required. However, Containerd now requires Kubernetes API access in order to get the ResourceClaims (on NodePrepareResources, step 1 in the flow picture) and to update the ResourceClaims Status (after CNI Add, step 5 in the flow picture). This PoC uses the kubelet kubeconfig to access the API (Status update should be allowed from kubelet access in that case). In Kind, Containerd starts before kubelet, so this PoC [keeps retrying to get the kubeconfig from a goroutine](https://github.com/LionelJouin/containerd/blob/dra-cni/internal/cri/server/service.go#L325). Once the kubeconfig is retrieved, Containerd will also register itself as DRA plugin ([Status](https://github.com/kubernetes/cri-api/blob/v0.31.0/pkg/apis/runtime/v1/api.proto#L117) could be improved to advertise the availability of the networking DRA Driver?). When a pod is created, its default primary network will be set up and [the other networks will be set up right after](https://github.com/LionelJouin/containerd/blob/dra-cni/internal/cri/server/sandbox_run.go#L507). ### Network DRA Driver as Re-Usable Framework/Pattern * https://github.com/LionelJouin/multi-network/tree/framework Highlighted with the [aojea/kubernetes-network-driver](https://github.com/aojea/kubernetes-network-driver) PoC, a DRA Driver for Networking could be created. NodePrepareResources would [retrieve the Resources Claims](https://github.com/LionelJouin/multi-network/blob/framework/pkg/dra/driver.go#L115) to be used, [store them](https://github.com/LionelJouin/multi-network/blob/framework/pkg/dra/driver.go#L133), so when the [function to add the networks](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L55) is called (on RunPodSandbox), the Resource Claims are already known and can be easily [retrieved](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L63) to [add the networks to the pod](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L111) and [update the status](https://github.com/LionelJouin/multi-network/blob/framework/pkg/cni/v1/cni.go#L145). ## Build Clone Kind ```bash git clone git@github.com:kubernetes-sigs/kind.git ``` Build Kind base image ```bash make -C images/base quick EXTRA_BUILD_OPT="--build-arg CONTAINERD_CLONE_URL=https://github.com/LionelJouin/containerd --build-arg CONTAINERD_VERSION=dra-cni --no-cache" TAG=dra-cni ``` Clone the Kubernetes fork ```bash git clone git@github.com:kubernetes/kubernetes.git cd kubernetes git remote add LionelJouin git@github.com:LionelJouin/kubernetes.git git fetch LionelJouin git checkout LionelJouin/dra-device-status ``` Build Kind image ```bash kind build node-image . --image kindest/node:dra-cni-status --base-image gcr.io/k8s-staging-kind/base:dra-cni ``` ## Demo Kind Cluster config: ```yaml --- kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 featureGates: "DynamicResourceAllocation": true "DRAControlPlaneController": true runtimeConfig: "resource.k8s.io/v1alpha3": true kubeadmConfigPatches: - | apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration logging: verbosity: 10 - | kind: ClusterConfiguration apiServer: extraArgs: v: "4" scheduler: extraArgs: v: "4" controllerManager: extraArgs: v: "4" containerdConfigPatches: - |- [plugins."io.containerd.grpc.v1.cri"] enable_cdi = true [plugins.'io.containerd.grpc.v1.cri'.cni] cni_dra = true nodes: - role: control-plane image: kindest/node:dra-cni-status - role: worker image: kindest/node:dra-cni-status ``` Install CNI Plugins: ```bash kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/e2e/templates/cni-install.yml.j2 ``` Apply ResourceSlice: ```yaml cat <