Skip to content

CSI support for hypervisor container runtimes #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
resouer opened this issue Jan 11, 2018 · 13 comments
Open

CSI support for hypervisor container runtimes #166

resouer opened this issue Jan 11, 2018 · 13 comments
Labels

Comments

@resouer
Copy link

resouer commented Jan 11, 2018

As we've discussed this in sig-storage meeting, we would like to propose a meaningful feature to CSI spec, which aims at leveraging hypervisor based container runtimes e.g. (KataContainers, virtlet, KubeVirt etc) to use CSI in the future.

  • The aim is to make it possible for runtimes like KataContainers to bypass the attach phase and go to mount phase directly, and then, Kata will mount a block device (UPDATE: and other cases as well) to the VM-based-pod directly, instead of doing bind mount which is much slower in hypervisor case.
  • Currently, we (Miratis, Hyper etc) are using flexvolume as workaround, e.g. https://github.com/kubernetes/frakti/blob/master/pkg/flexvolume/flexvolume.go While this patch is not portable and can not serve general purpose since it should be bound with specific plugin (e.g. Cinder etc).
  • This feature is also in the scope of Secure Runtime feature in sig-node's Q1 plan (p0). We already integrated Kata with CRI and CNI. And CSI will help us a lot to integrate Kata with containerd cri-o etc.
    To serve the minimal purpose, only a minor change is expected from CSI side, please refer this slides for details:

https://docs.google.com/presentation/d/1kPeia7wLqoKQI0oX4pvVdH1UpcPx3lpmFK4P_E6oiIc/edit#slide=id.p

The pseudo code of CSI change is here: https://github.com/bergwolf/spec/tree/detached_volume

We can of course schedule meeting or talk in next sync for future discussion, while this issue can be used as feature tracker.

CC:
Kata maintainers @bergwolf @sameo @gnawux
sig-storage @saad-ali @jingxu97
CSI @jieyu
RH: @rootfs Miratis: @ivan4th

@rootfs
Copy link

rootfs commented Jan 11, 2018

Instead of changing API fingerprint, making new APIs make easier for compatibility purpose.

In our use case, we also need Node Reserve/Release Volume to ensure volumes are only used by one node, if these volumes don't support multi-attach. I believe this also helps Kata container.

cc @fabiand

@fabiand
Copy link

fabiand commented Jan 11, 2018

Please also take into account that delegating the whole volume consumption to the hypervisor runtime is also benfitial - i.e. if we want to let qemu directly connect to the iSCSI target.

Not sure if this issue should serve just the original request, or also the one from @rootfs and mine (which are all three different ones).

@bergwolf
Copy link

@rootfs @fabiand Thanks for bring it up. And yes, Reserve/Release Volume is useful for Kata container as well. But I think it should be a Controller API in stead, because a node can be disconnected with CO but still have access to the storage network, in which case CO needs to call Release Volume before Reserving/Publishing the volume to another node. Also IMO Reserve Volume can take an owner argument so that CO can decide who (a node or a vm on the node) shall have exclusive access to the volume. WDYT?

Instead of changing API fingerprint, making new APIs make easier for compatibility purpose.

@rootfs, how about introducing a new NodePublishDetachedVolume() API? It keeps most semantics of NodePublishVolume() except presenting/mounting the volume at target_path, which will not be included in NodePublishDetachedVolumeRequest.

@rootfs
Copy link

rootfs commented Jan 11, 2018

@bergwolf +1

@cpuguy83
Copy link
Contributor

A couple of questions... and apologies if you explained this on the CSI call yesterday I had to miss the first half of it and I'm sure I don't understand the issues faced by VM based runtimes...

  1. Is detached mode an optimization for vm based runtimes or is it a requirement to work at all?
  2. Is this only for block devices?

For compatibility/extensibility purposes it may be good to give the mode it's own message type or perhaps make it an enum rather than a bool.

The ability to support the detached mode likely should also be a capability returned by the plugin.

@rootfs
Copy link

rootfs commented Jan 11, 2018

It is both optimized (through QEMU block without have to attach the volume to the host) and required (for isolation) mode for VM and limited to block devices.

@bergwolf Detached is probably ambiguous here - the volume may never be attached in the first place.

@bergwolf
Copy link

@rootfs @cpuguy83 It is not limited to block devices. We have implemented NFS support and SMB can be added as well. In theory any remote storage can be added in detached mode. There is an agent program in Kata container that can help storage setup directly in the guest.

@rootfs Detached is in contrast with NodePublishVolume() that IIUC always attaches the volume to the host.

@fabiand
Copy link

fabiand commented Jan 11, 2018

Yep, as @bergwolf says, this could also relevant for file-mode (specifically nfs).

For KuebVrit however, we are primarily interested in delegating the block storage attach to qemu. (not file).

@saad-ali
Copy link
Member

saad-ali commented Nov 2, 2018

Will address post v1. Related issue by @cpuguy83 about letting CO control mount -- should align these designs

@resouer
Copy link
Author

resouer commented Nov 4, 2018

@saad-ali That's great. Has the issue been sent out?

@julian-hj
Copy link
Contributor

@resouer I believe that Saad was referring to #96

@bergwolf
Copy link

CC @xing-yang @jingxu97

@aarondav
Copy link

aarondav commented Jun 9, 2021

Any progress on this issue? Would love CSI support for hypervisor runtimes like kata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants