Skip to content

[k8s] Improve /dev/fuse access on k8s #5028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 66 commits into from
Apr 12, 2025
Merged

[k8s] Improve /dev/fuse access on k8s #5028

merged 66 commits into from
Apr 12, 2025

Conversation

aylei
Copy link
Collaborator

@aylei aylei commented Mar 25, 2025

close #4108

This PR introduces a privileged kubernetes DaemonSet to proxy fuse mount/unmount operations, so that we get rid of the additional privileges and capabilities of SkyPilot Pods. For elaboration, refer to https://github.com/skypilot-org/skypilot/blob/improve-k8s-fuse/addons/fuse-proxy/README.md

Benchmark:

command: fio --name=64kseqwrites --rw=write --direct=1 --bs=1k --numjobs=1 --iodepth=8 --size=100M --group_reporting

Client Storage Bandwidth
GKE with fuse-proxy S3 10.9MiB/s
GKE with fuse-proxy GCS 9245KiB/s
GKE with smarter-device-plugin S3 N/A (Error occurred)
GKE with smarter-device-plugin GCS 8304KiB/s
GCP VM S3 13.7MiB/s
GCP VM GCS 12.4MiB/s

There is a performance degradation compared to plain VM (both using 2c resources), need to figure why in the future. But compared to existing solution (smarter-device-plugin), there is performance regression.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • GCS fuse mount/unmount
    • S3 fuse mount/unmount
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • /smoke-test --kubernetes -k test_docker_storage_mounts
  • /smoke-test --kubernetes -k test_kubernetes_storage_mounts
  • /smoke-test --aws -k test_docker_storage_mounts verify there is no regression for non-k8s fuse mount
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)
  • Manually tested GKE autoscaling from 0 with GPU request and GCS mount

Future TODOs

@aylei aylei force-pushed the improve-k8s-fuse branch from 2763f7d to 6756754 Compare March 26, 2025 03:36
aylei added 4 commits March 26, 2025 12:10
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
@aylei
Copy link
Collaborator Author

aylei commented Mar 26, 2025

/smoke-test --kubernetes -k test_kubernetes_storage_mounts

aylei added 4 commits March 26, 2025 13:42
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
@aylei
Copy link
Collaborator Author

aylei commented Mar 27, 2025

/smoke-test --kubernetes -k test_kubernetes_storage_mounts

Signed-off-by: Aylei <[email protected]>
@aylei
Copy link
Collaborator Author

aylei commented Apr 8, 2025

/smoke-test --aws -k test_docker_storage_mounts

https://buildkite.com/skypilot-1/smoke-tests/builds/631 Verify there is no regression for non-k8s fuse mount

Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go code LGTM

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing work @aylei! Sending some quick comments, will give it a go in a bit. Mostly looks good to me

@romilbhardwaj
Copy link
Collaborator

@aylei have we stress tested our new fuse solution in some way?

I'm trying to run fio --name=64kseqwrites --rw=write --direct=1 --bs=1k --numjobs=1 --iodepth=8 --size=1M --group_reporting by sshing into a cluster created with this YAML:

num_nodes: 2

file_mounts:
  /mydata:
    name: romiltestbucket9
    store: s3
    source: ~/tmp-workdir

run: |
  ls /mydata
  sudo apt install fio -y 

gets stuck on:

64kseqwrites: (g=0): rw=write, bs=(R) 1024B-1024B, (W) 1024B-1024B, (T) 1024B-1024B, ioengine=psync, iodepth=8
fio-3.25
Starting 1 process
64kseqwrites: Laying out IO file (1 file / 1MiB)

To be fair, our current smarter-devices-fuse based solution also fails with a transport endpoint failure. However, it works when running on cloud VMs.

@aylei
Copy link
Collaborator Author

aylei commented Apr 11, 2025

@romilbhardwaj Thanks for testing it out! I will take a look

@aylei
Copy link
Collaborator Author

aylei commented Apr 11, 2025

/smoke-test --aws -k test_docker_storage_mounts
https://buildkite.com/skypilot-1/smoke-tests/builds/664

@aylei
Copy link
Collaborator Author

aylei commented Apr 11, 2025

gets stuck on:

64kseqwrites: (g=0): rw=write, bs=(R) 1024B-1024B, (W) 1024B-1024B, (T) 1024B-1024B, ioengine=psync, iodepth=8

@romilbhardwaj It turns out there will be an error when running fio, and if the environment does not have syslog enabled, goofys will crash

2025/04/11 09:20:07.153474 fuse.ERROR *fuseops.FallocateOp error: function not implemented

After the latest commit: 5b6fc58, this problem is addressed:

fio --name=64kseqwrites --rw=write --direct=1 --bs=1k --numjobs=1 --iodepth=8 --size=1M --group_reporting
64kseqwrites: (g=0): rw=write, bs=(R) 1024B-1024B, (W) 1024B-1024B, (T) 1024B-1024B, ioengine=psync, iodepth=8
fio-3.25
Starting 1 process
64kseqwrites: Laying out IO file (1 file / 1MiB)

64kseqwrites: (groupid=0, jobs=1): err= 0: pid=17212: Fri Apr 11 09:20:07 2025
  write: IOPS=8827, BW=8828KiB/s (9039kB/s)(1024KiB/116msec); 0 zone resets
    clat (usec): min=66, max=919, avg=111.66, stdev=57.73
     lat (usec): min=66, max=919, avg=111.78, stdev=57.78
    clat percentiles (usec):
     |  1.00th=[   71],  5.00th=[   77], 10.00th=[   79], 20.00th=[   84],
     | 30.00th=[   88], 40.00th=[   91], 50.00th=[   94], 60.00th=[   99],
     | 70.00th=[  106], 80.00th=[  122], 90.00th=[  165], 95.00th=[  208],
     | 99.00th=[  338], 99.50th=[  412], 99.90th=[  676], 99.95th=[  922],
     | 99.99th=[  922]
  lat (usec)   : 100=61.33%, 250=35.35%, 500=3.12%, 750=0.10%, 1000=0.10%
  cpu          : usr=0.00%, sys=18.26%, ctx=2049, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=8828KiB/s (9039kB/s), 8828KiB/s-8828KiB/s (9039kB/s-9039kB/s), io=1024KiB (1049kB), run=116-116msec

@aylei
Copy link
Collaborator Author

aylei commented Apr 11, 2025

/smoke-test --kubernetes -k test_kubernetes_storage_mounts
/smoke-test --kubernetes -k test_docker_storage_mounts

https://buildkite.com/skypilot-1/smoke-tests/builds/665

@aylei
Copy link
Collaborator Author

aylei commented Apr 11, 2025

@romilbhardwaj I also update the stress test result in the PR description, ready for another round of review!

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @aylei! Super excited to get this in! LGTM.

Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is an exciting improvement

@aylei
Copy link
Collaborator Author

aylei commented Apr 12, 2025

/smoke-test

https://buildkite.com/skypilot-1/smoke-tests/builds/679

@aylei
Copy link
Collaborator Author

aylei commented Apr 12, 2025

Given the smoke test & stress test result, merging into master and track follow-ups in separate issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants