Skip to content

Conversation

Aidyn-A
Copy link
Collaborator

@Aidyn-A Aidyn-A commented Feb 5, 2024

This is a temporary placeholder for future PyTorch-native GPU Direct Storage until we push it to the upstream.

cc @eqy @Fuzzkatt

@eqy
Copy link
Collaborator

eqy commented Feb 5, 2024

CC @crcrpar

}

void save_data_no_gds(torch::Tensor& tensor, std::string& filename) {
c10::cuda::CUDAGuard gpuGuard(tensor.device());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a question, nor a suggestion) oh I've never used CUDAGuard directly (always have used OptionalDeviceGuard)


for size in [128, 1024, 8192]:
x = torch.empty(size, device = "cuda")
gds.load_data(x, f"{size}.data")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How complex would we expect converting the .data produced by the current API to the .pt format that PyTorch typically uses (assuming that we are doing this in host-code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm imagining that it would look something like:
gds materializes the tensors on storage (fast),
host wraps the storage in .pt files (hopefully can be done without materializing the tensor in host memory)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tensor.pt is basically a .zip that has the following structure:

.
├── byteorder
├── .data
│   └── serialization_id
├── data
│   └── 0
├── data.pkl
└── version

In this structure gds.load_data and gds.save_data should should be responsible for operating on data/0 file only.

Looking from the higher-level prospective, the serializer should handle zipping/unzipping the *.pt file, write/read byteorder, data.pkl ..., and do gds.load_data/gds.save_data on data/0. I hope that torch.serialization does not materialize it 😇

The main idea is to make it more atomic, so that other serializers like safetensors could use it.

@Aidyn-A Aidyn-A merged commit 5b67cd5 into NVIDIA:master Feb 7, 2024
@Aidyn-A Aidyn-A deleted the master branch February 7, 2024 21:31
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Feb 23, 2024
This commit upstreams NVIDIA/apex#1774 into
pytorch without api changes

The struct and its methods are pybinded as `torch._C._CudaGdsFileBase`

Something that needs fixing:
- cmake/public/cuda.cmake, CUDA::cuFile does not seem to work despite
  being mentioned here https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cufile
  I used /usr/local/cuda/lib64/libcufile.so for now but this needs to be
  properly fixed

There is a simple sanity check in sanity_check_loop_device.py. If you do
not have an ext4 or xfs mount, these are the series of commands I used
to create an ext4 filesystem on a loop device

```bash
dd if=/dev/zero of=./loopfile bs=1024 count=40000000
losetup -f
sudo losetup /dev/loop0 ./loopfile
sudo losetup /dev/loop0
sudo mkfs -t ext4 -v /dev/loop1
mkdir ./mnt/loopfs
sudo mount -t ext4 -o data=ordered /dev/loop0 ./mnt/loopfs
cd /dev/loop0
sudo chmod 777 ./mnt/loopfs
sudo umount ./mnt/loopfs/
losetup -d /dev/loop0
```
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Mar 18, 2024
This commit upstreams NVIDIA/apex#1774 into
pytorch without api changes

The struct and its methods are pybinded as `torch._C._CudaGdsFileBase`

Something that needs fixing:
- cmake/public/cuda.cmake, CUDA::cuFile does not seem to work despite
  being mentioned here https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cufile
  I used /usr/local/cuda/lib64/libcufile.so for now but this needs to be
  properly fixed

There is a simple sanity check in sanity_check_loop_device.py. If you do
not have an ext4 or xfs mount, these are the series of commands I used
to create an ext4 filesystem on a loop device

```bash
dd if=/dev/zero of=./loopfile bs=1024 count=40000000
losetup -f
sudo losetup /dev/loop0 ./loopfile
sudo losetup /dev/loop0
sudo mkfs -t ext4 -v /dev/loop1
mkdir ./mnt/loopfs
sudo mount -t ext4 -o data=ordered /dev/loop0 ./mnt/loopfs
cd /dev/loop0
sudo chmod 777 ./mnt/loopfs
sudo umount ./mnt/loopfs/
losetup -d /dev/loop0
```
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Jul 22, 2024
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants