Skip to content

migrate use of upload_fobj to use transfer #6558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

skshetry
Copy link
Collaborator

@skshetry skshetry commented Sep 8, 2021

Migrate away from upload_fobj to fs.utils.transfer
Also extend fs.utils.transfer to support file object upload and bytes upload.

Pre-requisite for merging fs.upload and fs.upload_fobj together. fs.upload_fobj won't be able to support callback.get_size as we don't have that information in the fs.upload as there's no src_fs and src_info to query to, so we are achieving that consistency in terms of callback support with the use of fs.utils.transfer.

Also pre-requisite for #6546.

@skshetry skshetry added the refactoring Factoring and re-factoring label Sep 8, 2021
@skshetry skshetry self-assigned this Sep 8, 2021
@skshetry skshetry requested a review from a team as a code owner September 8, 2021 09:55
) -> None:
use_move = isinstance(from_fs, type(to_fs)) and move
try:
if content:
Copy link
Collaborator Author

@skshetry skshetry Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit overloaded though, but moving with it for now.

dvc/fs/utils.py Outdated
else:
fobj = content
size = from_fs.getsize(from_info)
return to_fs.upload_fobj(fobj, to_info, size=size)
Copy link
Collaborator Author

@skshetry skshetry Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs.utils.transfer is the only place where we use upload_fobj now.

@skshetry skshetry force-pushed the migrate-upload-fobj-to-transfer branch from 8a9551a to 959c16a Compare September 8, 2021 10:42
@skshetry skshetry force-pushed the migrate-upload-fobj-to-transfer branch from f0d4573 to 959c16a Compare September 8, 2021 12:50
Comment on lines +72 to +74
fs.utils.transfer(
from_fs, from_info, self.fs, to_info, move=move, content=content
)
Copy link
Collaborator Author

@skshetry skshetry Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla, what do you think of this? Does transfer fit here, or should it use more primitive functions like pipe_file?
(although I want to use less API as possible)

Thinking about it again, I have doubts about this here, I only did it for simplifying the exception handling. 😅

Copy link
Contributor

@pmrowla pmrowla Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload_fobj for reference objects isn't really a file transfer, it's just a write() of the serialized reference data from memory.

It seems to me like we should still have a distinction between a file transfer and a direct binary data write/upload, since in this case there really isn't a from_fs and from_info that we are transferring.


One alternative would maybe be adding something like ReferenceHashFile.to_memfile() where it returns a memfs://some_tmp_pathname path to a MemoryFileSystem entry (containing the serialized data). And then we could do an actual "file transfer" using the memfs and memfs_path_info without needing the content field?

(but this seems a bit more convoluted than just having an fs.utils.write method)

Copy link
Collaborator Author

@skshetry skshetry Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, initially I started with two methods, but at the end fs.utils.write is going to be equivalent to upload_fobj, so I don't see much difference there. With transfer, I tried to push the problem of getting size for the callback support inside the transfer itself, rather than putting that burden on the caller.

We do the following when transferring files straight to the remote, which is just a file transfer at the end:

 with fs.open(path_info, mode="rb", chunk_size=fs.CHUNK_SIZE) as stream:
         stream = HashedStreamReader(stream)
         upload_odb.fs.upload_fobj(
             stream, tmp_info, desc=path_info.name, size=fs.getsize(path_info)
         )

But maybe the best thing to do here is put that burden on the caller itself, rather than trying to generalize it in transfer.

@skshetry skshetry force-pushed the migrate-upload-fobj-to-transfer branch from 959c16a to b8e5e50 Compare September 8, 2021 13:35
Also extend `fs.utils.transfer` to support file object upload and
bytes upload.
@skshetry skshetry force-pushed the migrate-upload-fobj-to-transfer branch from b8e5e50 to 03fa83a Compare September 8, 2021 13:42
@skshetry
Copy link
Collaborator Author

skshetry commented Sep 9, 2021

Closing this PR as we'll likely put the burden of doing callback.set_size on the caller itself.

See #6558 (comment).

@skshetry skshetry closed this Sep 9, 2021
@skshetry skshetry deleted the migrate-upload-fobj-to-transfer branch September 9, 2021 04:48
@skshetry skshetry mentioned this pull request Sep 9, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants