-
Notifications
You must be signed in to change notification settings - Fork 1.2k
migrate use of upload_fobj to use transfer #6558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate use of upload_fobj to use transfer #6558
Conversation
) -> None: | ||
use_move = isinstance(from_fs, type(to_fs)) and move | ||
try: | ||
if content: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit overloaded though, but moving with it for now.
dvc/fs/utils.py
Outdated
else: | ||
fobj = content | ||
size = from_fs.getsize(from_info) | ||
return to_fs.upload_fobj(fobj, to_info, size=size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.utils.transfer
is the only place where we use upload_fobj
now.
8a9551a
to
959c16a
Compare
f0d4573
to
959c16a
Compare
fs.utils.transfer( | ||
from_fs, from_info, self.fs, to_info, move=move, content=content | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload_fobj
for reference objects isn't really a file transfer, it's just a write()
of the serialized reference data from memory.
It seems to me like we should still have a distinction between a file transfer and a direct binary data write/upload, since in this case there really isn't a from_fs
and from_info
that we are transferring.
One alternative would maybe be adding something like ReferenceHashFile.to_memfile()
where it returns a memfs://some_tmp_pathname
path to a MemoryFileSystem
entry (containing the serialized data). And then we could do an actual "file transfer" using the memfs and memfs_path_info without needing the content
field?
(but this seems a bit more convoluted than just having an fs.utils.write
method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, initially I started with two methods, but at the end fs.utils.write
is going to be equivalent to upload_fobj
, so I don't see much difference there. With transfer
, I tried to push the problem of getting size
for the callback support inside the transfer
itself, rather than putting that burden on the caller.
We do the following when transferring files straight to the remote, which is just a file transfer at the end:
with fs.open(path_info, mode="rb", chunk_size=fs.CHUNK_SIZE) as stream:
stream = HashedStreamReader(stream)
upload_odb.fs.upload_fobj(
stream, tmp_info, desc=path_info.name, size=fs.getsize(path_info)
)
But maybe the best thing to do here is put that burden on the caller itself, rather than trying to generalize it in transfer
.
959c16a
to
b8e5e50
Compare
Also extend `fs.utils.transfer` to support file object upload and bytes upload.
b8e5e50
to
03fa83a
Compare
Closing this PR as we'll likely put the burden of doing See #6558 (comment). |
Migrate away from
upload_fobj
tofs.utils.transfer
Also extend
fs.utils.transfer
to support file object upload and bytes upload.Pre-requisite for merging
fs.upload
andfs.upload_fobj
together.fs.upload_fobj
won't be able to supportcallback.get_size
as we don't have that information in thefs.upload
as there's nosrc_fs
andsrc_info
to query to, so we are achieving that consistency in terms of callback support with the use offs.utils.transfer
.Also pre-requisite for #6546.