Skip to content

Serialization of data within a tensor is slow #9168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Jul 4, 2018 · 18 comments
Closed

Serialization of data within a tensor is slow #9168

mrocklin opened this issue Jul 4, 2018 · 18 comments

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Jul 4, 2018

Issue description

When naively serializing a pytorch tensor (or model) with pickle the process takes longer than it probably should when comparing it to Numpy serialization of the same data. It appears that at some point we're converting things into a list rather than returning the buffers directly.

Code example

In [1]: import numpy, torch, pickle

In [2]: x = numpy.random.random((10000, 10000))

In [3]: t = torch.tensor(x)

In [4]: %time len(pickle.dumps(x)) / 1e6                    # around 1GB/s
CPU times: user 298 ms, sys: 415 ms, total: 713 ms
Wall time: 711 ms
Out[4]: 800.000162

In [5]: %time len(pickle.dumps(t)) / 1e6                    # around 50MB/s
CPU times: user 14.6 s, sys: 1.03 s, total: 15.7 s
Wall time: 15.7 s
Out[5]: 900.200098

The majority of this time is spent in converting the t.storage() object into a list

In [11]: %time _ = t.storage().tolist()
CPU times: user 12.3 s, sys: 891 ms, total: 13.2 s
Wall time: 13.2 s

Instead, we might consider passing around a numpy array, buffer, or memoryview, each of which will serialize much more quickly than converting to many Python objects

In [1]: import torch

In [2]: torch.__version__
Out[2]: '0.4.0'
@soumith
Copy link
Member

soumith commented Jul 4, 2018

the standard pickler will be slow. if you use torch.save to a file or file-like object, it'll be much faster, as it goes through our custom pickling logic.

Have a look here: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L212-L286

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

Right, I'm suggesting making Torch's implementation of pickle fast.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

Separately, what is the most efficient way to convert a torch model/tensor into a set of bytes? Pass an io.BytesIO to the save function?

@soumith
Copy link
Member

soumith commented Jul 4, 2018

@mrocklin the fastest way is to do t.numpy() and use whatever you'd do to numpy arrays. We dont natively provide conversion to the PyBuffer interface.

@soumith
Copy link
Member

soumith commented Jul 4, 2018

t.numpy() is a free operation, no memcpy, not much going on except to setup some C structs.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

I understand. Would the PyTorch community accept a PR that uses numpy within the __reduce__ methods in order to improve serialization performance of tensor and model objects with naive use of pickle?

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

For others, looks like using io.BytesIO gets up to about 1GB/s.

In [1]: from torchvision.models.resnet import resnet18
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: import torch

In [3]: import io

In [4]: bio = io.BytesIO()

In [5]: %%time
   ...: torch.save(model, bio)
   ...: b = bio.getvalue()
   ...: 
CPU times: user 32 ms, sys: 16.4 ms, total: 48.4 ms
Wall time: 47.7 ms

In [6]: len(b) / 0.047 / 1e6 # MB/s
Out[6]: 996.619

And then we can reconstitute

io = io.BytesIO(b)
model2 = torch.load(io)

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

For context, I maintain a parallel computing library, Dask, and users are naively passing around PyTorch objects and getting poor performance. I can special-case PyTorch models to use the trick above, but it might be a good idea to make PyTorch's general serialization solution decently fast for other libraries that run into the same problem. I think that this is probably pretty easy to do.

dask/dask-ml#281 (comment)

@soumith
Copy link
Member

soumith commented Jul 4, 2018

Would the PyTorch community accept a PR that uses numpy within the reduce methods in order to improve serialization performance

I'll discuss with the team and get back to you in a couple of days. We've avoided a dependence on numpy for functionality so far, but it's been a while since we discussed this.

I can special-case PyTorch models to use the trick above

for the moment, this seems like a good idea.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018 via email

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018 via email

@soumith
Copy link
Member

soumith commented Jul 4, 2018

MemoryView requires us to implement the Py_buffer interface, which we haven't. Implementing Py_buffer interface across Py2 and Py3 is really complicated (notes from last time we tried doing it).

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018 via email

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2018

Alternatively, PyTorch clearly has code to turn storage objects efficiently into bytestreams. It must do this with torch.save. Is there an internal method somewhere to turn a storage object directly into bytes and then back?

For example, a fully usable solution for pickle would be to call the torch.save code in the comment above and just return those bytes. This isn't quite as clean, but would behave well and doesn't require much work.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 5, 2018

Short term, here is a possible workaround that reuses torch.save within _StorageBase.__reduce__: #9184

mrocklin added a commit to mrocklin/pytorch that referenced this issue Jul 5, 2018
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storgae objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol.

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See pytorch#9168 for context
facebook-github-bot pushed a commit that referenced this issue Jul 6, 2018
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes #9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
@zou3519
Copy link
Contributor

zou3519 commented Jul 9, 2018

@mrocklin we'll discuss and get back to you

@fmassa
Copy link
Member

fmassa commented Jul 9, 2018

@zou3519 actually, I think this can be closed since #9184 was merged.

@soumith soumith closed this as completed Jul 9, 2018
goodlux pushed a commit to goodlux/pytorch that referenced this issue Aug 15, 2018
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See pytorch#9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
@tstandley
Copy link

Not to reanimate this zombie, but this issue really isn't solved. It seems like it is solved for pickling models (in the above PR) but it is not solved for tensors themselves. If I pickle a tensor ".numpy()" i get much smaller pickles and much faster reads of those pickles (at least 12x faster reads).

krishnakurtakoti added a commit to krishnakurtakoti/ML_ that referenced this issue Apr 3, 2025
…-pickle 2. greydanus  starred a repository - https://github.com/zwimpee/cursivetransformer 3. Benoît Legat's Practical 1 – Linear regressions Url- https://blegat.github.io/ccir/practical1/ 4.  The Python Pickle Module 4a. Matthew Rocklin's Blog Post Url: https://matthewrocklin.com/blog/work/2018/07/23/protocols-pickle 4b. Guillaume Lemaitre 5. WhyThisRepo component 6. PR - pytorch/pytorch#9168
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants