Skip to content

Add weak pointer and finalizer support directly to THStorage. #9148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Jul 3, 2018

The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions. Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died. This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage. Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr. When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage. For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity). This means if you
keep calling it, we'll keep piling on finalizers. That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang [email protected]

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

I think the THPObjectPtr destruction in PyObjectFinalizer without the GIL is a bug, but I'm not sure.

allocator == &THManagedSharedAllocator) {
allocator == &THManagedSharedAllocator ||
// This is a bit racy...
self->cdata->weakcount.load() > 1) {

This comment was marked as off-topic.

This comment was marked as off-topic.

}
if(self->flag & TH_STORAGE_VIEW) {
THCStorage_free(state, self->view);
if ((storage->flag & TH_STORAGE_REFCOUNTED) && (storage->refcount.load() > 0)) {

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -174,7 +176,9 @@ THCStorage* THCStorage_newWithDataAndAllocator(
storage->scalar_type = scalar_type;
storage->data_ptr = data;
storage->size = size;
storage->refcount = 1;
new (&storage->refcount) std::atomic<int>(1);

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -22,6 +22,9 @@ class StorageRef(object):
def __init__(self, ptr):
self.cdata = ptr

def __del__(self):

This comment was marked as off-topic.

This comment was marked as off-topic.

namespace torch {

struct PyObjectFinalizer : public THFinalizer {
THPObjectPtr pyobj_;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the O(n) run time is not too bad, since it's still O(1) if you amortize it, and I'm more worried about "space leaks" due to unbounded length of finalizer chains.

// plus one more if refcount > 0
//
// - THStorage stays live as long as there are any strong
// or weak pointers to it (refcount > 0 || weakcount > 0)

This comment was marked as off-topic.

This comment was marked as off-topic.

//
// - finalizers are called and data_ptr is deallocated when refcount == 0
//
// - Once refcount == 0, it can never again be > 0 (it's monotonic)

This comment was marked as off-topic.

This comment was marked as off-topic.

# WARNING! This call to _weak_ref could lead to O(n) deleter
# behavior, if you repeatedly call it on the same Storage (all
# other sites are guarded by shared_cache; maybe this site
# should be too?)

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang added 3 commits July 9, 2018 06:53
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <[email protected]>
- Don't consider storages with weak references as "shared" (previously, they
  were unconditionally considered so.)
- Remove unnecessary "defensive" check that we don't double-free storage
- Delete retainIfLive, it is dead
- GIL-protect deallocation of THPObjectPtr

Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang ezyang force-pushed the pr/weak-ptr-and-finalizer-for-th branch from fd03aaa to b43bcef Compare July 9, 2018 14:12
@ezyang
Copy link
Contributor Author

ezyang commented Jul 9, 2018

@pytorchbot retest this please

1 similar comment
@ezyang
Copy link
Contributor Author

ezyang commented Jul 9, 2018

@pytorchbot retest this please

namespace torch {

struct PyObjectFinalizer : public THFinalizer {
THPObjectPtr pyobj_;

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 10, 2018
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: pytorch/pytorch#9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: pytorch/pytorch#9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
…h#9148)

Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: pytorch#9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants