Add weak pointer and finalizer support directly to THStorage. #9148

ezyang · 2018-07-03T21:49:07Z

The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions. Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died. This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage. Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr. When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage. For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity). This means if you
keep calling it, we'll keep piling on finalizers. That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang [email protected]

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

colesbury

Generally looks good.

I think the THPObjectPtr destruction in PyObjectFinalizer without the GIL is a bug, but I'm not sure.

torch/csrc/generic/StorageSharing.cpp

-      allocator == &THManagedSharedAllocator) {
+      allocator == &THManagedSharedAllocator ||
+      // This is a bit racy...
+      self->cdata->weakcount.load() > 1) {


aten/src/THC/THCStorage.cpp

-    }
-    if(self->flag & TH_STORAGE_VIEW) {
-      THCStorage_free(state, self->view);
+  if ((storage->flag & TH_STORAGE_REFCOUNTED) && (storage->refcount.load() > 0)) {


aten/src/THC/THCStorage.cpp

@@ -174,7 +176,9 @@ THCStorage* THCStorage_newWithDataAndAllocator(
  storage->scalar_type = scalar_type;
  storage->data_ptr = data;
  storage->size = size;
-  storage->refcount = 1;
+  new (&storage->refcount) std::atomic<int>(1);


torch/multiprocessing/reductions.py

@@ -22,6 +22,9 @@ class StorageRef(object):
    def __init__(self, ptr):
        self.cdata = ptr

+    def __del__(self):


torch/csrc/finalizer.h

+namespace torch {
+
+struct PyObjectFinalizer : public THFinalizer {
+  THPObjectPtr pyobj_;


facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

apaszke

Well the O(n) run time is not too bad, since it's still O(1) if you amortize it, and I'm more worried about "space leaks" due to unbounded length of finalizer chains.

aten/src/TH/THStorage.hpp

+//      plus one more if refcount > 0
+//
+//  - THStorage stays live as long as there are any strong
+//    or weak pointers to it (refcount > 0 || weakcount > 0)


aten/src/TH/THStorage.hpp

+//
+//  - finalizers are called and data_ptr is deallocated when refcount == 0
+//
+//  - Once refcount == 0, it can never again be > 0 (it's monotonic)


torch/multiprocessing/reductions.py

+    # WARNING!  This call to _weak_ref could lead to O(n) deleter
+    # behavior, if you repeatedly call it on the same Storage (all
+    # other sites are guarded by shared_cache; maybe this site
+    # should be too?)


The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <[email protected]>

- Don't consider storages with weak references as "shared" (previously, they were unconditionally considered so.) - Remove unnecessary "defensive" check that we don't double-free storage - Delete retainIfLive, it is dead - GIL-protect deallocation of THPObjectPtr Signed-off-by: Edward Z. Yang <[email protected]>

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang · 2018-07-09T15:30:36Z

@pytorchbot retest this please

ezyang · 2018-07-09T15:30:39Z

@pytorchbot retest this please

torch/csrc/finalizer.h

+namespace torch {
+
+struct PyObjectFinalizer : public THFinalizer {
+  THPObjectPtr pyobj_;


facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch/pytorch#9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f

…h#9148) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f

ezyang requested review from smessmer, gchanan and zou3519 July 3, 2018 21:49

ezyang requested review from apaszke, colesbury, soumith and zdevito as code owners July 3, 2018 21:49

facebook-github-bot reviewed Jul 3, 2018

View reviewed changes

colesbury reviewed Jul 3, 2018

View reviewed changes

facebook-github-bot reviewed Jul 4, 2018

View reviewed changes

apaszke reviewed Jul 6, 2018

View reviewed changes

ezyang added 3 commits July 9, 2018 06:53

CR comments

b43bcef

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang force-pushed the pr/weak-ptr-and-finalizer-for-th branch from fd03aaa to b43bcef Compare July 9, 2018 14:12

colesbury approved these changes Jul 9, 2018

View reviewed changes

torch/csrc/finalizer.h

namespace torch {

struct PyObjectFinalizer : public THFinalizer {

THPObjectPtr pyobj_;

This comment was marked as off-topic.

Sign in to view

facebook-github-bot reviewed Jul 9, 2018

View reviewed changes

ezyang mentioned this pull request Jul 9, 2018

Unify THAllocator and THCDeviceAllocator with at::Allocator/Deleter #9278

Closed

facebook-github-bot closed this in d0d1820 Jul 10, 2018

ezyang added the merged label Jun 26, 2019

Add weak pointer and finalizer support directly to THStorage. #9148

Add weak pointer and finalizer support directly to THStorage. #9148

Uh oh!

Conversation

ezyang commented Jul 3, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Jul 9, 2018

Uh oh!

ezyang commented Jul 9, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!