Skip to content

gh-127750: Fix singledispatchmethod caching #127839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Dec 11, 2024

We address two issues:

  • The singledispatchmethod would work incorrectly with objects that compare equal (but are not identical)
  • The singledispatchmethod cache would hold on to objects forever

See #127751

@eendebakpt eendebakpt changed the title Draft: gh-127750: Fix singledispatchmethod caching gh-127750: Fix singledispatchmethod caching Dec 11, 2024
@rhettinger rhettinger removed their request for review December 12, 2024 04:23
@vodik
Copy link

vodik commented Dec 12, 2024

I tried this locally and added and it seems to be working correctly. Haven't benchmarked it either though.

The cached methods seem to stick around in the cache until the gc fires. I haven't been able to find a way to make them stick around.

@AlexWaygood AlexWaygood self-requested a review December 12, 2024 07:44
@dg-pb
Copy link
Contributor

dg-pb commented Jan 7, 2025

Does CPython re-use id values after instances are destroyed?

SO says that yes, but it was in 2018. Is it still true?

If so, then can identity be reliably cached at all?

Maybe similar approach to cached_property could be used here? Although it would restrict to classes that have instance.__dict__.

@eendebakpt
Copy link
Contributor Author

Does CPython re-use id values after instances are destroyed?

SO says that yes, but it was in 2018. Is it still true?

If so, then can identity be reliably cached at all?

Maybe similar approach to cached_property could be used here? Although it would restrict to classes that have instance.__dict__.

Yes, cpython re-uses the id (the id is equal to the memory address, and memory is reused). But the id is unique during the lifetime of an object, see https://docs.python.org/3/library/functions.html#id.

The _method in the WeakValueDictionary has a reference to obj, so the key id(obj)is valid and unique as long as _method exists. Once the garbage collector detects a cycle containing the _method it will remove it from the dictionary. But since obj is part of this cycle, it also means obj no longer exist and id(obj) is free to re-use.

But your remark got me thinking: the PR right now is reliable, but maybe not very efficient. Running gc.collect clears the entire cache. The benchmarks I executed showed an improvement, but that might very well be because the garbage collector is not running often.

@dg-pb
Copy link
Contributor

dg-pb commented Jan 7, 2025

Running gc.collect clears the entire cache.

Could you clarify what do you mean by this?

@eendebakpt
Copy link
Contributor Author

Running gc.collect clears the entire cache.

Could you clarify what do you mean by this?
Take for example:

@dataclass
class A:
    value: int

    @singledispatchmethod
    def dispatch(self, x):
        return id(self)
t1 = A(1)
t1.dispatch(10)

Right after running this code the cache (e.g. _method_cache) will contain a single key-value pair: the key is id(t1) and the value is the _method object created in the dispatcher. The _method has a positive reference count (since _method.__self_reference holds a reference to _method).
When the garbage collector runs (see Python internal documentation on GC for details) the _method is part of a cycle of objects (e.g. _method -> _method.__self_reference -> _method) and is removed. Since there the _method object is no longer alive, the WeakValueDictionary cache _method_cache will remove the key-value pair.

We can address the issue by creating a reference from obj to the entry in the cache. Something like:

        try:
            setattr(obj, f'_singledispatchmethod_cache_{id(self)}', _method)  # create strong reference to _method which is not an isolated cycle. see gh-127751
        except Exception as ex:
            # frozen object, disable the cache
            self._method_cache = None

This works, but has the disadvantage that we are adding some (private) attributes in the object instance.

Note: the documentation https://docs.python.org/3/library/weakref.html#weakref.WeakValueDictionary states:

Mapping class that references values weakly. Entries in the dictionary will be discarded when no strong reference to the value exists any more.

That text is a bit misleading: there is a reference to _method, but still it is removed. I will make a separate PR to address this.

@dg-pb
Copy link
Contributor

dg-pb commented Jan 8, 2025

I see now.
I think simplifying this might be worthwhile.

Also, I would like the method to be cached for the lifetime of my instance, doesn't it currently garbage collect perfectly good cached methods?

This works, but has the disadvantage that we are adding some (private) attributes in the object instance.

What about the same way as cached_property works?

class singledispatchmethod:
    def __set_name__(self, owner, name):
        self.attrname = name

    def __get__(self, instance, owner=None):
        try:
            cache = instance.__dict__
        except AttributeError:
            # disable caching going forward
        else:
            method = cache.get(self.attrname)
            if method is not None:
                return method
            else:
                ...

except KeyError:
pass
else:
return _method

dispatch = self.dispatcher.dispatch
funcname = getattr(self.func, '__name__', 'singledispatchmethod method')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe moving this line just above the TypeError would be a bit more readable?
The performance benefit this offers would only matter if there was a code which actively depends on handling this exception.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented Jan 8, 2025

I see now. I think simplifying this might be worthwhile.

Also, I would like the method to be cached for the lifetime of my instance, doesn't it currently garbage collect perfectly good cached methods?

Yes, that is why I did not like the approach in this PR.

What about the same way as cached_property works?

Nice idea. I created an alternative PR for this variation.

@eendebakpt
Copy link
Contributor Author

Issue resolved in other PRs, closing

@eendebakpt eendebakpt closed this Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants