Skip to content

bpo-42972: Fully implement GC protocol for functools keywrapper and partial types #26363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 28, 2021

Conversation

erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented May 25, 2021

@shihai1991 shihai1991 requested a review from methane May 26, 2021 10:19
@erlend-aasland erlend-aasland force-pushed the bpo-42972/functools branch 2 times, most recently from ce657de to f6e6676 Compare May 26, 2021 20:48
@erlend-aasland erlend-aasland marked this pull request as draft May 26, 2021 20:51
@erlend-aasland erlend-aasland marked this pull request as ready for review May 27, 2021 00:01
Copy link
Member

@shihai1991 shihai1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your update :)

@erlend-aasland
Copy link
Contributor Author

Thanks for your update :)

Thank you so much for reviewing and pointing out the weakness in the LRU cache :)

Py_VISIT(pto->args);
Py_VISIT(pto->kw);
Py_VISIT(pto->dict);
Py_VISIT(Py_TYPE(pto));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to always start by visiting the type in all traverse functions. So same remark for other traverse functions.

I like to look into partialobject structure and visit them by their definition order. Technically, ob_type is the first one :-D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll apply to all remaining PR's :)

{Py_tp_methods, partial_methods},
{Py_tp_members, partial_memberlist},
{Py_tp_getset, partial_getsetlist},
{Py_tp_new, partial_new},
{Py_tp_free, PyObject_GC_Del},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep it and maybe only change that in 3.11 (in a separated PR)? Type inheritance is more complex than what it looks. See https://bugs.python.org/issue43770 for examples of bad surprised that I got when trying to use the default implementation of tp_getattro and tp_setattro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

PyObject_Free(ko);
PyObject_GC_UnTrack(ko);
(void)keyobject_clear(ko);
PyObject_GC_Del(ko);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO here it's ok to hardcode PyObject_GC_Del(), but in general I would suggest to call tp->tp_free(ko); in case tp_alloc/tp_free is overriden in a subclass. To make the code more consistent, I would suggest to always call tp_free in dealloc functions. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree :) Consistency FTW

@erlend-aasland
Copy link
Contributor Author

@pablogsal, @shihai1991, @methane: I've reverted the LRU cache optimisations as per @vstinner's request.

Py_VISIT(link->result);
Py_VISIT(Py_TYPE(link));
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lru_list_elem_clear() function would also be needed to implement the GC protocol, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lru_list_elem_clear() function would also be needed to implement the GC protocol, no?

I was under the impression traverse was enough. I may be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without clear, the ref count cannot each zero, and the GC cannot break cycles: https://devguide.python.org/garbage_collector/#destroying-unreachable-objects

cc @pablogsal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_lru_list_elem is not a standalone type. It is an internal of _lru_cache_wrapper.

  • lru_cache_tp_traverse() visits lru_list_elem.
  • lru_cache_tp_clear() clears lru_list_elem.

That's why we can keep _lru_list_elem non-GC type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, Inada's #26416 is acceptable.

@vstinner
Copy link
Member

If you want to hide the functools._lru_list_elem type, the lru_cache_tp_traverse() function must not traverse its type. Otherwise, there is a risk that the type is discovered using gc.get_objects() or a similar function.

Right now, the functools._lru_list_elem type is hidden in the main branch and it doesn't implement the GC protocol. In this case, the current code for it in lru_cache_tp_traverse() in ok:

    lru_list_elem *link = self->root.next;
    while (link != &self->root) {
        lru_list_elem *next = link->next;
        Py_VISIT(link->key);
        Py_VISIT(link->result);
        link = next;
    }

And the current code for it in lru_cache_cache_clear() is also ok:

    lru_cache_clear_list(list);

But you still need to add Py_VISIT(Py_TYPE(self)); in lru_cache_tp_traverse().

Since the lru_cache is way more complicated, I suggest to restrict this PR to functools.KeyWrapper and partial type, and write a second PR just for the lru_cache type.

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented May 28, 2021

Since the lru_cache is way more complicated, I suggest to restrict this PR to functools.KeyWrapper and partial type, and write a second PR just for the lru_cache type.

Reverting fc39917 should be sufficient; no need to open a new PR.

UPDATE: The PR now adds GC support to the KeyWrapper and partial types, and modifies lru_cache_tp_traverse to visit the LRU cache type and the LRU cache element type.

@methane
Copy link
Member

methane commented May 28, 2021

If you want to hide the functools._lru_list_elem type, the lru_cache_tp_traverse() function must not traverse its type. Otherwise, there is a risk that the type is discovered using gc.get_objects() or a similar function.

Since _lru_list_elem is not GC-tracked type, gc.get_objects() don't leak _lru_list_elem regardless lru_cache_tp_traverse() visit _lru_list_elem or not.

Right now, the functools._lru_list_elem type is hidden in the main branch and it doesn't implement the GC protocol. In this case, the current code for it in lru_cache_tp_traverse() in ok:

Current PR breaks circular reference correctly. See #26363 (comment)

Since _lru_list_elem is hidden and immutable now, it is OK to remove Py_VISIT(Py_TYPE(link)); if we are sure about _lru_list_elem_type don't create circular reference.
But I am not sure about it. So I feel current PR is the best. It breaks circular reference correctly anyway.

@erlend-aasland
Copy link
Contributor Author

Since _lru_list_elem is hidden and immutable now [...]

It's not immutable; only static types and heap types with the Py_TPFLAGS_IMMUTABLETYPE flag are immutable.

@vstinner
Copy link
Member

@methane: "Since _lru_list_elem is not GC-tracked type, gc.get_objects() don't leak _lru_list_elem regardless lru_cache_tp_traverse() visit _lru_list_elem or not."

It doesn't matter if the type implements the protocol or not. As soon as there is a Py_VISIT() call on it, it is exposed in gc.get_objects().

I tested the current PR (commit 6313247, Revert "Revert LRU cache element optimisation"):

static int
lru_cache_tp_traverse(lru_cache_object *self, visitproc visit, void *arg)
{
    ...
    lru_list_elem *link = self->root.next;
    while (link != &self->root) {
        lru_list_elem *next = link->next;
        Py_VISIT(link->key);
        Py_VISIT(link->result);
        Py_VISIT(Py_TYPE(link));  // <==== HERE
        link = next;
    }
    ...
}

Example:

$ cat x.py 
import functools

class Value:
    pass
value = Value()

@functools.lru_cache
def func():
    return value

func()

$ ./python -i x.py 
>>> import gc
>>> isinstance(int, type)
True
>>> types=[obj for obj in gc.get_objects() if isinstance(obj, type)]
>>> lru=[type for type in types if 'lru' in type.__qualname__]
>>> lru
[<class 'functools._lru_cache_wrapper'>, <class 'functools._lru_list_elem'>]

Here is the hidden functools._lru_list_elem type.

@vstinner
Copy link
Member

@erlend-aasland: Can you please leave LRU unchanged in this PR and create a new PR just for LRU? The LRU problem sounds complicated and IMO it deserves its own change.

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented May 28, 2021

@erlend-aasland: Can you please leave LRU unchanged in this PR and create a new PR just for LRU? The LRU problem sounds complicated and IMO it deserves its own change.

Of course. Give me 2 minutes. UPDATE: Done.

@erlend-aasland erlend-aasland requested a review from vstinner May 28, 2021 08:20
@erlend-aasland erlend-aasland changed the title bpo-42972: Fully implement GC protocol for functools types bpo-42972: Fully implement GC protocol for functools keywrapper and partial types May 28, 2021
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@erlend-aasland
Copy link
Contributor Author

@vstinner @methane, see GH-26423

@erlend-aasland
Copy link
Contributor Author

Thanks for your insights and reviews, everyone!

@vstinner vstinner merged commit 8994e9c into python:main May 28, 2021
@miss-islington
Copy link
Contributor

Thanks @erlend-aasland for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-26424 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label May 28, 2021
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 28, 2021
…artial types (pythonGH-26363)

(cherry picked from commit 8994e9c)

Co-authored-by: Erlend Egeberg Aasland <[email protected]>
vstinner pushed a commit that referenced this pull request May 28, 2021
…artial types (GH-26363) (GH-26424)

(cherry picked from commit 8994e9c)

Co-authored-by: Erlend Egeberg Aasland <[email protected]>
@erlend-aasland erlend-aasland deleted the bpo-42972/functools branch May 28, 2021 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants