-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-106485: Dematerialize instance dictionaries when possible #106539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this does the right thing, and does it the right way.
But I don't know if it does it at the right time, or in the right place.
In other words, are we dematerializing in the right instruction, and in such as way as to not degrade specialization?
I guess the measure of that is the stats.
Python/bytecodes.c
Outdated
DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | ||
res = _PyDictOrValues_GetValues(dorv)->values[index]; | ||
PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(owner); | ||
if (!_PyDictOrValues_IsValues(*dorv)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this big chunk of code inline here, it's going to mess up the tier 2 optimizer (it won't help the compiler generate code for the tier 1 interpreter either).
How about DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR_INSTANCE_DEMATERIALIZE)
and make another instruction LOAD_ATTR_INSTANCE_DEMATERIALIZE
to do the dematerialization?
Python/specialize.c
Outdated
@@ -673,6 +674,8 @@ specialize_dict_access( | |||
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner); | |||
if (_PyDictOrValues_IsValues(dorv)) { | |||
// Virtual dictionary | |||
unmaterializing: | |||
; // Load-bearing semicolon; don't touch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe // C doesn't allow declarations immediately after a label
?
@@ -695,6 +698,16 @@ specialize_dict_access( | |||
return 0; | |||
} | |||
// We found an instance with a __dict__. | |||
if (dict->ma_values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a complex condition. The goto makes it even harder to follow.
Could you compute a virtual_dict
flag, then branch on that?
The stats still show "has managed dict" as the primary reason for specialization failure. That suggests to me that we need to dematerialize in If that is the case we should be able to get the specialization rate for |
__dict__
s in LOAD_ATTR_INSTANCE_VALUE
Just kicked off a stats/benchmarking job for the new approach. Should be done soon. |
Stats comparison vs base and benchmark comparison vs base. On my phone, so not able to dig into the results right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stats are puzzling.
This hugely improves the specialization success rate, but make almost no difference to the number of unspecialized LOAD_ATTR
s executed.
@@ -1827,8 +1827,10 @@ dummy_func( | |||
op(_CHECK_MANAGED_OBJECT_HAS_VALUES, (owner -- owner)) { | |||
assert(Py_TYPE(owner)->tp_dictoffset < 0); | |||
assert(Py_TYPE(owner)->tp_flags & Py_TPFLAGS_MANAGED_DICT); | |||
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner); | |||
DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | |||
PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(owner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is turning:
test_bit = owner[-4] & 1
jump if test_bit -> slow_path
into
test_bit = owner[-4] & 1
jump if not test_bit -> following
spill-registers
call _PyObject_MakeInstanceAttributesFromDict
restore-registers
jump if return-register ->following
jump -> slow_path
following:
or, which is even worse:
test_bit = owner[-4] & 1
spill-registers
jump if not test_bit -> following
call _PyObject_MakeInstanceAttributesFromDict
jump if return-register ->following
jump -> slow_path
following:
restore-registers
I think this will need to be
DEOPT_IF(!_PyDictOrValues_IsValues(*dorv), LOAD_ATTR_DEMATERIALIZE)
To keep the slow-path from messing up the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really complicates the code (and probably needs ugly special-casing in the resulting uop trace), since LOAD_ATTR_DEMATERIALIZE
would need to jump back into (or repeat) the instruction we happen to be executing after a successful dematerialization.
Can we leave that to another PR, if we determine that the compiler indeed hasn't just moved this cold branch with the register spills out-of-line (like it should, under PGO)?
test_bit = owner[-4] & 1
jump if test_bit -> cold
following:
...
cold:
spill-registers
call _PyObject_MakeInstanceAttributesFromDict
restore-registers
jump if return-register ->following
jump -> slow_path
Python/bytecodes.c
Outdated
@@ -2718,8 +2720,10 @@ dummy_func( | |||
assert(type_version != 0); | |||
DEOPT_IF(self_cls->tp_version_tag != type_version, LOAD_ATTR); | |||
assert(self_cls->tp_flags & Py_TPFLAGS_MANAGED_DICT); | |||
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(self); | |||
DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | |||
PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for LOAD_ATTR_INSTANCE_VALUES
Python/bytecodes.c
Outdated
@@ -2748,8 +2752,10 @@ dummy_func( | |||
assert(type_version != 0); | |||
DEOPT_IF(self_cls->tp_version_tag != type_version, LOAD_ATTR); | |||
assert(self_cls->tp_flags & Py_TPFLAGS_MANAGED_DICT); | |||
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(self); | |||
DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | |||
PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
We're only dematerializing about 5M dicts, and the unspecialized As a side note, I've learned to mostly ignore the specialization success/failure stats when comparing branches like this, since repeated specialization attempts (and exponential backoff) cause failure rates to grow or shrink disproportionately from the actual number of specialized sites. In this case, the number of successes drops by 236k and the number of failures drop by over a million. But the percentages say we actually have 5% more successes now. It's sort of confusing to derive much meaning from those sorts of numbers. 🙃 |
Confirmed that the weird |
if (_PyType_HasFeature(tp, Py_TPFLAGS_MANAGED_DICT)) { | ||
OBJECT_STAT_INC(dict_materialized_on_request); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markshannon BTW, this was the missing STAT_INC
. Anything that sets a custom tp_new
and has Py_TPFLAGS_MANAGED_DICT
doesn't get its inline values initialized, and has a NULL
dict. We handle it fine, but we may want to consider treating a NULL
dict as "has values that need initializing" rather than "has a dict that needs initializing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, this is a major source of dict materializations in the benchmarks (about ~4 million dicts that we weren't counting before). It looks like we're creating all of these just to dematerialize them almost immediately.
} | ||
// It's likely that this dict still shares its keys (if it was materialized | ||
// on request and not heavily modified): | ||
assert(PyDict_CheckExact(dict)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should allow dict subclasses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't
Here's an alternative to #106496, using
LOAD_ATTR_INSTANCE_VALUE
.__dict__
s #106485