-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-38250: [Enum] single-bit flags are canonical #24215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38250: [Enum] single-bit flags are canonical #24215
Conversation
Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. repr() and str() now only show the Flags, not extra integer values; any extra integer values are either discarded (CONFORM), turned into ``int``s (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1>
some flag sets, such as ``ssl.Options`` are incomplete/inconsistent; using KEEP allows those flags to exist, and have useful repr()s, etc. also, add ``_inverted_`` attribute to Flag members to significantly speed up that operation.
repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().
2cde48b
to
1fd7471
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the documented semantic changes look good 👍
CONFORM functionality looks useful, it would replace this hack I've got:
SafeFlag
class SafeFlag(enum.Flag):
"""Replacement for enum.Flag which ignores unknown bits.
* for Enum, unknown value should be an error
* but for Flag, usually unknown bits can be safely ignored--
this is a simple way to achieve backwards compatibility
* use SafeFlag when flag values come from external input
(another service, firmware, etc.)
"""
@classmethod
def _missing_(cls, value):
# constrain to known bits
# NOTE: enum subclass cannot define members, so using getattr
all_value = getattr(cls, '_all_value', None)
if all_value is None:
all_value = sum(v.value for v in cls)
setattr(cls, '_all_value', all_value)
return super()._missing_(value & all_value)
Addition of the boundary options increases complexity however, and the implementation is growing rather than shrinking.
I'd like to help get the implementation into shape towards these ideals:
- all operations should be O(1) or O(number of set bits). Use bit operations rather than iteration wherever possible.
- iterators should generate values on the fly rather than pre-assemble in memory (implies no sorting or reversing)
- consider eliminating caching like
_value2member_map_
, assuming previous efficiency points can be achieved - eliminate _decompose 🙏
I reviewed some of the enum.py changes, but there is a lot to cover. Just raising the larger points to start.
@belm0 Comments left, changes made. Feel free to |
🤖 New build scheduled with the buildbot fleet by @ethanfurman for commit 668c9a9 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
For whatever reason, "resolve conversation" buttons aren't appearing. I haven't seen this problem in other repos. I suspect it's a permissions issue. |
Here's why I can't resolve conversations:
|
Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a complicated change to a complicated implementation (and sprinkled with noise)... I need to try another review pass later
new composite members aren't always added to _value2member_map_ -- this ensures the operation succeeds
@@ -153,15 +199,29 @@ def __set_name__(self, enum_class, member_name): | |||
enum_member._name_ = member_name | |||
enum_member.__objclass__ = enum_class | |||
enum_member.__init__(*args) | |||
enum_member._sort_order_ = len(enum_class._member_names_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right-- _sort_order_
is an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because sorting int
s is easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code through me off because it's not clear how len(enum_class._member_names_)
provides the sort order, so it may warrant a comment.
(Thank you for removing the formatting changes!) Unfortunately, I can't really say I'm supporting this PR and design-- I'll try to summarize my differences, some of which are lost in resolved comments.
|
The
While efficiency is definitely a virtue, CPython doesn't make speed guarantees.
For me at least, visually comparing the single-bit flags present in a multi-bit flag is far easier if both are in the same order, and comparing the Rest assured that your efforts have helped make
Previous behavior was already effectively
Noted, but see above about visual comparisons.
It is much more performant now than it was before, even with the (slightly) slower of the two methods. I expect most I looked at removing While we disagree about some of the design choices, I truly appreciate your help in getting this version of |
Regarding
Regarding
option 1:
option 2 (my suggestion):
This applies equally to Enum. |
auto() for flags returns the first power of two not used _order_ has any names that are aliases removed before checking against _member_names_
also fix _order_ tests
On 1/25/21 5:57 AM, John Belmonte wrote:
Making the implementation simpler has never been a primary goal. My primary goal is to Sometimes a certain combination of flags has a distinct meaning, and so a distinct name; As you noted in the bpo issue, a line must be drawn somewhere, so Flag no longer echos
Because multibit flags (aka aliases) are not listed in iteration, and repr() and
While that is desirable behavior, it is not a hard and fast rule, and Enum has never
This loses the flag names.
And this loses the value. Neither appeals to me. On the bright side: the stdlib Enum/Flag objects will probably get changed to option 2 above, which means there will be a readily accessible |
Sometimes that needs to be so. But other times, for the user's sake it's better to have a simple thing that they can understand fully and adapt to suit their needs, rather than something with more complex API and behavior which is trying to anticipate every need (and ultimately cannot succeed). Take KEEP and EJECT:
I was trying to argue that perhaps IntFlag should be simpler, and for cases where one would desire KEEP, it would be better to change the flag usage to be at the boundaries of the API only. It's suspect to add something to an API like KEEP with a disclaimer that it should almost never be used.
Another way I'll try to argue it: multi-bit aliases can be thought of as implementation details. They can be added and removed, their names can be changed, all without affecting a flag value's representation. So they don't really belong in a value's str or repr output. The user can rest assured that as long as he doesn't change the single-bit definitions, any logging or serialized textual representation of a flag value will remain correct and applicable in the future. closingI understand we accomplished some movement: invert of IntFlag is no longer surprising by default, some implementation inefficiencies are addressed, Flag gets iteration and length of set bits (fixed various bugs with the initial implementation), Flag supports a safe mode (CONFORM) without hacking into it. My disappointment is that through careful API and behavior decisions it would have been possible to do all this while at the same time contracting both the API and implementation. Rather, we ended up with a more complex API and larger implementation. That means it's harder for users to digest Enum/Flag docs, it will be harder for future devs to maintain the code, and it will be harder to make future enhancements to the library without breaking the implementation or compatibility. Future user A - "Why are IntFlag values behaving so bizarrely, I've never seen it!". (Doesn't know that the flag is defined with Something as simple as a definition of compose-able bits should not have 4 modes... |
On 1/25/21 8:55 PM, John Belmonte wrote:
My disappointment is that through careful API and behavior decisions it would have been possible to do all this while at
the same time contracting both the API and implementation.
Please do not conflate a difference in ideals and objectives as a lack of careful thought.
|
Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. Iterating, repr(), and str() show members in definition order. When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1> Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits. Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order. ``re`` module: repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().
Python 3.10 makes a number of changes to Enum and its sub-classes. Under the new default behavior of IntFlag the fact that we have `0b1b1` as a flag, but not `0b100` is considered a class definition time error (preventing import of ophyd). This opts back into allowing our (unorthodox) definition to import again and hides the new features behind a version gate. python/cpython#24215
Python 3.10 makes a number of changes to Enum and its sub-classes. Under the new default behavior of IntFlag the fact that we have `0b1b1` as a flag, but not `0b100` is considered a class definition time error (preventing import of ophyd). This opts back into allowing our (unorthodox) definition to import again and hides the new features behind a version gate. python/cpython#24215
Flag members are now divided by one-bit verses multi-bit, with multi-bit
being treated as aliases. Iterating over a flag only returns the
contained single-bit flags.
Iterating,
repr()
, andstr()
show members in definition order.When constructing combined-member flags, any extra integer values are either discarded (
CONFORM
), turned intoint
s (EJECT
) or treated as errors (STRICT
). Flag classes can specify which of those three behaviors is desired:Besides the three above behaviors, there is also
KEEP
, which should not be used unless necessary -- for example,_convert_
specifiesKEEP
as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g.ssl.Options
).KEEP
will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits.https://bugs.python.org/issue38250