bpo-38250: [Enum] single-bit flags are canonical #24215

ethanfurman · 2021-01-14T01:32:52Z

Flag members are now divided by one-bit verses multi-bit, with multi-bit
being treated as aliases. Iterating over a flag only returns the
contained single-bit flags.

Iterating, repr(), and str() show members in definition order.

When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired:

>>> class Test(Flag, boundary=CONFORM):
...     ONE = 1
...     TWO = 2
...
>>> Test(5)
<Test.ONE: 1>

Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits.

https://bugs.python.org/issue38250

Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. repr() and str() now only show the Flags, not extra integer values; any extra integer values are either discarded (CONFORM), turned into ``int``s (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1>

some flag sets, such as ``ssl.Options`` are incomplete/inconsistent; using KEEP allows those flags to exist, and have useful repr()s, etc. also, add ``_inverted_`` attribute to Flag members to significantly speed up that operation.

repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().

belm0

the documented semantic changes look good 👍

CONFORM functionality looks useful, it would replace this hack I've got:

SafeFlag

class SafeFlag(enum.Flag):
    """Replacement for enum.Flag which ignores unknown bits.

      * for Enum, unknown value should be an error
      * but for Flag, usually unknown bits can be safely ignored--
        this is a simple way to achieve backwards compatibility
      * use SafeFlag when flag values come from external input
        (another service, firmware, etc.)
    """

    @classmethod
    def _missing_(cls, value):
        # constrain to known bits
        # NOTE: enum subclass cannot define members, so using getattr
        all_value = getattr(cls, '_all_value', None)
        if all_value is None:
            all_value = sum(v.value for v in cls)
            setattr(cls, '_all_value', all_value)
        return super()._missing_(value & all_value)

Addition of the boundary options increases complexity however, and the implementation is growing rather than shrinking.

I'd like to help get the implementation into shape towards these ideals:

all operations should be O(1) or O(number of set bits). Use bit operations rather than iteration wherever possible.
iterators should generate values on the fly rather than pre-assemble in memory (implies no sorting or reversing)
consider eliminating caching like _value2member_map_, assuming previous efficiency points can be achieved
eliminate _decompose 🙏

I reviewed some of the enum.py changes, but there is a lot to cover. Just raising the larger points to start.

Lib/enum.py

ethanfurman · 2021-01-14T23:11:25Z

@belm0 Comments left, changes made. Feel free to resolve any conversations you feel are complete.

bedevere-bot · 2021-01-15T04:52:58Z

🤖 New build scheduled with the buildbot fleet by @ethanfurman for commit 668c9a9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

Lib/enum.py

belm0 · 2021-01-16T13:37:00Z

@belm0 Comments left, changes made. Feel free to resolve any conversations you feel are complete.

For whatever reason, "resolve conversation" buttons aren't appearing. I haven't seen this problem in other repos. I suspect it's a permissions issue.

Lib/enum.py

belm0 · 2021-01-18T12:36:35Z

Here's why I can't resolve conversations:

You can resolve a conversation in a pull request if you opened the pull request or if you have write access to the repository where the pull request was opened.

https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations

Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order.

belm0

it's a complicated change to a complicated implementation (and sprinkled with noise)... I need to try another review pass later

Lib/enum.py

new composite members aren't always added to _value2member_map_ -- this ensures the operation succeeds

belm0 · 2021-01-23T13:11:48Z

Lib/enum.py

@@ -153,15 +199,29 @@ def __set_name__(self, enum_class, member_name):
        enum_member._name_ = member_name
        enum_member.__objclass__ = enum_class
        enum_member.__init__(*args)
+        enum_member._sort_order_ = len(enum_class._member_names_)


this doesn't look right-- _sort_order_ is an int?

Yes, because sorting ints is easy.

This code through me off because it's not clear how len(enum_class._member_names_) provides the sort order, so it may warrant a comment.

Lib/enum.py

belm0 · 2021-01-23T14:30:15Z

(Thank you for removing the formatting changes!)

Unfortunately, I can't really say I'm supporting this PR and design-- I'll try to summarize my differences, some of which are lost in resolved comments.

implementation and API complexity for the "boundary" option - My opinion is that Flag and IntFlag should be effectively "CONFORM"-- since it is simple, efficient, and safe-- and that user code should bear the complexity if they need other behavior (e.g. checking for unknown bits to implement something like "STRICT"). It can be argued that would be a breaking change-- but the PR at hand is itself likely to break existing code due to behavior changes and repr changes.
str() and repr() performance is important - these should be fast (no temporary list or sorting) without caching. Rather than 2nd class methods, they are essential for serialization to text, logging, etc. There is opportunity to do so by having the representation in LSB to MSB (ideal) or MSB to LSB (sacrifices CPU or requires a temporary list, but still no sorting) order. Performance shouldn't depend on definition order.
instance repr() should be in some order by bit value, not definition order
instance iteration should be performant - shouldn't depend on definition order

ethanfurman · 2021-01-23T16:52:50Z

re. iteration ordering

I see it as less complication for both: the "correct" iterator is chosen at class definition, then
never changes; user's always get definition order. How is that more complication?

The code has to have two iteration implementations and choose the correct one. There should be
unit tests covering both cases, and somehow confirming which iteration implementation is actually
used-- but I don't think that's currently covered.

The Perm enumeration is defined in msb order, and tested.

Moreover, this is a complication for users. Flag definitions in existing code which happened to be
defined in "not the ideal order" will have slower iteration. Users will be expected to somehow know
about this, decide if they need the CPU savings or not, and be careful to use the correct ordering.
(And document it on every Flag definition so that some future code maintainer that doesn't know
about it doesn't violate the careful ordering.)

While efficiency is definitely a virtue, CPython doesn't make speed guarantees.

All for what? I don't understand why it's good that instance iteration is in definition order.

For me at least, visually comparing the single-bit flags present in a multi-bit flag is far easier if both are in the same order, and comparing the Flag against its definition order is easier of Flag iteration is in definition order. I don't think I'm alone in this.

Rest assured that your efforts have helped make Flag magnitudes faster than it was.

implementation and API complexity for the "boundary" option - My opinion is that Flag and
IntFlag should be effectively "CONFORM"-- since it is simple, efficient, and safe-- and that user
code should bear the complexity if they need other behavior (e.g. checking for unknown bits to
implement something like "STRICT"). It can be argued that would be a breaking change-- but the
PR at hand is itself likely to break existing code due to behavior changes and repr changes.

Previous behavior was already effectively STRICT and EJECT -- adding boundary just makes it easier for users to actually select the behaviors they need -- such as needing to use IntEnum, but wanting the safety of STRICT (so errors do not pass silently). Further, moving to single-bit flags being canonical made transforming existing integer flags in the stdlib difficult if not impossible, which is why I ended up adding KEEP.

str() and repr() performance is important - these should be fast (no temporary list or sorting)
without caching. Rather than 2nd class methods, they are essential for serialization to text,
logging, etc. There is opportunity to do so by having the representation in LSB to MSB (ideal) or
MSB to LSB (sacrifices CPU or requires a temporary list, but still no sorting) order. Performance
shouldn't depend on definition order.

str() and repr() use the name of the flag, and the name is calculated once at flag creation. (The old code used to iterate every time for multi-bit flags, and that was definitely one of the behaviors to get fixed in this PR.)

instance repr() should be in some order by bit value, not definition order

Noted, but see above about visual comparisons.

instance iteration should be performant - shouldn't depend on definition order

It is much more performant now than it was before, even with the (slightly) slower of the two methods. I expect most Flag enumerations will just get the speed increase (defined with auto(), or converted with _convert_(), and for those that absolutely need every last cpu cycle it is easy enough to use a decorator to reset the _member_iter_ to _member_iter_by_value_. Either you or I are sure to add an SO answer for that (If you, please make sure to stress that these are implementation details and are subject to change without notice).

I looked at removing _value2member_map_, but that was a serious hit on performance. After this PR lands I can look at swapping that for a WeakDict instead (more complexity, but a space savings).

While we disagree about some of the design choices, I truly appreciate your help in getting this version of Flag to be simpler and faster. If you have ideas in the future about making Flag even faster (without changing semantics) I would love to hear them.

belm0 · 2021-01-25T13:57:19Z

Regarding str() output, I find the following arbitrary. I'd rather have str output always use single-bit flags, never multi-bit. Besides avoiding strange user-facing rules like "output a multi-bit alias if the value happens to be covered by a single such alias", it would make the implementation simpler.

>>> class Bar(Flag):
...     A = auto()
...     B = auto()
...     C = auto()
...     AB = A|B
...     BC = B|C
...     AC = A|C
...     CA = A|C
...
>>> ~Bar.B
<Bar.AC: 5>  # why is AC canonical over A|C or CA?
>>> ~Bar(0)
<Bar.A|B|C: 7> # why is A|B|C canonical over A|BC or AB|C or AC|B?

Regarding repr() output, the best practice is for it to be eval-able back to the original object. Now:

>>> repr(Bar.A)
'<Bar.A: 1>'
>>> repr(~Bar(0))
'<Bar.A|B|C: 7>'

option 1:

>>> repr(Bar.A)
'Bar(1)'
>>> repr(~Bar(0))
'Bar(7)'

option 2 (my suggestion):

>>> repr(Bar.A)
'Bar.A'
>>> repr(~Bar(0))
'(Bar.A|Bar.B|Bar.C)'

This applies equally to Enum. repr(MyEnum.BAZ) should be 'MyEnum.BAZ'.

auto() for flags returns the first power of two not used _order_ has any names that are aliases removed before checking against _member_names_

also fix _order_ tests

ethanfurman · 2021-01-25T19:37:39Z

On 1/25/21 5:57 AM, John Belmonte wrote:

Regarding |str()| output, I find the following arbitrary. I'd rather have str output
always use single-bit flags, never multi-bit. Besides avoiding strange rules like
"output a multi-bit alias if the value happens to be covered by a single such alias",
it would make the implementation simpler.

Making the implementation simpler has never been a primary goal. My primary goal is to
make users' code simpler, which means the implementations for the libraries they use are
usually less simple.

Sometimes a certain combination of flags has a distinct meaning, and so a distinct name;
allowing the use of that distinct name in repr()s makes it easier to reason about what
is going on in the code. If the user bothers to give a distinct name to a certain flag
combination, then Flag will echo it back when appropriate.

As you noted in the bpo issue, a line must be drawn somewhere, so Flag no longer echos
back every possible alias present in a value -- either you get one alias that matches
every flag set, or you get the canonical, aka single-bit, flags.

>>> class Bar(Flag):
...     A = auto()
...     B = auto()
...     C = auto()
...     AB = A|B
...     BC = B|C
...     AC = A|C
...     CA = A|C
...
>>> ~Bar.B <Bar.AC: 5>
# why is AC canonical over CA?

AC is not canonical, but it was listed first and follows standard aliasing rules
(first one listed, wins).

>>> ~Bar(0) <Bar.A|B|C: 7>
# why is A|B|C canonical over A|BC or AB|C or AC|B?

Because multibit flags (aka aliases) are not listed in iteration, and repr() and
str() are produced by iteration.

Regarding |repr()| output, the best practice is for it to be eval-able back to
the original object.

While that is desirable behavior, it is not a hard and fast rule, and Enum has never
followed it. There are other areas of the stdlib whose repr()s are also not round-
trippable.

Now:

>>> repr(~Bar(0))
<Bar.A|B|C: 7>

option 1:

>>> repr(~Bar(0))
Bar(7)

This loses the flag names.

option 2:

>>> repr(~Bar(0))
(Bar.A|Bar.B|Bar.C)

And this loses the value. Neither appeals to me.

On the bright side: the stdlib Enum/Flag objects will probably get changed to option 2 above, which means there will be a readily accessible __repr__ that users are welcome to use in their own enumerations.

belm0 · 2021-01-26T04:55:31Z

My primary goal is to make users' code simpler, which means the implementations for the libraries they use are usually less simple.

Sometimes that needs to be so. But other times, for the user's sake it's better to have a simple thing that they can understand fully and adapt to suit their needs, rather than something with more complex API and behavior which is trying to anticipate every need (and ultimately cannot succeed).

Take KEEP and EJECT:

Previous behavior was already effectively STRICT and EJECT -- adding boundary just makes it easier for users to actually select the behaviors they need -- such as needing to use IntEnum, but wanting the safety of STRICT (so errors do not pass silently). Further, moving to single-bit flags being canonical made transforming existing integer flags in the stdlib difficult if not impossible, which is why I ended up adding KEEP.

I was trying to argue that perhaps IntFlag should be simpler, and for cases where one would desire KEEP, it would be better to change the flag usage to be at the boundaries of the API only. It's suspect to add something to an API like KEEP with a disclaimer that it should almost never be used.

Sometimes a certain combination of flags has a distinct meaning, and so a distinct name;
allowing the use of that distinct name in repr()s makes it easier to reason about what
is going on in the code. If the user bothers to give a distinct name to a certain flag
combination, then Flag will echo it back when appropriate.

Another way I'll try to argue it: multi-bit aliases can be thought of as implementation details. They can be added and removed, their names can be changed, all without affecting a flag value's representation. So they don't really belong in a value's str or repr output. The user can rest assured that as long as he doesn't change the single-bit definitions, any logging or serialized textual representation of a flag value will remain correct and applicable in the future.

closing

I understand we accomplished some movement: invert of IntFlag is no longer surprising by default, some implementation inefficiencies are addressed, Flag gets iteration and length of set bits (fixed various bugs with the initial implementation), Flag supports a safe mode (CONFORM) without hacking into it.

My disappointment is that through careful API and behavior decisions it would have been possible to do all this while at the same time contracting both the API and implementation. Rather, we ended up with a more complex API and larger implementation. That means it's harder for users to digest Enum/Flag docs, it will be harder for future devs to maintain the code, and it will be harder to make future enhancements to the library without breaking the implementation or compatibility.

Future user A - "Why are IntFlag values behaving so bizarrely, I've never seen it!". (Doesn't know that the flag is defined with boundary=KEEP.)

Something as simple as a definition of compose-able bits should not have 4 modes...

ethanfurman · 2021-01-26T17:27:01Z

On 1/25/21 8:55 PM, John Belmonte wrote: My disappointment is that through careful API and behavior decisions it would have been possible to do all this while at the same time contracting both the API and implementation.

Please do not conflate a difference in ideals and objectives as a lack of careful thought.

Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. Iterating, repr(), and str() show members in definition order. When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1> Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits. Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order. ``re`` module: repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().

Python 3.10 makes a number of changes to Enum and its sub-classes. Under the new default behavior of IntFlag the fact that we have `0b1b1` as a flag, but not `0b100` is considered a class definition time error (preventing import of ophyd). This opts back into allowing our (unorthodox) definition to import again and hides the new features behind a version gate. python/cpython#24215

the-knights-who-say-ni added the CLA signed label Jan 14, 2021

bedevere-bot added the awaiting core review label Jan 14, 2021

ethanfurman mentioned this pull request Jan 14, 2021

bpo-38250: minor Flag refactor #22734

Closed

ethanfurman added 7 commits January 13, 2021 20:19

add boundary KEEP for Flags (default for _convert_)

f7f9e72

some flag sets, such as ``ssl.Options`` are incomplete/inconsistent; using KEEP allows those flags to exist, and have useful repr()s, etc. also, add ``_inverted_`` attribute to Flag members to significantly speed up that operation.

remove extra white space

72dbdd7

test that zero-valued members are empty

210fae7

update tests to confirm CONFORM with negate

48bcd07

update aenum.rst; add doctest to test_enum

1fd7471

ethanfurman force-pushed the enum-single_bit_flags branch from 2cde48b to 1fd7471 Compare January 14, 2021 04:24

belm0 reviewed Jan 14, 2021

View reviewed changes

ethanfurman added 4 commits January 14, 2021 09:30

fix doc test

aa425e6

optimizations

4786942

formatting

45565b2

add news entry

806c8c6

fix formatting of news entry

668c9a9

ethanfurman added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 15, 2021

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 15, 2021

belm0 reviewed Jan 16, 2021

View reviewed changes

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

hongweipeng reviewed Jan 18, 2021

View reviewed changes

Lib/enum.py Show resolved Hide resolved

ethanfurman added 2 commits January 19, 2021 23:01

update iteration method and order

9bd9e97

Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order.

add John Belmonte

18bcbac

belm0 reviewed Jan 20, 2021

View reviewed changes

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

Lib/enum.py Show resolved Hide resolved

Lib/enum.py Show resolved Hide resolved

Lib/enum.py Outdated Show resolved Hide resolved

ethanfurman added 2 commits January 20, 2021 11:47

more bit-fiddling improvements

f1c4584

use pop() instead of "del"

e3713aa

new composite members aren't always added to _value2member_map_ -- this ensures the operation succeeds

belm0 reviewed Jan 23, 2021

View reviewed changes

ethanfurman added 6 commits January 25, 2021 10:27

add note about next auto() value for Enum and Flag

95bf9c8

local name optimizations

41ac1ce

remove commented-out code

3ea814e

add test for next auto() and _order_

4983558

auto() for flags returns the first power of two not used _order_ has any names that are aliases removed before checking against _member_names_

raise TypeError if _value_ not added in custom new

651da18

also fix _order_ tests

enable doc tests, update formatting

6e99d48

ethanfurman added 2 commits January 25, 2021 11:49

fix note

b52c5a2

Update 2021-01-14-15-07-16.bpo-38250.1fvhOk.rst

8d7b272

ethanfurman merged commit 7aaeb2a into python:master Jan 25, 2021

bedevere-bot removed the awaiting core review label Jan 25, 2021

ethanfurman deleted the enum-single_bit_flags branch January 26, 2021 07:34

tirkarthi mentioned this pull request Mar 12, 2021

Test fail with python-3.10 dalibo/pg_activity#194

Closed

tacaswell mentioned this pull request May 4, 2021

Fix enum py310 bluesky/ophyd#985

Closed

mattclay mentioned this pull request May 6, 2021

Import fails on Python 3.10.0b1 jborean93/pyspnego#10

Closed

AlexWaygood mentioned this pull request May 18, 2022

enum: More changes for 3.11 python/typeshed#7862

Merged

The-Compiler mentioned this pull request May 26, 2022

Default enum.FlagBoundary should be KEEP? #93250

Closed

AdamWill mentioned this pull request Jun 22, 2022

Enum: revert to 3.9 #88725

Closed

sunmy2019 mentioned this pull request Apr 8, 2023

enum.CONFORM behavior breaks backwards compatibility #103365

Closed

CharlieZhao95 mentioned this pull request Sep 21, 2023

Iterating over enum.Flag types ignores zero-valued in Python 3.11 #109633

Closed

Uh oh!

bpo-38250: [Enum] single-bit flags are canonical #24215

bpo-38250: [Enum] single-bit flags are canonical #24215

Uh oh!

Conversation

ethanfurman commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

belm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ethanfurman commented Jan 14, 2021

Uh oh!

bedevere-bot commented Jan 15, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

belm0 commented Jan 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

belm0 commented Jan 18, 2021

Uh oh!

belm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

belm0 Jan 23, 2021

Choose a reason for hiding this comment

Uh oh!

ethanfurman Jan 23, 2021

Choose a reason for hiding this comment

Uh oh!

belm0 Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

belm0 commented Jan 23, 2021

Uh oh!

ethanfurman commented Jan 23, 2021

Uh oh!

belm0 commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethanfurman commented Jan 25, 2021

Uh oh!

belm0 commented Jan 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

closing

Uh oh!

ethanfurman commented Jan 26, 2021 via email

Uh oh!

Uh oh!

ethanfurman commented Jan 14, 2021 •

edited

Loading

belm0 commented Jan 16, 2021 •

edited

Loading

belm0 commented Jan 25, 2021 •

edited

Loading

belm0 commented Jan 26, 2021 •

edited

Loading