Skip to content

Inconsistent behavior between _io.text_encoding and _pyio.text_encoding #127611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
donBarbos opened this issue Dec 4, 2024 · 20 comments
Closed
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@donBarbos
Copy link
Contributor

donBarbos commented Dec 4, 2024

Bug report

Bug description:

>>> import _io, _pyio
>>> _pyio.text_encoding(None, stacklevel=2)
'locale'
>>> _io.text_encoding(None, stacklevel=2)
Traceback (most recent call last):
  File "<python-input-54>", line 1, in <module>
    _io.text_encoding(None, stacklevel=2)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
TypeError: _io.text_encoding() takes no keyword arguments

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

@donBarbos donBarbos added the type-bug An unexpected behavior, bug, or error label Dec 4, 2024
@tomasr8
Copy link
Member

tomasr8 commented Dec 4, 2024

Indeed, the C function signature is marked as positional-only:

_io.text_encoding
encoding: object
stacklevel: int = 2
/

This was originally added when PEP 597 was implemented:
https://github.com/python/cpython/pull/19481/files#diff-0501b143a2f566e8a760abc20a88fd2533b1f230e5322831f400a0263b8c0253R511-R514

However, I didn't find any comments in the PR explaining why this is so. In addition, the example implementation in the PEP also allows keyword arguments.

cc @methane as the original author

@methane
Copy link
Member

methane commented Dec 5, 2024

I just didn't care it well.

The doc uses positional only argument. It is official specification than the PEP.
https://docs.python.org/3.13/library/io.html#io.text_encoding

Unless clear use case for keyword argument here, how about just prohibit keyword argument on _pyio too?

@donBarbos
Copy link
Contributor Author

Firstly, the documentation doesn't say anything about how to use the stacklevel argument. Yes, the positional argument is used in the example, but only the encoding argument is passed there.
Also, the C signature of the _io.open function is marked exactly the same like stacklevel argument (but these are keyword arguments).

_io.open
file: object
mode: str = "r"
buffering: int = -1
encoding: str(accept={str, NoneType}) = None
errors: str(accept={str, NoneType}) = None
newline: str(accept={str, NoneType}) = None
closefd: bool = True
opener: object = None

Secondly, if we remove stacklevel argument as keyword argumnet in _pyio implementation and leave it as positional argument, it will break backward compatibility.

I just suggest to consider stacklevel argument as a keyword argument.

cc @methane, @tomasr8

@methane
Copy link
Member

methane commented Dec 5, 2024

Firstly, the documentation doesn't say anything about how to use the stacklevel argument. Yes, the positional argument is used in the example, but only the encoding argument is passed there.

image

The / means all arguments are positional-only.

@donBarbos
Copy link
Contributor Author

@methane, oh sorry my bad.
so in the end we need to change the behavior of the python implementation?

@methane
Copy link
Member

methane commented Dec 5, 2024

Secondly, if we remove stacklevel argument as keyword argumnet in _pyio implementation and leave it as positional argument, it will break backward compatibility.

_pyio is very minor module. Almost no one use it except testing purpose.

  • Removing keyword support from _pyio needs only changelog.
  • Adding keyword support to _io needs document update (e.g. versionchanged) and what's new etry.

This is why I am proposing just remove it unless there is clear benefits.

@donBarbos
Copy link
Contributor Author

@methane, sounds good enough to me. I have no more objections to fixing the python implementation.
I'm ready to send PR.

@donBarbos donBarbos changed the title Missing keyword argument in _io.text_encoding function Inconsistent behavior between _io.text_encoding and _pyio.text_encoding Dec 5, 2024
@tomasr8
Copy link
Member

tomasr8 commented Dec 5, 2024

Requiring positional-only arguments in pyio is definitely easier but I lean more towards allowing kw arguments in the C implementation.

For one, it's not a breaking change, unlike changing pyio even if very few people are using it. There's also no good reason for enforcing positional-only arguments and I'd even say it promotes bad habits because it forces you to write code like text_encoding(encoding, 2) with a magic constant, rather than text_encoding(encoding, stacklevel=2).

@methane
Copy link
Member

methane commented Dec 5, 2024

FWIW, this function is not for regular users.
Good habit is do not use this function at all. Use encoding="utf-8" as default encoding instead.

This function is for almost only stdlib. 3rd party libraries may need it too, but it is rare.
Additionally, using stacklevel other than 2 is more rare. It is really rare.

That is why I am reluctant to make API improvements that would require a what's new entry and versionchanged note.
But If you still wish to make improvements, I am not opposed.

@picnixz picnixz added topic-IO stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir labels Dec 5, 2024
@picnixz
Copy link
Member

picnixz commented Dec 5, 2024

Does PyPy uses the pure-python implementation io or not? if it does, we should not make a breaking change =/

@methane
Copy link
Member

methane commented Dec 5, 2024

No. PyPy has own builtin _io module.

@donBarbos
Copy link
Contributor Author

donBarbos commented Dec 5, 2024

I think it's worth mentioning that encoding argument also cannot be passed as keyword.

And I agree that the practice of using keyword arguments may be a better (readable) option.

So what will the consensus be?

cc @methane, @tomasr8, @picnixz

@methane
Copy link
Member

methane commented Dec 6, 2024

No consensus yet. I don't want to change this function. This function is almost only for stdlib. No need to improve usability for many users.

@picnixz
Copy link
Member

picnixz commented Dec 6, 2024

We have a similar case for functools.reduce #121676 so whatever decision we make here should be made out there for consistency IMO (and vice-versa). They both suffer the same issue, namely the Python implementation accepts keyword arguments while the C implementation does not and the docs say "positional-only".

@methane
Copy link
Member

methane commented Dec 6, 2024

They are very different functions. It seems a foolish consistency.

  • reduce() doc didn't use positional-only argument until 3.13. On the other hand, text_encoding() is documented as positional-only since it is introduced in 3.10.
  • reduce() is general purpose function. It may have many users. On the other hand, text_encoding() is really rare outside of stdlib. I searched its usage with sourcegraph but I can not find any usage out of stdlib.

@picnixz
Copy link
Member

picnixz commented Dec 6, 2024

They may be different but IMO the issue is the same. A publicly documented function that behaves differently with the C and the Python implementation. But if you don't want the parallel, I'm ok (and would therefore rather not change text_encoding at all).

@tomasr8
Copy link
Member

tomasr8 commented Dec 6, 2024

I agree with @picnixz. If we don't want to change the C implementation, I'd also prefer to not do anything since there seems to be very little benefit to restricting the Python implementation for the sake of consistency.

@donBarbos
Copy link
Contributor Author

donBarbos commented Dec 6, 2024

I found another issue #89979 on a related topic, maybe it's worth moving this whole discussion here (but it's not progressing there)

This problem is common but no one takes the initiative to solve it.
Maybe we need to ask someone to give us the go-ahead to make more consistency in both implementations :-)

cc @tomasr8, @picnixz, @methane

@picnixz
Copy link
Member

picnixz commented Dec 7, 2024

In general, the Python implementation is aligned with the C implementation when the Python implementation is used by other implementations (e.g., functools for pypy). If it's not, we tend not to bother too much 1) if existing code that works with the C implementation works, or 2) if it is half public API or can be replaced by a more modern approach. So, indeed, the functools issue is different (because there is a downstream user using the Python implementation as its sole alternative); sorry for that, this was my bad.

Note that in both cases, we document the API with positional-only, so whatever implementation we choose, it's actually ok if users follow the docs. Now, in the datetime case, the documentation does not put restrictions so people could assume that keyword arguments are allowed while they are not in some implementations (so following the docs could break their code on some Python implementations). It's again a slightly different issue compared to this one and the functools one.

Maybe we need to ask someone to give us the go-ahead to make more consistency in both implementations :-)

Inada is a core dev, and likely the code owner of unicode-related stuff. So that "someone" would be them.

@erlend-aasland
Copy link
Contributor

Closing, as Inada-san clearly prefer status quo.

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants