gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output #105202

iritkatriel · 2023-06-01T18:14:04Z

PyIter_Next returns NULL for both normal exit and error. The caller needs to call PyErr_Occurred() to disambiguate. This PR adds PyIter_NextItem, which is not ambiguous about errors.

Issue: PyIter_Next has ambiguous return value #105201

… an ambiguous return value

Include/abstract.h

Modules/_testcapimodule.c

Doc/c-api/iter.rst

iritkatriel · 2023-06-01T20:34:57Z

Doc/c-api/iter.rst

-   while ((item = PyIter_Next(iterator))) {
+   PyObject *item;
+   int res;
+   while ((res = PyIter_NextItem(iterator, &item)) == 0 && item != NULL) {


I think it might be better if PyIter_NextItem returned -1 for error, 0 for exhausted iterator and 1 if there is a 'next' value. Then we could just continue as long as it return 1 (without checking for NULL).

My personal preference would be to stick with the C idiom of returning -1 or 0. I also think that the API design should not be discussed in a conversation on a PR.

After trying to migrate a few places, I agree. Working with 3 return values is not simpler.

But I do think it should be PyObject* PyIter_NextItem(PyObject *o, int *err), otherwise we need two checks in every loop of iteration and that's just wasteful.

You still need two checks, no?

Not on every iteration. Only when item it NULL and you exit the loop you need to see what err is.

So the return value is still ambiguous. I thought the aim of the new API was an unambiguous return value.

The aim is that you can find out whether an error occurred without calling PyErr_Occurred().

Quoting from the issue:

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error.

…ovides a new value

This reverts commit 8d93533.

…en it provides a new value" This reverts commit edc54c8.

This reverts commit 2c14604.

rhettinger · 2023-06-02T23:44:06Z

Personally, I don't find the new API to be an improvement. While the overall caller logic would have the same structure, it takes an additional line to create the variable, adds pointer logic in place of the cleaner looking function call, and it creates an opportunity to lose track of where the error came from. I really don't think we need this.

Also, it is somewhat aggressive to label the long-standing API as "legacy" as if it is wrong or hard to read in some way. Likewise it is aggressive to label the new API as "preferred". That will only induce some new core dev to sweep through and replace every existing call even though the current code works just fine and the existing API will never go away (at least not without creating a ton of unnecessary work for the Python ecosystem).

iritkatriel · 2023-06-03T05:24:01Z

it creates an opportunity to lose track of where the error came from.

The opposite: currently you need to check PyErr_Occurred() so you can’t be sure where the error came from. With this change you know.

erlend-aasland · 2023-06-03T13:03:30Z

I'm still -1, as explained in the issue. Also, the PR title is misleading; you're replacing one API with an ambiguous return value with another API with an ambiguous return value. It ends up adding yet another problematic API to the list of ambiguous APIs.

iritkatriel · 2023-06-03T13:14:45Z

Where is the ambiguity? The return value is the pair (PyObject*, int). Everything you need it provided by the function, so you don’t need to access global state.

iritkatriel · 2023-06-03T17:04:56Z

I'm still -1, as explained in the issue. Also, the PR title is misleading; you're replacing one API with an ambiguous return value with another API with an ambiguous return value. It ends up adding yet another problematic API to the list of ambiguous APIs.

I reworded the title to ‘ambiguous output’ rather than ‘ambiguous return value’.

rhettinger · 2023-06-04T05:10:03Z

The existing API has been used successfully for two decades and during that time I've not seen a single user request for an alternative API. There isn't a real user issue being solved here. ISTM that this is an invented problem and that the solution is in some ways worse than what we have now. In addition, having more than one way to do it will create more problems than it solves. Since this is part of the stable ABI, the new and old functions will live in perpetuity and create be a recurring source of confusion. Also, the new API will not be usable by any package wanting to be compatible will versions of Python before 3.12. Having a mix of techniques is an undesirable outcome.

iritkatriel · 2023-06-04T07:33:52Z

The issue of c api functions with ambiguous return values came up in the discussions that started at the recent language summit, about the problems with the c api (see the blog post about that). The problem is not only related to PyIter_Next usage in cpython, but to how the c api works for alternative python implementations (that don’t use cpython’s internal error handling mechanism.)

There was some discussion about whether we can fix problems like this incrementally, or we just need to give up on the current api and redesign a new one from scratch. I believe it’s always better to fix things incrementally when you can, and redesign from scratch only when you have to (the redesign can then focus on the problems that actually require it). Other core devs said it is not possible to fix the current c api, and I see now (almost at the first hurdle) what they mean. This probably means that we will have a very inflated list of problems “requiring” a new c api. Not because that is the right way to go about this, but because we won’t be able to get through discussions like this to make incremental fixes.

It probably won’t make much of a difference to the new c api (there will likely be a new one either way because not all problems can be fixed incrementally in the current one). But I do think it’s sad if the current c api (which will be with us for a long time) can’t evolve because most core devs have given up on that. And I certainly understand now why they have.

iritkatriel · 2023-06-05T18:18:00Z

Reopening following discussion on the issue.

markshannon · 2023-06-06T10:07:20Z

I strongly prefer int PyIter_NextItem(PyObject *iter, PyObject **next) for reasons explained on the issue.

markshannon · 2023-06-06T10:41:26Z

@rhettinger @erlend-aasland
The reason that this is a problem, is that it makes the error handling in the VM and C API stateful. And that state causes a lot of problems.

The existence of this state means we need to check that the VM is in a valid state at C API boundaries.
This is slow when we do it correctly, e.g. when calling builtin functions, and dangerous when we don't , e.g when calling slot wrappers.

Currently there are four possible states after calling an API function that may fail (which is all but a few):

Return a valid value, and don't set an exception
Return an invalid value, and set an exception
Return an valid value, and set an exception
Return an invalid value, and don't set an exception

The first two of those are correct, but even then the second case requires the caller to handle the exception, or it leaves the VM in an invalid state.
The third is always wrong; it is dangerous if not checked for, so we need to always check for it.
The fourth is wrong for some functions, but for functions like PyIter_Next needs an additional check, which is another problem.

We have to constantly check that the exception is not set, because if it were and we call PyIter_Next on an empty iterator, and it returns NULL without setting an exception, we would think that it failed, not that it terminated.

If the return values all C API functions (and C extension code) were unambiguous, then error handling would become (mostly) stateless, making the VM more robust and calling into C extensions potentially faster, as none of the resulting states can leave the VM in a invalid state.

Return a valid value, and don't set an exception. No problem
Return an invalid value, and set an exception
Return an valid value, and set an exception
Return an invalid value, and don't set an exception

The goal is that PyErr_Occurred() is only called to fetch the error after a function returns NULL or -1. That way it never needs to cleared or monitored, saving a lot of code that would no longer need to worry about whether there was a "current" exception, as there would be no such thing. The "current" exception should only be meaningful immediately after a failed C API call.

Here's an example. Suppose we are iterating over two iterators at the same time (like in zip):

    PyObject *a = PyIter_Next(iter1);
    PyObject *b = PyIter_Next(iter2);

But we forget to check the error case of iter1.

    PyObject *a = PyIter_Next(iter1);
    if (a == NULL) {
        a_exhausted = true;
    }

but we do check iter2

    PyObject *b = PyIter_Next(iter2);
    if (b == NULL && PyErr_hasOcurred()) {
         /* Bug here
          * We think iter2 has failed, but iter2 may have
          * terminated and iter1 raised an exception */
    }

markshannon · 2023-06-23T12:26:33Z

There is an efficiency issue here, as well.
Changing to using PyIter_NextItem adds quite a lot of overhead, because of the impedance mismatch with the underlying tp_iternext function pointer. We still need to do the additional check of PyErr_Occurred() as tp_iternext returns NULL for either exhaustion or error.

If we change the underlying protocol to match that of PyIter_NextItem things become a lot more efficient.

int
PyIter_NextItem(PyObject *iter, PyObject **next)
{
    return (*Py_TYPE(iter)->tp_iternextitem)(iter, next);
}

Note that if tp_iternextitem raises StopIteration, then so does PyIter_NextItem, so implementations of tp_iternextitem will need to be aware of this. This is a non-issue for almost any iterators other than generators, or wrappers around __next__ methods.

pythongh-105201: Add PyIter_NextItem to replace PyIter_Next which has…

5e91524

… an ambiguous return value

bedevere-bot added the awaiting core review label Jun 1, 2023

iritkatriel requested a review from vstinner June 1, 2023 18:14

bedevere-bot mentioned this pull request Jun 1, 2023

PyIter_Next has ambiguous return value #105201

Closed

iritkatriel requested a review from encukou June 1, 2023 18:14

iritkatriel and others added 2 commits June 1, 2023 19:14

Merge branch 'main' into iter_nextitem

6987604

add new function to the stable ABI

7905125

JelleZijlstra reviewed Jun 1, 2023

View reviewed changes

Include/abstract.h Outdated Show resolved Hide resolved

Modules/_testcapimodule.c Outdated Show resolved Hide resolved

iritkatriel added 2 commits June 1, 2023 20:41

code review comments from Jella nd Erlend

cc29375

update doc

9eb3a79

iritkatriel commented Jun 1, 2023

View reviewed changes

Doc/c-api/iter.rst Outdated Show resolved Hide resolved

typo

4dcb5d8

iritkatriel commented Jun 1, 2023

View reviewed changes

regen

5aaf37e

iritkatriel requested a review from a team as a code owner June 1, 2023 20:37

iritkatriel added 7 commits June 1, 2023 22:08

make PyIter_NextItem return 0 for exhausted iterator and 1 when it pr…

edc54c8

…ovides a new value

fix doc

8d93533

Revert "fix doc"

02b64d4

This reverts commit 8d93533.

Revert "make PyIter_NextItem return 0 for exhausted iterator and 1 wh…

1ee8618

…en it provides a new value" This reverts commit edc54c8.

flip signature around

028cef4

if(err)

2c14604

Revert "if(err)"

a6d58ab

This reverts commit 2c14604.

iritkatriel changed the title ~~gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous return value~~ gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output Jun 3, 2023

iritkatriel closed this Jun 4, 2023

iritkatriel reopened this Jun 5, 2023

iritkatriel marked this pull request as draft June 23, 2023 13:01

bedevere-bot removed the awaiting core review label Jun 23, 2023

iritkatriel closed this Aug 2, 2023

erlend-aasland mentioned this pull request Jul 27, 2024

gh-105201: Add PyIter_NextItem() #122331

Merged

Uh oh!

gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output #105202

gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output #105202

Uh oh!

Conversation

iritkatriel commented Jun 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhettinger commented Jun 2, 2023

Uh oh!

iritkatriel commented Jun 3, 2023

Uh oh!

erlend-aasland commented Jun 3, 2023

Uh oh!

iritkatriel commented Jun 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iritkatriel commented Jun 3, 2023

Uh oh!

rhettinger commented Jun 4, 2023

Uh oh!

iritkatriel commented Jun 4, 2023

Uh oh!

iritkatriel commented Jun 5, 2023

Uh oh!

markshannon commented Jun 6, 2023

Uh oh!

markshannon commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Jun 23, 2023

Uh oh!

Uh oh!

iritkatriel commented Jun 1, 2023 •

edited

Loading

iritkatriel commented Jun 3, 2023 •

edited

Loading

markshannon commented Jun 6, 2023 •

edited

Loading