Skip to content

Intermittent "unrecognized configuration name" failure on iOS and Android #118201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
freakboy3742 opened this issue Apr 23, 2024 · 10 comments · Fixed by #126089
Closed

Intermittent "unrecognized configuration name" failure on iOS and Android #118201

freakboy3742 opened this issue Apr 23, 2024 · 10 comments · Fixed by #126089
Labels
OS-android OS-ios type-bug An unexpected behavior, bug, or error

Comments

@freakboy3742
Copy link
Contributor

freakboy3742 commented Apr 23, 2024

Bug report

Bug description:

The iOS buildbot is seeing an intermittent testing failure in `test_posix.PosixTester.test_confstr:

======================================================================
ERROR: test_confstr (test.test_posix.PosixTester.test_confstr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/4FED991E-7280-4D2F-B63A-C7FECDE66EAD/data/Containers/Bundle/Application/0809D9E4-34B6-4223-A3AD-079D76671D9D/iOSTestbed.app/python/lib/python3.13/test/test_posix.py", line 569, in test_confstr
    self.assertEqual(len(posix.confstr("CS_PATH")) > 0, True)
                         ~~~~~~~~~~~~~^^^^^^^^^^^
ValueError: unrecognized configuration name
----------------------------------------------------------------------

See this PR for the buildbot report; this build is the resulting failure.

The failure appears to be completely transient, affecting ~1 in 10 builds; the next buildbot run almost always passes, with no changes addressing this issue.

I've been unsuccessful reproducing the test locally.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Other

Linked PRs

@freakboy3742 freakboy3742 added type-bug An unexpected behavior, bug, or error OS-ios labels Apr 23, 2024
@freakboy3742 freakboy3742 self-assigned this Apr 23, 2024
@mhsmith
Copy link
Member

mhsmith commented Apr 30, 2024

Until we have time to work out why this is happening, should we skip this test? It's posting iOS buildbot failures on GitHub about once a day, and we don't want to train everyone to ignore them.

@freakboy3742
Copy link
Contributor Author

I've pushed up #118452 to skip the test on iOS (for now, at least).

@freakboy3742 freakboy3742 changed the title Intermittent test failure on iOS in test_posix.PosixTester.test_confstr Intermittent test failure on iOS when accessing system config variables May 1, 2024
@freakboy3742
Copy link
Contributor Author

Doing a full audit of the failures the buildbot has seen, it appears there's another cluster of tests that fail intermittently for a reason that appears similar to the test_posix test.

Out of 118 builds to date, we've seen:

  • 4 instances of:
======================================================================
ERROR: test_fpathconf (test.test_os.TestInvalidFD.test_fpathconf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/85DA13F3-B7FB-4094-B3E2-5936BFB3D984/data/Containers/Bundle/Application/B95FE44D-AA6C-427B-B3B5-68EB2D062669/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2369, in test_fpathconf
    self.check(os.pathconf, "PC_NAME_MAX")
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/85DA13F3-B7FB-4094-B3E2-5936BFB3D984/data/Containers/Bundle/Application/B95FE44D-AA6C-427B-B3B5-68EB2D062669/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2293, in check
    f(os_helper.make_bad_fd(), *args, **kwargs)
    ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unrecognized configuration name
  • 2 instances of:
======================================================================
ERROR: test_fd_count (test.test_support.TestSupport.test_fd_count)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/4A852A8A-CCCB-441C-8D72-E2EEA20B0EDD/data/Containers/Bundle/Application/9A737A54-B46D-42BE-AACA-F5780789C07E/iOSTestbed.app/python/lib/python3.13/test/test_support.py", line 558, in test_fd_count
    start = os_helper.fd_count()
            ~~~~~~~~~~~~~~~~~~^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/4A852A8A-CCCB-441C-8D72-E2EEA20B0EDD/data/Containers/Bundle/Application/9A737A54-B46D-42BE-AACA-F5780789C07E/iOSTestbed.app/python/lib/python3.13/test/support/os_helper.py", line 634, in fd_count
    MAXFD = os.sysconf("SC_OPEN_MAX")
            ~~~~~~~~~~^^^^^^^^^^^^^^^
ValueError: unrecognized configuration name
  • 2 instances of:
======================================================================
ERROR: test_many_opens (test.test_zipfile.test_core.TestsWithMultipleOpens.test_many_opens)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/82B8EC8A-A9A7-4E96-9E26-0D22DE1C6647/data/Containers/Bundle/Application/2BF4ED76-6DB4-44F1-931E-A2CC4F6603B5/iOSTestbed.app/python/lib/python3.13/test/test_zipfile/test_core.py", line 2901, in test_many_opens
    startcount = fd_count()
                 ~~~~~~~~^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/82B8EC8A-A9A7-4E96-9E26-0D22DE1C6647/data/Containers/Bundle/Application/2BF4ED76-6DB4-44F1-931E-A2CC4F6603B5/iOSTestbed.app/python/lib/python3.13/test/support/os_helper.py", line 634, in fd_count
    MAXFD = os.sysconf("SC_OPEN_MAX")
            ~~~~~~~~~~^^^^^^^^^^^^^^^
ValueError: unrecognized configuration name

The common thread here is the call in os_helper.fd_count() to os.sysconf(); it is raising ValueError rather than OSError; however, as with the test_posix failure, the issue appears to be that a POSIX-specific system configuration variable isn't (reliably) available at runtime.

@hugovk
Copy link
Member

hugovk commented Jun 15, 2024

Triage: the linked PRs are merged, is this ready to be closed or is there more to do?

@freakboy3742
Copy link
Contributor Author

Yes - this is fully resolved.

freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Aug 5, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 6, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 6, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Sep 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Oct 9, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Oct 9, 2024
@mhsmith
Copy link
Member

mhsmith commented Oct 24, 2024

ERROR: test_fpathconf (test.test_os.TestInvalidFD.test_fpathconf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/85DA13F3-B7FB-4094-B3E2-5936BFB3D984/data/Containers/Bundle/Application/B95FE44D-AA6C-427B-B3B5-68EB2D062669/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2369, in test_fpathconf
    self.check(os.pathconf, "PC_NAME_MAX")
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/85DA13F3-B7FB-4094-B3E2-5936BFB3D984/data/Containers/Bundle/Application/B95FE44D-AA6C-427B-B3B5-68EB2D062669/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2293, in check
    f(os_helper.make_bad_fd(), *args, **kwargs)
    ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unrecognized configuration name

I've now seen this happen a couple of times on Android:

But this isn't an unreliable system call: it's a lookup in a static string-to-integer table which is created at compile time:

PyErr_SetString(PyExc_ValueError, "unrecognized configuration name");

cpython/Modules/posixmodule.c

Lines 13572 to 13574 in fed501d

#ifdef _PC_NAME_MAX
{"PC_NAME_MAX", _PC_NAME_MAX},
#endif

So this is probably a bug in Python, but I can't even begin to guess what it is, or why it would happen on both Android and iOS. I can't find any mention on GitHub of it happening on any other platform.

@mhsmith
Copy link
Member

mhsmith commented Oct 28, 2024

It's happened twice more on Android in the last day, but:

  • Only on the main branch, not 3.13
  • Only on ARM64, not x86_64

Timeline:

So this is probably a bug in Python, but I can't even begin to guess what it is, or why it would happen on both Android and iOS. I can't find any mention on GitHub of it happening on any other platform.

There are a few things that set iOS and Android apart from most other platforms:

It's possible that this combination makes it more likely to hit some undefined behavior or compiler bug which affects conv_confname or some other code in this area. Though it's unclear why that wouldn't affect macOS as well.

I'll reopen this issue until we've had a change to investigate further.

@mhsmith mhsmith reopened this Oct 28, 2024
@mhsmith mhsmith changed the title Intermittent test failure on iOS when accessing system config variables Intermittent "unrecognized configuration name" failure on iOS and Android Oct 28, 2024
@mhsmith
Copy link
Member

mhsmith commented Oct 28, 2024

In all the failures so far, it's failed on the automatic rerun, so it's probably something to do with the way the module initializes itself in setup_confname_table, which will persist through the life of that process. I don't see anything wrong with that function, but I notice that it's the only place in CPython's production code that calls qsort, although it's also used in a couple of places in the ctypes tests.

I reran the most recent failure on the same commit, and it passed.

I also couldn't reproduce the failure locally, despite running the failing test repeatedly, in a new process each time.

@JelleZijlstra
Copy link
Member

I looked at the relevant code for a bit and couldn't find anything obviously wrong either.

One thing that seemed a little suspicious was the size of the array members, which are struct constdef objects. They contain a pointer and an int, so 12 bytes total. That means that sorting the arrays may make some writes to a 4-byte aligned address. Maybe iOS and Android don't like that.

The two config names that have caused issues (PC_NAME_MAX and SC_OPEN_MAX) both are sort of in the middle of their tables, though I'm not sure how many of these configs are defined on Android and iOS.

@mhsmith
Copy link
Member

mhsmith commented Nov 4, 2024

Thanks, it's plausible that this could be related to alignment, since it's only happened on ARM64, not x86_64. Though I don't understand why repeating the buildbot run on the same commit doesn't repeat the failure. There shouldn't be any non-determinism in the build, and non-determinism at runtime such as ASLR shouldn't change alignment on such a small scale.

freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
freakboy3742 added a commit to freakboy3742/cpython that referenced this issue Dec 13, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
mhsmith added a commit to mhsmith/cpython that referenced this issue Mar 17, 2025
freakboy3742 pushed a commit that referenced this issue Mar 18, 2025
gh-118201: Simplify conv_confname (#126089)

(cherry picked from commit c5c9286)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-android OS-ios type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants