Skip to content

Issues with UCX #9654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dalcinl opened this issue Nov 11, 2021 · 4 comments
Closed

Issues with UCX #9654

dalcinl opened this issue Nov 11, 2021 · 4 comments
Assignees

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Nov 11, 2021

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

master

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

./configure --with-ofi --with-ucx ---with-pmix=internal -enable-debug --enable-mem-debug

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 9984334f91eadd17ebf3618b95d74fbca9c708fc 3rd-party/openpmix (v1.1.3-3208-g9984334f)
 b2d1226461f9be2689779a2f3b8503987559f69e 3rd-party/prrte (psrvr-v2.0.0rc1-4051-gb2d1226461)

Please describe the system on which you are running

  • Operating system/version: Fedora 34
  • Computer hardware: x86_64
  • Network type: isolated

Details of the problem

I fresh run of mpi4py testsuite is printing the following malloc zero warnings and aborts after failed assertion:

...
malloc debug: Request for 0 bytes (osc_ucx_comm.c, 661)
malloc debug: Request for 0 bytes (osc_ucx_comm.c, 972)
malloc debug: Request for 0 bytes (osc_ucx_active_target.c, 228)
malloc debug: Request for 0 bytes (osc_ucx_active_target.c, 104)
...
osc_ucx_component.c:610: ompi_osc_ucx_win_attach: Assertion `insert_index >= 0 && (uint64_t)insert_index < module->state.dynamic_win_count' failed.

Runinng with OMPI_MCA_osc=sm, the assertion abort goes away, and my OSC tests complete successfully.

@devreal
Copy link
Contributor

devreal commented Nov 11, 2021

@janjust this seems to come from the dynamic windows fixes?

@janjust
Copy link
Contributor

janjust commented Nov 12, 2021

@devreal Thank you, I'll take a look. I actually discovered other issues in post/wait. I'll take a look at this asap.

@janjust
Copy link
Contributor

janjust commented Mar 16, 2022

Fixed with #10126

@janjust janjust closed this as completed Mar 29, 2022
@gpaulsen
Copy link
Member

Fixed in v5.0.x with #10138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants