Skip to content

Retire the FAQ section of the docs, moving its content to other locations #11486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 54 commits into from
Closed

Conversation

qkoziol
Copy link
Contributor

@qkoziol qkoziol commented Mar 13, 2023

Log of changes for moving FAQ questions to other locations

Non-FAQ section changes:

  • Corrected MPI sub-version rendering bug in section 3.5.1 (MPI Standard
    Compliance), by creating and using new 'mpi_standard_full_version'
    substitution in conf.py
  • Added strikethru_start and strikethru_end substitutions in conf.py
  • Broke out updating/upgrading an Open MPI installation from within section
    4.11.2 (Installing over a prior Open MPI installation) into a new section
    4.12 (Updating or Upgrading an Open MPI installation)
  • s/ackwards/ackward/g in section 7 (Version numbers and compatibility)
  • Added link to section 8.3 (Setting MCA parameter values) to section 3.4
    (General Run-Time Support Notes)
  • Added new section 10.4 (Scheduling processes across hosts)
  • Added new section 10.11 (Unusual jobs) to section 10 (Launching MPI
    applications)
  • Added new section 10.12 (Troubleshooting) to section 10 (Launching MPI
    applications)
  • Changed title of section 11 from (Run-time tuning MPI applications) to
    (Run-time operation and tuning MPI applications)
  • Added new subsection 11.4 (Fault tolerance) to section 11 (Run-time
    operation and tuning MPI applications)
  • Added new subsection 11.5 (Large clusters) to section 11 (Run-time
    operation and tuning MPI applications)
  • Added new subsection 11.6 (Processor and memory affinity) to section 11
    (Run-time operation and tuning MPI applications)

FAQ section changes:

Supported Systems: (13.1)

  • Moved 13.1.1 (What operating systems does Open MPI support?), 13.1.2 (What
    hardware platforms does Open MPI support?), and 13.1.3 (What network
    interconnects does Open MPI support?) into Section 4 (Building and
    Installing Open MPI) as new section 4.2 (Supported Systems), between
    previous sections 4.1 and 4.2.
  • Moved 13.1.4 (How does Open MPI interface to back-end run-time systems?) to
    the top of section 10.3 (The role of PMIx and PRRTE).
  • Moved 13.1.5 (What run-time environments does Open MPI support?) to the top
    of section 3.2 (Platform Notes)
  • Deleted 13.1.6 (How much MPI does Open MPI support?), as it duplicates
    information in section 3.5.1 (MPI Standard Compliance)
  • Moved 13.1.7 (Is Open MPI thread safe?) to section 9 (Building MPI
    Applications) as new section 9.7.
  • Moved 13.1.8 (Does Open MPI support 64 bit environments?) to section 9
    (Building MPI Applications) as new section 9.8.
  • Moved 13.1.9 (Does Open MPI support execution in heterogeneous environments?)
    to section 9 (Building MPI Applications) as new section 9.9.
  • Moved 13.1.10 (Does Open MPI support parallel debuggers?) to section 12
    (Debugging Open MPI Parallel Applications) as new section 12.4, between
    previous sections 12.3 and 12.4.

System administrator-level technical information: (13.2)

  • Moved 13.2.1 (I’m a sysadmin; what do I care about Open MPI?) to section
    4 (Building and installing Open MPI) as new section 4.14 (Advice for
    System Administrators)
  • Moved 13.2.2 (Do I need multiple Open MPI installations?) to the end of
    section 4.11 (Installation Location), as new subsection 4.11.4 (Installing
    Multiple Copies of Open MPI).
  • Moved 13.2.3 (What are MCA Parameters? Why would I set them?) to section
    4 (Building and installing Open MPI) as 4.14.1 (Setting Global MCA
    Parameters), within new section 4.14 (Advice for System Administrators)
  • Moved 13.2.4 (Do my users need to have their own installation of Open MPI?)
    to section 4 (Building and installing Open MPI) as 4.14.5 (User
    customization of a global Open MPI installation), within section 4.14
    (Advice for System Administrators)
  • Deleted 13.2.5 (I have power users who will want to override my global MCA
    parameters; is this possible?), as the information is already incorporated
    into new section 4.14.5 (User customization of a global Open MPI
    installation).
  • Moved 13.2.6 (What MCA parameters should I, the system administrator, set?)
    to section 4 (Building and installing Open MPI) as 4.14.2 (Setting MCA
    Parameters for a Global Open MPI installation), within section 4.14
    (Advice for System Administrators)
  • Moved 13.2.7 (I just added a new plugin to my Open MPI installation; do I
    need to recompile all my MPI apps?) to section 4 (Building and installing
    Open MPI) as 4.14.3 (Adding a new plugin to a global Open MPI installation),
    within section 4.14 (Advice for System Administrators)
  • Moved 13.2.8 (I just upgraded my InfiniBand network; do I need to recompile
    all my MPI apps?) to section 4 (Building and installing Open MPI) as 4.14.4
    (Upgrading network hardware with a global Open MPI installation), within
    section 4.14 (Advice for System Administrators)
  • Moved 13.2.9 (We just upgraded our version of Open MPI; do I need to
    recompile all my MPI apps?) into new section 4.12 (Updating or Upgrading an
    Open MPI installation)
  • Moved 13.2.10 (I have an MPI application compiled for another MPI; will it
    work with Open MPI?) to be a warning at the top page of section 9 (Building
    MPI applications)

Building Open MPI: (13.3)

  • Moved 13.3.1 (How do I statically link to the libraries of Intel compiler
    suite?) to section 4 (Building and installing Open MPI) as new section
    4.6.1 (Statically linking to the libraries of Intel compiler suite), within
    section 4.6 (Specifying Compilers and flags)
  • Moved 13.3.2 (Why do I get errors about hwloc or libevent not found?) to
    section 4 (Building and installing Open MPI) as new section 4.7.5
    (Difficulties with C and Fortran), within section 4.7 (Required support
    libraries)

Running MPI Applications: (13.4)

  • Moved / integrated content from 13.4.1 (What prerequisites are necessary for
    running an Open MPI job?) into section 10.2 (Prerequisites)
  • Moved 13.4.2 (What ABI guarantees does Open MPI provide?) into section 7
    (Version numbers and backward compatibility)) as new section 7.2
    (Application Binary Interface (ABI) Compatibility)
  • Moved / integrated content from 13.4.3 (Do I need a common filesystem on all my nodes?) into the first few paragraphs of section 4.11 (Installation
    location)
  • Moved 13.4.4 (How do I add Open MPI to my PATH and LD_LIBRARY_PATH?) into
    section 10.2 (Prerequisites) as new section 10.2.1 (Adding Open MPI to
    PATH and LD_LIBRARY_PATH)
  • Moved 13.4.5 (What if I can’t modify my PATH and/or LD_LIBRARY_PATH?) into
    section 10.2 (Prerequisites) as new section 10.2.2 (Using the --prefix
    option with mpirun)
  • Integrated 13.4.6 (How do I launch Open MPI parallel jobs?) into
    the first few paragraphs of section 10 (Launching MPI applications)
  • Integrated 13.4.7 (How do I run a simple SPMD MPI job?) into sections 10.1.2
    (Launching on a single host) and 10.1.3 (Launching in a non-scheduled
    environments (via ssh))
  • Moved 13.4.8 (How do I run an MPMD MPI job?) to new subsection 10.11.4
    (Launching an MPMD MPI job) in section 10.11 (Unusual jobs)
  • Moved 13.4.9 (How do I specify the hosts on which my MPI job runs?) to new
    subsection 10.6.1 (Specifying the hosts for an MPI job), in section 10.6
    (Launching with SSH)
  • Moved 13.4.10 (How can I diagnose problems when running across multiple
    hosts?) to new subsection 10.12.3 (Problems when running across
    multiple hosts) in section 10.12 (Troubleshooting)
  • Moved 13.4.11 (I get errors about missing libraries. What should I do?) to new
    subsection 10.12.2 (Errors about missing libraries) in section 10.12
    (Troubleshooting)
  • Moved 13.4.12 (Can I run non-MPI programs with mpirun / mpiexec?) to new
    subsection 10.11.1 (Running non-MPI programs) in section 10.11 (Unusual
    jobs)
  • Moved 13.4.13 (Can I run GUI applications with Open MPI?) to new
    subsection 10.11.2 (Running GUI applications) in section 10.11 (Unusual
    jobs)
  • Moved 13.4.14 (Can I run ncurses-based / curses-based / applications with
    funky input schemes with Open MPI?) to new subsection 10.11.3 (Running
    curses-based applications) in section 10.11 (Unusual jobs)
  • Moved 13.4.15 (What other options are available to mpirun?) to new subsection
    10.1.1.1 (Other mpirun options) in section 10.1 (Quick start: Launching MPI
    applications)
  • Moved 13.4.16 (How do I use the --hostfile option to mpirun?) to new
    subsection 10.4.2 (Scheduling with the --hostfile option) in section 10.4
    (Scheduling processes across hosts)
  • Moved 13.4.17 (How do I use the --host option to mpirun?) to new
    subsection 10.4.3 (Scheduling with the --host option) in section 10.4
    (Scheduling processes across hosts)
  • Moved 13.4.18 (What are “slots”?) to new subsection 10.4.4 (Process slots)
    in section 10.4 (Scheduling processes across hosts)
  • Moved 13.4.19 (How are the number of slots calculated?) to new subsection
    10.4.4.1 (Calculating the number of slots) in section 10.4 (Scheduling
    processes across hosts)
  • Moved 13.4.20 (How do I control how my processes are scheduled across hosts?)
    to new subsection 10.4.1 (Scheduling overview) in section 10.4 (Scheduling
    processes across hosts)
  • Moved 13.4.21 (Can I oversubscribe nodes (run more processes than
    processors)?) to new subsection 10.4.5 (Oversubscribing nodes) in section
    10.4 (Scheduling processes across hosts)
  • Moved 13.4.22 (Can I force Aggressive or Degraded performance modes?) to
    new subsection 10.4.5.1 (Forcing aggressive or degraded performance mode)
    in section 10.4 (Scheduling processes across hosts)
  • Moved 13.4.23 (How do I run with the TotalView parallel debugger?) to new
    section 12.4 (Using Parallel Debuggers to Debug Open MPI Applications)
  • Moved 13.4.24 (How do I run with the DDT parallel debugger?) to new
    section 12.4 (Using Parallel Debuggers to Debug Open MPI Applications)
  • Moved 13.4.25 (How do I dynamically load libmpi at runtime? to new
    subsection 9.10 (Dynamically loading libmpi at runtime) in section 9
    (Building MPI applications)
  • Moved 13.4.26 (What MPI environment variables exist?) to new subsection
    11.1 (Environment variables set for MPI applications) in section 11
    (Run-time operation and tuning MPI applications)

Fault Tolerance: (13.5)

  • Moved 13.5.1 (What is “fault tolerance”?) to the opening of new subsection
    11.4 (Fault tolerance) in section 11 (Run-time operation and tuning MPI
    applications)
  • Moved 13.5.2 (What fault tolerance techniques has / does / will Open MPI
    support?) to new subsection 11.4.1 (Supported fault tolerance techniques)
    in section 11.4 (Fault Tolerance)
  • Moved 13.5.3 (Does Open MPI support checkpoint and restart of parallel
    jobs (similar to LAM/MPI)?) to new subsection 11.4.2 (Checkpoint and
    restart of parallel jobs) in section 11.4 (Fault Tolerance)
  • Moved 13.5.4 (Where can I find the fault tolerance development work?)
    to new subsection 11.4.1.1 (Current fault tolerance development) in
    section 11.4 (Fault tolerance)
  • Moved 13.5.5 (Does Open MPI support end-to-end data reliability in MPI
    message passing?) to new subsection 11.4.3 (End-to-end data reliability
    for MPI messages) in section 11.4 (Fault Tolerance)

Troubleshooting: (13.6)

  • Moved 13.6.1 (Messages about missing symbols) to new subsection 10.12.1
    (Messages about missing symbols when running my application) in section
    10.12 (Troubleshooting)
  • Deleted 13.6.2 (How do I attach a parallel debugger to my MPI job?), it's
    covered in section 12 (Debugging Open MPI Parallel Applications)
  • Moved 13.6.3 (How do I find out what MCA parameters are being seen/used by my
    job?) into Section 8 (The Modular Component Architecture), as new
    section 8.3, between previous sections 8.2 and 8.3.

Large Clusters: (13.7)

  • Moved 13.7.1 (How do I reduce startup time for jobs on large clusters?) to
    to new subsection 11.5.1 (Reducing startup time for jobs on large clusters)
    in section 11.5 (Large clusters)
  • Moved 13.7.2 (Where should I put my libraries: Network vs. local filesystems?)
    to new subsection 11.5.2 (Library location: network vs. local filesystems)
    in section 11.5 (Large clusters)
  • Moved 13.7.3 (Static vs. shared libraries?) to new subsection 11.5.2.1
    (Static vs. shared libraries) in section 11.5 (Large clusters)
  • Moved 13.7.4 (How do I reduce the time to wireup OMPI’s out-of-band
    communication system?) to new subsection 11.5.3 (Reducing wireup time) in
    section 11.5 (Large clusters)
  • Moved 13.7.5 (I know my cluster’s configuration - how can I take advantage of
    that knowledge?) to new subsection 11.5.4 (Static cluster configurations)
    in section 11.5 (Large clusters)

General Tuning: (13.8)

  • Moved 13.8.1 (How do I install my own components into an Open MPI
    installation?) to new subsection 11.2 (Installing custom components)
    in section 11 (Run-time operation and tuning MPI applications)
  • Moved 13.8.2 (What is processor affinity? Does Open MPI support it?) to
    to new subsection 11.6.1 (Processor affinity) in section 11.6 (Processor and
    memory affinity)
  • Moved 13.8.3 (What is memory affinity? Does Open MPI support it?) to
    to new subsection 11.6.2 (Memory affinity) in section 11.6 (Processor and
    memory affinity)
  • Moved 13.8.4 (How do I tell Open MPI to use processor and/or memory affinity?) to
    to new subsection 11.6.3 (Enabling processor and memory affinity) in
    section 11.6 (Processor and memory affinity)
  • Moved 13.8.5 (Does Open MPI support calling fork(), system(), or popen() in
    MPI processes?) to new subsection 9.11 (Calling fork(), system(), or popen()
    in MPI processes?) in section 9 (Building MPI applications)
  • Moved 13.8.6 (I want to run some performance benchmarks with Open MPI. How do
    I do that?) to new subsection 11.8 (Benchmarking Open MPI applications)
    in section 11 (Run-time operation and tuning MPI applications)
  • Deleted 13.8.7 (I am getting a MPI_WIN_FREE error from IMB-EXT — what do I
    do?), as it's about a bugggy version of the Intel MPI benchmarks from 2009.

Signed-off-by: Quincey Koziol [email protected]

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 13, 2023

This is the whole set of changes needed to retire the FAQ, comments welcome.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 13, 2023

Force pushed new commit, with signed-off-by

Copy link
Contributor

@wckzhang wckzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for section - System administrator-level technical information: (13.2)

@wckzhang
Copy link
Contributor

I'm not going to ask for any resolution of the TODO's I see here, I think that can come as a separate PR and we should probably grep for TODO in the docs

@wckzhang
Copy link
Contributor

wckzhang commented Mar 13, 2023

Moved / integrated content from 13.4.3 (What ABI guarantees does Open MPI
provide?) into the first few paragraphs of section 4.11 (Installation
location)

This section name is wrong, not related to ABI guarantees

Moved 13.3.2 (Why do I get errors about hwloc or libevent not found?) to
section 4 (Building and installing Open MPI) as new section 4.7.5
(Difficulties with C and Fortran), within section 4.6 (Required support
libraries)

4.6 -> 4.7

Moved 13.8.2 (What is processor affinity? Does Open MPI support it?) to
to new subsection 11.5.1 (Processor affinity) in section 11.5 (Processor and
memory affinity)
Moved 13.8.3 (What is memory affinity? Does Open MPI support it?) to
to new subsection 11.5.2 (Memory affinity) in section 11.5 (Processor and
memory affinity)

This moved to 11.6.1/2 in section 11.6

@wckzhang
Copy link
Contributor

What is that merge commit?

@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 23, 2023

What is that merge commit?

Just updating against the mainline changes.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 23, 2023

I'm not going to ask for any resolution of the TODO's I see here, I think that can come as a separate PR and we should probably grep for TODO in the docs

Yes, that's where I was also.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

3 similar comments
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 23, 2023

Moved / integrated content from 13.4.3 (What ABI guarantees does Open MPI
provide?) into the first few paragraphs of section 4.11 (Installation
location)

This section name is wrong, not related to ABI guarantees

Moved 13.3.2 (Why do I get errors about hwloc or libevent not found?) to
section 4 (Building and installing Open MPI) as new section 4.7.5
(Difficulties with C and Fortran), within section 4.6 (Required support
libraries)

4.6 -> 4.7

Moved 13.8.2 (What is processor affinity? Does Open MPI support it?) to
to new subsection 11.5.1 (Processor affinity) in section 11.5 (Processor and
memory affinity)
Moved 13.8.3 (What is memory affinity? Does Open MPI support it?) to
to new subsection 11.5.2 (Memory affinity) in section 11.5 (Processor and
memory affinity)

This moved to 11.6.1/2 in section 11.6

Fixed the PR description, thanks

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

9ea2ada: Retire the FAQ section of the docs, moving its con...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

abouteiller and others added 5 commits March 23, 2023 17:46
Add error management to ack_failed_internal

Use comm->num_acked to decide if any_source should be re-enabled

More compatibility with mixed use of failure_ack and ack_failed

Additional shortcut: bypass computing num_acked when no new faults

Recompute num_acked when the v1 interface got used inbetween

Signed-off-by: Aurelien Bouteiller <[email protected]>

Copyrights

The group needs to be initialized before the first goto.

Signed-off-by: George Bosilca <[email protected]>

Add #defines for the new API.

Signed-off-by: George Bosilca <[email protected]>
ulfm/ishrink: Some fortran bindings had errors, some were missing,
some needed to use PMPI f2c conversions from within the Fortran bindings

ulfm/ishrink: _f functions must not be static

Signed-off-by: Aurelien Bouteiller <[email protected]>
abouteiller and others added 28 commits March 23, 2023 17:46
This patch is to address:
    #11448

When Open MPI is compiled with CUDA support,
comm->c_coll->coll_xxx_module is coll_cuda_module and
HAN_LOAD_FALLBACK_COLLECTIVE is a no-op.

As a result, HAN's collective functions can be called
even if HAN has been disabled, which resulted an infinitely
recursive calling loop.

To address this issue, this patch make HAN's collective
fucntion to call fallback function when HAN module was
disabled.

Signed-off-by: Wei Zhang <[email protected]>
Instead of calling the communicator collective module after
disqualifying HAN call directly the fallback collective. This avoids
circular dependencies between modules in some cases. However, while this
solution works it deliver suboptimal performance as the withdrawn module
will remain in the call chain, but will behave as a passthrough.

Signed-off-by: George Bosilca <[email protected]>
silence the warnings generated when compiling the rocm component
that stem from the hip_runtime_api.h header file. There is nothing
we can do about them here, and it just makes it harder to identify
actually issues in the component code itself.

Signed-off-by: Edgar Gabriel <[email protected]>
Includes fix for launch at scale

Signed-off-by: Ralph Castain <[email protected]>
…request

Using atomic swap operations we make sure that a thread completing
a request will atomically mark it for the thread registering a callback.
Similarly, the thread registering a callback will register it atomically
and check for whether the request has completed.


Signed-off-by: Joseph Schuchart <[email protected]>
…eting a request"

This reverts commit b6467d0. The change
breaks some persistent op implementations, which do not reinitialize the
request object.
Similar to
open-mpi/ompi-scripts@7dc912e,
add "--bind-to none" to the mpirun command.

The CI tests on the AWS jenkins instance all run on instances with 1
core.  There's a special case in OMPI to bind to core for a 2 process
run for benchmarking reasons, and this was causing failures in mapping
because there aren't 2 cores on the instance.  So instead turn off
binding for these small tests.

Signed-off-by: Jeff Squyres <[email protected]>
The new Jenkins CI appears to be ready.  This commit effectively
reverts 6406e00, and re-enables all the Jenkinsfile-based tests.

Signed-off-by: Jeff Squyres <[email protected]>
The way that color string was initialized caused
problems as the pointer to opal_var_dump_color_string
became (nil) after re-entering this path after
performing mca_base_var_register.

Signed-off-by: William Zhang <[email protected]>
Setting the default right before registration is the well
known behavior and prevents issues as the pointer can be
set to NULL during param deregistration.

Signed-off-by: William Zhang <wilzhang.amazon.com>
Setting the default right before registration is the well
known behavior and prevents issues as the pointer can be
set to NULL during param deregistration.

Signed-off-by: William Zhang <wilzhang.amazon.com>
Setting the default right before registration is the well
known behavior and prevents issues as the pointer can be
set to NULL during param deregistration.

Signed-off-by: William Zhang <wilzhang.amazon.com>
Clean the workspace after every stage (ie, test) to avoid filling
disk.  The downside of this change is that we can't reuse a checkout
of OMPI between stages that run on the same build node.  The upside
is that we are much less likely to run out of disk space during a
test.  We ran into some issues today when there were many builds,
because the workspace name is different between pull requests, and
when a build node had enough checkouts (one for each pull request),
we filled the disk.

Signed-off-by: Brian Barrett <[email protected]>
This patch fixed the incorrect handling of MPI_IN_PLACE in t1, t2, t3
task in calling reduce and ireduce.

Signed-off-by: Wei Zhang <[email protected]>
accelerators.

related to issue #11246

Signed-off-by: Howard Pritchard <[email protected]>
Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
This patch try to use 1.18 API when it is available. This is because
1.18 API clearly define provider's CUDA API behavior to be that
provider can call CUDA API by default if application uses 1.18 API.

When using older version of API, some libfabric will not claim
support of FI_HMEM even if it is capable of supporting because
the provider does not know whether CUDA calls are permitted.

Signed-off-by: Wei Zhang <[email protected]>
@qkoziol
Copy link
Contributor Author

qkoziol commented Mar 23, 2023

Re-opening this, to clean up the log and signed-off-by for commits

@qkoziol qkoziol closed this Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.