-
Notifications
You must be signed in to change notification settings - Fork 900
Retire the FAQ section of the docs, moving its content to other locations #11486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
This is the whole set of changes needed to retire the FAQ, comments welcome. |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Force pushed new commit, with signed-off-by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review for section - System administrator-level technical information: (13.2)
I'm not going to ask for any resolution of the TODO's I see here, I think that can come as a separate PR and we should probably grep for TODO in the docs |
This section name is wrong, not related to ABI guarantees
4.6 -> 4.7
This moved to 11.6.1/2 in section 11.6 |
What is that merge commit? |
Just updating against the mainline changes. |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
1 similar comment
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Yes, that's where I was also. |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
3 similar comments
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Fixed the PR description, thanks |
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
1 similar comment
Hello! The Git Commit Checker CI bot found a few problems with this PR: 9ea2ada: Retire the FAQ section of the docs, moving its con...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Add error management to ack_failed_internal Use comm->num_acked to decide if any_source should be re-enabled More compatibility with mixed use of failure_ack and ack_failed Additional shortcut: bypass computing num_acked when no new faults Recompute num_acked when the v1 interface got used inbetween Signed-off-by: Aurelien Bouteiller <[email protected]> Copyrights The group needs to be initialized before the first goto. Signed-off-by: George Bosilca <[email protected]> Add #defines for the new API. Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: Aurelien Bouteiller <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
ulfm/ishrink: Some fortran bindings had errors, some were missing, some needed to use PMPI f2c conversions from within the Fortran bindings ulfm/ishrink: _f functions must not be static Signed-off-by: Aurelien Bouteiller <[email protected]>
Signed-off-by: Aurelien Bouteiller <[email protected]>
examples Signed-off-by: Aurélien Bouteiller <[email protected]>
This patch is to address: #11448 When Open MPI is compiled with CUDA support, comm->c_coll->coll_xxx_module is coll_cuda_module and HAN_LOAD_FALLBACK_COLLECTIVE is a no-op. As a result, HAN's collective functions can be called even if HAN has been disabled, which resulted an infinitely recursive calling loop. To address this issue, this patch make HAN's collective fucntion to call fallback function when HAN module was disabled. Signed-off-by: Wei Zhang <[email protected]>
Instead of calling the communicator collective module after disqualifying HAN call directly the fallback collective. This avoids circular dependencies between modules in some cases. However, while this solution works it deliver suboptimal performance as the withdrawn module will remain in the call chain, but will behave as a passthrough. Signed-off-by: George Bosilca <[email protected]>
silence the warnings generated when compiling the rocm component that stem from the hip_runtime_api.h header file. There is nothing we can do about them here, and it just makes it harder to identify actually issues in the component code itself. Signed-off-by: Edgar Gabriel <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Jan Fecht <[email protected]>
Includes fix for launch at scale Signed-off-by: Ralph Castain <[email protected]>
…request Using atomic swap operations we make sure that a thread completing a request will atomically mark it for the thread registering a callback. Similarly, the thread registering a callback will register it atomically and check for whether the request has completed. Signed-off-by: Joseph Schuchart <[email protected]>
…eting a request" This reverts commit b6467d0. The change breaks some persistent op implementations, which do not reinitialize the request object.
Similar to open-mpi/ompi-scripts@7dc912e, add "--bind-to none" to the mpirun command. The CI tests on the AWS jenkins instance all run on instances with 1 core. There's a special case in OMPI to bind to core for a 2 process run for benchmarking reasons, and this was causing failures in mapping because there aren't 2 cores on the instance. So instead turn off binding for these small tests. Signed-off-by: Jeff Squyres <[email protected]>
The new Jenkins CI appears to be ready. This commit effectively reverts 6406e00, and re-enables all the Jenkinsfile-based tests. Signed-off-by: Jeff Squyres <[email protected]>
home at https://github.com/hpc/mpir-to-pmix-guide Signed-off-by: Howard Pritchard <[email protected]>
The way that color string was initialized caused problems as the pointer to opal_var_dump_color_string became (nil) after re-entering this path after performing mca_base_var_register. Signed-off-by: William Zhang <[email protected]>
Setting the default right before registration is the well known behavior and prevents issues as the pointer can be set to NULL during param deregistration. Signed-off-by: William Zhang <wilzhang.amazon.com>
Setting the default right before registration is the well known behavior and prevents issues as the pointer can be set to NULL during param deregistration. Signed-off-by: William Zhang <wilzhang.amazon.com>
Setting the default right before registration is the well known behavior and prevents issues as the pointer can be set to NULL during param deregistration. Signed-off-by: William Zhang <wilzhang.amazon.com>
Clean the workspace after every stage (ie, test) to avoid filling disk. The downside of this change is that we can't reuse a checkout of OMPI between stages that run on the same build node. The upside is that we are much less likely to run out of disk space during a test. We ran into some issues today when there were many builds, because the workspace name is different between pull requests, and when a build node had enough checkouts (one for each pull request), we filled the disk. Signed-off-by: Brian Barrett <[email protected]>
This patch fixed the incorrect handling of MPI_IN_PLACE in t1, t2, t3 task in calling reduce and ireduce. Signed-off-by: Wei Zhang <[email protected]>
accelerators. related to issue #11246 Signed-off-by: Howard Pritchard <[email protected]>
Co-authored-by: William Zhang <[email protected]> Signed-off-by: Quincey Koziol <[email protected]>
Co-authored-by: William Zhang <[email protected]> Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
This patch try to use 1.18 API when it is available. This is because 1.18 API clearly define provider's CUDA API behavior to be that provider can call CUDA API by default if application uses 1.18 API. When using older version of API, some libfabric will not claim support of FI_HMEM even if it is capable of supporting because the provider does not know whether CUDA calls are permitted. Signed-off-by: Wei Zhang <[email protected]>
Re-opening this, to clean up the log and signed-off-by for commits |
Log of changes for moving FAQ questions to other locations
Non-FAQ section changes:
Compliance), by creating and using new 'mpi_standard_full_version'
substitution in conf.py
4.11.2 (Installing over a prior Open MPI installation) into a new section
4.12 (Updating or Upgrading an Open MPI installation)
(General Run-Time Support Notes)
applications)
applications)
(Run-time operation and tuning MPI applications)
operation and tuning MPI applications)
operation and tuning MPI applications)
(Run-time operation and tuning MPI applications)
FAQ section changes:
Supported Systems: (13.1)
hardware platforms does Open MPI support?), and 13.1.3 (What network
interconnects does Open MPI support?) into Section 4 (Building and
Installing Open MPI) as new section 4.2 (Supported Systems), between
previous sections 4.1 and 4.2.
the top of section 10.3 (The role of PMIx and PRRTE).
of section 3.2 (Platform Notes)
information in section 3.5.1 (MPI Standard Compliance)
Applications) as new section 9.7.
(Building MPI Applications) as new section 9.8.
to section 9 (Building MPI Applications) as new section 9.9.
(Debugging Open MPI Parallel Applications) as new section 12.4, between
previous sections 12.3 and 12.4.
System administrator-level technical information: (13.2)
4 (Building and installing Open MPI) as new section 4.14 (Advice for
System Administrators)
section 4.11 (Installation Location), as new subsection 4.11.4 (Installing
Multiple Copies of Open MPI).
4 (Building and installing Open MPI) as 4.14.1 (Setting Global MCA
Parameters), within new section 4.14 (Advice for System Administrators)
to section 4 (Building and installing Open MPI) as 4.14.5 (User
customization of a global Open MPI installation), within section 4.14
(Advice for System Administrators)
parameters; is this possible?), as the information is already incorporated
into new section 4.14.5 (User customization of a global Open MPI
installation).
to section 4 (Building and installing Open MPI) as 4.14.2 (Setting MCA
Parameters for a Global Open MPI installation), within section 4.14
(Advice for System Administrators)
need to recompile all my MPI apps?) to section 4 (Building and installing
Open MPI) as 4.14.3 (Adding a new plugin to a global Open MPI installation),
within section 4.14 (Advice for System Administrators)
all my MPI apps?) to section 4 (Building and installing Open MPI) as 4.14.4
(Upgrading network hardware with a global Open MPI installation), within
section 4.14 (Advice for System Administrators)
recompile all my MPI apps?) into new section 4.12 (Updating or Upgrading an
Open MPI installation)
work with Open MPI?) to be a warning at the top page of section 9 (Building
MPI applications)
Building Open MPI: (13.3)
suite?) to section 4 (Building and installing Open MPI) as new section
4.6.1 (Statically linking to the libraries of Intel compiler suite), within
section 4.6 (Specifying Compilers and flags)
section 4 (Building and installing Open MPI) as new section 4.7.5
(Difficulties with C and Fortran), within section 4.7 (Required support
libraries)
Running MPI Applications: (13.4)
running an Open MPI job?) into section 10.2 (Prerequisites)
(Version numbers and backward compatibility)) as new section 7.2
(Application Binary Interface (ABI) Compatibility)
location)
section 10.2 (Prerequisites) as new section 10.2.1 (Adding Open MPI to
PATH and LD_LIBRARY_PATH)
section 10.2 (Prerequisites) as new section 10.2.2 (Using the --prefix
option with mpirun)
the first few paragraphs of section 10 (Launching MPI applications)
(Launching on a single host) and 10.1.3 (Launching in a non-scheduled
environments (via ssh))
(Launching an MPMD MPI job) in section 10.11 (Unusual jobs)
subsection 10.6.1 (Specifying the hosts for an MPI job), in section 10.6
(Launching with SSH)
hosts?) to new subsection 10.12.3 (Problems when running across
multiple hosts) in section 10.12 (Troubleshooting)
subsection 10.12.2 (Errors about missing libraries) in section 10.12
(Troubleshooting)
subsection 10.11.1 (Running non-MPI programs) in section 10.11 (Unusual
jobs)
subsection 10.11.2 (Running GUI applications) in section 10.11 (Unusual
jobs)
funky input schemes with Open MPI?) to new subsection 10.11.3 (Running
curses-based applications) in section 10.11 (Unusual jobs)
10.1.1.1 (Other mpirun options) in section 10.1 (Quick start: Launching MPI
applications)
subsection 10.4.2 (Scheduling with the --hostfile option) in section 10.4
(Scheduling processes across hosts)
subsection 10.4.3 (Scheduling with the --host option) in section 10.4
(Scheduling processes across hosts)
in section 10.4 (Scheduling processes across hosts)
10.4.4.1 (Calculating the number of slots) in section 10.4 (Scheduling
processes across hosts)
to new subsection 10.4.1 (Scheduling overview) in section 10.4 (Scheduling
processes across hosts)
processors)?) to new subsection 10.4.5 (Oversubscribing nodes) in section
10.4 (Scheduling processes across hosts)
new subsection 10.4.5.1 (Forcing aggressive or degraded performance mode)
in section 10.4 (Scheduling processes across hosts)
section 12.4 (Using Parallel Debuggers to Debug Open MPI Applications)
section 12.4 (Using Parallel Debuggers to Debug Open MPI Applications)
subsection 9.10 (Dynamically loading libmpi at runtime) in section 9
(Building MPI applications)
11.1 (Environment variables set for MPI applications) in section 11
(Run-time operation and tuning MPI applications)
Fault Tolerance: (13.5)
11.4 (Fault tolerance) in section 11 (Run-time operation and tuning MPI
applications)
support?) to new subsection 11.4.1 (Supported fault tolerance techniques)
in section 11.4 (Fault Tolerance)
jobs (similar to LAM/MPI)?) to new subsection 11.4.2 (Checkpoint and
restart of parallel jobs) in section 11.4 (Fault Tolerance)
to new subsection 11.4.1.1 (Current fault tolerance development) in
section 11.4 (Fault tolerance)
message passing?) to new subsection 11.4.3 (End-to-end data reliability
for MPI messages) in section 11.4 (Fault Tolerance)
Troubleshooting: (13.6)
(Messages about missing symbols when running my application) in section
10.12 (Troubleshooting)
covered in section 12 (Debugging Open MPI Parallel Applications)
job?) into Section 8 (The Modular Component Architecture), as new
section 8.3, between previous sections 8.2 and 8.3.
Large Clusters: (13.7)
to new subsection 11.5.1 (Reducing startup time for jobs on large clusters)
in section 11.5 (Large clusters)
to new subsection 11.5.2 (Library location: network vs. local filesystems)
in section 11.5 (Large clusters)
(Static vs. shared libraries) in section 11.5 (Large clusters)
communication system?) to new subsection 11.5.3 (Reducing wireup time) in
section 11.5 (Large clusters)
that knowledge?) to new subsection 11.5.4 (Static cluster configurations)
in section 11.5 (Large clusters)
General Tuning: (13.8)
installation?) to new subsection 11.2 (Installing custom components)
in section 11 (Run-time operation and tuning MPI applications)
to new subsection 11.6.1 (Processor affinity) in section 11.6 (Processor and
memory affinity)
to new subsection 11.6.2 (Memory affinity) in section 11.6 (Processor and
memory affinity)
to new subsection 11.6.3 (Enabling processor and memory affinity) in
section 11.6 (Processor and memory affinity)
MPI processes?) to new subsection 9.11 (Calling fork(), system(), or popen()
in MPI processes?) in section 9 (Building MPI applications)
I do that?) to new subsection 11.8 (Benchmarking Open MPI applications)
in section 11 (Run-time operation and tuning MPI applications)
do?), as it's about a bugggy version of the Intel MPI benchmarks from 2009.
Signed-off-by: Quincey Koziol [email protected]