ops: embiggen the ops framework #13015

hppritcha · 2025-01-03T17:35:24Z

to take a count argument of type size_t.

related to #12226

related to #9194

to take a count argument of type size_t. related to open-mpi#12226 related to open-mpi#9194 Signed-off-by: Howard Pritchard <[email protected]>

devreal · 2025-01-08T20:49:45Z

ompi/op/op.h

@@ -588,15 +573,23 @@ static inline void ompi_op_reduce(ompi_op_t * op, const void *source,
                                          op->o_func.java_data.object);
        return;
    }
-    op->o_func.c_fn(source, target, &count, &dtype);
+    if (0 == (op->o_flags & OMPI_OP_FLAGS_BIGCOUNT)) {
+        op->o_func.c_fn(source, target, &count, &dtype);


Should this code be moved here? We still need to handle larger counts even if the user hasn't provided a big-count callback.

I'm not sure I understand the comment. What is the this code link referring to? My understanding of the standard is that for user defined reduction functions that if they are expecting to need to handle big count elements they need to define a bigcount aware reduction op.

You removed the blob of code that deals with larger-than INT_MAX elements here: https://github.com/open-mpi/ompi/pull/13015/files#diff-dfd595860dc4d0c9c73114bd8563c0786d063327f6e30955a3e361c3f6098e3dL510-L539

Should this be moved into the code path where we deal with user reduction operators that are not big-count ready? Otherwise we might have overflows if the we want to reduce 4M elements with a "legacy" user reduction operator.

no that was stuff josh added to handle the fact that the ompi_op_reduce was already using a size_t full_count arg for some reason before this PR, but our internal op table methods could only handle int.

devreal · 2025-01-08T21:26:52Z

ompi/op/op.h

@@ -507,36 +518,6 @@ static inline void ompi_op_reduce(ompi_op_t * op, const void *source,
    MPI_Fint f_dtype, f_count;
    int count = full_count;


This smells like UB and I'm worried the compiler might some day use it to remove any checks for INT_MAX. Maybe count needs to be size_t as well?

i'll move count just to the section of code invoking user op_reduce methods that are using the little count interface.

hmm.. actually the verbiage in the 4.1 standard in section 6.9.5 makes me think we do need to keep that loop thing but only for user "small count" ops. so you're right! good catch.

@devreal check now. check if full_count bigger than INT_MAX and user defined operator is of small count variety, if yes, loop.

Signed-off-by: Howard Pritchard <[email protected]>

devreal · 2025-01-08T23:37:42Z

ompi/op/op.h

-        }
-        return;
-    }
+    int count = (int)full_count;


I think this is still flaky. If the compiler sees this assignment it can assume (because of UB) that it does not cause an overflow (and thus full_count <= INT_MAX) so this jeopardizes the check below. I'm not even sure we still need count here. When we do need to pass an int instead of size_t we should have a local variable count and the assignment there, only if we know for sure that we don't overflow.

bosilca

There are two ways to "address" the big count issue in the op. One is the industrious effort, partially correct, proposed in this PR. The other, is to take advantage of the fact that the developers of op framework were forward looking and decided to handle everything not with a direct call into the op module, but by an intermediary call ompi_op_reduce (and the 3 arg variant) that already take a size_t.

We can indeed have the same level of functionality with minimal changes to the op creation (to add the size_t user defined op), and then a simple change to the ompi_op_reduce to handle it. No need to change the version of of the component, no need to touch a large portion of other people's code, no negative performance impact, and a solution in less than 100 lines of code.

hppritcha · 2025-01-09T16:20:33Z

Hmmm... okay I will refactor this to not make any(hopefully) changes to the ops framework.

hppritcha · 2025-01-09T22:59:46Z

closing this and replacing with much simpler PR #13030

ops: embiggen the ops framework

b047013

to take a count argument of type size_t. related to open-mpi#12226 related to open-mpi#9194 Signed-off-by: Howard Pritchard <[email protected]>

github-actions bot added the Target: main label Jan 3, 2025

hppritcha marked this pull request as draft January 3, 2025 17:35

hppritcha added the MPI-4.0 label Jan 3, 2025

hppritcha marked this pull request as ready for review January 5, 2025 20:54

hppritcha requested review from ggouaillardet and devreal and removed request for ggouaillardet January 5, 2025 20:54

devreal requested changes Jan 8, 2025

View reviewed changes

devreal reviewed Jan 8, 2025

View reviewed changes

pr feedback

7664e23

Signed-off-by: Howard Pritchard <[email protected]>

devreal reviewed Jan 8, 2025

View reviewed changes

bosilca requested changes Jan 9, 2025

View reviewed changes

hppritcha closed this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ops: embiggen the ops framework #13015

ops: embiggen the ops framework #13015

Uh oh!

hppritcha commented Jan 3, 2025

Uh oh!

devreal Jan 8, 2025

Uh oh!

hppritcha Jan 8, 2025

Uh oh!

devreal Jan 8, 2025

Uh oh!

hppritcha Jan 8, 2025

Uh oh!

devreal Jan 8, 2025

Uh oh!

hppritcha Jan 8, 2025

Uh oh!

hppritcha Jan 8, 2025

Uh oh!

hppritcha Jan 8, 2025

Uh oh!

devreal Jan 8, 2025

Uh oh!

bosilca left a comment

Uh oh!

hppritcha commented Jan 9, 2025

Uh oh!

hppritcha commented Jan 9, 2025

Uh oh!

Uh oh!

		@@ -507,36 +518,6 @@ static inline void ompi_op_reduce(ompi_op_t * op, const void *source,
		MPI_Fint f_dtype, f_count;
		int count = full_count;

ops: embiggen the ops framework #13015

ops: embiggen the ops framework #13015

Uh oh!

Conversation

hppritcha commented Jan 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bosilca left a comment

Choose a reason for hiding this comment

Uh oh!

hppritcha commented Jan 9, 2025

Uh oh!

hppritcha commented Jan 9, 2025

Uh oh!

Uh oh!