-
Notifications
You must be signed in to change notification settings - Fork 0
[DRAFT][flang] Multi-Image Fortran (MIF) Dialect #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JDPailleux I'm not sure why this was moved from PR #3 without explanation, but the only difference seems to be the branch names. I've copied over my initial high-level comments from #3.
I have plenty of other feedback on the actual content of the dialect, but before diving into that I was hoping for some clarity on the intended scope and name of the dialect.
Is this dialect intended to capture all of Fortran's multi-image functionality (i.e. everything that gets lowered into a PRIF call)? Or is it really intended to just capture coarrays (in which case large sections of the current draft seem out-of-scope)?
} | ||
|
||
//===----------------------------------------------------------------------===// | ||
// Collective Operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do Collective Subroutines really belong in this dialect?
These are often conflated during informal conversations, but Collective Subroutines have absolutely nothing to do with coarrays. They are a completely orthogonal multi-image feature. In fact the argument to collective subroutines is even prohibited from being a coindexed object.
I'm not sure these belong in ANY dialect - for most static purposes, collective subroutines are just a regular intrinsic subroutine. IMHO they don't belong lumped into a coarray dialect any more than say, the SQRT intrinsic.
} | ||
|
||
//===----------------------------------------------------------------------===// | ||
// Image Queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The STOP and FAIL IMAGE statements and intrinsics NUM_IMAGES, STOPPED_IMAGES, FAILED_IMAGES and IMAGE_STATUS also have nothing to do with coarrays. Neither does SYNC ALL, nor (arguably) all the TEAM support routines.
Perhaps this dialect should be renamed to the multi-image dialect? That name could encompass all the features related to multi-image parallel execution, of which coarrays are an important subset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 3fef235
537be5d
to
3fef235
Compare
Hi @bonachea , following your feedback and what was mentioned during the committee meeting, I renamed the dialect to Multi-Image (MIF) dialect because it make more sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great start on this dialect!
I made many comments/requests. These include may simple suggestions that can probably be applied immediately (I highly recommend using the "commit suggestion" button in the GitHub UI for this purpose, it saves alot of typing and is less error-prone than manual application). There are also a few comments that may require more sweeping changes and possibly further discussion.
I will also likely have more feedback once we start to address some of these issues I've raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several operations seem to be notably missing from this dialect:
- non-coarray
ALLOCATE
MOVE_ALLOC
with coarray arguments
I assume these pre-existing serial features already exists as operations elsewhere in HLFIR/MLIR, but they will need to be lowered to PRIF calls when compiling in multi-image mode.
How do we propose to handle such operations?
} | ||
|
||
//===----------------------------------------------------------------------===// | ||
// Image Queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 3fef235
Hi @bonachea, sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JDPailleux - Thanks for all the improvements!
I've made a few additional comments on open threads and added a few more suggestions.
I again highly recommend using the "commit suggestion" button in the GitHub UI for suggestions you agree with: this wonderful button saves alot of typing and is less error-prone than manual application. There is also an "Add suggestion to batch" button that can be used to batch suggestions into a single commit.
Also a few comments from the first review remain unresolved.
- `atom`: Shall be a coarray of type integer with kind | ||
ATOMIC_INT_KIND or ATOMIC_LOGICAL_KIND from module ISO_FORTRAN_ENV | ||
- `old`, `new` and `compared`: Shall be a scalar integer with kind ATOMIC_LOGICAL_KIND. | ||
They need to have the same type and kind as `atom`. | ||
}]; | ||
|
||
let arguments = (ins AnyIntegerType:$image_num, | ||
AnyIntegerType:$atom, | ||
Arg<AnyIntegerType, "", [MemWrite]>:$old, | ||
Arg<AnyIntegerType, "", [MemRead]>:$compare, | ||
Arg<AnyIntegerType, "", [MemRead]>:$new_val, | ||
Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned, ATOMIC_LOGICAL_KIND is not an integer kind. Strictly speaking the two atomic types in Fortran are:
- integer(ATOMIC_INT_KIND) and
- logical(ATOMIC_LOGICAL_KIND)
It may be that flang internally conflates integer
and logical
(not sure?), but I think at least in descriptions we should try to express the types correctly.
In the suggestion below I'm unsure whether AnyIntegerType
will encompass logical(ATOMIC_LOGICAL_KIND)
, so this might have to be something more generic.
- `atom`: Shall be a coarray of type integer with kind | |
ATOMIC_INT_KIND or ATOMIC_LOGICAL_KIND from module ISO_FORTRAN_ENV | |
- `old`, `new` and `compared`: Shall be a scalar integer with kind ATOMIC_LOGICAL_KIND. | |
They need to have the same type and kind as `atom`. | |
}]; | |
let arguments = (ins AnyIntegerType:$image_num, | |
AnyIntegerType:$atom, | |
Arg<AnyIntegerType, "", [MemWrite]>:$old, | |
Arg<AnyIntegerType, "", [MemRead]>:$compare, | |
Arg<AnyIntegerType, "", [MemRead]>:$new_val, | |
Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg); | |
- `atom`: Shall be a coarray of type integer with kind | |
ATOMIC_INT_KIND or of type logical with kind ATOMIC_LOGICAL_KIND | |
from module ISO_FORTRAN_ENV | |
- `old`, `new` and `compared`: Shall be a scalars with the same type and kind as `atom`. | |
}]; | |
let arguments = (ins AnyIntegerType:$image_num, | |
AnyIntegerType:$atom, | |
Arg<AnyIntegerType, "", [MemWrite]>:$old, | |
Arg<AnyIntegerType, "", [MemRead]>:$compare, | |
Arg<AnyIntegerType, "", [MemRead]>:$new_val, | |
Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg); |
The same comment also applies to mif_AtomicDefineOp
and mif_AtomicRefOp
, the two other operations that accept both integer
or logical
arguments.
The co_broadcast operation performs the computation of the sum | ||
across images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co_broadcast is a data movement collective, not a computational collective. It does not perform a "sum" (or any other ALU computation).
The co_broadcast operation performs the computation of the sum | |
across images. | |
The co_broadcast operation broadcasts a value across images. |
def mif_CoReduceOp : mif_Op<"co_reduce", [AttrSizedOperandSegments]> { | ||
let summary = "Generalized reduction across images."; | ||
let description = [{ | ||
The co_sum operation performs the reduction across images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The co_sum operation performs the reduction across images. | |
The co_reduce operation performs the reduction across images. |
"Performs a collective synchronization of all images in the current team"; | ||
|
||
let arguments = (ins Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg); | ||
let results = (outs AnyIntegerType:$stat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand your answer.
My main concern is to ensure we faithfully represent the presence of absence of the optional STAT=
argument from the user's original code in the dialect, so that when it comes time to lower this operation to a PRIF call you can match the user's choice to provide STAT or not in the corresponding PRIF call.
I should note this detail only matters when an error occurs, and maybe we initially don't care about error behavior. But if we expect to some day provide spec-compliant error behavior, then we probably need to preserve that information somehow in the IR.
let summary = "Performs a synchronization of image with each of the other " | ||
"images in the `image_set`"; | ||
|
||
let arguments = (ins fir_SequenceType:$image_set, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The F23 standard does not actually require the image_set to be non-empty:

but of course you can pass an empty rank-one array of integers, e.g.:
PROGRAM SYNCIMGS
integer, dimension(1:0) :: empty_array
SYNC IMAGES(*)
SYNC IMAGES(THIS_IMAGE())
SYNC IMAGES([INTEGER ::])
SYNC IMAGES(empty_array)
print *, THIS_IMAGE(), size(empty_array)
END
I've just verified this program compiles and works with Cray Fortran. Both flang/trunk and the SiPearl fork accept this program without compile-time errors (and in general one cannot statically prove an array variable is non-empty).
However an empty input array (SYNC IMAGES([INTEGER ::])
) means "synchronize with no images" whereas SYNC IMAGES(*)
means the opposite "synchronize with all the images". For this reason PRIF makes the image_set optional and uses the absence of the argument to mean *
, a state which is syntactically and semantically distinct from a provided empty array.
- `image_num`: Shall be an integer that correspond to the image index of | ||
the lock variable to be locked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so in that case for completeness I think we need to introduce a dialect operation to perform that image translation.
The semantics are identical to prif_initial_team_index
, but of course we don't have to call it that.
- `lock_var` : Shall be a coarray of type LOCK_TYPE from the intrinsic | ||
module ISO_FORTRAN_ENV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely understand we are trying to match Fortran and not PRIF here, but in most cases the design of PRIF was determined by what's possible to express in Fortran.
My specific concern here is that the user code might not always match the simple case:
LOCK(coarray[image_num])
This is definitely a common/important case, but in general I believe Fortran also allows more complicated statements like:
LOCK(some_var%some_component(2)%coarray[image_num]%some_component(1,4,7)%lock_var(2))
In other words, the argument to LOCK
can be any valid coindexed expression that leads to a LOCK_TYPE
variable. The same is true for UNLOCK
, also EVENT POST
(with EVENT_TYPE
) and ATOMIC_*
(with appropriate atomic type).
You can even see an example of this in F23 sec C.7.9:
EVENT POST (worker[1]%free(THIS_IMAGE()))
Does fir_BoxType
preserve sufficient information about the full coindexed expression used to reach the appropriate variable? If not then we may eventually have to address the general case more explicitly in the dialect.
let arguments = (ins Optional<BoolLike>:$quiet, | ||
Optional<AnyIntegerType>:$stop_code_int, | ||
Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$stop_code_char, | ||
UnitAttr:$is_error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I didn't realize there was an LLVM precedent for conflating these two separate statements.
I just wanted to note that while STOP
and ERROR STOP
have very similar runtime behavior in single-image execution, they have very different runtime behavior in multi-image execution. So we may eventually want this difference to be more visible in the dialect (to control analysis of surrounding code), but I guess it's fine this way for now provided we document it.
## Problem When the new setting ``` set target.parallel-module-load true ``` was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices. The top of the stack in the crash look something like this: ``` #0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe) #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99) #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda) #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707) #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41) #6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e) llvm#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff) llvm#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc) ``` Our workaround was to set `set target.parallel-module-load ` to `false` to avoid the crash. ## Background PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection. ## Changes in this diff * This diff refactors the code to (hopefully) have clearer ownership of the connection, clearer separation of AdbClient and SyncService by making a new class for clearer separations of concerns, called AdbSyncService. * New unit tests are added * Additional logs were added (see llvm#145382 (comment) for details)
…namic (llvm#153420) Canonicalizing the following IR: ``` func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> { %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32> %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8> %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32> return %2 : tensor<?x17xf32> } ``` resulted in a crash ``` #0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db) #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1 #3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18 #4 0x0000565131b18c30 SignalHandler(int, siginfo_t*, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3 #5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050) #6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6 llvm#8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7 llvm#9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9 llvm#10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2) llvm#11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3 llvm#12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10 llvm#13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0 ``` from the folder for `tosa::mul` since the zero value was being reshaped to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all dimensions to be static. So in this case, the fix is to not to fold the op.
This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace: ``` #0 0x564ac90fb80f (/builds/worker/dist/bin/js+0x240e80f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #1 0x564ac9223a64 (/builds/worker/dist/bin/js+0x2536a64) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #2 0x564ac922316f (/builds/worker/dist/bin/js+0x253616f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #3 0x564ac9eac032 (/builds/worker/dist/bin/js+0x31bf032) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #4 0x0dec477ca22e (<unknown module>) ``` Without this change, the following symbolization is output: ``` #0 0x55a6d72f980f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3 #1 0x55a6d72f980f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5 #2 0x55a6d7421a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13 #3 0x55a6d742116f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12 #4 0x55a6d80aa032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10 #4 0x2c803bd8f22e (<unknown module>) ``` The last frame has a duplicate number. With this change the numbering is correct: ``` #0 0x5620c58ec80f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3 #1 0x5620c58ec80f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5 #2 0x5620c5a14a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13 #3 0x5620c5a1416f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12 #4 0x5620c669d032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10 #5 0x349f24c7022e (<unknown module>) ```
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
Hello,
This PR is a discussion around a dialect to define “coarrray” operations in Flang. This is a draft and a few operations have been proposed to start with and see if they might be suitable.