Skip to content

reduction on complex numbers, added volatile copy and assignment #2453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 28, 2023

Conversation

liqiangxl
Copy link
Collaborator

copy constructor must be declared within class. So I need to directly patch aten/src/ATen/cuda/llvm_complex.cpp to allow the use of volatile in reduction.

Fixes #2369

@naoyam
Copy link
Collaborator

naoyam commented Feb 14, 2023

While not ideal, it should be possible to make the reductions work without modifying the LLVM header file. For serial reductions, they are dynamically generated, so we could tweak our codegen to emit different code for volatile complex types. For block/grid reductions, we are using device functions provided as part of the nvfuser "runtime". For those cases, we could provide a copy functor to the device functions so that the overload cases for volatile types aren't necessary.

It's certainly easier from the codegen side if we could just modify the header file as done in the PR. However, if we don't want to make a modification of the external code, there's an alternative option. Another option would be inserting the overload definitions programmatically into the strings returned from at::cuda::get_complex_body_string(), etc.

@liqiangxl
Copy link
Collaborator Author

aten/src/ATen/cuda/llvm_complex.cpp in Pytorch is copy-pasted (with modification) from the original llvm file. If we don't want to modify that file, can we just make a copy-past to our own runtime folder and combine everything we added to that file for the support of complex number?

@liqiangxl liqiangxl force-pushed the llu/complex_number_reduction branch 2 times, most recently from cc457bf to 3032f57 Compare February 23, 2023 13:58
@liqiangxl liqiangxl requested a review from zasdfgbnm February 23, 2023 16:50
Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you skim through our tests in third_party/nvfuser/test to find the tests that check reductions with different dtypes, and add complex to the dtype list? I do this by searching DataType::Double and read all matches manually to determine if a test is interesting to me. Here is a list of tests I found: FusionGroupAllreduce5_CUDA, FusionReductionSchedulerNoODimShmoo_CUDA, FusionReductionSchedulerDimShmoo_CUDA, FusionWelfordShmoo_CUDA

@liqiangxl liqiangxl force-pushed the llu/complex_number_reduction branch from 3032f57 to 59cbbb0 Compare February 24, 2023 18:54
@csarofeen
Copy link
Owner

Is this ready for another review?

@liqiangxl liqiangxl force-pushed the llu/complex_number_reduction branch from 59cbbb0 to e4a9055 Compare February 27, 2023 18:42
@@ -0,0 +1,172 @@
namespace std {
Copy link
Collaborator

@zasdfgbnm zasdfgbnm Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we protect this file with

#ifdef __NVCC__
#include <type_traits>
#else
....
#endif

Please try a PYTORCH_NVFUSER_DUMP="cuda_to_file" run on something, and use nvcc to compile the dumped file to check if it compiles.

Copy link
Collaborator Author

@liqiangxl liqiangxl Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! we need this protection. However, the generated code still can't compile with nvcc due to the lack of non-volatile to volatile copy of complex numbers in <complex>. error msg:

__tmp_kernel1.cu(3426): error: no operator "=" matches these operands
            operand types are: volatile std::complex<float> = const std::complex<float>
          detected during:
            instantiation of "void CudaCodeGen::TupleCopy<DstType, SrcType, num_vals>::copy(DstType &, CudaCodeGen::nvfuser_index_t, const SrcType &, CudaCodeGen::nvfuser_index_t) [with DstType=CudaCodeGen::PtrTupleBase<true, float, double, CudaCodeGen::int64_t, std::complex<float>>, SrcType=CudaCodeGen::LocalTuple<float, double, CudaCodeGen::int64_t, std::complex<float>>, num_vals=4]" 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that, if the kernel is complex, then we are still unable to compile with nvcc. But if it is not complex then it is working? If so, then it is fine for now. I believe @mmigdal-nv is using nvcc with matmul quite often, and we need to make sure this PR doesn't break non-complex matmul use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only influence reduction of complex numbers.


auto outFDI = add(
add(castOp(DataType::Double, tv3), tv7), castOp(DataType::Double, tv11));
auto out = add(outFDI, castOp(DataType::Double, tv15));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cast to complex double here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just pick the real part and cast to double. All results from other data types are casted to double, so does the reference results from torch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cast all results from other data types and reference results to complex double as well? Casting to double discard imag so we are not checking the correctness of imag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this test with all results converted to complex

Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding reduction support for complex numbers!

@liqiangxl liqiangxl merged commit 657080b into devel Feb 28, 2023
@zasdfgbnm zasdfgbnm deleted the llu/complex_number_reduction branch February 28, 2023 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvFuser doesn't support complex reductions
4 participants