-
Notifications
You must be signed in to change notification settings - Fork 108
Tree shaking for modules #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add cycle statements after a line has been parse successfully to avoid reparsing it as a different kind of statement.
Sources files are only designated as FPM_UNIT_MODULE if they only contain modules. Non-program sources that contain subprograms not in modules are designated as FPM_UNIT_SUBPROGRAM.
If we can't detect the end of a module, then we can't assume that there aren't non-module subprograms present, hence unit type becomes FPM_UNIT_SUBPROGRAM
This is a very exciting update, thank you for working on it. I will test it this week. For the skip pruning option, I suggest a |
when there are no top-level executables to prune from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played with this locally and everything worked as expected. It's a huge UX upgrade, I love it.
What was especially nice was that to build all stdlib modules (in the case of --no-prune
or prior to this PR), you need to pass -fno-range-check
if using GFortran to build the hash module. Now, if you don't use the hash module, you don't need to build it and you don't need to pass a flag as a workaround to make it build.
I only skimmed through the code changes, they look fine. I don't have enough knowledge to propose more or better testing for this. So, I'm happy with this PR as is.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial preparation step for the tree-shaking is quite costly, for a project of mine it took ~1 min before starting the build (8 projects, 135 files, 37k LOC, tblite/tblite). Worse, the build eventually failed because in jacobwilliams/json-fortran a couple of modules were pruned.
Thanks @awvwgk.
On my 2014 low-end laptop it took 26 s, which nevertheless is quite long. Granted, this is a quite large project--wIth dependencies it's over 100K LOC. I believe all dependency source files need to be parsed before the build can begin (correct me if I'm wrong). I think this project is a good test-case to use to get pruning to work the way we want it. How do you suggest improving this? At the very least, we should print a message to the stdout that the pruning in progress may take a while, otherwise the user will think that fpm hung. A possibility is to develop a heuristic to automatically disable pruning if the size of the root project (e.g. measured in LOC) is large relative to individual dependencies. This seems quite involved to do well and could perhaps be a future GSoC project.
I can reproduce this with a minimal project that doesn't use json-fortran, but lists it as a dependency. With pruning enabled it attempts to build json-fortran (which it shouldn't), but then it either prunes out a module that's needed, or it messes up the build sequence. I can't tell yet which part of the introduced logic causes this behavior. I agree this should be resolved before moving forward. |
Could the long time be explained by the use of a recursive subroutine? Some profiling may explain where the majority of time is spent, and what is the bottleneck. |
Certainly not a large one, maybe medium-sized. Large projects in the Fortran world have several million lines of codes. Another good test project is John's general-purpose-fortran. Regarding the bottleneck, I recommend to do some profiling. I doubt the source scanner will be a time consuming step, but I spotted a couple of quadratic scaling search loops when looking through the patch. If it turns out that we cannot reduce the startup time for this feature, we have to defer the time consuming step to runtime and create the build graph dynamically. This strategy is quite successful in meson/CMake generated ninja files, which callback into their parent build system for dynamically determining dependencies, startup time is instantaneous and the dependency resolution never fails. |
Thanks all for testing and the feedback! I didn't test on a large project so that's been useful; I now realise that I missed a redundancy check from the recursive enumeration of modules, which is probably the cause of the poor scaling, so I'll add that in. |
Add redundancy check to recursive exploration of used modules to avoid redundant re-processing and consequent poor-scaling for large projects.
Make sure to always enumerate used modules from non-module sources because these can't be pruned. Adds unit test for this case.
Avoids misidentifying pure module sources as non-module (subprogram) sources due to presence of code outside of modules.
Adding the missing redundancy check to the recursive exploration has reduced the startup time for large projects. Pruning now adds about 5% to the startup time for tblite compared to the main branch on my machine. Please can you test again for yourselves since I didn't encounter startup times of the same order as you two for tblite (about 10secs on 2014 i7 prior to adding redundancy checks, and now about 2.3secs). The issues with json-fortran were twofold: first, I wasn't enumerating used modules in non-module sources (which can't be pruned) and; the json sources were being misidentified as non-module sources because of preprocessor lines. I've pushed fixes and tests for these as well. |
Hmm, with the recent updates, fpm doesn't seem to prune for me. To reproduce:
Add to fpm.toml:
then run If I run it with Then I compared the two and found that with My question is, if I don't use a dependency at all in my programs and modules, but it's listed in fpm.toml as a dependency, is fpm with pruning expected to build some of that dependency? I didn't think so, but maybe I'm wrong. |
Thanks for checking @milancurcic - I can confirm that this is expected behavior because dftd4 (a dependency of tblite) contains a non-module source file
So to answer your question directly: yes, if you don't use a dependency via a |
@LKedward Ah! I forgot about that. All clear. I will continue testing with a few packages of varying sizes. |
Thanks for the fix, I can depend on Another edge case for the pruning, this time a smaller project (~95k LOC in 205 files in 5 projects, Apply this patch on top of the current head: diff --git a/fpm.toml b/fpm.toml
index 249d6db..265eeff 100644
--- a/fpm.toml
+++ b/fpm.toml
@@ -24,3 +24,8 @@ dependencies.toml-f.git = "https://github.com/toml-f/toml-f"
[[test]]
name = "tester"
source-dir = "test/unit"
+
+[[test]]
+name = "api-tester"
+source-dir = "test/api"
+main = "api-test.c" Try to run a fresh build for the tests
Same holds for most of the projects I'm developing, they have an |
Thanks @awvwgk - yes this is another scenario I missed. I think it shouldn't be too difficult to detect Aside: It seems a shame for a package to effectively loose pruning support because it provides a C API; some possible solutions to this could be to provide the C API as a separate package or to have the C API as optional feature (#609). |
Splitting of bindings into separate projects means they will eventually become stale and out-of-sync, this is not an option for long-term maintainability. Having the possibility to mark part of the code as optional features would be a solution. For the other build systems in my projects (CMake and meson) the C API is indeed optional and usually disabled when the project is included as subproject and statically linked to avoid symbols leaking from shared libraries declared in the superproject. |
Sources containing module subroutines and functions with bind(C) are labelled as SUBPROGRAM to disable pruning.
Adds a parse_sequence helper utility to parse sequences of tokens separated by zero or more spaces
Startup times look okay now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing the corner cases. I think this looks good to merge now.
I haven't played with it further, but given @awvwgk's review I suggest to go ahead and merge. |
Thanks both for reviewing, I'll now merge. |
@LKedward , I found out, that char_repeat_read failed.
[ 5%]Compiling...
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/example_char_repeat_read.f90.o: in function `MAIN__':
h5fortran/example/char_repeat_read.f90:22: undefined reference to `__h5fortran_MOD_h5write_scalar'
/usr/bin/ld: h5fortran/example/char_repeat_read.f90:31: undefined reference to `__h5fortran_MOD_h5read_scalar'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x60): undefined reference to `__h5fortran_MOD_h5read_1d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x70): undefined reference to `__h5fortran_MOD_h5read_3d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x78): undefined reference to `__h5fortran_MOD_h5read_2d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x88): undefined reference to `__h5fortran_MOD_h5read_4d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x98): undefined reference to `__h5fortran_MOD_h5read_6d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xa0): undefined reference to `__h5fortran_MOD_h5read_scalar'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xa8): undefined reference to `__h5fortran_MOD_h5write_1d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xb0): undefined reference to `__h5fortran_MOD_h5write_3d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xb8): undefined reference to `__h5fortran_MOD_h5write_4d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xc0): undefined reference to `__h5fortran_MOD_h5write_2d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xc8): undefined reference to `__h5fortran_MOD_h5read_7d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xd0): undefined reference to `__h5fortran_MOD_h5read_5d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xe0): undefined reference to `__h5fortran_MOD_h5write_6d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0xf0): undefined reference to `__h5fortran_MOD_h5write_scalar'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x168): undefined reference to `__h5fortran_MOD_h5write_7d'
/usr/bin/ld: build/gfortran_ECCF645F41985A44/h5fortran/libh5fortran.a(src_interface.f90.o):(.data.rel.ro+0x170): undefined reference to `__h5fortran_MOD_h5write_5d'
collect2: error: ld returned 1 exit status
<ERROR> Compilation failed for object " char_repeat_read "
<ERROR>stopping due to failed compilation
STOP 1 Taking ENV: Ubuntu, fpm-0.6.0 . |
@zoziha thanks for checking. Can you open a new issue for visibility? |
Thanks for reporting @zoziha, I'll look into this |
Experimental changes to implement limited dependency tree-shaking/pruning just for modules (#635)
FPM_UNIT_MODULE
)end module
statement since this is used to detect if anything is outside a module. Modules without anend module
statement will always be built and linked.prune_build_targets
subroutine to perform tree-shaking/pruning of supported modulesFPM_UNIT_SUBPROGRAM
) are never prunedPlease test and give feedback; there are possibly more edge cases that I have not considered.
TODO:
--all
--no-prune
flag tofpm build
to skip pruning