You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink'
Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.
While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.
The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
<install>/lib/nvptx64-nvidia-cuda/libc.a
<install>/lib/nvptx64-nvidia-cuda/libc++.a
<install>/lib/nvptx64-nvidia-cuda/libomp.a
<install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.
In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
14
+
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
15
+
while still relying on NVIDIA's proprietary linker to produce the final output.
16
+
Features include, static archive (.a) linking, LTO, and accepting files ending
17
+
in ``.o`` without error.
18
+
19
+
Usage
20
+
=====
21
+
22
+
This tool can be used with the following options. Any arguments not intended
23
+
only for the linker wrapper will be forwarded to ``nvlink``.
24
+
25
+
.. code-block:: console
26
+
27
+
OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
28
+
This enables static linking and LTO handling for NVPTX targets.
29
+
30
+
USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>
31
+
32
+
OPTIONS:
33
+
--arch <value> Specify the 'sm_' name of the target architecture.
34
+
--cuda-path=<dir> Set the system CUDA path
35
+
--dry-run Print generated commands without running.
36
+
--feature <value> Specify the '+ptx' freature to use for LTO.
37
+
-g Specify that this was a debug compile.
38
+
-help-hidden Display all available options
39
+
-help Display available options (--help-hidden for more)
40
+
-L <dir> Add <dir> to the library search path
41
+
-l <libname> Search for library <libname>
42
+
-mllvm <arg> Arguments passed to LLVM, including Clang invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' for a list of options.
43
+
-o <path> Path to file to write output
44
+
--plugin-opt=jobs=<value>
45
+
Number of LTO codegen partitions
46
+
--plugin-opt=lto-partitions=<value>
47
+
Number of LTO codegen partitions
48
+
--plugin-opt=O<O0, O1, O2, or O3>
49
+
Optimization level for LTO
50
+
--plugin-opt=thinlto<value>
51
+
Enable the thin-lto backend
52
+
--plugin-opt=<value> Options passed to LLVM, not including the Clang invocation. Use '--plugin-opt=--help' for a list of options.
53
+
--save-temps Save intermediate results
54
+
--version Display the version number and exit
55
+
-v Print verbose information
56
+
57
+
Example
58
+
=======
59
+
60
+
This tool is intended to be invoked when targeting the NVPTX toolchain directly.
0 commit comments