-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[DWARFVerifier] Allow overlapping ranges for ICF-merged functions #117952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch modifies the DWARF verifier to handle a valid case where two or more functions have identical address ranges due to being merged by ICF (Identical Code Folding). Previously, the verifier would incorrectly report these as errors, but functions merged via ICF (such as when using LLD's --keep-icf-stabs option) can legitimately share the same address range. A new test case has been added to verify this behavior using YAML-based DWARF data that simulates two DW_TAG_subprogram entries with identical address ranges. The test ensures that the verifier correctly identifies this as a valid case and doesn't emit any errors, while still maintaining the existing verification for truly invalid overlapping ranges in other scenarios.
@llvm/pr-subscribers-debuginfo Author: None (alx32) ChangesThis patch modifies the DWARF verifier to handle a valid case where two or more functions have identical address ranges due to being merged by ICF (Identical Code Folding). Previously, the verifier would incorrectly report these as errors, but functions merged via ICF (such as when using LLD's --keep-icf-stabs option) can legitimately share the same address range. A new test case has been added to verify this behavior using YAML-based DWARF data that simulates two DW_TAG_subprogram entries with identical address ranges. The test ensures that the verifier correctly identifies this as a valid case and doesn't emit any errors, while still maintaining the existing verification for truly invalid overlapping ranges in other scenarios. Full diff: https://github.com/llvm/llvm-project/pull/117952.diff 2 Files Affected:
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 1fe3eb1e90fe65..58b965892ecaf3 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -624,12 +624,21 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
// Verify that children don't intersect.
const auto IntersectingChild = ParentRI.insert(RI);
if (IntersectingChild != ParentRI.Children.end()) {
- ++NumErrors;
- ErrorCategory.Report("DIEs have overlapping address ranges", [&]() {
- error() << "DIEs have overlapping address ranges:";
- dump(Die);
- dump(IntersectingChild->Die) << '\n';
- });
+ auto &IR = IntersectingChild->Ranges;
+ // Overlapping DW_TAG_subprogram can happen and are valid when multiple
+ // functions are merged via ICF. See --keep-icf-stabs in LLD.
+ bool isMergedFunc = (Die.getTag() == DW_TAG_subprogram) &&
+ (IR.size() == 1) && (IR == RI.Ranges);
+ if (!isMergedFunc) {
+ if (IntersectingChild->Ranges != ParentRI.Children.end()->Ranges) {
+ ++NumErrors;
+ ErrorCategory.Report("DIEs have overlapping address ranges", [&]() {
+ error() << "DIEs have overlapping address ranges:";
+ dump(Die);
+ dump(IntersectingChild->Die) << '\n';
+ });
+ }
+ }
}
// Verify that ranges are contained within their parent.
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_no_overlap_error_icf.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_no_overlap_error_icf.yaml
new file mode 100644
index 00000000000000..6f3e6de28169dc
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_no_overlap_error_icf.yaml
@@ -0,0 +1,531 @@
+# This test verifies that if two DW_TAG_subprogram's have a single identical
+# address range they are not flagged by dwarfdump as being invalid (having
+# overlapping address ranges).
+# This is a valid case that can happen when ICF merges multiple functions
+# together. See --keep-icf-stabs in LLD.
+#
+# The DWARF looks like:
+# 0x0000002e: DW_TAG_subprogram
+# DW_AT_low_pc (0x00000001000002f0)
+# DW_AT_high_pc (0x00000001000002fc)
+# DW_AT_name ("function1")
+# DW_AT_decl_file ("/tmp/out/my_code.cpp")
+# [...]
+#
+# 0x00000071: DW_TAG_subprogram
+# DW_AT_low_pc (0x00000001000002f0)
+# DW_AT_high_pc (0x00000001000002fc)
+# DW_AT_name ("function2")
+# DW_AT_decl_file ("/tmp/out/my_code.cpp")
+# [...]
+#
+
+# RUN: yaml2obj %s | llvm-dwarfdump --error-display=details --verify - | FileCheck %s
+# CHECK: No errors.
+
+
+## YAML below was generating using this shell script:
+# #!/bin/bash
+# set -ex
+#
+# TOOLCHAIN_DIR="Debug/bin"
+# SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
+# OUT_DIR="$SCRIPT_DIR/out"
+# mkdir -p $OUT_DIR
+#
+# cd $SCRIPT_DIR
+#
+# cat > "$OUT_DIR/my_code.cpp" << EOF
+# #define ATTRIB extern "C" __attribute__((noinline))
+#
+# ATTRIB int function1(int a) {
+# int b = a * 4;
+# int result = b + 4;
+# return result;
+# }
+#
+# ATTRIB int function2(int a) {
+# int b = a * 4;
+# int result = b + 4;
+# return result;
+# }
+#
+# int main() {
+# return function1(1) + function2(2);
+# }
+# EOF
+#
+# "$TOOLCHAIN_DIR/clang++" --target=arm64-apple-macos11 -c -g -O3 "$OUT_DIR/my_code.cpp" -o "$OUT_DIR/my_code.o" \
+# -gmlt -fno-unwind-tables -fno-exceptions
+# "$TOOLCHAIN_DIR/ld64.lld" -arch arm64 -platform_version macos 11.0.0 11.0.0 -o "$OUT_DIR/my_bin_icf.exe" \
+# "$OUT_DIR/my_code.o" -dead_strip --icf=all --keep-icf-stabs
+#
+# $TOOLCHAIN_DIR/dsymutil --flat "$OUT_DIR/my_bin_icf.exe" -o "$OUT_DIR/my_bin_icf.dSYM" --verify-dwarf=none
+# $TOOLCHAIN_DIR/obj2yaml "$OUT_DIR/my_bin_icf.dSYM" > $OUT_DIR/my_bin_icf.dSYM.yaml
+
+
+
+--- !mach-o
+FileHeader:
+ magic: 0xFEEDFACF
+ cputype: 0x100000C
+ cpusubtype: 0x0
+ filetype: 0xA
+ ncmds: 7
+ sizeofcmds: 1240
+ flags: 0x0
+ reserved: 0x0
+LoadCommands:
+ - cmd: LC_UUID
+ cmdsize: 24
+ uuid: 4C4C449C-5555-3144-A108-0F6123B3CB8A
+ - cmd: LC_BUILD_VERSION
+ cmdsize: 24
+ platform: 1
+ minos: 720896
+ sdk: 720896
+ ntools: 0
+ - cmd: LC_SYMTAB
+ cmdsize: 24
+ symoff: 4096
+ nsyms: 4
+ stroff: 4160
+ strsize: 50
+ - cmd: LC_SEGMENT_64
+ cmdsize: 72
+ segname: __PAGEZERO
+ vmaddr: 0
+ vmsize: 4294967296
+ fileoff: 0
+ filesize: 0
+ maxprot: 0
+ initprot: 0
+ nsects: 0
+ flags: 0
+ - cmd: LC_SEGMENT_64
+ cmdsize: 152
+ segname: __TEXT
+ vmaddr: 4294967296
+ vmsize: 16384
+ fileoff: 0
+ filesize: 0
+ maxprot: 5
+ initprot: 5
+ nsects: 1
+ flags: 0
+ Sections:
+ - sectname: __text
+ segname: __TEXT
+ addr: 0x1000002A0
+ size: 60
+ offset: 0x0
+ align: 2
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x80000400
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: CFFAEDFE0C000001000000000A00000007000000D804000000000000000000001B000000180000004C4C449C55553144A1080F6123B3CB8A32000000
+ - cmd: LC_SEGMENT_64
+ cmdsize: 72
+ segname: __LINKEDIT
+ vmaddr: 4294983680
+ vmsize: 4096
+ fileoff: 4096
+ filesize: 114
+ maxprot: 1
+ initprot: 1
+ nsects: 0
+ flags: 0
+ - cmd: LC_SEGMENT_64
+ cmdsize: 872
+ segname: __DWARF
+ vmaddr: 4294987776
+ vmsize: 4096
+ fileoff: 8192
+ filesize: 871
+ maxprot: 7
+ initprot: 3
+ nsects: 10
+ flags: 0
+ Sections:
+ - sectname: __debug_line
+ segname: __DWARF
+ addr: 0x100005000
+ size: 123
+ offset: 0x2000
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __debug_aranges
+ segname: __DWARF
+ addr: 0x10000507B
+ size: 48
+ offset: 0x207B
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __debug_info
+ segname: __DWARF
+ addr: 0x1000050AB
+ size: 125
+ offset: 0x20AB
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __debug_frame
+ segname: __DWARF
+ addr: 0x100005128
+ size: 112
+ offset: 0x2128
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 14000000FFFFFFFF0400080001781E0C1F000000000000001400000000000000A0020000010000000C000000000000001400000000000000A0020000010000000C000000000000002400000000000000AC0200000100000030000000000000004C0C1D109E019D029303940400000000
+ - sectname: __debug_abbrev
+ segname: __DWARF
+ addr: 0x100005198
+ size: 64
+ offset: 0x2198
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __debug_str
+ segname: __DWARF
+ addr: 0x1000051D8
+ size: 163
+ offset: 0x21D8
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __apple_namespac
+ segname: __DWARF
+ addr: 0x10000527B
+ size: 36
+ offset: 0x227B
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+ - sectname: __apple_names
+ segname: __DWARF
+ addr: 0x10000529F
+ size: 116
+ offset: 0x229F
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000003000000030000000C000000000000000100000001000600FFFFFFFF0000000002000000DC41AB586A7F9A7CDD41AB584400000054000000640000008A000000010000002E000000000000009E00000001000000500000000000000094000000010000003F00000000000000
+ - sectname: __apple_types
+ segname: __DWARF
+ addr: 0x100005313
+ size: 48
+ offset: 0x2313
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 48534148010000000100000000000000180000000000000004000000010006000300050005000B0006000600FFFFFFFF
+ - sectname: __apple_objc
+ segname: __DWARF
+ addr: 0x100005343
+ size: 36
+ offset: 0x2343
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x0
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+LinkEditData:
+ NameList:
+ - n_strx: 2
+ n_type: 0xF
+ n_sect: 1
+ n_desc: 0
+ n_value: 4294967980
+ - n_strx: 8
+ n_type: 0xF
+ n_sect: 1
+ n_desc: 0
+ n_value: 4294967968
+ - n_strx: 19
+ n_type: 0xF
+ n_sect: 1
+ n_desc: 0
+ n_value: 4294967968
+ - n_strx: 30
+ n_type: 0xF
+ n_sect: 1
+ n_desc: 16
+ n_value: 4294967296
+ StringTable:
+ - ''
+ - ''
+ - _main
+ - _function1
+ - _function2
+ - __mh_execute_header
+DWARF:
+ debug_str:
+ - ''
+ - 'clang version 20.0.0git (https://github.com/alx32/llvm-project.git 92a15dd7482ff4e1fae7a07f888564e5b1d53eee)'
+ - '/tmp/out/my_code.cpp'
+ - '/'
+ - '/tmp'
+ - function1
+ - function2
+ - main
+ debug_abbrev:
+ - ID: 0
+ Table:
+ - Code: 0x1
+ Tag: DW_TAG_compile_unit
+ Children: DW_CHILDREN_yes
+ Attributes:
+ - Attribute: DW_AT_producer
+ Form: DW_FORM_strp
+ - Attribute: DW_AT_language
+ Form: DW_FORM_data2
+ - Attribute: DW_AT_name
+ Form: DW_FORM_strp
+ - Attribute: DW_AT_LLVM_sysroot
+ Form: DW_FORM_strp
+ - Attribute: DW_AT_stmt_list
+ Form: DW_FORM_sec_offset
+ - Attribute: DW_AT_comp_dir
+ Form: DW_FORM_strp
+ - Attribute: DW_AT_APPLE_optimized
+ Form: DW_FORM_flag_present
+ - Attribute: DW_AT_low_pc
+ Form: DW_FORM_addr
+ - Attribute: DW_AT_high_pc
+ Form: DW_FORM_data4
+ - Code: 0x2
+ Tag: DW_TAG_subprogram
+ Children: DW_CHILDREN_no
+ Attributes:
+ - Attribute: DW_AT_low_pc
+ Form: DW_FORM_addr
+ - Attribute: DW_AT_high_pc
+ Form: DW_FORM_data4
+ - Attribute: DW_AT_APPLE_omit_frame_ptr
+ Form: DW_FORM_flag_present
+ - Attribute: DW_AT_call_all_calls
+ Form: DW_FORM_flag_present
+ - Attribute: DW_AT_name
+ Form: DW_FORM_strp
+ - Code: 0x3
+ Tag: DW_TAG_subprogram
+ Children: DW_CHILDREN_yes
+ Attributes:
+ - Attribute: DW_AT_low_pc
+ Form: DW_FORM_addr
+ - Attribute: DW_AT_high_pc
+ Form: DW_FORM_data4
+ - Attribute: DW_AT_call_all_calls
+ Form: DW_FORM_flag_present
+ - Attribute: DW_AT_name
+ Form: DW_FORM_strp
+ - Code: 0x4
+ Tag: DW_TAG_call_site
+ Children: DW_CHILDREN_no
+ Attributes:
+ - Attribute: DW_AT_call_origin
+ Form: DW_FORM_ref4
+ - Attribute: DW_AT_call_return_pc
+ Form: DW_FORM_addr
+ debug_aranges:
+ - Length: 0x2C
+ Version: 2
+ CuOffset: 0x0
+ AddressSize: 0x8
+ Descriptors:
+ - Address: 0x1000002A0
+ Length: 0x3C
+ debug_info:
+ - Length: 0x79
+ Version: 4
+ AbbrevTableID: 0
+ AbbrOffset: 0x0
+ AddrSize: 8
+ Entries:
+ - AbbrCode: 0x1
+ Values:
+ - Value: 0x1
+ - Value: 0x21
+ - Value: 0x6E
+ - Value: 0x83
+ - Value: 0x0
+ - Value: 0x85
+ - Value: 0x1
+ - Value: 0x1000002A0
+ - Value: 0x3C
+ - AbbrCode: 0x2
+ Values:
+ - Value: 0x1000002A0
+ - Value: 0xC
+ - Value: 0x1
+ - Value: 0x1
+ - Value: 0x8A
+ - AbbrCode: 0x2
+ Values:
+ - Value: 0x1000002A0
+ - Value: 0xC
+ - Value: 0x1
+ - Value: 0x1
+ - Value: 0x94
+ - AbbrCode: 0x3
+ Values:
+ - Value: 0x1000002AC
+ - Value: 0x30
+ - Value: 0x1
+ - Value: 0x9E
+ - AbbrCode: 0x4
+ Values:
+ - Value: 0x2E
+ - Value: 0x1000002C0
+ - AbbrCode: 0x4
+ Values:
+ - Value: 0x3F
+ - Value: 0x1000002CC
+ - AbbrCode: 0x0
+ - AbbrCode: 0x0
+ debug_line:
+ - Length: 119
+ Version: 4
+ PrologueLength: 39
+ MinInstLength: 1
+ MaxOpsPerInst: 1
+ DefaultIsStmt: 1
+ LineBase: 251
+ LineRange: 14
+ OpcodeBase: 13
+ StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
+ IncludeDirs:
+ - out
+ Files:
+ - Name: my_code.cpp
+ DirIdx: 1
+ ModTime: 0
+ Length: 0
+ Opcodes:
+ - Opcode: DW_LNS_extended_op
+ ExtLen: 9
+ SubOpcode: DW_LNE_set_address
+ Data: 4294967968
+ - Opcode: DW_LNS_set_column
+ Data: 15
+ - Opcode: DW_LNS_set_prologue_end
+ Data: 0
+ - Opcode: DW_LNS_advance_line
+ SData: 9
+ Data: 0
+ - Opcode: DW_LNS_copy
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 20
+ - Opcode: 0x4B
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 5
+ - Opcode: 0x4B
+ Data: 0
+ - Opcode: DW_LNS_advance_pc
+ Data: 4
+ - Opcode: DW_LNS_extended_op
+ ExtLen: 1
+ SubOpcode: DW_LNE_end_sequence
+ Data: 0
+ - Opcode: DW_LNS_extended_op
+ ExtLen: 9
+ SubOpcode: DW_LNE_set_address
+ Data: 4294967968
+ - Opcode: DW_LNS_set_column
+ Data: 15
+ - Opcode: DW_LNS_set_prologue_end
+ Data: 0
+ - Opcode: 0x15
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 20
+ - Opcode: 0x4B
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 5
+ - Opcode: 0x4B
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 0
+ - Opcode: DW_LNS_advance_line
+ SData: 9
+ Data: 0
+ - Opcode: 0x4A
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 12
+ - Opcode: DW_LNS_set_prologue_end
+ Data: 0
+ - Opcode: 0xBB
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 27
+ - Opcode: DW_LNS_negate_stmt
+ Data: 0
+ - Opcode: 0xBA
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 25
+ - Opcode: 0x82
+ Data: 0
+ - Opcode: DW_LNS_set_column
+ Data: 5
+ - Opcode: DW_LNS_set_epilogue_begin
+ Data: 0
+ - Opcode: 0x4A
+ Data: 0
+ - Opcode: DW_LNS_advance_pc
+ Data: 12
+ - Opcode: DW_LNS_extended_op
+ ExtLen: 1
+ SubOpcode: DW_LNE_end_sequence
+ Data: 0
+...
|
What about subranges that overlap? If someone was using something like Propeller where a function ends up split into multiple sections, then those sections are ICF'd? (I'm not 100% sure that's possible, but it seems quite plausible to me... ) |
You are correct that theoretically it's possible to have overlapping address ranges merged by something like ICF. But I think the default behavior for tooling has been to not have overlapping address ranges in dSYM's. We changed this with adding I could add the behavior to check for overlapping subranges, but I wouldn't be able to generate a test case for it as I don't think it's currently possible to generate overlapping subranges (we would need something like Any advice on how to generate a valid scenario for the overlapping subranges case or ideas how to proceed ? |
Not sure I follow the discussion about dsyms - this is a generic feature that impacts/appears in MachO as much as it does on ELF, etc. Propeller-like debug info can be generated with There's some differences in linker behavior - lld ICF tombstones all other copies of a function/block, resolving only one copy of the references. Whereas the So here's an example that shows functions with some parts that overlap and some that don't:
|
@dwblaikie - thanks for the detailed example. I can confirm that with the example provided by you we generate the same verification error. I think we should also consider perfectly overlapping subranges as not emitting a verification warning and add a test for this scenario. Does this sound OK or did you have something else in mind ? |
Sounds good to me, thanks! |
I ran into this myself a while ago, and I remember looking at the DWARF spec and noticing that there's no exception for function merging. I think I came to the same conclusion when I implemented this check originally. I agree that it makes sense to allow this , but (unless I'm wrong and don't remember or didn't look hard enough) we should raise an issue with the DWARF committee (which @dwblaikie, @pogo59 and I are part of) to allow this in the standard. |
I'll send a proposal to dwarf-discuss mailing list. |
In the current DWARF 5 spec I can't find anything saying that overlapping address ranges are not allowed between The only thing I can find is on page 53, line 15 where it says:
I read this as saying that if a Am I missing something - or should the proposal be a request to clarify that identical overlapping address ranges are allowed between |
If you have overlapping subranges between subprograms, then you'll probably also end up with overlapping subranges within a single CU range list (if two functions in a single CU got ICF'd), and across CUs (if two functions across CUs got ICF'd). It seems possible for identical overlapping subranges within a subprogram too (if two basic blocks in a single function are identical/get merged - chances are an optimizer would deduplicate those, but not guaranteed/no need to include that restriction I think - probably could come up at -O0 with basic block sections (bit of a weird feature combo, but possible)) |
Just a small comment from the peanut gallery: I recently ran into some (most likely, obsolete) code in lldb which tries to handle the case where the ranges of a DW_TAG_lexical_block are not included in the ranges of its parent (block, subprogram, etc.). I was looking at the DWARF spec to see if it allows anything like this, but I couldn't find anything to support either interpretation. If you're going to be looking at strengthening the language around range relationships between various DIEs, it might be a good idea to include parent-child relationships as well. :) |
The parent-child aspect would be orthogonal to the OP's complaint, which is about a non-parent-child overlap, so it would want to be a separate proposal. It makes sense to me that the parent's ranges should be a superset of the child ranges, fwiw. |
Indeed the guidelines recommend keeping proposals focused on a single matter:
So looks like the parent-child aspect should be separate. Here is what I am thinking of proposing, feedback is appreciated:
Clarification of Address Range Overlap Rules for Functions and Basic BlocksBackgroundModern compiler optimizations and linker features can result in functions or basic blocks sharing the same address ranges in the final binary. Two common cases are:
The current DWARF 5 specification's statement in Section 2.17.3 (p. 53, line 15) that "Bounded range entries in a range list may not overlap" is overly restrictive and doesn't reflect valid cases where identical code sequences are merged. The specification needs to differentiate between valid identical overlaps and invalid partial overlaps, both within range lists and between different OverviewThis proposal seeks to:
Proposed ChangesIn Section 2.17 (Location Descriptions), replace the text on p. 53, line 15: From:
To:
DependenciesNone References
|
I have sent the proposal to the dwarf-discuss mailing list as indicated by docs. Can we proceed with this change as of now or should we wait for the outcome of the proposal ? |
One side question - this relates to both this change, and the change you folks made in the last few months related to the ICF situation with the line table? (where you wanted to make sure the line table/subprogram were used correctly together, yeah? That came from ICF and the resulting ambiguity between the .debug_info subprogram description and tnhe line table that applies - with the intent to ensure that a matched pair of line table + subprogram DIE tree were used for symbolizing, yeah?) But lld doesn't produce these overlapping ranges, when it ICFs, it resolves only one duplicate to point to the final code, and the rest get zero'd/tombstoned, yeah? So what situation are you in that you are getting these duplicates? Are you using gold (why that rather than lld?)? Some other linker? Is Bolt doing something esoteric? same thing as gold? Might be worth some design discussion about that in general. But, yes, in the presence of gold's behavior, despite it not really being maintained so much anymore (as far as I understand it), we should probably support it... |
This is what happens by default, yes - but the recently added
We're only being blocked by |
Ah, so you're specifically seeking out this behavior, and looking to support it more.
Fair enough. |
@@ -622,7 +627,8 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die, | |||
} | |||
|
|||
// Verify that children don't intersect. | |||
const auto IntersectingChild = ParentRI.insert(RI); | |||
bool AllowDuplicates = Die.getTag() == DW_TAG_subprogram; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why restrict this to subprograms having duplicates? Presumably anything could have duplicates - if we can have subprograms with duplicates (is this duplicates within a subprogram, or between subprograms, or both?) then we can probably have lexical scopes with duplicates too? (& certainly CUs with duplicates - both within a CU and across multiple CUs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wanting to restrict this change to scenarios that I am familiar with / I'm trying to address. I am not sure about the implications for allowing duplicates for anything else. But if is clear that the right choice is to allow full overlaps for all tags - then I can make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a few questions:
Is this only allowing duplicates on DW_AT_ranges of DW_AT_subprograms? Or only allowing overlap /of/ subprograms /on/ CUs? I guess the latter - so that, sufficiently generalizes, handles the case of subprograms with ranges (basic block sections) that overlap with /other/ subprograms
But presumably that /doesn't/ cover the case where a subprogram's sections might overlap with itself, due to BB sections? (or perhaps it does, if there's no "self" special case - if you're allowing a subprogram to overlap with other subprograms, including itself)
But shouldn't this apply to CUs too? a CU ranges could overlap with another CU due to a subprogram in one CU overlapping with a subprogram in a different CU? (is that tested? does that somehow come out of the subprogram-only-allowed case here?)
But, equally, this should apply to lexical scopes too - should be able to make a scope in a function compiled with basic-block sections have self-duplicate ranges, or collide/duplicate with athore lexical scope's ranges in the same subprogram (by putting the same code in different places in a function, and wrapping it in lexical scopes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only allowing duplicates on DW_AT_ranges of DW_AT_subprograms
So this is only relating to DW_AT_subprogram
's - allowing them to overlap like this:
- Allows
DW_AT_ranges
's to overlap in between differentDW_AT_subprogram
's - Allows identical
DW_AT_low_pc
/DW_AT_high_pc
in betweenDW_AT_subprogram
's
doesn't cover the case where a subprogram's sections might overlap with itself
- No, if multiple
DW_AT_ranges
's in aDW_AT_subprogram
overlap - this will still raise an error. Should we cover this case also ?
But shouldn't this apply to CUs too?
llvm-dwarfdump --error-display=details --verify
does currently not raise any errors for overlapping CU's.
But, equally, this should apply to lexical scopes too
- Same as above,
llvm-dwarfdump --error-display=details --verify
does currently not raise any errors for overlapping lexical blocks.
I tested using this script which generates DWARF dumped here.
The output with this patch shows no errors.
The output without this patch shows the errors for overlapping DW_AT_subprogram
's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, looks like we don't even try to apply any verification constraints to ranges between CUs shrug they should probably have the same constraints, though - that exact overlap of any range is acceptable, but partial overlap is not. But ah well, I'm not pushing for improving that.
Let's look at lexical scope overlap, then...
I think this example demonstrates almost all the overlapping cases (well, doesn't include distinct subprograms overlapping with each other, but otherwise I think it covers things):
int f1(bool a, bool b) {
{
int i = 3;
{
int j = 7;
if (a)
return 5;
}
{
int j = 8;
if (b)
return 5;
}
}
return 3;
}
int main() {
}
clang++-tot test.cpp -g -ffunction-sections -fbasic-block-sections=all -fuse-ld=gold -Wl,--icf=all
error: DIE has overlapping ranges in DW_AT_ranges attribute: [0x00000000000006a2, 0x00000000000006ae) and [0x00000000000006a2, 0x00000000000006ae)
0x0000000c: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_strx1] (indexed (00000000) string = "clang version 20.0.0git ([email protected]:llvm/llvm-project.git 8dd9f206b518a97132f3e2489ccc93704e638353)")
DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus_14)
DW_AT_name [DW_FORM_strx1] (indexed (00000001) string = "test.cpp")
DW_AT_str_offsets_base [DW_FORM_sec_offset] (0x00000008)
DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000)
DW_AT_comp_dir [DW_FORM_strx1] (indexed (00000002) string = "/usr/local/google/home/blaikie/dev/scratch")
DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x4) rangelist = 0x00000052
[0x0000000000000670, 0x00000000000006a2)
[0x00000000000006a2, 0x00000000000006ae)
[0x00000000000006ae, 0x00000000000006c4)
[0x00000000000006a2, 0x00000000000006ae)
[0x00000000000006c4, 0x00000000000006d0)
[0x00000000000006d0, 0x00000000000006d5)
[0x00000000000006e0, 0x00000000000006e8))
DW_AT_addr_base [DW_FORM_sec_offset] (0x00000008)
DW_AT_rnglists_base [DW_FORM_sec_offset] (0x0000000c)
error: DIE has overlapping ranges in DW_AT_ranges attribute: [0x00000000000006a2, 0x00000000000006ae) and [0x00000000000006a2, 0x00000000000006ae)
0x0000002b: DW_TAG_subprogram [2] * (0x0000000c)
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000020
[0x0000000000000670, 0x00000000000006a2)
[0x00000000000006a2, 0x00000000000006ae)
[0x00000000000006ae, 0x00000000000006c4)
[0x00000000000006a2, 0x00000000000006ae)
[0x00000000000006c4, 0x00000000000006d0)
[0x00000000000006d0, 0x00000000000006d5))
DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6)
DW_AT_linkage_name [DW_FORM_strx1] (indexed (00000003) string = "_Z2f1bb")
DW_AT_name [DW_FORM_strx1] (indexed (00000004) string = "f1")
DW_AT_decl_file [DW_FORM_data1] ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
DW_AT_decl_line [DW_FORM_data1] (1)
DW_AT_type [DW_FORM_ref4] (cu + 0x0087 => {0x00000087} "int")
DW_AT_external [DW_FORM_flag_present] (true)
error: DIE has overlapping ranges in DW_AT_ranges attribute: [0x00000000000006a2, 0x00000000000006ae) and [0x00000000000006a2, 0x00000000000006ae)
0x0000004d: DW_TAG_lexical_block [4] * (0x0000002b)
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x1) rangelist = 0x00000033
[0x0000000000000685, 0x00000000000006a2)
[0x00000000000006a2, 0x00000000000006ae)
[0x00000000000006ae, 0x00000000000006c4)
[0x00000000000006a2, 0x00000000000006ae))
error: DIEs have overlapping address ranges:
0x00000068: DW_TAG_lexical_block [4] * (0x0000004d)
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x3) rangelist = 0x0000004b
[0x00000000000006ae, 0x00000000000006c4)
[0x00000000000006a2, 0x00000000000006ae))
0x0000005a: DW_TAG_lexical_block [4] * (0x0000004d)
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x2) rangelist = 0x00000042
[0x000000000000068c, 0x00000000000006a2)
[0x00000000000006a2, 0x00000000000006ae))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example - the latest version addresses this.
@@ -73,7 +73,8 @@ class DWARFVerifier { | |||
/// This is used for finding overlapping ranges in the DW_AT_ranges | |||
/// attribute of a DIE. It is also used as a set of address ranges that | |||
/// children address ranges must all be contained in. | |||
std::optional<DWARFAddressRange> insert(const DWARFAddressRange &R); | |||
std::optional<DWARFAddressRange> insert(const DWARFAddressRange &R, | |||
bool AllowDuplicates = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any caller passing false, or using the default false value here? It looks like there's only two callers and they're both now passing true, so we should just change the behavior categorically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - that simplifies the change quite a bit.
This seems to break tests on macOS: http://45.33.8.238/macm1/97720/step_10.txt (looks like this uses Please take a look, and revert for now if it takes a while to fix. |
Will look into and follow up with next steps in ~2 hours. |
We are seeing the test failure on 'LLVM :: tools/llvm-dwarfdump/X86/verify_no_overlap_error_icf.yaml' in our mac-x64 builder as well. Error message:
Linked to failed build: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-host-mac-x64/b8728276402386409841/overview |
Another broken mac build bot: https://lab.llvm.org/buildbot/#/builders/23/builds/5911 |
OK, will disable the test for now. |
Disabling via: #120322 |
Actual fix: #120330 |
…vm#117952) This patch modifies the DWARF verifier to handle a valid case where two or more functions have identical address ranges due to being merged by ICF (Identical Code Folding). Previously, the verifier would incorrectly report these as errors, but functions merged via ICF (such as when using LLD's --keep-icf-stabs option) can legitimately share the same address range. A new test case has been added to verify this behavior using YAML-based DWARF data that simulates two DW_TAG_subprogram entries with identical address ranges. The test ensures that the verifier correctly identifies this as a valid case and doesn't emit any errors, while still maintaining the existing verification for truly invalid overlapping ranges in other scenarios. Before this change, the newly added test case would have failed, with `llvm-dwarfdump` marking the overlapping address ranges in the DWARF as an error. We also modify the existing tests `llvm-dwarfutil/ELF/X86/verify.test` and `llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml` which rely on the existence of the error that we're trying to suppress. We slightly change one offset so that the ranges don't perfectly overlap and an error is still generated. (cherry picked from commit ad32576)
…32576cffc8-bb4007e56274-1808255a44e6 [🍒 stable/20240723] [DWARFVerifier] Allow overlapping ranges for ICF-merged functions (llvm#117952)
This patch modifies the DWARF verifier to handle a valid case where two or more functions have identical address ranges due to being merged by ICF (Identical Code Folding). Previously, the verifier would incorrectly report these as errors, but functions merged via ICF (such as when using LLD's --keep-icf-stabs option) can legitimately share the same address range.
A new test case has been added to verify this behavior using YAML-based DWARF data that simulates two DW_TAG_subprogram entries with identical address ranges. The test ensures that the verifier correctly identifies this as a valid case and doesn't emit any errors, while still maintaining the existing verification for truly invalid overlapping ranges in other scenarios. Before this change, the newly added test case would have failed, with
llvm-dwarfdump
marking the overlapping address ranges in the DWARF as an error.We also modify the existing test
llvm-dwarfutil/ELF/X86/verify.test
which relies on the existence of the error that we're trying to suppress. Here we slightly change one offset so that the ranges don't perfectly overlap and an error is still generated.