Merge branch 'master' into '6.0/stage' #18

github-actions · 2020-08-05T03:28:06Z

No description provided.

Every few builds or so, a vmtest VM crashes after printing "x86: Booting SMP configuration:". After some difficult debugging, I determined that the crash happens in arch/x86/realmode/rm/trampoline_64.S (the code that initializes secondary CPUs) at the ljmp from startup_32 to startup_64. The real problem happens earlier in startup_32: movl $pa_trampoline_pgd, %eax movl %eax, %cr3 Sometimes, the store to CR3 "fails" and CR3 remains zero, which causes the later ljmp to triple fault. This can be reproduced by the following script: #!/bin/sh curl -L 'https://www.dropbox.com/sh/2mcf2xvg319qdaw/AABFKsISWRpndNZ1gz60O-qSa/x86_64/vmlinuz-5.8.0-rc7-vmtest1?dl=1' -o vmlinuz cat > commands.gdb << "EOF" set confirm off target remote :1234 # arch/x86/realmode/rm/trampoline_64.S:startup_32 after CR3 store. hbreak *0x9ae09 if $cr3 == 0 command info registers eax cr3 quit 1 end # kernel/smp.c:smp_init() after all CPUs have been brought up. If we get here, # the bug wasn't triggered. hbreak *0xffffffff81ed4484 command kill quit 0 end continue EOF while true; do qemu-system-x86_64 -cpu host -enable-kvm -smp 64 -m 128M \ -nodefaults -display none -serial file:/dev/stdout -no-reboot \ -kernel vmlinuz -append 'console=0,115200 panic=-1 nokaslr' \ -s -S & gdb -batch -x commands.gdb || exit 1 done This seems to be a problem with nested virtualization that was fixed by Linux kernel commit b4d185175bc1 ("KVM: VMX: give unrestricted guest full control of CR3") (in v4.17). Apparently, the Google Cloud hosts that Travis runs on are missing this fix. We obviously can't patch those hosts, but we can work around it. Disabling unrestricted guest support in the Travis VM causes CR3 stores in the nested vmtest VM to be emulated, bypassing the bug. Signed-off-by: Omar Sandoval <[email protected]>

Signed-off-by: Omar Sandoval <[email protected]>

The __init_subclass__ and __class_getitem__ methods are always class methods even if not decorated as such, so format them accordingly. Signed-off-by: Omar Sandoval <[email protected]>

Lots if interfaces in drgn transparently turn an integer Object into an int by using __index__(), so add an IntegerLike protocol for this and use it everywhere applicable. Signed-off-by: Omar Sandoval <[email protected]>

Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes] to get rid of some mypy --strict errors. Signed-off-by: Omar Sandoval <[email protected]>

Signed-off-by: Omar Sandoval <[email protected]>

This will be used to support relative imports. Signed-off-by: Omar Sandoval <[email protected]>

Mainly for completeness, as I don't really like using them in my own projects. Signed-off-by: Omar Sandoval <[email protected]>

The helpers implemented in C have Python wrappers only for the purpose of documentation. This is because drgndoc ignores all imports when recursively documenting attributes. However, mypy uses the convention that aliased imports (i.e., import ... as ... or from ... import ... as ...) are considered re-exported, so we can follow that convention and include aliased imports. (mypy also considered attributes in __all__ as re-exported, so we should probably follow that in the future, too, but for now aliased imports are enough). This lets us get rid of the Python wrappers. Signed-off-by: Omar Sandoval <[email protected]>

We can get rid of the :include: and :exclude: options by deciding solely based on whether a node has a docstring. Empty docstrings can be used to indicate nodes that should be included with no additional content. The __init__() method must now also have a docstring in order to be documented. Additionally, the directives are now fully formatted by the Formatter rather than being split between the Formatter and DrgnDocDirective. Signed-off-by: Omar Sandoval <[email protected]>

One of the blockers for adding type annotations to helpers is that some helpers need to be overloaded, but drgndoc doesn't support that. This adds support. Each function now tracks all of its overloaded signature, each of which may be documented separately. The formatted output (for functions/methods and classes with __init__()) combines all of the documented overloads. Signed-off-by: Omar Sandoval <[email protected]>

Now that drgndoc can handle overloads and we have the IntegerLike and Path aliases, we can add type annotations to all helpers. There are also a couple of functional changes that snuck in here to make annotating easier. Signed-off-by: Omar Sandoval <[email protected]>

The remaining warnings are all no-any-return, which is hard to avoid in drgn. Signed-off-by: Omar Sandoval <[email protected]>

Signed-off-by: Omar Sandoval <[email protected]>

As of QEMU commit a5804fcf7b22 ("9pfs: local: ignore O_NOATIME if we don't have permissions") (in v5.1.0), QEMU handles O_NOATIME sanely, so we don't need the LD_PRELOAD hack. Since we're adding a version check, make the multidevs check based on the version, too. Signed-off-by: Omar Sandoval <[email protected]>

This picks up a newer version of QEMU and lets us use udevadm trigger -w. Let's also explicitly add "os: linux" to silence the config validation. Signed-off-by: Omar Sandoval <[email protected]>

drgn_object_init() is available in drgh.h file and seems to a required call before calling drgn_program_find_object(). Without this, trying to call drgn_object_init() from an external C application results in undefined reference. Signed-off-by: Aditya Sarwade <[email protected]>

Signed-off-by: Omar Sandoval <[email protected]>

It's still useful to have an escape hatch for names we don't want documented. Signed-off-by: Omar Sandoval <[email protected]>

E.g., drgndoc:: foo.bar() should emit py:method:: foo.bar() regardless of a previous py:module directive. Signed-off-by: Omar Sandoval <[email protected]>

Program::objects is used to store references to objects that must stay alive while the Program is alive. It is currently a PyDict where the keys are the object addresses as PyLong and the values are the objects themselves. This has two problems: 1. Allocating the key as a full object is obviously wasteful. 2. PyDict doesn't have an API for reserving capacity ahead of time, which we want for an upcoming change. Both of these are easily fixed by using our own hash table. Signed-off-by: Omar Sandoval <[email protected]>

…_bit}() Most places that call these check has_platform and return an error, and those that don't can live with the extra check. Signed-off-by: Omar Sandoval <[email protected]>

struct drgn_program has a bunch of state scattered around. Group it together more logically, even if it means sacrificing some padding here and there. Signed-off-by: Omar Sandoval <[email protected]>

This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <[email protected]>

…eader If we fail to read an include directory in read_file_name_table(), we need to free the directory hashes. Signed-off-by: Omar Sandoval <[email protected]>

This is more clear: although these flags happen to be encoded with the DWARF tag, they are flags regarding the DIE. Signed-off-by: Omar Sandoval <[email protected]>

The CU unit length and DIE offset are both limited by the size of the mapped debugging information, i.e., size_t. Signed-off-by: Omar Sandoval <[email protected]>

As a small simplification, we can take commit 9bb2cce ("Enable DWARF indexing to work with partial units") further and not look at the tag of the top-level DIE at all. Signed-off-by: Omar Sandoval <[email protected]>

We currently assume that if DW_AT_declaration is present, it is true. This seems to be true in practice, and I see no reason to ever use DW_FORM_flag with a value of zero. There's no performance hit to handle it, though, so we might as well. Signed-off-by: Omar Sandoval <[email protected]>

In read_cus(), the master thread can use the final CUs vector directly and the rest of the threads can merge their private vectors in. This consistently shaves a few milliseconds off of startup. Signed-off-by: Omar Sandoval <[email protected]>

I originally copied the sections into each compilation unit to avoid a pointer indirection, but performance-wise it's a wash, so we might as well save the memory. This will be more important when we keep the CUs after indexing. Signed-off-by: Omar Sandoval <[email protected]>

It's very unlikely that we'll ever index more than 4 billion DIEs in a single shard, so we can shrink the index a bit by using uint32_t indices (and uint8_t tag). Signed-off-by: Omar Sandoval <[email protected]>

This is preparation for the next change where we'll need to do two passes over the CUs. Signed-off-by: Omar Sandoval <[email protected]>

We currently handle DIEs with a DW_AT_specification attribute by parsing the corresponding declaration to get the name and inserting the DIE as usual. This has a couple of problems: 1. It only works if DW_AT_specification refers to the same compilation unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not DW_FORM_ref_addr. As a result, drgn doesn't support the latter. 2. It assumes that the DIE with DW_AT_specification is in the correct "scope". Unfortunately, this is not true for g++: for a variable definition in a C++ namespace, it generates a DIE with DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE which refers to the declaration with DW_AT_specification _outside_ of the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE. Supporting both of these cases requires reworking how we handle DW_AT_specification. This commit takes an approach of parsing the DWARF data in two passes: the first pass reads the abbrevation and file name tables and builds a map of instances of DW_AT_specification; the second pass indexes DIEs as before, but ignores DIEs with DW_AT_specification and handles DIEs with DW_AT_declaration by looking them up in the map built by the first pass. This approach is a 10-20% regression in indexing time in the benchmarks I ran. Thankfully, it is not 100% slower for a couple of reasons. The first is that the two passes are simpler than the original combined pass. The second is that a decent part of the indexing time is spent faulting in the mapped debugging information, which only needs to happen once (even if the file is cached, minor page faults add non-negligible overhead). This doesn't handle DW_AT_specification "chains" yet, but neither did the original code. If it is necessary, it shouldn't be too difficult to add. Signed-off-by: Omar Sandoval <[email protected]>

Now that we can handle a DW_AT_specification that references another compilation unit, add support for DW_FORM_ref_addr. Signed-off-by: Omar Sandoval <[email protected]>

There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <[email protected]>

This is needed for a future change where we'll want to save an error and return it multiple times. Signed-off-by: Omar Sandoval <[email protected]>

…iterator_next() For namespace support, we will want to access the struct drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next(). Signed-off-by: Omar Sandoval <[email protected]>

In order to index namespaces lazily, we need the CU structures. Rename struct compilation_unit to the less generic struct drgn_dwarf_index_cu and keep the CUs in a vector in the dindex. Signed-off-by: Jay Kamat <[email protected]>

DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the DWARF index, with each namespace being its own sub-index. We only index the namespace itself when it is first accessed, which should help with startup time and simplifies tracking. Signed-off-by: Jay Kamat <[email protected]>

The current name is too verbose. Let's go with a shorter, more generic name. Signed-off-by: Omar Sandoval <[email protected]>

Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <[email protected]>

If we create a pending CU for a namespace, then add more CUs to the index, the CU might get reallocated, resulting in a use after free. Fix it by storing the index of the CU instead of the pointer. Signed-off-by: Omar Sandoval <[email protected]>

Signed-off-by: Omar Sandoval <[email protected]>

The elfutils header files should be treated as if they were in the standard location, so use -isystem instead of -I. Signed-off-by: Omar Sandoval <[email protected]>

I recently hit a couple of CI failures caused by relying on transitive includes that weren't always present. include-what-you-use is a Clang-based tool that helps with this. It's a bit finicky and noisy, so this adds scripts/iwyu.py to make running it more convenient (but not reliable enough to automate it in Travis). This cleans up all reasonable include-what-you-use warnings and reorganizes a few header files. Signed-off-by: Omar Sandoval <[email protected]>

The fix was backported to QEMU's 5.0 stable branch and released in 5.0.1. Signed-off-by: Omar Sandoval <[email protected]>

osandov and others added 30 commits August 4, 2020 16:36

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

81c30c5

drgndoc: format None, True, and False as roles

b8aa2dc

Signed-off-by: Omar Sandoval <[email protected]>

drgndoc: handle implicit classmethods

2345325

The __init_subclass__ and __class_getitem__ methods are always class methods even if not decorated as such, so format them accordingly. Signed-off-by: Omar Sandoval <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

09c4893

Add IntegerLike type annotation

66c5cc8

Lots if interfaces in drgn transparently turn an integer Object into an int by using __index__(), so add an IntegerLike protocol for this and use it everywhere applicable. Signed-off-by: Omar Sandoval <[email protected]>

Add Path type alias

2d49ef6

Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes] to get rid of some mypy --strict errors. Signed-off-by: Omar Sandoval <[email protected]>

drgndoc: fix mypy --strict errors

4a3b8fb

Signed-off-by: Omar Sandoval <[email protected]>

drgndoc: save all modules and classes traversed to resolve name

a270525

This will be used to support relative imports. Signed-off-by: Omar Sandoval <[email protected]>

drgndoc: support relative imports

192d35c

Mainly for completeness, as I don't really like using them in my own projects. Signed-off-by: Omar Sandoval <[email protected]>

Fix mypy --strict warnings

8c7c80e

The remaining warnings are all no-any-return, which is hard to avoid in drgn. Signed-off-by: Omar Sandoval <[email protected]>

Format imports with isort

4e770fb

Signed-off-by: Omar Sandoval <[email protected]>

travis: check Python code with black, isort, and mypy

656d85f

Signed-off-by: Omar Sandoval <[email protected]>

travis: upgrade to Ubuntu 20.04

903a44d

This picks up a newer version of QEMU and lets us use udevadm trigger -w. Let's also explicitly add "os: linux" to silence the config validation. Signed-off-by: Omar Sandoval <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

c2ceec4

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

62d92aa

scripts: add Python include header path to cscope

d40526d

Signed-off-by: Omar Sandoval <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

d0d7d6c

drgndoc: bring back :exclude:

93e3351

It's still useful to have an escape hatch for names we don't want documented. Signed-off-by: Omar Sandoval <[email protected]>

drgndoc: directly use name passed as argument to drgndoc directive

b0f9403

E.g., drgndoc:: foo.bar() should emit py:method:: foo.bar() regardless of a previous py:module directive. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64…

d4e0771

…_bit}() Most places that call these check has_platform and return an error, and those that don't can live with the extra check. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: rearrange struct drgn_program members

1c8181e

struct drgn_program has a bunch of state scattered around. Group it together more logically, even if it means sacrificing some padding here and there. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: fold drgn_type_index into drgn_program

c31208f

This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

b0a7a7d

sdimitro approved these changes Aug 31, 2020

View reviewed changes

osandov and others added 25 commits September 2, 2020 17:13

libdrgn: dwarf_index: fix leak when parsing bad line number program h…

85c4b36

…eader If we fail to read an include directory in read_file_name_table(), we need to free the directory hashes. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: rename TAG_FLAG_* and TAG_MASK to DIE_FLAG_*

2252bef

This is more clear: although these flags happen to be encoded with the DWARF tag, they are flags regarding the DIE. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: use size_t instead of uint64_t where appropriate

c8f84c5

The CU unit length and DIE offset are both limited by the size of the mapped debugging information, i.e., size_t. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: don't worry about tag of CU DIE

ea9f3f3

As a small simplification, we can take commit 9bb2cce ("Enable DWARF indexing to work with partial units") further and not look at the tag of the top-level DIE at all. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: store DIE indices as uint32_t

0b4ab17

It's very unlikely that we'll ever index more than 4 billion DIEs in a single shard, so we can shrink the index a bit by using uint32_t indices (and uint8_t tag). Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: store abbrevation and file name tables in CU

5079776

This is preparation for the next change where we'll need to do two passes over the CUs. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: handle DW_AT_specification with DW_FORM_ref_addr

c053c2b

Now that we can handle a DW_AT_specification that references another compilation unit, add support for DW_FORM_ref_addr. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: add drgn_error_copy()

d512964

This is needed for a future change where we'll want to save an error and return it multiple times. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: keep CUs after indexing

a51abfc

In order to index namespaces lazily, we need the CU structures. Rename struct compilation_unit to the less generic struct drgn_dwarf_index_cu and keep the CUs in a vector in the dindex. Signed-off-by: Jay Kamat <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

d875729

libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info

3ac9ae3

The current name is too verbose. Let's go with a shorter, more generic name. Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: dwarf_index: free namespaces when rolling back

89b5da2

Signed-off-by: Omar Sandoval <[email protected]>

libdrgn: use -isystem for elfutils headers

fdbe336

The elfutils header files should be treated as if they were in the standard location, so use -isystem instead of -I. Signed-off-by: Omar Sandoval <[email protected]>

vmtest: also disable onoatimehack on QEMU 5.0.1

d829401

The fix was backported to QEMU's 5.0 stable branch and released in 5.0.1. Signed-off-by: Omar Sandoval <[email protected]>

Merge branch 'refs/heads/upstream-HEAD' into repo-HEAD

1af30a3

sdimitro requested a review from prakashsurya September 24, 2020 14:41

prakashsurya approved these changes Sep 24, 2020

View reviewed changes

prakashsurya merged commit 266b7f4 into 6.0/stage Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge branch 'master' into '6.0/stage' #18

Merge branch 'master' into '6.0/stage' #18

Uh oh!

github-actions bot commented Aug 5, 2020

Uh oh!

Uh oh!

Merge branch 'master' into '6.0/stage' #18

Merge branch 'master' into '6.0/stage' #18

Uh oh!

Conversation

github-actions bot commented Aug 5, 2020

Uh oh!

Uh oh!