-
Notifications
You must be signed in to change notification settings - Fork 3
add release.conf for PINE64 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bsd-hacker
wants to merge
35
commits into
shigeru_work
Choose a base branch
from
issue/1/add_release_conf_for_pine64
base: shigeru_work
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bsd-hacker
pushed a commit
that referenced
this pull request
Oct 6, 2017
…erns -w and -v flag matching was mostly functional but had some minor problems: 1. -w flag processing only allowed one iteration through pattern matching on a line. This was problematic if one pattern could match more than once, or if there were multiple patterns and the earliest/ longest match was not the most ideal, and 2. Previous work "fixed" things to not further process a line if the first iteration through patterns produced no matches. This is clearly wrong if we're dealing with the more restrictive -w matching. #2 breakage could have also occurred before recent broad rewrites, but it would be more arbitrary based on input patterns as to whether or not it actually affected things. Fix both of these by forcing a retry of the patterns after advancing just past the start of the first match if we're doing more restrictive -w matching and we didn't get any hits to start with. Also move -v flag processing outside of the loop so that we have a greater change to match in the more restrictive cases. This wasn't strictly wrong, but it could be a little more error prone. While here, introduce some regressions tests for this behavior and fix some excessive wrapping nearby that hindered readability. GNU grep passes these new tests. PR: 218467, 218811 Approved by: emaste (mentor, blanket MFC)
bsd-hacker
pushed a commit
that referenced
this pull request
Nov 1, 2017
whack an unused variable git-svn-id: svn+ssh://svn.freebsd.org/base/head@325267 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Nov 6, 2017
Use our ifm_list of supported media types rather than nested switch statements to find the current media type. Find a supported type that matches the current speed. Remove all workarounds while updating ifmr->ifm_active. For BNXT_IFMEDIA_ADD, added Three more speeds IFM_10G_T, IFM_2500_T & IFM_2500_KX. Submitted by: Bhargava Chenna Marreddy <[email protected]> Reviewed by: shurd, sbruno Approved by: sbruno (mentor) Sponsored by: Broadcom Limited Differential Revision: https://reviews.freebsd.org/D12896
bsd-hacker
pushed a commit
that referenced
this pull request
Nov 6, 2017
Use our ifm_list of supported media types rather than nested switch statements to find the current media type. Find a supported type that matches the current speed. Remove all workarounds while updating ifmr->ifm_active. For BNXT_IFMEDIA_ADD, added Three more speeds IFM_10G_T, IFM_2500_T & IFM_2500_KX. Submitted by: Bhargava Chenna Marreddy <[email protected]> Reviewed by: shurd, sbruno Approved by: sbruno (mentor) Sponsored by: Broadcom Limited Differential Revision: https://reviews.freebsd.org/D12896 git-svn-id: svn+ssh://svn.freebsd.org/base/head@325488 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Jan 21, 2018
amd64: remove unused variable from pmap_delayed_invl_genp This is take #2.
bsd-hacker
pushed a commit
that referenced
this pull request
Jan 21, 2018
Attaching syscon_generic earlier than BUS_PASS_DEFAULT makes it more difficult for specific syscon drivers to attach to the syscon node and to get ordering right. Further discussion yielded the following set of decisions: - Move syscon_generic to BUS_PASS_DEFAULT - If a platform needs a syscon with different attach order or probe behavior, it should subclass syscon_generic and match on the SoC specific compat string - When we come across a need for a syscon that attaches earlier but only specifies compatible = "syscon", we should create a syscon_exclusive driver that provides generic access but probes earlier and only matches if "syscon" is the only compatible. Such fdt nodes do exist in the wild right now, but we don't really use them at the moment. Additionally: - Any syscon provider that has needs any more complex than a spinlock solely for syscon access and a single memory resource should subclass syscon directly rather than attempting to subclass syscon_generic or add complexity to it. syscon_generic's attach/detach methods may be made public should the need arise to subclass it with additional attach/detach behavior. We introduce aw_syscon(4) that just subclasses syscon_generic but probes earlier to meet our requirements for if_awg and implements #2 above for this specific situation. It currently only matches a64/a83t/h3 since these are the only platforms that really need it at the time being. Discussed with: ian Reviewed by: manu, andrew, bcr (manpages, content unchanged since review) Differential Revision: https://reviews.freebsd.org/D13793 git-svn-id: svn+ssh://svn.freebsd.org/base/head@327936 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Jan 21, 2018
Attaching syscon_generic earlier than BUS_PASS_DEFAULT makes it more difficult for specific syscon drivers to attach to the syscon node and to get ordering right. Further discussion yielded the following set of decisions: - Move syscon_generic to BUS_PASS_DEFAULT - If a platform needs a syscon with different attach order or probe behavior, it should subclass syscon_generic and match on the SoC specific compat string - When we come across a need for a syscon that attaches earlier but only specifies compatible = "syscon", we should create a syscon_exclusive driver that provides generic access but probes earlier and only matches if "syscon" is the only compatible. Such fdt nodes do exist in the wild right now, but we don't really use them at the moment. Additionally: - Any syscon provider that has needs any more complex than a spinlock solely for syscon access and a single memory resource should subclass syscon directly rather than attempting to subclass syscon_generic or add complexity to it. syscon_generic's attach/detach methods may be made public should the need arise to subclass it with additional attach/detach behavior. We introduce aw_syscon(4) that just subclasses syscon_generic but probes earlier to meet our requirements for if_awg and implements #2 above for this specific situation. It currently only matches a64/a83t/h3 since these are the only platforms that really need it at the time being. Discussed with: ian Reviewed by: manu, andrew, bcr (manpages, content unchanged since review) Differential Revision: https://reviews.freebsd.org/D13793
bsd-hacker
pushed a commit
that referenced
this pull request
Feb 2, 2018
6.0.0 (branches/release_60 r324090). This introduces retpoline support, with the -mretpoline flag. The upstream initial commit message (r323155 by Chandler Carruth) contains quite a bit of explanation. Quoting: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typic al workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 MFC after: 3 months X-MFC-With: r327952 PR: 224669 git-svn-id: svn+ssh://svn.freebsd.org/base/head@328817 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Feb 2, 2018
6.0.0 (branches/release_60 r324090). This introduces retpoline support, with the -mretpoline flag. The upstream initial commit message (r323155 by Chandler Carruth) contains quite a bit of explanation. Quoting: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typic al workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 MFC after: 3 months X-MFC-With: r327952 PR: 224669
bsd-hacker
pushed a commit
that referenced
this pull request
Feb 20, 2018
The suspension counter needs synchronisation through slock, but we don't need it to check if inspecting the counter is necessary to begin with. In the common case it is not, thus avoid the lock if possible. Reviewed by: kib Tested by: pho git-svn-id: svn+ssh://svn.freebsd.org/base/head@329531 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Feb 20, 2018
The suspension counter needs synchronisation through slock, but we don't need it to check if inspecting the counter is necessary to begin with. In the common case it is not, thus avoid the lock if possible. Reviewed by: kib Tested by: pho
bsd-hacker
pushed a commit
that referenced
this pull request
Mar 25, 2018
branches. Upstream merge revisions: r324007: merging r323155 for llvm r324009: merging r323915 for llvm r324012: merging r323155 for clang r324025: merging r323155 for lld, with modifications to handle int3 fill r324026: merging r323288 for lld r325088: merging r324449 for llvm r325089: merging r324645 for llvm r325090: merging r325049 for llvm r325091: merging r325085 for llvm Original commit messages: r323155 (by Chandler Carruth): Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 r323915 (by Chandler Carruth): [x86] Make the retpoline thunk insertion a machine function pass. Summary: This removes the need for a machine module pass using some deeply questionable hacks. This should address PR36123 which is a case where in full LTO the memory usage of a machine module pass actually ended up being significant. We should revert this on trunk as soon as we understand and fix the memory usage issue, but we should include this in any backports of retpolines themselves. Reviewers: echristo, MatzeB Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42726 r323288 (by Rui Ueyama): Fix retpoline PLT header size for i386. Differential Revision: https://reviews.llvm.org/D42397 r324449 (by Chandler Carruth): [x86/retpoline] Make the external thunk names exactly match the names that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 r324645 (by David Woodhouse): [X86] Support 'V' register operand modifier This allows the register name to be printed without the leading '%'. This can be used for emitting calls to the retpoline thunks from inline asm. r325049 (by Reid Kleckner): [X86] Use EDI for retpoline when no scratch regs are left Summary: Instead of solving the hard problem of how to pass the callee to the indirect jump thunk without a register, just use a CSR. At a call boundary, there's nothing stopping us from using a CSR to hold the callee as long as we save and restore it in the prologue. Also, add tests for this mregparm=3 case. I wrote execution tests for __llvm_retpoline_push, but they never got committed as lit tests, either because I never rewrote them or because they got lost in merge conflicts. Reviewers: chandlerc, dwmw2 Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43214 r325085 (by Reid Kleckner): [X86] Remove dead code from retpoline thunk generation Differential Revision: https://reviews.freebsd.org/D14720
bsd-hacker
pushed a commit
that referenced
this pull request
Mar 29, 2018
Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839
bsd-hacker
pushed a commit
that referenced
this pull request
Mar 29, 2018
Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839 git-svn-id: svn+ssh://svn.freebsd.org/base/head@331723 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Apr 12, 2018
several follow-up fixes. MFC r327952: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r321788). Upstream has branched for the 6.0.0 release, which should be in about 6 weeks. Please report bugs and regressions, so we can get them into the release. Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11 support to build; see UPDATING for more information. MFC r328010: Pull in r322473 from upstream llvm trunk (by Andrei Elovikov): [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 This should fix "Vector value already set for part" assertions when building the net/iodine and sysutils/daa2iso ports. Reported by: jbeich PR: 224867, 224868 MFC r328090: Pull in r322623 from upstream llvm trunk (by Andrew V. Tischenko): Allow usage of X86-prefixes as separate instrs. Differential Revision: https://reviews.llvm.org/D42102 This should fix parse errors when x86 prefixes (such as 'lock' and 'rep') are followed by various non-mnemonic tokens, e.g. comments, .byte directives and labels. PR: 224669, 225054 MFC r328091: Revert r327340, as the workaround for rep prefixes followed by .byte directives is no longer needed after r328090. MFC r328141 (by emaste): lld: Fix for ld.lld does not accept "AT" syntax for declaring LMA region AT> lma_region expression allows to specify the memory region for section load address. Should fix [upstream LLVM] PR35684. LLVM review: https://reviews.llvm.org/D41397 Obtained from: LLVM r322359 by George Rimar MFC r328143 (by emaste): lld: Handle parsing AT(ADDR(.foo-bar)). The problem we had with it is that anything inside an AT is an expression, so we failed to parse the section name because of the - in it. Requested by: royger Obtained from: LLVM r322801 by Rafael Espindola MFC r328144 (by emaste): lld: Fix incorrect physical address on self-referencing AT command. When a section placement (AT) command references the section itself, the physical address of the section in the ELF header was calculated incorrectly due to alignment happening right after the location pointer's value was captured. The problem was diagnosed and the first version of the patch written by Erick Reyes. Obtained from: LLVM r322421 by Rafael Espindola MFC r328145: Pull in r322016 from upstream llvm trunk (by Sanjay Patel): [ValueTracking] remove overzealous assert The test is derived from a failing fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5008 Credit to @RKSimon for pointing out the problem. This should fix "Bad flavor while matching min/max" errors when building the graphics/libsixel and science/kst2 ports. Reported by: jbeich PR: 225268, 225269 MFC r328146: Pull in r322106 from upstream llvm trunk (by Alexey Bataev): [COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 This should fix "Not a vector MVT!" errors when building the games/dhewm3 port. Reported by: jbeich PR: 225271 MFC r328286 (by emaste): lld: Don't mark a shared library as needed because of a lazy symbol. Obtained from: LLVM r323221 by Rafael Esp?ndola MFC r328381: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r323338). PR: 224669 MFC r328513: Pull in r322245 from upstream clang trunk (by Craig Topper): [X86] Make -mavx512f imply -mfma and -mf16c in the frontend like it does in the backend. Similarly, make -mno-fma and -mno-f16c imply -mno-avx512f. Withou this "-mno-sse -mavx512f" ends up with avx512f being enabled in the frontend but disabled in the backend. Reported by: pawel PR: 225488 MFC r328542 (by emaste): lld: Use lookup instead of find. NFC, just simpler. Obtained from: LLVM r323395 by Rafael Espindola MFC r328543 (by emaste): lld: Only lookup LMARegion once. NFC. This is similar to how we handle MemRegion. Obtained from: LLVM r323396 by Rafael Espindola MFC r328544 (by emaste): lld: Remove MemRegionOffset. NFC. We can just use a member variable in MemoryRegion. Obtained from: LLVM r323399 by Rafael Espindola MFC r328545 (by emaste): lld: Simplify. NFC. Obtained from: LLVM r323440 by Rafael Espindola MFC r328546 (by emaste): lld: Improve LMARegion handling. This fixes the crash reported at [LLVM] PR36083. The issue is that we were trying to put all the sections in the same PT_LOAD and crashing trying to write past the end of the file. This also adds accounting for used space in LMARegion, without it all 3 PT_LOADs would have the same physical address. Obtained from: LLVM r323449 by Rafael Espindola MFC r328547 (by emaste): lld: Move LMAOffset from the OutputSection to the PhdrEntry. NFC. If two sections are in the same PT_LOAD, their relatives offsets, virtual address and physical addresses are all the same. [Rafael] initially wanted to have a single global LMAOffset, on the assumption that every ELF file was in practiced loaded contiguously in both physical and virtual memory. Unfortunately that is not the case. The linux kernel has: LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xced000 0xced000 R E 0x200000 LOAD 0x1000000 0xffffffff81e00000 0x0000000001e00000 0x15f000 0x15f000 RW 0x200000 LOAD 0x1200000 0x0000000000000000 0x0000000001f5f000 0x01b198 0x01b198 RW 0x200000 LOAD 0x137b000 0xffffffff81f7b000 0x0000000001f7b000 0x116000 0x1ec000 RWE 0x200000 The delta for all but the third PT_LOAD is the same: 0xffffffff80000000. [Rafael] thinks the 3rd one is a hack for implementing per cpu data, but we can't break that. Obtained from: LLVM r323456 by Rafael Espindola MFC r328548 (by emaste): lld: Put the header in the first PT_LOAD even if that PT_LOAD has a LMAExpr The root problem is that we were creating a PT_LOAD just for the header. That was technically valid, but inconvenient: we should not be making the ELF discontinuous. The solution is to allow a section with LMAExpr to be added to a PT_LOAD if that PT_LOAD doesn't already have a LMAExpr. LLVM PR: 36017 Obtained from: LLVM r323625 by Rafael Espindola MFC r328594 (by emaste): Pull in r322108 from upstream llvm trunk (by Rafael Esp?ndola): Make one of the emitFill methods non virtual. NFC. This is just preparatory work to fix [LLVM] PR35858. MFC r328595 (by emaste): Pull in r322123 from upstream llvm trunk (by Rafael Esp?ndola): Don't create MCFillFragment directly. Instead use higher level APIs that take care of most bookkeeping. MFC r328596 (by emaste): Pull in r322131 from upstream llvm trunk (by Rafael Esp?ndola): Use a MCExpr for the size of MCFillFragment. This allows the size to be found during ralaxation. This fixes [LLVM] pr35858. Requested by: royger MFC r328753: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r323948). PR: 224669 MFC r328817: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r324090). This introduces retpoline support, with the -mretpoline flag. The upstream initial commit message (r323155 by Chandler Carruth) contains quite a bit of explanation. Quoting: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typic al workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 PR: 224669 MFC r329033: Pull in r324594 from upstream clang trunk (by Alexander Ivchenko): Fix for #31362 - ms_abi is implemented incorrectly for values >=16 bytes. Summary: This patch is a fix for following issue: https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by front end lowering C calling conventions without taking into account calling conventions enforced by attribute. In this case win64cc was no correctly lowered on targets other than Windows. Reviewed By: rnk (Reid Kleckner) Differential Revision: https://reviews.llvm.org/D43016 Author: belickim <[email protected]> This fixes clang 6.0.0 assertions when building the emulators/wine and emulators/wine-devel ports, and should also make it use the correct Windows calling conventions. Bump __FreeBSD_version to make the fix easy to detect. PR: 224863 MFC r329223: Pull in r323998 from upstream clang trunk (by Richard Smith): PR36157: When injecting an implicit function declaration in C89, find the right DeclContext rather than injecting it wherever we happen to be. This avoids creating functions whose DeclContext is a struct or similar. This fixes assertion failures when parsing certain not-completely-valid struct declarations. Reported by: ae PR: 225862 MFC r329410: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r325330). PR: 224669 MFC r329983: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 (branches/release_60 r325932). This corresponds to 6.0.0 rc3. PR: 224669 MFC r330384: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 6.0.0 release (upstream r326565). Release notes for llvm, clang and lld will be available here soon: <http://releases.llvm.org/6.0.0/docs/ReleaseNotes.html> <http://releases.llvm.org/6.0.0/tools/clang/docs/ReleaseNotes.html> <http://releases.llvm.org/6.0.0/tools/lld/docs/ReleaseNotes.html> Relnotes: yes PR: 224669 MFC r330686: Pull in r326882 from upstream llvm trunk (by Sjoerd Meijer): [ARM] Fix for PR36577 Don't PerformSHLSimplify if the given node is used by a node that also uses a constant because we may get stuck in an infinite combine loop. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577 Patch by Sam Parker. Differential Revision: https://reviews.llvm.org/D44097 This fixes a hang when compiling one particular file in java/openjdk8 for armv6 and armv7. Reported by: swills PR: 226388 MFC r331065: Pull in r327638 from upstream llvm trunk (by Matthew Simpson): [ConstantFolding, InstSimplify] Handle more vector GEPs This patch addresses some additional cases where the compiler crashes upon encountering vector GEPs. This should fix PR36116. Differential Revision: https://reviews.llvm.org/D44219 Reference: https://bugs.llvm.org/show_bug.cgi?id=36116 This fixes an assertion when building the emulators/snes9x port. Reported by: jbeich PR: 225471 MFC r331066: Pull in r321999 from upstream clang trunk (by Ivan A. Kosarev): [CodeGen] Fix TBAA info for accesses to members of base classes Resolves: Bug 35724 - regression (r315984): fatal error: error in backend: Broken function found (Did not see access type in access path!) https://bugs.llvm.org/show_bug.cgi?id=35724 Differential Revision: https://reviews.llvm.org/D41547 This fixes "Did not see access type in access path" fatal errors when building the devel/gdb port (version 8.1). Reported by: jbeich PR: 226658 MFC r331366: Pull in r327101 from upstream llvm trunk (by Rafael Espindola): Don't treat .symver as a regular alias definition. This patch starts simplifying the handling of .symver. For now it just moves the responsibility for creating an alias down to the streamer. With that the asm streamer can pass a .symver unchanged, which is nice since gas cannot parse "foo@bar = zed". In a followup I hope to move the handling down to the writer so that we don't need special hacks for avoiding breaking names with @@@ on windows. Pull in r327160 from upstream llvm trunk (by Rafael Espindola): Delay creating an alias for @@@. With this we only create an alias for @@@ once we know if it should use @ or @@. This avoids last minutes renames and hacks to handle MS names. This only handles the ELF writer. LTO still has issues with @@@ aliases. Pull in r327928 from upstream llvm trunk (by Vitaly Buka): Object: Move attribute calculation into RecordStreamer. NFC Summary: Preparation for D44274 Reviewers: pcc, espindola Subscribers: hiraditya Differential Revision: https://reviews.llvm.org/D44276 Pull in r327930 from upstream llvm trunk (by Vitaly Buka): Object: Fix handling of @@@ in .symver directive Summary: name@@@nodename is going to be replaced with name@@nodename if symbols is defined in the assembled file, or name@nodename if undefined. https://sourceware.org/binutils/docs/as/Symver.html Fixes PR36623 Reviewers: pcc, espindola Subscribers: mehdi_amini, hiraditya Differential Revision: https://reviews.llvm.org/D44274 Together, these changes fix handling of @@@ in .symver directives when doing Link Time Optimization. Reported by: Shawn Webb <[email protected]> MFC r331731: Pull in r328738 from upstream lld trunk (by Rafael Espindola): Strip @ver suffices from the LTO output. This fixes pr36623. The problem is that we have to parse versions out of names before LTO so that LTO can use that information. When we get the LTO produced .o files, we replace the previous symbols with the LTO produced ones, but they still have @ in their names. We could just trim the name directly, but calling parseSymbolVersion to do it is simpler. This is a follow-up to r331366, since we discovered that lld could append version strings to symbols twice, when using Link Time Optimization.
bsd-hacker
pushed a commit
that referenced
this pull request
Apr 26, 2018
Compression was removed so #2 goes away and everything else needs renumbered to match, and the usage string was also updated due to removed options. X-MFC-With: r332995 git-svn-id: svn+ssh://svn.freebsd.org/base/head@333000 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Apr 26, 2018
Compression was removed so #2 goes away and everything else needs renumbered to match, and the usage string was also updated due to removed options. X-MFC-With: r332995
bsd-hacker
pushed a commit
that referenced
this pull request
May 24, 2018
There's no need to perform the interrupt unbind while holding the blkback lock, and doing so leads to the following LOR: lock order reversal: (sleepable after non-sleepable) 1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 stack backtrace: #0 0xffffffff80bdd993 at witness_debugger+0x73 #1 0xffffffff80bdd814 at witness_checkorder+0xe34 #2 0xffffffff80b7d798 at _sx_xlock+0x68 #3 0xffffffff811b3913 at intr_remove_handler+0x43 #4 0xffffffff811c63ef at xen_intr_unbind+0x10f #5 0xffffffff80a12ecf at xbb_disconnect+0x2f #6 0xffffffff80a12e54 at xbb_shutdown+0x1e4 #7 0xffffffff80a10be4 at xbb_frontend_changed+0x54 #8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14 #9 0xffffffff80a2a382 at xenwatch_thread+0x182 #10 0xffffffff80b34164 at fork_exit+0x84 #11 0xffffffff8101ec9e at fork_trampoline+0xe Reported by: Nathan Friess <[email protected]> Sponsored by: Citrix Systems R&D git-svn-id: svn+ssh://svn.freebsd.org/base/head@334145 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
May 24, 2018
There's no need to perform the interrupt unbind while holding the blkback lock, and doing so leads to the following LOR: lock order reversal: (sleepable after non-sleepable) 1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 stack backtrace: #0 0xffffffff80bdd993 at witness_debugger+0x73 #1 0xffffffff80bdd814 at witness_checkorder+0xe34 #2 0xffffffff80b7d798 at _sx_xlock+0x68 #3 0xffffffff811b3913 at intr_remove_handler+0x43 #4 0xffffffff811c63ef at xen_intr_unbind+0x10f #5 0xffffffff80a12ecf at xbb_disconnect+0x2f #6 0xffffffff80a12e54 at xbb_shutdown+0x1e4 #7 0xffffffff80a10be4 at xbb_frontend_changed+0x54 #8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14 #9 0xffffffff80a2a382 at xenwatch_thread+0x182 #10 0xffffffff80b34164 at fork_exit+0x84 #11 0xffffffff8101ec9e at fork_trampoline+0xe Reported by: Nathan Friess <[email protected]> Sponsored by: Citrix Systems R&D
bsd-hacker
pushed a commit
that referenced
this pull request
Jun 8, 2018
(or peel off the band-aid, whatever floats your boat) This addresses two separate issues: 1.) Nothing within bsdgrep actually knew whether it cared about line numbers or not. 2.) The file layer knew nothing about the context in which it was being called. #1 is only important when we're *not* processing line-by-line. #2 is debatably a good idea; the parsing context is only handy because that's where we store current offset information and, as of this commit, whether or not it needs to be line-aware.
bsd-hacker
pushed a commit
that referenced
this pull request
Jun 8, 2018
(or peel off the band-aid, whatever floats your boat) This addresses two separate issues: 1.) Nothing within bsdgrep actually knew whether it cared about line numbers or not. 2.) The file layer knew nothing about the context in which it was being called. #1 is only important when we're *not* processing line-by-line. #2 is debatably a good idea; the parsing context is only handy because that's where we store current offset information and, as of this commit, whether or not it needs to be line-aware. git-svn-id: svn+ssh://svn.freebsd.org/base/head@334821 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
May 28, 2019
It appeared that using NET_EPOCH_WAIT() while holding global BPF lock can lead to another panic: spin lock 0xfffff800183c9840 (turnstile lock) held by 0xfffff80018e2c5a0 (tid 100325) too long panic: spin lock held too long ... #0 sched_switch (td=0xfffff80018e2c5a0, newtd=0xfffff8000389e000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2133 #1 0xffffffff80bf9912 in mi_switch (flags=256, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:439 #2 0xffffffff80c21db7 in sched_bind (td=<optimized out>, cpu=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2704 #3 0xffffffff80c34c33 in epoch_block_handler_preempt (global=<optimized out>, cr=0xfffffe00005a1a00, arg=<optimized out>) at /usr/src/sys/kern/subr_epoch.c:394 #4 0xffffffff803c741b in epoch_block (global=<optimized out>, cr=<optimized out>, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416 #5 ck_epoch_synchronize_wait (global=0xfffff8000380cd80, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465 #6 0xffffffff80c3475e in epoch_wait_preempt (epoch=0xfffff8000380cd80) at /usr/src/sys/kern/subr_epoch.c:513 #7 0xffffffff80ce970b in bpf_detachd_locked (d=0xfffff801d309cc00, detached_ifp=<optimized out>) at /usr/src/sys/net/bpf.c:856 #8 0xffffffff80ced166 in bpf_detachd (d=<optimized out>) at /usr/src/sys/net/bpf.c:836 #9 bpf_dtor (data=0xfffff801d309cc00) at /usr/src/sys/net/bpf.c:914 To fix this add the check to the catchpacket() that BPF descriptor was not detached just before we acquired BPFD_LOCK(). Reported by: slavash Tested by: slavash MFC after: 1 week git-svn-id: svn+ssh://svn.freebsd.org/base/head@348324 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
May 28, 2019
It appeared that using NET_EPOCH_WAIT() while holding global BPF lock can lead to another panic: spin lock 0xfffff800183c9840 (turnstile lock) held by 0xfffff80018e2c5a0 (tid 100325) too long panic: spin lock held too long ... #0 sched_switch (td=0xfffff80018e2c5a0, newtd=0xfffff8000389e000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2133 #1 0xffffffff80bf9912 in mi_switch (flags=256, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:439 #2 0xffffffff80c21db7 in sched_bind (td=<optimized out>, cpu=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2704 #3 0xffffffff80c34c33 in epoch_block_handler_preempt (global=<optimized out>, cr=0xfffffe00005a1a00, arg=<optimized out>) at /usr/src/sys/kern/subr_epoch.c:394 #4 0xffffffff803c741b in epoch_block (global=<optimized out>, cr=<optimized out>, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416 #5 ck_epoch_synchronize_wait (global=0xfffff8000380cd80, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465 #6 0xffffffff80c3475e in epoch_wait_preempt (epoch=0xfffff8000380cd80) at /usr/src/sys/kern/subr_epoch.c:513 #7 0xffffffff80ce970b in bpf_detachd_locked (d=0xfffff801d309cc00, detached_ifp=<optimized out>) at /usr/src/sys/net/bpf.c:856 #8 0xffffffff80ced166 in bpf_detachd (d=<optimized out>) at /usr/src/sys/net/bpf.c:836 #9 bpf_dtor (data=0xfffff801d309cc00) at /usr/src/sys/net/bpf.c:914 To fix this add the check to the catchpacket() that BPF descriptor was not detached just before we acquired BPFD_LOCK(). Reported by: slavash Tested by: slavash MFC after: 1 week
bsd-hacker
pushed a commit
that referenced
this pull request
Aug 8, 2019
For some inexplicable reason, C++ compilers reject the -Wno- flag, and also (ab)use CWARNFLAGS. Reported by: imp git-svn-id: svn+ssh://svn.freebsd.org/base/head@350740 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bsd-hacker
pushed a commit
that referenced
this pull request
Aug 8, 2019
For some inexplicable reason, C++ compilers reject the -Wno- flag, and also (ab)use CWARNFLAGS. Reported by: imp
bsd-hacker
pushed a commit
that referenced
this pull request
Nov 12, 2019
Currently NMIs are sent over event channels, but that defeats the purpose of NMIs since event channels can be masked. Fix this by issuing NMIs using a hypercall, which injects a NMI (vector #2) to the desired vCPU. Note that NMIs could also be triggered using the emulated local APIC, but using a hypercall is better from a performance point of view since it doesn't involve instruction decoding when not using x2APIC mode. Reported and Tested by: avg Sponsored by: Citrix Systems R&D
bsd-hacker
pushed a commit
that referenced
this pull request
Dec 29, 2019
With WITNESS enabled we see the following warning: lock order reversal: (sleepable after non-sleepable) 1st 0xffffffd0847c7210 fu540spi0 (fu540spi0) @ /usr/home/kp/axiado/hornet-freebsd/src/sys/riscv/sifive/fu540_spi.c:297 2nd 0xffffffc00372bb30 Clock topology lock (Clock topology lock) @ /usr/home/kp/axiado/hornet-freebsd/src/sys/dev/extres/clk/clk.c:1137 stack backtrace: #0 0xffffffc0002a579e at witness_checkorder+0xb72 #1 0xffffffc0002a5556 at witness_checkorder+0x92a #2 0xffffffc000254c7a at _sx_slock_int+0x66 #3 0xffffffc00025537a at _sx_slock+0x8 #4 0xffffffc000123022 at clk_get_freq+0x38 #5 0xffffffc0005463e4 at __clzdi2+0x2bb8 #6 0xffffffc00014af58 at randomdev_getkey+0x76e #7 0xffffffc0001278b0 at simplebus_add_device+0x7ee #8 0xffffffc00027c9a8 at device_attach+0x2e6 #9 0xffffffc00027c634 at device_probe_and_attach+0x7a #10 0xffffffc00027d76a at bus_generic_attach+0x10 #11 0xffffffc00014aab0 at randomdev_getkey+0x2c6 #12 0xffffffc00027c9a8 at device_attach+0x2e6 #13 0xffffffc00027c634 at device_probe_and_attach+0x7a #14 0xffffffc00027d76a at bus_generic_attach+0x10 #15 0xffffffc000278bd2 at config_intrhook_oneshot+0x52 #16 0xffffffc000278b3e at config_intrhook_establish+0x146 #17 0xffffffc000278cf2 at config_intrhook_disestablish+0xfe The clock topology lock can sleep, which means we cannot attempt to acquire it while holding the non-sleepable mutex. Fix that by retrieving the clock speed once, during attach and not every time during SPI transaction setup. Submitted by: kp Sponsored by: Axiado
bsd-hacker
pushed a commit
that referenced
this pull request
May 25, 2020
This reapplies logical r360944 and r360946 (reverting r360955), with fixed copystr() stand-in replacement macro. Eventually the goal is to convert consumers and kill the macro, but for a first step it helps if the macro is correct. Prior commit message: Unlike the other copy*() functions, it does not serve to copy from one address space to another or protect against potential faults. It's just an older incarnation of the now-more-common strlcpy(). Add a coccinelle script to tools/ which can be used to mechanically convert existing instances where replacement with strlcpy is trivial. In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the code was further refactored manually to simplify. Replace the declaration of copystr() in systm.h with a small macro wrapper around strlcpy (with correction from brooks@ -- thanks). Remove N redundant MI implementations of copystr. For MIPS, this entailed inlining the assembler copystr into the only consumer, copyinstr, and making the latter a leaf function. Reviewed by: jhb (earlier version) Discussed with: brooks (thanks!) Differential Revision: https://reviews.freebsd.org/D24672
bsd-hacker
pushed a commit
that referenced
this pull request
Jul 6, 2020
Add a more compact display format for kern.tty_info_kstacks inspired by procstat -kk. Set it as a default one. # sysctl kern.tty_info_kstacks=1 kern.tty_info_kstacks: 0 -> 1 # sleep 2 ^T load: 0.17 cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k #0 0xffffffff80c4443e at mi_switch+0xbe #1 0xffffffff80c98044 at sleepq_catch_signals+0x494 #2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12 #3 0xffffffff80c43af3 at _sleep+0x193 #4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1 #5 0xffffffff80c5119b at sys_nanosleep+0x3b #6 0xffffffff810ffc69 at amd64_syscall+0x119 #7 0xffffffff810d5520 at fast_syscall_common+0x101 sleep: about 1 second(s) left out of the original 2 ^C # sysctl kern.tty_info_kstacks=2 kern.tty_info_kstacks: 1 -> 2 # sleep 2 ^T load: 0.24 cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12 sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b amd64_syscall+0x119 fast_syscall_common+0x101 sleep: about 1 second(s) left out of the original 2 ^C Suggested by: avg Reviewed by: mjg Relnotes: yes Sponsored by: Mysterious Code Ltd. Differential Revision: https://reviews.freebsd.org/D25487
bsd-hacker
pushed a commit
that referenced
this pull request
Aug 22, 2020
Instantiate Error in Target::GetEntryPointAddress() only when necessary When Target::GetEntryPointAddress() calls exe_module->GetObjectFile()->GetEntryPointAddress(), and the returned entry_addr is valid, it can immediately be returned. However, just before that, an llvm::Error value has been setup, but in this case it is not consumed before returning, like is done further below in the function. In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core: * thread #1, name = 'testcase', stop reason = breakpoint 1.1 frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5 1 int main(int argc, char *argv[]) 2 { -> 3 return 0; 4 } (lldb) p argc Program aborted due to an unhandled Error: Error value was Success. (Note: Success values must still be checked prior to being destroyed). Thread 1 received signal SIGABRT, Aborted. thr_kill () at thr_kill.S:3 3 thr_kill.S: No such file or directory. (gdb) bt #0 thr_kill () at thr_kill.S:3 #1 0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 #3 0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112 #4 0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267 #5 0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67 #6 0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114 #7 0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97 #8 0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604 #9 0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347 #10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383 #11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301 #12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331 #13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190 #14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372 #15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414 #16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646 #17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003 #18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762 #19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760 #20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548 #21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903 #22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946 #23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169 #24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675 #25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890 Fix the incorrect error catch by only instantiating an Error object if it is necessary. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D86355 This should fix lldb aborting as described in the scenario above. Reported by: dmgk PR: 248745
bsd-hacker
pushed a commit
that referenced
this pull request
Aug 25, 2020
r358097 introduced a problem for i386, where kernel builds will intermittently get hung, typically with many processes sleeping on "btalloc". I know nothing about VM, but received assistance from rlibby@ and markj@. rlibby@ stated the following: It looks like the problem is that for systems that do not have UMA_MD_SMALL_ALLOC, we do uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc); but we haven't set an appropriate free function. This is probably why UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was appropriate for systems with uma_small_alloc. So by default we get page_free as our free function. That calls kmem_free, which calls vmem_free ... but we do our allocs with vmem_xalloc. I'm not positive, but I think the problem is that in effect we vmem_xalloc -> vmem_free, not vmem_xfree. Three possible fixes: 1: The one you tested, but this is not best for systems with uma_small_alloc. 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC. 3: Actually provide an appropriate vmem_bt_free function. I think we should just do option 2 with a comment, it's simple and it's what we used to do. I'm not sure how much benefit we would see from option 3, but it's more work. This patch implements #2. I haven't done a comment, since I don't know what the problem is. markj@ noted the following: I think the suggested patch is ok, but not for the reason stated. On platforms without a direct map the problem is: to allocate btags we need a slab, and to allocate a slab we need to map a page, and to map a page we need to allocate btags. We handle this recursion using a custom slab allocator which specifies M_USE_RESERVE, allowing it to dip into a reserve of free btags. Because the returned slab can be used to keep the reserve populated, this ensures that there are always enough free btags available to handle the recursion. UMA_ZONE_NOFREE ensures that we never reclaim free slabs from the zone. However, when it was removed, an apparent bug in UMA was exposed: keg_drain() ignores the reservation set by uma_zone_reserve() in vmem_startup(). So under memory pressure we reclaim the free btags that are needed to break the recursion. That's why adding _NOFREE back fixes the problem: it disables the reclamation. We could perhaps fix it more cleverly, by modifying keg_drain() to always leave uk_reserve slabs available. markj@'s initial patch failed testing, so committing this patch was agreed upon as the interim solution. Either rlibby@ or markj@ might choose to add a comment to it. PR: 248008 Reviewed by: rlibby, markj
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
request to pull for #1 .