summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-25powerpc/interrupt: Remove prep_irq_for_user_exit()Christophe Leroy
prep_irq_for_user_exit() has only one caller, squash it inside that caller. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-18-npiggin@gmail.com
2021-06-25powerpc/interrupt: Refactor prep_irq_for_{user/kernel_enabled}_exit()Christophe Leroy
prep_irq_for_user_exit() is a superset of prep_irq_for_kernel_enabled_exit(). Rename prep_irq_for_kernel_enabled_exit() as prep_irq_for_enabled_exit() and have prep_irq_for_user_exit() use it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-17-npiggin@gmail.com
2021-06-25powerpc/interrupt: Interchange prep_irq_for_{kernel_enabled/user}_exit()Christophe Leroy
prep_irq_for_user_exit() is a superset of prep_irq_for_kernel_enabled_exit(). In order to allow refactoring in following patch, interchange the two. This will allow prep_irq_for_user_exit() to call a renamed version of prep_irq_for_kernel_enabled_exit(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-16-npiggin@gmail.com
2021-06-25powerpc/interrupt: Refactor interrupt_exit_user_prepare()Christophe Leroy
interrupt_exit_user_prepare() is a superset of interrupt_exit_user_prepare_main(). Refactor to avoid code duplication. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-15-npiggin@gmail.com
2021-06-25powerpc/interrupt: Rename and lightly change syscall_exit_prepare_main()Christophe Leroy
Rename syscall_exit_prepare_main() into interrupt_exit_prepare_main() Pass it the 'ret' so that it can 'or' it directly instead of oring twice, once inside the function and once outside. And remove 'r3' parameter which is not used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [np: split out some changes into other patches] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-14-npiggin@gmail.com
2021-06-25powerpc/64: use interrupt restart table to speed up return from interruptNicholas Piggin
Use the restart table facility to return from interrupt or system calls without disabling MSR[EE] or MSR[RI]. Interrupt return asm is put into the low soft-masked region, to prevent interrupts being processed here, although they are still taken as masked interrupts which causes SRRs to be clobbered, and a pending soft-masked interrupt to require replaying. The return code uses restart table regions to redirct to a fixup handler rather than continue with the exit, if such an interrupt happens. In this case the interrupt return is redirected to a fixup handler which reloads r1 for the interrupt stack and reloads registers and sets state up to replay the soft-masked interrupt and try the exit again. Some types of security exit fallback flushes and barriers are currently unable to cope with reentrant interrupts, e.g., because they store some state in the scratch SPR which would be clobbered even by masked interrupts. For now the interrupts-enabled exits are disabled when these flushes are used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Guard unused exit_must_hard_disable() as reported by lkp] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-13-npiggin@gmail.com
2021-06-25powerpc/64: treat low kernel text as irqs soft-maskedNicholas Piggin
Treat code below __end_soft_masked as soft-masked for the purpose of alternate return. 64s already mostly does this for scv entry. This will be used to exit from interrupts without disabling MSR[EE]. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-12-npiggin@gmail.com
2021-06-25powerpc/64: interrupt soft-enable race fixNicholas Piggin
Prevent interrupt restore from allowing racing hard interrupts going ahead of previous soft-pending ones, by using the soft-masked restart handler to allow a store to clear the soft-mask while knowing nothing is soft-pending. This probably doesn't matter much in practice, but it's a simple demonstrator / test case to exercise the restart table logic. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-11-npiggin@gmail.com
2021-06-25powerpc/64: allow alternate return locations for soft-masked interruptsNicholas Piggin
The exception table fixup adjusts a failed page fault's interrupt return location if it was taken at an address specified in the exception table, to a corresponding fixup handler address. Introduce a variation of that idea which adds a fixup table for NMIs and soft-masked asynchronous interrupts. This will be used to protect certain critical sections that are sensitive to being clobbered by interrupts coming in (due to using the same SPRs and/or irq soft-mask state). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-10-npiggin@gmail.com
2021-06-25powerpc/64s: save one more register in the masked interrupt handlerNicholas Piggin
This frees up one more register (and takes advantage of that to clean things up a little bit). This register will be used in the following patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-9-npiggin@gmail.com
2021-06-25powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]Nicholas Piggin
This extends the MSR[RI]=0 window a little further into the system call in order to pair RI and EE enabling with a single mtmsrd. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-8-npiggin@gmail.com
2021-06-25powerpc/64: move interrupt return asm to interrupt_64.SNicholas Piggin
The next patch would like to move interrupt return assembly code to a low location before general text, so move it into its own file and include via head_64.S Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-7-npiggin@gmail.com
2021-06-25powerpc/64s: avoid reloading (H)SRR registers if they are still validNicholas Piggin
When an interrupt is taken, the SRR registers are set to return to where it left off. Unless they are modified in the meantime, or the return address or MSR are modified, there is no need to reload these registers when returning from interrupt. Introduce per-CPU flags that track the validity of SRR and HSRR registers. These are cleared when returning from interrupt, when using the registers for something else (e.g., OPAL calls), when adjusting the return address or MSR of a context, and when context switching (which changes the return address and MSR). This improves the performance of interrupt returns. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in fixup patch from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
2021-06-25powerpc/64s: introduce different functions to return from SRR vs HSRR interruptsNicholas Piggin
This makes no real difference yet except that HSRR type interrupts will use hrfid to return. This is important for the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-4-npiggin@gmail.com
2021-06-25powerpc: remove interrupt exit helpers unused argumentNicholas Piggin
The msr argument is not used, remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-3-npiggin@gmail.com
2021-06-25powerpc/interrupt: Fix CONFIG ifdef typoChristophe Leroy
CONFIG_PPC_BOOK3S should be CONFIG_PPC_BOOK3S_64. restore_math is a no-op for other configurations. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [np: split from another patch] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-2-npiggin@gmail.com
2021-06-25powerpc/prom_init: Pass linux_banner to firmware via option vector 7Michael Ellerman
Pass the value of linux_banner to firmware via option vector 7. Option vector 7 is described in "LoPAR" Linux on Power Architecture Reference v2.9, in table B.7 on page 824: An ASCII character formatted null terminated string that describes the client operating system. The string shall be human readable and may be displayed on the console. The string can be up to 256 bytes total, including the nul terminator. linux_banner contains lots of information, and should make it possible to identify the exact kernel version that is running: const char linux_banner[] = "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n"; For example: Linux version 4.15.0-144-generic (buildd@bos02-ppc64el-018) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #148-Ubuntu SMP Sat May 8 02:32:13 UTC 2021 (Ubuntu 4.15.0-144.148-generic 4.15.18) It's also printed at boot to the console/dmesg, which should make it possible to correlate what firmware receives with the console/dmesg on the machine. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621064938.2021419-2-mpe@ellerman.id.au
2021-06-25powerpc/prom_init: Convert prom_strcpy() into prom_strscpy_pad()Michael Ellerman
In a subsequent patch we'd like to have something like a strscpy_pad() implementation usable in prom_init.c. Currently we have a strcpy() implementation with only one caller, so convert it into strscpy_pad() and update the caller. Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621064938.2021419-1-mpe@ellerman.id.au
2021-06-25powerpc/64s: Fix boot failure with 4K RadixMichael Ellerman
When using the Radix MMU our PGD is always 64K, and must be naturally aligned. For a 4K page size kernel that means page alignment of swapper_pg_dir is not sufficient, leading to failure to boot. Use the existing MAX_PTRS_PER_PGD which has the correct value, and avoids us hard-coding 64K here. Fixes: e72421a085a8 ("powerpc: Define swapper_pg_dir[] in C") Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210624123420.2784187-1-mpe@ellerman.id.au
2021-06-24bpf: Support all gso types in bpf_skb_change_proto()Maciej Żenczykowski
Since we no longer modify gso_size, it is now theoretically safe to not set SKB_GSO_DODGY and reset gso_segs to zero. This also means the skb_is_gso_tcp() check should no longer be necessary. Unfortunately we cannot remove the skb_{decrease,increase}_gso_size() helpers, as they are still used elsewhere: bpf_skb_net_grow() without BPF_F_ADJ_ROOM_FIXED_GSO bpf_skb_net_shrink() without BPF_F_ADJ_ROOM_FIXED_GSO net/core/lwt_bpf.c's handle_gso_type() Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dongseok Yi <dseok.yi@samsung.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/20210617000953.2787453-3-zenczykowski@gmail.com
2021-06-24mcb: Use DEFINE_RES_MEM() helper macro and fix the end addressZhen Lei
Use DEFINE_RES_MEM() to save a couple of lines of code, which makes the code a bit shorter and easier to read. The start address does not need to appear twice. By the way, the value of '.end' should be "start + size - 1". So the previous writing should have omitted subtracted 1. Fixes: acf5e051ac44 ("MCB: add support for SC31 to mcb-lpc") Fixes: 73edc8f7ccef ("mcb: Added support for LPC or non PCI based MCB carrier") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210616073030.834-2-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24PNP: moved EXPORT_SYMBOL so that it immediately followed its function/variableJinchao Wang
change made to resolve following checkpatch message: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable Signed-off-by: Jinchao Wang <wjc@cdjrlc.com> Link: https://lore.kernel.org/r/20210624020929.49968-1-wjc@cdjrlc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24KVM: arm64: Set the MTE tag bit before releasing the pageMarc Zyngier
Setting a page flag without holding a reference to the page is living dangerously. In the tag-writing path, we drop the reference to the page by calling kvm_release_pfn_dirty(), and only then set the PG_mte_tagged bit. It would be safer to do it the other way round. Fixes: f0376edb1ddca ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87k0mjidwb.wl-maz@kernel.org
2021-06-24bus: mhi: pci-generic: Add missing 'pci_disable_pcie_error_reporting()' callsChristophe JAILLET
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call Add the missing call in the error handling path of the probe and in the remove function. Cc: <stable@vger.kernel.org> Fixes: b012ee6bfe2a ("mhi: pci_generic: Add PCI error handlers") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/f70c14701f4922d67e717633c91b6c481b59f298.1623445348.git.christophe.jaillet@wanadoo.fr Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210621161616.77524-6-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24bus: mhi: Wait for M2 state during system resumeBaochen Qiang
During system resume, MHI host triggers M3->M0 transition and then waits for target device to enter M0 state. Once done, the device queues a state change event into ctrl event ring and notifies MHI host by raising an interrupt, where a tasklet is scheduled to process this event. In most cases, the tasklet is served timely and wait operation succeeds. However, there are cases where CPU is busy and cannot serve this tasklet for some time. Once delay goes long enough, the device moves itself to M1 state and also interrupts MHI host after inserting a new state change event to ctrl ring. Later when CPU finally has time to process the ring, there will be two events: 1. For M3->M0 event, which is the first event to be processed queued first. The tasklet handler serves the event, updates device state to M0 and wakes up the task. 2. For M0->M1 event, which is processed later, the tasklet handler triggers M1->M2 transition and updates device state to M2 directly, then wakes up the MHI host (if it is still sleeping on this wait queue). Note that although MHI host has been woken up while processing the first event, it may still has no chance to run before the second event is processed. In other words, MHI host has to keep waiting till timeout causing the M0 state to be missed. kernel log here: ... Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi 0000:06:00.0: Entered with PM state: M3, MHI state: M3 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi 0000:06:00.0: State change event to state: M0 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi 0000:06:00.0: State change event to state: M1 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi 0000:06:00.0: Did not enter M0 state, MHI state: M2, PM state: M2 ... Fix this issue by simply adding M2 as a valid state for resume. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Cc: stable@vger.kernel.org Fixes: 0c6b20a1d720 ("bus: mhi: core: Add support for MHI suspend and resume") Signed-off-by: Baochen Qiang <bqiang@codeaurora.org> Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210524040312.14409-1-bqiang@codeaurora.org [mani: slightly massaged the commit message] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210621161616.77524-4-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24bus: mhi: core: Fix power down latencyLoic Poulain
On graceful power-down/disable transition, when an MHI reset is performed, the MHI device loses its context, including interrupt configuration. However, the current implementation is waiting for event(irq) driven state change to confirm reset has been completed, which never happens, and causes reset timeout, leading to unexpected high latency of the mhi_power_down procedure (up to 45 seconds). Fix that by moving to the recently introduced poll_reg_field method, waiting for the reset bit to be cleared, in the same way as the power_on procedure. Cc: stable@vger.kernel.org Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions") Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Link: https://lore.kernel.org/r/1620029090-8975-1-git-send-email-loic.poulain@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210621161616.77524-3-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24intel_th: Wait until port is in reset before programming itAlexander Shishkin
Some devices don't drain their pipelines if we don't make sure that the corresponding output port is in reset before programming it for a new trace capture, resulting in bits of old trace appearing in the new trace capture. Fix that by explicitly making sure the reset is asserted before programming new trace capture. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210621151246.31891-5-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24intel_th: msu: Make contiguous buffers uncachedAlexander Shishkin
We already keep the multiblock mode buffers uncached, but forget the single mode. Address this. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210621151246.31891-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24intel_th: Remove an unused exit point from intel_th_remove()Uwe Kleine-König
As described in the added comment device_for_each_child() never returns a non-zero value. So remove the corresponding error check. This simplifies the quest to make struct bus_type::remove() return void. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210621151246.31891-3-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24stm class: Spelling fixRandy Dunlap
Drop the repeated word "the" in a comment. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [alexander.shishkin: fixed the commit message] Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210621151246.31891-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24ext4: fsmap: fix the block/inode bitmap commentRitesh Harjani
While debugging fstest ext4/027 failure, found below comment to be wrong and confusing. Hence fix it while we are at it. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/e79134132db7ea42f15747b5c669ee91cc1aacdf.1622432690.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24nitro_enclaves: Set Bus Master for the NE PCI deviceLongpeng(Mike)
Enable Bus Master for the NE PCI device, according to the PCI spec for submitting memory or I/O requests: Master Enable – Controls the ability of a PCI Express Endpoint to issue Memory and I/O Read/Write Requests, and the ability of a Root or Switch Port to forward Memory and I/O Read/Write Requests in the Upstream direction Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Alexandru Vasile <lexnv@amazon.com> Cc: Alexandru Ciobotaru <alcioa@amazon.com> Reviewed-by: Andra Paraschiv <andraprs@amazon.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20210621004046.1419-1-longpeng2@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24bpf: Do not change gso_size during bpf_skb_change_proto()Maciej Żenczykowski
This is technically a backwards incompatible change in behaviour, but I'm going to argue that it is very unlikely to break things, and likely to fix *far* more then it breaks. In no particular order, various reasons follow: (a) I've long had a bug assigned to myself to debug a super rare kernel crash on Android Pixel phones which can (per stacktrace) be traced back to BPF clat IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later on during transmit deep in the GSO engine, AFAICT precisely because of this change to gso_size, though I've never been able to manually reproduce it. I believe it may be related to the particular network offload support of attached USB ethernet dongle being used for tethering off of an IPv6-only cellular connection. The reason might be we end up with more segments than max permitted, or with a GSO packet with only one segment... (either way we break some assumption and hit a BUG_ON) (b) There is no check that the gso_size is > 20 when reducing it by 20, so we might end up with a negative (or underflowing) gso_size or a gso_size of 0. This can't possibly be good. Indeed this is probably somehow exploitable (or at least can result in a kernel crash) by delivering crafted packets and perhaps triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS) is related to MTU, but not directly derived from it: gso_size/MSS may be significantly smaller then one would get by deriving from local MTU. And on some NICs (which do loose MTU checking on receive, it may even potentially be larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just going from 21 to 1 is potentially problematic because it increases the number of segments by a factor of 21 (think DoS, or some other crash due to too many segments). (c) It's always safe to not increase the gso_size, because it doesn't result in the max packet size increasing. So the skb_increase_gso_size() call was always unnecessary for correctness (and outright undesirable, see later). As such the only part which is potentially dangerous (ie. could cause backwards compatibility issues) is the removal of the skb_decrease_gso_size() call. (d) If the packets are ultimately destined to the local device, then there is absolutely no benefit to playing around with gso_size. It only matters if the packets will egress the device. ie. we're either forwarding, or transmitting from the device. (e) This logic only triggers for packets which are GSO. It does not trigger for skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a GSO packet (and you don't even know what the MTU is, so you can't even fix it). As such your transmit path must *already* be able to handle an MTU 20 bytes larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28 bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size() call doesn't actually increase the size of the packets your transmit side must be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/ route MTUs must already be set correctly. Since for example with an IPv4 egress MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames, so you need a 1520 byte device MTU. This means if your IPv6 device's egress MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the need to handle fragments). This is to handle normal non-GSO packets. Thus the reduction is simply not needed for GSO packets, because when they're correctly built, they will already be the right size. (f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP segments) should not be modified, so that TCP's MSS counting works correctly (this matters for congestion control). If protocol conversion changes the gso_size, then the number of TCP segments may increase or decrease. Packet loss after protocol conversion can result in partial loss of MSS segments that the sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs in the middle of the segments it sent? (g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee that gso_size (and thus resulting packet MTU) is changed. ie. you must assume it won't be changed. (h) changing gso_size is outright buggy for UDP GSO packets, where framing matters (I believe that's also the case for SCTP, but it's already excluded by [g]). So the only remaining case is TCP, which also doesn't want it (see [f]). (i) see also the reasoning on the previous attempt at fixing this (commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e) which shows that the current behaviour causes TCP packet loss: In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the coalesced packet payload can be > MSS, but < MSS + 20. bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload length. After then tcp_gso_segment checks for the payload length if it is <= MSS. The condition is causing the packet to be dropped. tcp_gso_segment(): [...] mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) goto out; [...] Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary and buggy, and decreasing can go negative. Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dongseok Yi <dseok.yi@samsung.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com
2021-06-24misc: ibmasm: Modify matricies to matricesGuoqing Chi
The plural of "matrix" is "matrices". Signed-off-by: Guoqing Chi <chiguoqing@yulong.com> Link: https://lore.kernel.org/r/20210621031100.13093-1-chi962464zy@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24misc: vmw_vmci: return the correct errno codeJunlin Yang
When kzalloc failed, should return -ENOMEM rather than -EINVAL. Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Link: https://lore.kernel.org/r/20210619112854.1720-1-angkery@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24siox: Simplify error handling via dev_err_probe()Thorsten Scherer
commit a787e5400a1c ("driver core: add device probe log helper") introduced a helper for a common error checking pattern. Use it. Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thorsten Scherer <t.scherer@eckelmann.de> Link: https://lore.kernel.org/r/20210616061736.3786173-2-t.scherer@eckelmann.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24fpga: machxo2-spi: Address warning about unused variableMoritz Fischer
Address warning about unused variable in case CONFIG_OF is not set. warning: unused variable 'of_match' [-Wunused-const-variable] static const struct of_device_id of_match[] = { Fixes: 88fb3a002330 ("fpga: lattice machxo2: Add Lattice MachXO2 support") Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tom Rix <trix@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Moritz Fischer <mdf@kernel.org> Link: https://lore.kernel.org/r/20210618224618.1487323-1-mdf@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24ext4: fix comment for s_hash_unsignedEric Biggers
Fix the comment for s_hash_unsigned to not be the opposite of what it actually is. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20210527235557.2377525-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24HID: input: Add support for Programmable ButtonsThomas Weißschuh
Map them to KEY_MACRO# event codes. These buttons are defined by HID as follows: "The user defines the function of these buttons to control software applications or GUI objects." This matches the semantics of the KEY_MACRO# input event codes that Linux supports. Also add support for HID "Named Array" collections. Also add hid-debug support for KEY_MACRO#. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-06-24drm/nouveau: fix dma_address check for CPU/GPU syncChristian König
AGP for example doesn't have a dma_address array. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christian.koenig@amd.com
2021-06-24Revert "bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto"Maciej Żenczykowski
This reverts commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e. See the followup commit for the reasoning why I believe the appropriate approach is to simply make this change without a flag, but it can basically be summarized as using this helper without the flag is bug-prone or outright buggy, and thus the default should be this new behaviour. As this commit has only made it into net-next/master, but not into any real release, such a backwards incompatible change is still ok. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dongseok Yi <dseok.yi@samsung.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/20210617000953.2787453-1-zenczykowski@gmail.com
2021-06-24lkdtm/heap: Add init_on_alloc testsKees Cook
Add SLAB and page allocator tests for init_on_alloc. Testing for init_on_free was already happening via the poisoning tests. Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-10-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24selftests/lkdtm: Enable various testable CONFIGsKees Cook
Add a handful of LKDTM-testable features that depend on certain CONFIGs so that they are visible in logs for CI systems that run the selftests. Others could be added, but may be seen as having too high a trade-off for general testing. Cc: kernelci@groups.io Suggested-by: Guillaume Tucker <guillaume.tucker@collabora.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24lkdtm: Add CONFIG hints in errors where possibleKees Cook
For various failure conditions, try to include some details about where to look for reasons about the failure. Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-8-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24lkdtm: Enable DOUBLE_FAULT on all architecturesKees Cook
Where feasible, I prefer to have all tests visible on all architectures, but to have them wired to XFAIL. DOUBLE_FAIL was set up to XFAIL, but wasn't actually being added to the test list. Fixes: cea23efb4de2 ("lkdtm/bugs: Make double-fault test always available") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-7-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24lkdtm/heap: Add vmalloc linear overflow testKees Cook
Similar to the existing slab overflow and stack exhaustion tests, add VMALLOC_LINEAR_OVERFLOW (and rename the slab test SLAB_LINEAR_OVERFLOW). Additionally unmarks the test as destructive. (It should be safe in the face of misbehavior.) Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-6-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITEKees Cook
When built under CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this test is expected to fail (i.e. not trip an exception). Fixes: 46d1a0f03d66 ("selftests/lkdtm: Add tests for LKDTM targets") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-5-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24selftests/lkdtm: Fix expected text for free poisonKees Cook
Freed memory poisoning can be tested a few ways, so update the expected text to reflect the non-Oopsing alternative. Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-4-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24selftests/lkdtm: Fix expected text for CR4 pinningKees Cook
The error text for CR4 pinning changed. Update the test to match. Fixes: a13b9d0b9721 ("x86/cpu: Use pinning mask for CR4 bits needing to be 0") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24selftests/lkdtm: Avoid needing explicit sub-shellKees Cook
Some environments do not set $SHELL when running tests. There's no need to use $SHELL here anyway, since "cat" can be used to receive any delivered signals from the kernel. Additionally avoid using bash-isms in the command, and record stderr for posterity. Fixes: 46d1a0f03d66 ("selftests/lkdtm: Add tests for LKDTM targets") Cc: stable@vger.kernel.org Suggested-by: Guillaume Tucker <guillaume.tucker@collabora.com> Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-2-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>