summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-20ELF: document some de-facto PT_* ABI quirksAlexey Dobriyan
Turns out rules about PT_INTERP, PT_GNU_STACK and PT_GNU_PROPERTY program headers are slightly different. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://lore.kernel.org/r/88d3f1bb-f4e0-4c40-9304-3843513a1262@p183 Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-04-20Documentation: arm: remove stih415/stih416 related entriesAlain Volmat
ST's STiH415 and STiH416 platforms support have been removed since a long time already. This commit updates the sti related documentation overview to remove related entries and update the sti part to add STiH407/STiH410 and STiH418 platforms which are still actively supported. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20230416185349.18156-1-avolmat@me.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-04-20docs: turn off "smart quotes" in the HTML buildJonathan Corbet
We have long disabled the "html_use_smartypants" option to prevent Sphinx from mangling "--" sequences (among others). Unfortunately, Sphinx changed that option to "smartquotes" in the 1.6.6 release, and seemingly didn't see fit to warn about the use of the obsolete option, resulting in the aforementioned mangling returning. Disable this behavior again and hope that the option name stays stable for a while. Reported-by: Zipeng Zhang <zhangzipeng0@foxmail.com> Link: https://lore.kernel.org/lkml/tencent_CB1A298D31FD221496FF657CD7EF406E6605@qq.com Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-04-20Merge tag 'pci-v6.3-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fix from Bjorn Helgaas: - Previously we ignored PCI devices if the DT "status" property or the ACPI _STA method said it was not present. Per spec, _STA cannot be used for that purpose, and using it that way caused regressions, so skip the _STA check (Rob Herring) * tag 'pci-v6.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Restrict device disabled status check to DT
2023-04-21btrfs: reinterpret async discard iops_limit=0 as no delayBoris Burkov
Currently, a limit of 0 results in a hard coded metering over 6 hours. Since the default is a set limit, I suspect no one truly depends on this rather arbitrary setting. Repurpose it for an arguably more useful "unlimited" mode, where the delay is 0. Note that if block groups are too new, or go fully empty, there is still a delay associated with those conditions. Those delays implement heuristics for not trimming a region we are relatively likely to fully overwrite soon. CC: stable@vger.kernel.org # 6.2+ Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-21btrfs: set default discard iops_limit to 1000Boris Burkov
Previously, the default was a relatively conservative 10. This results in a 100ms delay, so with ~300 discards in a commit, it takes the full 30s till the next commit to finish the discards. On a workstation, this results in the disk never going idle, wasting power/battery, etc. Set the default to 1000, which results in using the smallest possible delay, currently, which is 1ms. This has shown to not pathologically keep the disk busy by the original reporter. Link: https://lore.kernel.org/linux-btrfs/Y%2F+n1wS%2F4XAH7X1p@nz/ Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182228 CC: stable@vger.kernel.org # 6.2+ Reviewed-by: Neal Gompa <neal@gompa.dev Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-20wifi: ath9k: Don't mark channelmap stack variable read-only in ↵Toke Høiland-Jørgensen
ath9k_mci_update_wlan_channels() This partially reverts commit e161d4b60ae3a5356e07202e0bfedb5fad82c6aa. Turns out the channelmap variable is not actually read-only, it's modified through the MCI_GPM_CLR_CHANNEL_BIT() macro further down in the function, so making it read-only causes page faults when that code is hit. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217183 Link: https://lore.kernel.org/r/20230413214118.153781-1-toke@toke.dk Fixes: e161d4b60ae3 ("wifi: ath9k: Make arrays prof_prio and channelmap static const") Cc: stable@vger.kernel.org Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-20Merge tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull Rust fixes from Miguel Ojeda: "Most of these are straightforward. The last one is more complex, but it only touches Rust + GCC builds which are for the moment best-effort. - Code: Missing 'extern "C"' fix. - Scripts: 'is_rust_module.sh' and 'generate_rust_analyzer.py' fixes. - A couple trivial fixes - Build: Rust + GCC build fix and 'grep' warning fix" * tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linux: rust: allow to use INIT_STACK_ALL_ZERO rust: fix regexp in scripts/is_rust_module.sh rust: build: Fix grep warning scripts: generate_rust_analyzer: Handle sub-modules with no Makefile rust: kernel: Mark rust_fmt_argument as extern "C" rust: sort uml documentation arch support table rust: str: fix requierments->requirements typo
2023-04-20PCI: Restrict device disabled status check to DTRob Herring
Commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") checked the firmware device status for both DT and ACPI devices. That caused a regression in some ACPI systems. The exact reason isn't clear. It's possibly a firmware bug. For now, at least, refactor the check to be for DT based systems only. Note that the original implementation leaked a refcount which is now correctly handled. [bhelgaas: Per ACPI r6.5, sec 6.3.7, for devices on an enumerable bus, _STA must return with bit[0] ("device is present") set] Link: https://lore.kernel.org/all/m2fs9lgndw.fsf@gmail.com/ Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/r/20230419193513.708818-1-robh@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=217317 Reported-by: Donald Hunter <donald.hunter@gmail.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Donald Hunter <donald.hunter@gmail.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Binbin Zhou <zhoubinbin@loongson.cn> Cc: Liu Peibao <liupeibao@loongson.cn> Cc: Huacai Chen <chenhuacai@loongson.cn>
2023-04-20Merge tag 'net-6.3-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. There are a few fixes for new code bugs, including the Mellanox one noted in the last networking pull. No known regressions outstanding. Current release - regressions: - sched: clear actions pointer in miss cookie init fail - mptcp: fix accept vs worker race - bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390 - eth: bnxt_en: fix a possible NULL pointer dereference in unload path - eth: veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag Current release - new code bugs: - eth: revert "net/mlx5: Enable management PF initialization" Previous releases - regressions: - netfilter: fix recent physdev match breakage - bpf: fix incorrect verifier pruning due to missing register precision taints - eth: virtio_net: fix overflow inside xdp_linearize_page() - eth: cxgb4: fix use after free bugs caused by circular dependency problem - eth: mlxsw: pci: fix possible crash during initialization Previous releases - always broken: - sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg - netfilter: validate catch-all set elements - bridge: don't notify FDB entries with "master dynamic" - eth: bonding: fix memory leak when changing bond type to ethernet - eth: i40e: fix accessing vsi->active_filters without holding lock Misc: - Mat is back as MPTCP co-maintainer" * tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) net: bridge: switchdev: don't notify FDB entries with "master dynamic" Revert "net/mlx5: Enable management PF initialization" MAINTAINERS: Resume MPTCP co-maintainer role mailmap: add entries for Mat Martineau e1000e: Disable TSO on i219-LM card to increase speed bnxt_en: fix free-runnig PHC mode net: dsa: microchip: ksz8795: Correctly handle huge frame configuration bpf: Fix incorrect verifier pruning due to missing register precision taints hamradio: drop ISA_DMA_API dependency mlxsw: pci: Fix possible crash during initialization mptcp: fix accept vs worker race mptcp: stops worker on unaccepted sockets at listener close net: rpl: fix rpl header size calculation net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete() bonding: Fix memory leak when changing bond type to Ethernet veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() bnxt_en: Fix a possible NULL pointer dereference in unload path bnxt_en: Do not initialize PTP on older P3/P4 chips netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements ...
2023-04-20Merge branch 'for-next/sysreg' into for-next/coreWill Deacon
* for-next/sysreg: arm64/sysreg: Convert HFGITR_EL2 to automatic generation arm64/idreg: Don't disable SME when disabling SVE arm64/sysreg: Update ID_AA64PFR1_EL1 for DDI0601 2022-12 arm64/sysreg: Convert HFG[RW]TR_EL2 to automatic generation arm64/sysreg: allow *Enum blocks in SysregFields blocks
2023-04-20Merge branch 'for-next/stacktrace' into for-next/coreWill Deacon
* for-next/stacktrace: arm64: move PAC masks to <asm/pointer_auth.h> arm64: use XPACLRI to strip PAC arm64: avoid redundant PAC stripping in __builtin_return_address() arm64: stacktrace: always inline core stacktrace functions arm64: stacktrace: move dump functions to end of file arm64: stacktrace: recover return address for first entry
2023-04-20Merge branch 'for-next/perf' into for-next/coreWill Deacon
* for-next/perf: (24 commits) KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege drivers/perf: hisi: add NULL check for name drivers/perf: hisi: Remove redundant initialized of pmu->name perf/arm-cmn: Fix port detection for CMN-700 arm64: pmuv3: dynamically map PERF_COUNT_HW_BRANCH_INSTRUCTIONS perf/arm-cmn: Validate cycles events fully Revert "ARM: mach-virt: Select PMUv3 driver by default" drivers/perf: apple_m1: Add Apple M2 support dt-bindings: arm-pmu: Add PMU compatible strings for Apple M2 cores perf: arm_cspmu: Fix variable dereference warning perf/amlogic: Fix config1/config2 parsing issue drivers/perf: Use devm_platform_get_and_ioremap_resource() kbuild, drivers/perf: remove MODULE_LICENSE in non-modules perf: qcom: Use devm_platform_get_and_ioremap_resource() perf: arm: Use devm_platform_get_and_ioremap_resource() perf/arm-cmn: Move overlapping wp_combine field ARM: mach-virt: Select PMUv3 driver by default ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM ARM: Make CONFIG_CPU_V7 valid for 32bit ARMv8 implementations perf: pmuv3: Change GENMASK to GENMASK_ULL ...
2023-04-20KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilegeWill Deacon
Although pKVM supports CPU PMU emulation for non-protected guests since 722625c6f4c5 ("KVM: arm64: Reenable pmu in Protected Mode"), this relies on the PMU driver probing before the host has de-privileged so that the 'kvm_arm_pmu_available' static key can still be enabled by patching the hypervisor text. As it happens, both of these events hang off device_initcall() but the PMU consistently won the race until 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf"). Since then, the host will fail to boot when pKVM is enabled: | hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available | kvm [1]: nVHE hyp BUG at: [<ffff8000090366e0>] __kvm_nvhe_handle_host_mem_abort+0x270/0x284! | kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE | kvm [1]: Hyp Offset: 0xfffea41fbdf70000 | Kernel panic - not syncing: HYP panic: | PS:a00003c9 PC:0000dbe04b0c66e0 ESR:00000000f2000800 | FAR:fffffbfffddfcf00 HPFAR:00000000010b0bf0 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7-00083-g0bce6746d154 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x33c | nvhe_hyp_panic_handler+0x10c/0x190 | aarch64_insn_patch_text_nosync+0x64/0xc8 | arch_jump_label_transform+0x4c/0x5c | __jump_label_update+0x84/0xfc | jump_label_update+0x100/0x134 | static_key_enable_cpuslocked+0x68/0xac | static_key_enable+0x20/0x34 | kvm_host_pmu_init+0x88/0xa4 | armpmu_register+0xf0/0xf4 | arm_pmu_acpi_probe+0x2ec/0x368 | armv8_pmu_driver_init+0x38/0x44 | do_one_initcall+0xcc/0x240 Fix the race properly by deferring the de-privilege step to device_initcall_sync(). This will also be needed in future when probing IOMMU devices and allows us to separate the pKVM de-privilege logic from the core hypervisor initialisation path. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf") Tested-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230420123356.2708-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-04-20Revert "block: Merge bio before checking ->cached_rq"Ming Lei
This reverts commit 23f3e3272e7a4d9fb870485cd6df1e4f9539282c. blk-mq sched bio merge still needs request to grab queue usage counter, so we can't simply call blk_mq_attempt_bio_merge() when queue usage counter isn't held. Fixes: 23f3e3272e7a ("block: Merge bio before checking ->cached_rq") Cc: Xiao Ni <xni@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230420112018.1108058-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-20Merge branch 'for-next/mm' into for-next/coreWill Deacon
* for-next/mm: arm64: mm: always map fixmap at page granularity arm64: mm: move fixmap code to its own file arm64: add FIXADDR_TOT_{START,SIZE} Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"" arm: uaccess: Remove memcpy_page_flushcache() mm,kfence: decouple kfence from page granularity mapping judgement
2023-04-20Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64: kexec: include reboot.h arm64: delete dead code in this_cpu_set_vectors() arm64: kernel: Fix kernel warning when nokaslr is passed to commandline arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step arm64/sme: Fix some comments of ARM SME arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2() arm64/signal: Use system_supports_tpidr2() to check TPIDR2 arm64: compat: Remove defines now in asm-generic arm64: kexec: remove unnecessary (void*) conversions arm64: armv8_deprecated: remove unnecessary (void*) conversions firmware: arm_sdei: Fix sleep from invalid context BUG
2023-04-20Merge branch 'for-next/kdump' into for-next/coreWill Deacon
* for-next/kdump: arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones arm64: kdump: do not map crashkernel region specifically arm64: kdump : take off the protection on crashkernel memory region
2023-04-20Merge branch 'for-next/ftrace' into for-next/coreWill Deacon
* for-next/ftrace: arm64: ftrace: Simplify get_ftrace_plt arm64: ftrace: Add direct call support ftrace: selftest: remove broken trace_direct_tramp ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS ftrace: Store direct called addresses in their ops ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs ftrace: Remove the legacy _ftrace_direct API ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multi ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
2023-04-20Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon
* for-next/cpufeature: arm64/cpufeature: Use helper macro to specify ID register for capabilites arm64/cpufeature: Consistently use symbolic constants for min_field_value arm64/cpufeature: Pull out helper for CPUID register definitions
2023-04-20Merge branch 'for-next/asm' into for-next/coreWill Deacon
* for-next/asm: arm64: uaccess: remove unnecessary earlyclobber arm64: uaccess: permit put_{user,kernel} to use zero register arm64: uaccess: permit __smp_store_release() to use zero register arm64: atomics: lse: improve cmpxchg implementation
2023-04-20Merge branch 'for-next/acpi' into for-next/coreWill Deacon
* for-next/acpi: ACPI: AGDI: Improve error reporting for problems during .remove()
2023-04-20arm64: kexec: include reboot.hSimon Horman
Include reboot.h in machine_kexec.c for declaration of machine_crash_shutdown. gcc-12 with W=1 reports: arch/arm64/kernel/machine_kexec.c:257:6: warning: no previous prototype for 'machine_crash_shutdown' [-Wmissing-prototypes] 257 | void machine_crash_shutdown(struct pt_regs *regs) No functional changes intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230418-arm64-kexec-include-reboot-v1-1-8453fd4fb3fb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-04-20arm64: delete dead code in this_cpu_set_vectors()Dan Carpenter
The "slot" variable is an enum, and in this context it is an unsigned int. So the type means it can never be negative and also we never pass invalid data to this function. If something did pass invalid data then this check would be insufficient protection. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/73859c9e-dea0-4764-bf01-7ae694fa2e37@kili.mountain Signed-off-by: Will Deacon <will@kernel.org>
2023-04-20net: bridge: switchdev: don't notify FDB entries with "master dynamic"Vladimir Oltean
There is a structural problem in switchdev, where the flag bits in struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only represent a simplified / denatured view of what's in struct net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc). Each time we want to pass more information about struct net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info (here, BR_FDB_STATIC), we find that FDB entries were already notified to switchdev with no regard to this flag, and thus, switchdev drivers had no indication whether the notified entries were static or not. For example, this command: ip link add br0 type bridge && ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic has never worked as intended with switchdev. It causes a struct net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has a single flag set: BR_FDB_ADDED_BY_USER. This is further passed to the switchdev notifier chain, where interested drivers have no choice but to assume this is a static (does not age) and sticky (does not migrate) FDB entry. So currently, all drivers offload it to hardware as such, as can be seen below ("offload" is set). bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 offload master br0 The software FDB entry expires $ageing_time centiseconds after the kernel last sees a packet with this MAC SA, and the bridge notifies its deletion as well, so it eventually disappears from hardware too. This is a problem, because it is actually desirable to start offloading "master dynamic" FDB entries correctly - they should expire $ageing_time centiseconds after the *hardware* port last sees a packet with this MAC SA - and this is how the current incorrect behavior was discovered. With an offloaded data plane, it can be expected that software only sees exception path packets, so an otherwise active dynamic FDB entry would be aged out by software sooner than it should. With the change in place, these FDB entries are no longer offloaded: bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 master br0 and this also constitutes a better way (assuming a backport to stable kernels) for user space to determine whether the kernel has the capability of doing something sane with these or not. As opposed to "master dynamic" FDB entries, on the current behavior of which no one currently depends on (which can be deduced from the lack of kselftests), Ido Schimmel explains that entries with the "extern_learn" flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev, since the spectrum driver listens to them (and this is kind of okay, because although they are treated identically to "static", they are expected to not age, and to roam). Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230418155902.898627-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-20ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBookAndy Chi
There is a HP ProBook 455 G10 which using ALC236 codec and need the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make mute LED and micmute LED work. Signed-off-by: Andy Chi <andy.chi@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230420035942.66817-1-andy.chi@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-04-20Merge tag 'asoc-fix-v6.3-rc7' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.3 A few remaining small fixes for v6.3, all small driver specific ones.
2023-04-19x86: remove 'zerorest' argument from __copy_user_nocache()Linus Torvalds
Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as well, because the implementation of that function didn't actually even look at that argument, and wasn't even aware it existed, although some misleading comments did mention it still. The 'zerorest' thing is a historical artifact of how "copy_from_user()" worked, in that it would zero the rest of the kernel buffer that it copied into. That zeroing still exists, but it's long since been moved to generic code, and the raw architecture-specific code doesn't do it. See _copy_from_user() in lib/usercopy.c for this all. However, while __copy_user_nocache() shares some history and superficial other similarities with copy_from_user(), it is in many ways also very different. In particular, while the code makes it *look* similar to the generic user copy functions that can copy both to and from user space, and take faults on both reads and writes as a result, __copy_user_nocache() does no such thing at all. __copy_user_nocache() always copies to kernel space, and will never take a page fault on the destination. What *can* happen, though, is that the non-temporal stores take a machine check because one of the use cases is for writing to stable memory, and any memory errors would then take synchronous faults. So __copy_user_nocache() does look a lot like copy_from_user(), but has faulting behavior that is more akin to our old copy_in_user() (which no longer exists, but copied from user space to user space and could fault on both source and destination). And it very much does not have the "zero the end of the destination buffer", since a problem with the destination buffer is very possibly the very source of the partial copy. So this whole thing was just a confusing historical artifact from having shared some code with a completely different function with completely different use cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-19Revert "net/mlx5: Enable management PF initialization"Jakub Kicinski
This reverts commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4. Paul reports that it causes a regression with IB on CX4 and FW 12.18.1000. In addition I think that the concept of "management PF" is not fully accepted and requires a discussion. Fixes: fe998a3c77b9 ("net/mlx5: Enable management PF initialization") Reported-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/ Link: https://lore.kernel.org/r/20230413222547.56901-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2023-04-19 We've added 3 non-merge commits during the last 6 day(s) which contain a total of 3 files changed, 34 insertions(+), 9 deletions(-). The main changes are: 1) Fix a crash on s390's bpf_arch_text_poke() under a NULL new_addr, from Ilya Leoshkevich. 2) Fix a bug in BPF verifier's precision tracker, from Daniel Borkmann and Andrii Nakryiko. 3) Fix a regression in veth's xdp_features which led to a broken BPF CI selftest, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix incorrect verifier pruning due to missing register precision taints veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL ==================== Link: https://lore.kernel.org/r/20230419195847.27060-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19MAINTAINERS: Resume MPTCP co-maintainer roleMat Martineau
I'm returning to the MPTCP maintainer role I held for most of the subsytem's history. This time I'm using my kernel.org email address. Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/mptcp/af85e467-8d0a-4eba-b5f8-e2f2c5d24984@tessares.net/ Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230418231318.115331-1-martineau@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19mailmap: add entries for Mat MartineauMatthieu Baerts
Map Mat's old corporate addresses to his kernel.org one. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230418-upstream-net-20230418-mailmap-mat-v1-1-13ca5dc83037@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-04-17 (i40e) This series contains updates to i40e only. Alex moves setting of active filters to occur under lock and checks/takes error path in rebuild if re-initializing the misc interrupt vector failed. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix i40e_setup_misc_vector() error handling i40e: fix accessing vsi->active_filters without holding lock ==================== Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230417205245.1030733-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 19 are cc:stable and the remainder address issues which were introduced during this merge cycle, or aren't considered suitable for -stable backporting. 19 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) nilfs2: initialize unused bytes in segment summary blocks mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages mm/mmap: regression fix for unmapped_area{_topdown} maple_tree: fix mas_empty_area() search maple_tree: make maple state reusable after mas_empty_area_rev() mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() tools/Makefile: do missed s/vm/mm/ mm: fix memory leak on mm_init error handling mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Revert "userfaultfd: don't fail on unrecognized features" writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used mailmap: update jtoppins' entry to reference correct email mm/mempolicy: fix use-after-free of VMA iterator mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO mm/mprotect: fix do_mprotect_pkey() return on error mm/khugepaged: check again on anon uffd-wp during isolation ...
2023-04-19e1000e: Disable TSO on i219-LM card to increase speedSebastian Basierski
While using i219-LM card currently it was only possible to achieve about 60% of maximum speed due to regression introduced in Linux 5.8. This was caused by TSO not being disabled by default despite commit f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround"). Fix that by disabling TSO during driver probe. Fixes: f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround") Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230417205345.1030801-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19bnxt_en: fix free-runnig PHC modeVadim Fedorenko
The patch in fixes changed the way real-time mode is chosen for PHC on the NIC. Apparently there is one more use case of the check outside of ptp part of the driver which was not converted to the new macro and is making a lot of noise in free-running mode. Fixes: 131db4991622 ("bnxt_en: reset PHC frequency in free-running mode") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230418202511.1544735-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'spi-fix-v6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A small fix in the error handling for the rockchip driver, ensuring we don't leak clock enables if we fail to request the interrupt for the device" * tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
2023-04-19Merge tag 'regulator-fix-v6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few driver specific fixes, one build coverage issue and a couple of 'someone typed in the wrong number' style errors in describing devices to the subsystem" * tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: sm5703: Fix missing n_voltages for fixed regulators regulator: fan53555: Fix wrong TCS_SLEW_MASK regulator: fan53555: Explicitly include bits header
2023-04-19rpmsg: glink: Consolidate TX_DATA and TX_DATA_CONTBjorn Andersson
Rather than duplicating most of the code for constructing the initial TX_DATA and subsequent TX_DATA_CONT packets, roll them into a single loop. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230418163018.785524-3-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Propagate TX failures in intentless mode as wellBjorn Andersson
As support for splitting transmission over several messages using TX_DATA_CONT was introduced it does not immediately return the return value of qcom_glink_tx(). The result is that in the intentless case (i.e. intent == NULL), the code will continue to send all additional chunks. This is wasteful, and it's possible that the send operation could incorrectly indicate success, if the last chunk fits in the TX fifo. Fix the condition. Fixes: 8956927faed3 ("rpmsg: glink: Add TX_DATA_CONT command while sending") Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230418163018.785524-2-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Wait for intent, not just request ackBjorn Andersson
In some implementations of the remote firmware, an intent request acknowledgment is sent when it's determined if the intent allocation will be fulfilled, but then the intent is queued after the acknowledgment. The result is that upon receiving a granted allocation request, the search for the newly allocated intent will fail and an additional request will be made. This will at best waste memory, but if the second request is rejected the transaction will be incorrectly rejected. Take the incoming intent into account in the wait to mitigate this problem. The above scenario can still happen, in the case that, on that same channel, an unrelated intent is delivered prior to the request acknowledgment and a separate process enters the send path and picks up the intent. Given that there's no relationship between the acknowledgment and the delivered (or to be delivered intent), there doesn't seem to be a solution to this problem. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> [bjorn: Fixed spelling issues pointed out by checkpatch in commit message] Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230327144617.3134175-3-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Transition intent request signaling to wait queueBjorn Andersson
Transition the intent request acknowledgement to use a wait queue so that it's possible, in the next commit, to extend the wait to also wait for an incoming intent. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230327144617.3134175-2-quic_bjorande@quicinc.com
2023-04-19Merge tag 'nand/for-6.4' into mtd/nextMiquel Raynal
Raw NAND core changes: * Convert to platform remove callback returning void * Fix spelling mistake waifunc() -> waitfunc() Raw NAND controller driver changes: * imx: Remove unused is_imx51_nfc and imx53_nfc functions * omap2: Drop obsolete dependency on COMPILE_TEST * orion: Use devm_platform_ioremap_resource() * qcom: - Use of_property_present() for testing DT property presence - Use devm_platform_get_and_ioremap_resource() * stm32_fmc2: Depends on ARCH_STM32 instead of MACH_STM32MP157 * tmio: Remove reference to config MTD_NAND_TMIO in the parsers Raw NAND manufacturer driver changes: * hynix: Fix up bit 0 of sdr_timing_mode SPI-NAND changes: * Add support for ESMT F50x1G41LB Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2023-04-19Merge tag 'spi-nor/for-6.4' into mtd/nextMiquel Raynal
SPI NOR core changes: * introduce Read While Write support for flashes featuring several banks * set the 4-Byte Address Mode method based on SFDP data * allow post_sfdp hook to return errors * parse SCCR MC table and introduce support for multi-chip devices SPI NOR manufacturer drivers changes: * macronix: add support for mx25uw51245g with RWW * spansion: - determine current address mode at runtime as it can be changed in a non-volatile way and differ from factory defaults or from what SFDP advertises. - enable JFFS2 write buffer mode for few ECC'd NOR flashes: S25FS256T, s25hx and s28hx - add support for s25hl02gt and s25hs02gt Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2023-04-19net: dsa: microchip: ksz8795: Correctly handle huge frame configurationChristophe JAILLET
Because of the logic in place, SW_HUGE_PACKET can never be set. (If the first condition is true, then the 2nd one is also true, but is not executed) Change the logic and update each bit individually. Fixes: 29d1e85f45e0 ("net: dsa: microchip: ksz8: add MTU configuration support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/43107d9e8b5b8b05f0cbd4e1f47a2bb88c8747b2.1681755535.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19rust: allow to use INIT_STACK_ALL_ZEROAndrea Righi
With CONFIG_INIT_STACK_ALL_ZERO enabled, bindgen passes -ftrivial-auto-var-init=zero to clang, that triggers the following error: error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' However, this additional option that is currently required by clang is deprecated since clang-16 and going to be removed in the future, likely with clang-18. So, make sure bindgen is using this extra option if the major version of the libclang used by bindgen is < 16. In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST without triggering any build error. Link: https://github.com/llvm/llvm-project/issues/44842 Link: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0-rc2/clang/docs/ReleaseNotes.rst#deprecated-compiler-flags Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> [Changed to < 16, added link and reworded] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-19rust: fix regexp in scripts/is_rust_module.shAndrea Righi
nm can use "R" or "r" to show read-only data sections, but scripts/is_rust_module.sh can only recognize "r", so with some versions of binutils it can fail to detect if a module is a Rust module or not. Right now we're using this script only to determine if we need to skip BTF generation (that is disabled globally if CONFIG_RUST is enabled), but it's still nice to fix this script to do the proper job. Moreover, with this patch applied I can also relax the constraint of "RUST depends on !DEBUG_INFO_BTF" and build a kernel with Rust and BTF enabled at the same time (of course BTF generation is still skipped for Rust modules). [ Miguel: The actual reason is likely to be a change on the Rust compiler between 1.61.0 and 1.62.0: echo '#[used] static S: () = ();' | rustup run 1.61.0 rustc --emit=obj --crate-type=lib - && nm rust_out.o echo '#[used] static S: () = ();' | rustup run 1.62.0 rustc --emit=obj --crate-type=lib - && nm rust_out.o Gives: 0000000000000000 r _ZN8rust_out1S17h48027ce0da975467E 0000000000000000 R _ZN8rust_out1S17h58e1f3d9c0e97cefE See https://godbolt.org/z/KE6jneoo4. ] Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-19bpf: Fix incorrect verifier pruning due to missing register precision taintsDaniel Borkmann
Juan Jose et al reported an issue found via fuzzing where the verifier's pruning logic prematurely marks a program path as safe. Consider the following program: 0: (b7) r6 = 1024 1: (b7) r7 = 0 2: (b7) r8 = 0 3: (b7) r9 = -2147483648 4: (97) r6 %= 1025 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 7: (97) r6 %= 1 8: (b7) r9 = 0 9: (bd) if r6 <= r9 goto pc+1 10: (b7) r6 = 0 11: (b7) r0 = 0 12: (63) *(u32 *)(r10 -4) = r0 13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48) 15: (bf) r1 = r4 16: (bf) r2 = r10 17: (07) r2 += -4 18: (85) call bpf_map_lookup_elem#1 19: (55) if r0 != 0x0 goto pc+1 20: (95) exit 21: (77) r6 >>= 10 22: (27) r6 *= 8192 23: (bf) r1 = r0 24: (0f) r0 += r6 25: (79) r3 = *(u64 *)(r0 +0) 26: (7b) *(u64 *)(r1 +0) = r3 27: (95) exit The verifier treats this as safe, leading to oob read/write access due to an incorrect verifier conclusion: func#0 @0 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r6 = 1024 ; R6_w=1024 1: (b7) r7 = 0 ; R7_w=0 2: (b7) r8 = 0 ; R8_w=0 3: (b7) r9 = -2147483648 ; R9_w=-2147483648 4: (97) r6 %= 1025 ; R6_w=scalar() 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648 7: (97) r6 %= 1 ; R6_w=scalar() 8: (b7) r9 = 0 ; R9=0 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 10: (b7) r6 = 0 ; R6_w=0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 9 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=0 22: (27) r6 *= 8192 ; R6_w=0 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 19 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? last_idx 18 first_idx 9 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 regs=40 stack=0 before 10: (b7) r6 = 0 25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 27: (95) exit from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 frame 0: propagating r6 last_idx 19 first_idx 11 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0 last_idx 9 first_idx 9 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0 last_idx 8 first_idx 0 regs=40 stack=0 before 8: (b7) r9 = 0 regs=40 stack=0 before 7: (97) r6 %= 1 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=40 stack=0 before 5: (05) goto pc+0 regs=40 stack=0 before 4: (97) r6 %= 1025 regs=40 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 19: safe frame 0: propagating r6 last_idx 9 first_idx 0 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=40 stack=0 before 5: (05) goto pc+0 regs=40 stack=0 before 4: (97) r6 %= 1025 regs=40 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 from 6 to 9: safe verification time 110 usec stack depth 4 processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 The verifier considers this program as safe by mistakenly pruning unsafe code paths. In the above func#0, code lines 0-10 are of interest. In line 0-3 registers r6 to r9 are initialized with known scalar values. In line 4 the register r6 is reset to an unknown scalar given the verifier does not track modulo operations. Due to this, the verifier can also not determine precisely which branches in line 6 and 9 are taken, therefore it needs to explore them both. As can be seen, the verifier starts with exploring the false/fall-through paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic, r6 is correctly marked for precision tracking where backtracking kicks in where it walks back the current path all the way where r6 was set to 0 in the fall-through branch. Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also here, the state of the registers is the same, that is, r6=0 and r9=0, so that at line 19 the path can be pruned as it is considered safe. It is interesting to note that the conditional in line 9 turned r6 into a more precise state, that is, in the fall-through path at the beginning of line 10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed here) at the beginning of line 11, r6 turned into a known const r6=0 as r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must be 0 (**): [...] ; R6_w=scalar() 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 [...] from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 [...] The next path is 'from 6 to 9'. The verifier considers the old and current state equivalent, and therefore prunes the search incorrectly. Looking into the two states which are being compared by the pruning logic at line 9, the old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag correctly set in the old state, r9 did not. Both r6'es are considered as equivalent given the old one is a superset of the current, more precise one, however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9 did not have reg->precise flag set, the verifier does not consider the register as contributing to the precision state of r6, and therefore it considered both r9 states as equivalent. However, for this specific pruned path (which is also the actual path taken at runtime), register r6 will be 0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map. The purpose of precision tracking is to initially mark registers (including spilled ones) as imprecise to help verifier's pruning logic finding equivalent states it can then prune if they don't contribute to the program's safety aspects. For example, if registers are used for pointer arithmetic or to pass constant length to a helper, then the verifier sets reg->precise flag and backtracks the BPF program instruction sequence and chain of verifier states to ensure that the given register or stack slot including their dependencies are marked as precisely tracked scalar. This also includes any other registers and slots that contribute to a tracked state of given registers/stack slot. This backtracking relies on recorded jmp_history and is able to traverse entire chain of parent states. This process ends only when all the necessary registers/slots and their transitive dependencies are marked as precise. The backtrack_insn() is called from the current instruction up to the first instruction, and its purpose is to compute a bitmask of registers and stack slots that need precision tracking in the parent's verifier state. For example, if a current instruction is r6 = r7, then r6 needs precision after this instruction and r7 needs precision before this instruction, that is, in the parent state. Hence for the latter r7 is marked and r6 unmarked. For the class of jmp/jmp32 instructions, backtrack_insn() today only looks at call and exit instructions and for all other conditionals the masks remain as-is. However, in the given situation register r6 has a dependency on r9 (as described above in **), so also that one needs to be marked for precision tracking. In other words, if an imprecise register influences a precise one, then the imprecise register should also be marked precise. Meaning, in the parent state both dest and src register need to be tracked for precision and therefore the marking must be more conservative by setting reg->precise flag for both. The precision propagation needs to cover both for the conditional: if the src reg was marked but not the dst reg and vice versa. After the fix the program is correctly rejected: func#0 @0 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r6 = 1024 ; R6_w=1024 1: (b7) r7 = 0 ; R7_w=0 2: (b7) r8 = 0 ; R8_w=0 3: (b7) r9 = -2147483648 ; R9_w=-2147483648 4: (97) r6 %= 1025 ; R6_w=scalar() 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648 7: (97) r6 %= 1 ; R6_w=scalar() 8: (b7) r9 = 0 ; R9=0 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 10: (b7) r6 = 0 ; R6_w=0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 9 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=0 22: (27) r6 *= 8192 ; R6_w=0 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 19 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? last_idx 18 first_idx 9 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 regs=40 stack=0 before 10: (b7) r6 = 0 25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 27: (95) exit from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 frame 0: propagating r6 last_idx 19 first_idx 11 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0 last_idx 9 first_idx 9 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0 last_idx 8 first_idx 0 regs=240 stack=0 before 8: (b7) r9 = 0 regs=40 stack=0 before 7: (97) r6 %= 1 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 19: safe from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0 9: (bd) if r6 <= r9 goto pc+1 last_idx 9 first_idx 0 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 last_idx 9 first_idx 0 regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 11: R6=scalar(umax=18446744071562067968) R9=-2147483648 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0_w=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff)) 22: (27) r6 *= 8192 ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192) 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 21 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm???? last_idx 19 first_idx 11 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0 last_idx 9 first_idx 0 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 math between map_value pointer and register with unbounded min value is not allowed verification time 886 usec stack depth 4 processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2 Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Reported-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com> Reported-by: Meador Inge <meadori@google.com> Reported-by: Simon Scannell <simonscannell@google.com> Reported-by: Nenad Stojanovski <thenenadx@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com> Reviewed-by: Meador Inge <meadori@google.com> Reviewed-by: Simon Scannell <simonscannell@google.com>
2023-04-19KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()Dan Carpenter
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two between 0-32768 but if it is more than sizeof(long) this will corrupt memory. Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-04-19Merge tag 'nfsd-6.3-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address two issues with the new GSS krb5 Kunit tests * tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix failures of checksum Kunit tests sunrpc: Fix RFC6803 encryption test