summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-19net/handshake: Create a NETLINK service for handling handshake requestsChuck Lever
When a kernel consumer needs a transport layer security session, it first needs a handshake to negotiate and establish a session. This negotiation can be done in user space via one of the several existing library implementations, or it can be done in the kernel. No in-kernel handshake implementations yet exist. In their absence, we add a netlink service that can: a. Notify a user space daemon that a handshake is needed. b. Once notified, the daemon calls the kernel back via this netlink service to get the handshake parameters, including an open socket on which to establish the session. c. Once the handshake is complete, the daemon reports the session status and other information via a second netlink operation. This operation marks that it is safe for the kernel to use the open socket and the security session established there. The notification service uses a multicast group. Each handshake mechanism (eg, tlshd) adopts its own group number so that the handshake services are completely independent of one another. The kernel can then tell via netlink_has_listeners() whether a handshake service is active and prepared to handle a handshake request. A new netlink operation, ACCEPT, acts like accept(2) in that it instantiates a file descriptor in the user space daemon's fd table. If this operation is successful, the reply carries the fd number, which can be treated as an open and ready file descriptor. While user space is performing the handshake, the kernel keeps its muddy paws off the open socket. A second new netlink operation, DONE, indicates that the user space daemon is finished with the socket and it is safe for the kernel to use again. The operation also indicates whether a session was established successfully. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19.gitignore: Do not ignore .kunitconfig filesChuck Lever
Circumvent the .gitignore wildcard to avoid warnings about ignored .kunitconfig files. As far as I can tell, the warnings are harmless and these files are not actually ignored. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'ipsec-next-2023-04-19' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2023-04-19 1) Remove inner/outer modes from input/output path. These are not needed anymore. From Herbert Xu. * tag 'ipsec-next-2023-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Remove inner/outer modes from output path xfrm: Remove inner/outer modes from input path ==================== Link: https://lore.kernel.org/r/20230419075300.452227-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19dt-bindings: net: ethernet: Fix JSON pointer referencesRob Herring
A JSON pointer reference (the part after the "#") must start with a "/". Conversely, references to the entire document must not have a trailing "/" and should be just a "#". The existing jsonschema package allows these, but coming changes make allowed "$ref" URIs stricter and throw errors on these references. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230418150628.1528480-1-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net: micrel: Update the list of supported physHoratiu Vultur
At the beginning of the file micrel.c there is list of supported PHYs. Extend this list with the following PHYs lan8841, lan8814 and lan8804, as these PHYs were added but the list was not updated. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230418124713.2221451-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net: stmmac: dwmac-meson8b: Avoid cast to incompatible function typeSimon Horman
Rather than casting clk_disable_unprepare to an incompatible function type provide a trivial wrapper with the correct signature for the use-case. Reported by clang-16 with W=1: drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:276:6: error: cast from 'void (*)(struct clk *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] (void(*)(void *))clk_disable_unprepare, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230418-dwmac-meson8b-clk-cb-cast-v1-1-e892b670cbbb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2023-04-19 We've added 3 non-merge commits during the last 6 day(s) which contain a total of 3 files changed, 34 insertions(+), 9 deletions(-). The main changes are: 1) Fix a crash on s390's bpf_arch_text_poke() under a NULL new_addr, from Ilya Leoshkevich. 2) Fix a bug in BPF verifier's precision tracker, from Daniel Borkmann and Andrii Nakryiko. 3) Fix a regression in veth's xdp_features which led to a broken BPF CI selftest, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix incorrect verifier pruning due to missing register precision taints veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL ==================== Link: https://lore.kernel.org/r/20230419195847.27060-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19MAINTAINERS: Resume MPTCP co-maintainer roleMat Martineau
I'm returning to the MPTCP maintainer role I held for most of the subsytem's history. This time I'm using my kernel.org email address. Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/mptcp/af85e467-8d0a-4eba-b5f8-e2f2c5d24984@tessares.net/ Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230418231318.115331-1-martineau@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19mailmap: add entries for Mat MartineauMatthieu Baerts
Map Mat's old corporate addresses to his kernel.org one. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230418-upstream-net-20230418-mailmap-mat-v1-1-13ca5dc83037@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-04-17 (i40e) This series contains updates to i40e only. Alex moves setting of active filters to occur under lock and checks/takes error path in rebuild if re-initializing the misc interrupt vector failed. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix i40e_setup_misc_vector() error handling i40e: fix accessing vsi->active_filters without holding lock ==================== Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230417205245.1030733-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 19 are cc:stable and the remainder address issues which were introduced during this merge cycle, or aren't considered suitable for -stable backporting. 19 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) nilfs2: initialize unused bytes in segment summary blocks mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages mm/mmap: regression fix for unmapped_area{_topdown} maple_tree: fix mas_empty_area() search maple_tree: make maple state reusable after mas_empty_area_rev() mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() tools/Makefile: do missed s/vm/mm/ mm: fix memory leak on mm_init error handling mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Revert "userfaultfd: don't fail on unrecognized features" writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used mailmap: update jtoppins' entry to reference correct email mm/mempolicy: fix use-after-free of VMA iterator mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO mm/mprotect: fix do_mprotect_pkey() return on error mm/khugepaged: check again on anon uffd-wp during isolation ...
2023-04-19e1000e: Disable TSO on i219-LM card to increase speedSebastian Basierski
While using i219-LM card currently it was only possible to achieve about 60% of maximum speed due to regression introduced in Linux 5.8. This was caused by TSO not being disabled by default despite commit f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround"). Fix that by disabling TSO during driver probe. Fixes: f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround") Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230417205345.1030801-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19bnxt_en: fix free-runnig PHC modeVadim Fedorenko
The patch in fixes changed the way real-time mode is chosen for PHC on the NIC. Apparently there is one more use case of the check outside of ptp part of the driver which was not converted to the new macro and is making a lot of noise in free-running mode. Fixes: 131db4991622 ("bnxt_en: reset PHC frequency in free-running mode") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230418202511.1544735-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net: dsa: mt7530: fix support for MT7531BEDaniel Golle
There are two variants of the MT7531 switch IC which got different features (and pins) regarding port 5: * MT7531AE: SGMII/1000Base-X/2500Base-X SerDes PCS * MT7531BE: RGMII Moving the creation of the SerDes PCS from mt753x_setup to mt7530_probe with commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation to mt7530_probe function") works fine for MT7531AE which got two instances of mtk-pcs-lynxi, however, MT7531BE requires mt7531_pll_setup to setup clocks before the single PCS on port 6 (usually used as CPU port) starts to work and hence the PCS creation failed on MT7531BE. Fix this by introducing a pointer to mt7531_create_sgmii function in struct mt7530_priv and call it again at the end of mt753x_setup like it was before commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation to mt7530_probe function"). Fixes: 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation to mt7530_probe function") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/ZDvlLhhqheobUvOK@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19module: add debugging auto-load duplicate module supportLuis Chamberlain
The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs. To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's *request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded. This adds debugging support to the kernel module auto-loader (*request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests. Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-20powerpc/85xx: p2020: Move all P2020 RDB machine descriptions to p2020.cPali Rohár
This moves P2020 RDB machine descriptions into new p2020.c source file. This is preparation for code de-duplication and providing one unified machine description for all P2020 boards. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-9-pali@kernel.org
2023-04-20powerpc/85xx: p2020: Move all P2020 DS machine descriptions to p2020.cPali Rohár
This moves P2020 DS machine descriptions into new p2020.c source file. This is preparation for code de-duplication and providing one unified machine description for all P2020 boards. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-8-pali@kernel.org
2023-04-20powerpc/85xx: Remove #ifdef CONFIG_QUICC_ENGINE in mpc85xx_rdbChristophe Leroy
mpc85xx_qe_par_io_init() is a stub when CONFIG_QUICC_ENGINE is not set. CONFIG_UCC_GETH and CONFIG_SERIAL_QE depend on CONFIG_QUICC_ENGINE. Remove #ifdef CONFIG_QUICC_ENGINE Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-7-pali@kernel.org
2023-04-20powerpc/85xx: Remove #ifdefs CONFIG_PPC_I8259 in mpc85xx_dsChristophe Leroy
All necessary items are declared all the time, no need to use a #ifdef CONFIG_PPC_I8259. Refactor CONFIG_PPC_I8259 actions into a dedicated init function. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-6-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_{ds/rdb} replace prink by pr_xxx macroChristophe Leroy
Use pr_debug() instead of printk(KERN_DEBUG Use pr_err() instead of printk(KERN_ERR Use pr_info() instead of printk(KERN_INFO or printk(" Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-5-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_{ds/rdb} replace BUG_ON() by WARN_ON()Christophe Leroy
No need to BUG() in case mpic_alloc() fails. Use WARN_ON(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-4-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_{ds/rdb} compact the call to mpic_alloc()Christophe Leroy
Reduce number of lines in the call to mpic_alloc(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-3-pali@kernel.org
2023-04-20powerpc/85xx: Remove DBG() macroChristophe Leroy
DBG() macro is defined at three places while used only one time at one place. Replace its only use by a pr_debug() and remove the macro. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408140122.25293-2-pali@kernel.org
2023-04-20powerpc/fsl_uli1575: Mark uli_exclude_device() as staticPali Rohár
Function uli_exclude_device() is not used outside of the fsl_uli1575.c source file anymore. So mark it as static and remove public prototype. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-9-pali@kernel.org
2023-04-20powerpc/86xx: mpc86xx_hpcn: Call uli_init() instead of explicit ppc_md ↵Pali Rohár
assignment After calling fsl_pci_assign_primary(), it is possible to use uli_init() to conditionally initialize ppc_md.pci_exclude_device callback based on the uli1575 detection. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-8-pali@kernel.org
2023-04-20powerpc/fsl_uli1575: Allow to disable FSL_ULI1575 supportPali Rohár
ULI1575 PCIe south bridge exists only on some Freescale boards. Allow to disable CONFIG_FSL_ULI1575 symbol when it is not explicitly selected and only implied. This is achieved by marking symbol as visible by providing short description. Also adds dependency for this symbol to prevent enabling it on platforms on which driver does not compile. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-7-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_rdb: Do not automatically select FSL_ULI1575Pali Rohár
Boards provided by CONFIG_MPC85xx_RDB option do not initialize fsl_uli1575.c driver. So remove explicit select dependency on it. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-6-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_ds: Move uli_init() code into its own driver filePali Rohár
Move uli_init() function into existing driver fsl_uli1575.c file in order to share its code between more platforms and board files. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-5-pali@kernel.org
2023-04-20powerpc/fsl_uli1575: Simplify uli_exclude_device() usagePali Rohár
Function uli_exclude_device() is called only from mpc86xx_exclude_device() and mpc85xx_exclude_device() functions. Both those functions are same, so merge its logic directly into the uli_exclude_device() function. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-4-pali@kernel.org
2023-04-20powerpc/85xx: mpc85xx_ds: Simplify mpc85xx_exclude_device() functionPali Rohár
Function mpc85xx_exclude_device() is installed and used only when pci_with_uli is fsl_pci_primary. So replace check for pci_with_uli by fsl_pci_primary in mpc85xx_exclude_device() and move pci_with_uli variable declaration into function mpc85xx_ds_uli_init() where it is used. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-3-pali@kernel.org
2023-04-20powerpc/fsl_uli1575: Misc cleanupChristophe Leroy
Use a single line for uli_exclude_device(). Add uli_exclude_device() prototype in ppc-pci.h and guard it. Remove that prototype from mpc85xx_ds.c and mpc86xx_hpcn.c files. Make uli_pirq_to_irq[] static as it is used only in that file. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230409000812.18904-2-pali@kernel.org
2023-04-20powerpc/boot: Fix boot wrapper code generation with CONFIG_POWER10_CPUNicholas Piggin
-mcpu=power10 will generate prefixed and pcrel code by default, which we do not support. The general kernel disables these with cflags, but those were missed for the boot wrapper. Fixes: 4b2a9315f20d ("powerpc/64s: POWER10 CPU Kconfig build option") Cc: stable@vger.kernel.org # v6.1+ Reported-by: Danny Tsen <dtsen@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230407040909.230998-1-npiggin@gmail.com
2023-04-20xfs: Extend table marker on deprecated mount options tableBagas Sanjaya
Sphinx reports htmldocs warning on deprecated mount options table: /home/bagas/repo/linux-kernel/Documentation/admin-guide/xfs.rst:243: WARNING: Malformed table. Text in column margin in table line 5. =========================== ================ Name Removal Schedule =========================== ================ Mounting with V4 filesystem September 2030 Mounting ascii-ci filesystem September 2030 ikeep/noikeep September 2025 attr2/noattr2 September 2025 =========================== ================ Extend the table markers to take account of the second name entry ("Mounting ascii-ci filesystem"), which is now the widest and to fix the above warning. Fixes: 7ba83850ca2691 ("xfs: deprecate the ascii-ci feature") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-20xfs: fix duplicate includesDave Chinner
Header files were already included, just not in the normal order. Remove the duplicates, preserving normal order. Also move xfs_ag.h include to before the scrub internal includes which are normally last in the include list. Fixes: d5c88131dbf0 ("xfs: allow queued AG intents to drain before scrubbing") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-19SUNRPC: remove the maximum number of retries in call_bind_statusDai Ngo
Currently call_bind_status places a hard limit of 3 to the number of retries on EACCES error. This limit was done to prevent NLM unlock requests from being hang forever when the server keeps returning garbage. However this change causes problem for cases when NLM service takes longer than 9 seconds to register with the port mapper after a restart. This patch removes this hard coded limit and let the RPC handles the retry based on the standard hard/soft task semantics. Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests") Reported-by: Helen Chao <helen.chao@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-04-19Merge tag 'spi-fix-v6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A small fix in the error handling for the rockchip driver, ensuring we don't leak clock enables if we fail to request the interrupt for the device" * tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
2023-04-19Merge tag 'regulator-fix-v6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few driver specific fixes, one build coverage issue and a couple of 'someone typed in the wrong number' style errors in describing devices to the subsystem" * tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: sm5703: Fix missing n_voltages for fixed regulators regulator: fan53555: Fix wrong TCS_SLEW_MASK regulator: fan53555: Explicitly include bits header
2023-04-19sed-opal: geometry feature reporting commandOndrej Kozina
Locking range start and locking range length attributes may be require to satisfy restrictions exposed by OPAL2 geometry feature reporting. Geometry reporting feature is described in TCG OPAL SSC, section 3.1.1.4 (ALIGN, LogicalBlockSize, AlignmentGranularity and LowestAlignedLBA). 4.3.5.2.1.1 RangeStart Behavior: [ StartAlignment = (RangeStart modulo AlignmentGranularity) - LowestAlignedLBA ] When processing a Set method or CreateRow method on the Locking table for a non-Global Range row, if: a) the AlignmentRequired (ALIGN above) column in the LockingInfo table is TRUE; b) RangeStart is non-zero; and c) StartAlignment is non-zero, then the method SHALL fail and return an error status code INVALID_PARAMETER. 4.3.5.2.1.2 RangeLength Behavior: If RangeStart is zero, then [ LengthAlignment = (RangeLength modulo AlignmentGranularity) - LowestAlignedLBA ] If RangeStart is non-zero, then [ LengthAlignment = (RangeLength modulo AlignmentGranularity) ] When processing a Set method or CreateRow method on the Locking table for a non-Global Range row, if: a) the AlignmentRequired (ALIGN above) column in the LockingInfo table is TRUE; b) RangeLength is non-zero; and c) LengthAlignment is non-zero, then the method SHALL fail and return an error status code INVALID_PARAMETER In userspace we stuck to logical block size reported by general block device (via sysfs or ioctl), but we can not read 'AlignmentGranularity' or 'LowestAlignedLBA' anywhere else and we need to get those values from sed-opal interface otherwise we will not be able to report or avoid locking range setup INVALID_PARAMETER errors above. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Tested-by: Milan Broz <gmazyland@gmail.com> Link: https://lore.kernel.org/r/20230411090931.9193-2-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19rpmsg: glink: Consolidate TX_DATA and TX_DATA_CONTBjorn Andersson
Rather than duplicating most of the code for constructing the initial TX_DATA and subsequent TX_DATA_CONT packets, roll them into a single loop. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230418163018.785524-3-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Propagate TX failures in intentless mode as wellBjorn Andersson
As support for splitting transmission over several messages using TX_DATA_CONT was introduced it does not immediately return the return value of qcom_glink_tx(). The result is that in the intentless case (i.e. intent == NULL), the code will continue to send all additional chunks. This is wasteful, and it's possible that the send operation could incorrectly indicate success, if the last chunk fits in the TX fifo. Fix the condition. Fixes: 8956927faed3 ("rpmsg: glink: Add TX_DATA_CONT command while sending") Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230418163018.785524-2-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Wait for intent, not just request ackBjorn Andersson
In some implementations of the remote firmware, an intent request acknowledgment is sent when it's determined if the intent allocation will be fulfilled, but then the intent is queued after the acknowledgment. The result is that upon receiving a granted allocation request, the search for the newly allocated intent will fail and an additional request will be made. This will at best waste memory, but if the second request is rejected the transaction will be incorrectly rejected. Take the incoming intent into account in the wait to mitigate this problem. The above scenario can still happen, in the case that, on that same channel, an unrelated intent is delivered prior to the request acknowledgment and a separate process enters the send path and picks up the intent. Given that there's no relationship between the acknowledgment and the delivered (or to be delivered intent), there doesn't seem to be a solution to this problem. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> [bjorn: Fixed spelling issues pointed out by checkpatch in commit message] Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230327144617.3134175-3-quic_bjorande@quicinc.com
2023-04-19rpmsg: glink: Transition intent request signaling to wait queueBjorn Andersson
Transition the intent request acknowledgement to use a wait queue so that it's possible, in the next commit, to extend the wait to also wait for an incoming intent. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230327144617.3134175-2-quic_bjorande@quicinc.com
2023-04-19perf probe: Add missing 0x prefix for addresses printed in hexadecimalArnaldo Carvalho de Melo
To fix this confusing warning: # perf probe -l Failed to find debug information for address 798240 probe_main:prometheus_new_counter__return (on github.com/prometheus/client_golang/prometheus.NewCounter%return in /home/acme/git/prometheus-uprobes/main with counter) # As that 798240 is printed with PRIx64 but has no letters, better print the 0x prefix to disambiguate. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZEBCyFu2GjTw6qOi@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-19Merge tag 'nand/for-6.4' into mtd/nextMiquel Raynal
Raw NAND core changes: * Convert to platform remove callback returning void * Fix spelling mistake waifunc() -> waitfunc() Raw NAND controller driver changes: * imx: Remove unused is_imx51_nfc and imx53_nfc functions * omap2: Drop obsolete dependency on COMPILE_TEST * orion: Use devm_platform_ioremap_resource() * qcom: - Use of_property_present() for testing DT property presence - Use devm_platform_get_and_ioremap_resource() * stm32_fmc2: Depends on ARCH_STM32 instead of MACH_STM32MP157 * tmio: Remove reference to config MTD_NAND_TMIO in the parsers Raw NAND manufacturer driver changes: * hynix: Fix up bit 0 of sdr_timing_mode SPI-NAND changes: * Add support for ESMT F50x1G41LB Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2023-04-19Merge tag 'spi-nor/for-6.4' into mtd/nextMiquel Raynal
SPI NOR core changes: * introduce Read While Write support for flashes featuring several banks * set the 4-Byte Address Mode method based on SFDP data * allow post_sfdp hook to return errors * parse SCCR MC table and introduce support for multi-chip devices SPI NOR manufacturer drivers changes: * macronix: add support for mx25uw51245g with RWW * spansion: - determine current address mode at runtime as it can be changed in a non-volatile way and differ from factory defaults or from what SFDP advertises. - enable JFFS2 write buffer mode for few ECC'd NOR flashes: S25FS256T, s25hx and s28hx - add support for s25hl02gt and s25hs02gt Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2023-04-19net: dsa: microchip: ksz8795: Correctly handle huge frame configurationChristophe JAILLET
Because of the logic in place, SW_HUGE_PACKET can never be set. (If the first condition is true, then the 2nd one is also true, but is not executed) Change the logic and update each bit individually. Fixes: 29d1e85f45e0 ("net: dsa: microchip: ksz8: add MTU configuration support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/43107d9e8b5b8b05f0cbd4e1f47a2bb88c8747b2.1681755535.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19page_pool: add DMA_ATTR_WEAK_ORDERING on all mappingsJakub Kicinski
Commit c519fe9a4f0d ("bnxt: add dma mapping attributes") added DMA_ATTR_WEAK_ORDERING to DMA attrs on bnxt. It has since spread to a few more drivers (possibly as a copy'n'paste). DMA_ATTR_WEAK_ORDERING only seems to matter on Sparc and PowerPC/cell, the rarity of these platforms is likely why we never bothered adding the attribute in the page pool, even though it should be safe to add. To make the page pool migration in drivers which set this flag less of a risk (of regressing the precious sparc database workloads or whatever needed this) let's add DMA_ATTR_WEAK_ORDERING on all page pool DMA mappings. We could make this a driver opt-in but frankly I don't think it's worth complicating the API. I can't think of a reason why device accesses to packet memory would have to be ordered. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230417152805.331865-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19rust: allow to use INIT_STACK_ALL_ZEROAndrea Righi
With CONFIG_INIT_STACK_ALL_ZERO enabled, bindgen passes -ftrivial-auto-var-init=zero to clang, that triggers the following error: error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' However, this additional option that is currently required by clang is deprecated since clang-16 and going to be removed in the future, likely with clang-18. So, make sure bindgen is using this extra option if the major version of the libclang used by bindgen is < 16. In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST without triggering any build error. Link: https://github.com/llvm/llvm-project/issues/44842 Link: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0-rc2/clang/docs/ReleaseNotes.rst#deprecated-compiler-flags Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> [Changed to < 16, added link and reworded] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-19rust: fix regexp in scripts/is_rust_module.shAndrea Righi
nm can use "R" or "r" to show read-only data sections, but scripts/is_rust_module.sh can only recognize "r", so with some versions of binutils it can fail to detect if a module is a Rust module or not. Right now we're using this script only to determine if we need to skip BTF generation (that is disabled globally if CONFIG_RUST is enabled), but it's still nice to fix this script to do the proper job. Moreover, with this patch applied I can also relax the constraint of "RUST depends on !DEBUG_INFO_BTF" and build a kernel with Rust and BTF enabled at the same time (of course BTF generation is still skipped for Rust modules). [ Miguel: The actual reason is likely to be a change on the Rust compiler between 1.61.0 and 1.62.0: echo '#[used] static S: () = ();' | rustup run 1.61.0 rustc --emit=obj --crate-type=lib - && nm rust_out.o echo '#[used] static S: () = ();' | rustup run 1.62.0 rustc --emit=obj --crate-type=lib - && nm rust_out.o Gives: 0000000000000000 r _ZN8rust_out1S17h48027ce0da975467E 0000000000000000 R _ZN8rust_out1S17h58e1f3d9c0e97cefE See https://godbolt.org/z/KE6jneoo4. ] Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-19bpf: Fix incorrect verifier pruning due to missing register precision taintsDaniel Borkmann
Juan Jose et al reported an issue found via fuzzing where the verifier's pruning logic prematurely marks a program path as safe. Consider the following program: 0: (b7) r6 = 1024 1: (b7) r7 = 0 2: (b7) r8 = 0 3: (b7) r9 = -2147483648 4: (97) r6 %= 1025 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 7: (97) r6 %= 1 8: (b7) r9 = 0 9: (bd) if r6 <= r9 goto pc+1 10: (b7) r6 = 0 11: (b7) r0 = 0 12: (63) *(u32 *)(r10 -4) = r0 13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48) 15: (bf) r1 = r4 16: (bf) r2 = r10 17: (07) r2 += -4 18: (85) call bpf_map_lookup_elem#1 19: (55) if r0 != 0x0 goto pc+1 20: (95) exit 21: (77) r6 >>= 10 22: (27) r6 *= 8192 23: (bf) r1 = r0 24: (0f) r0 += r6 25: (79) r3 = *(u64 *)(r0 +0) 26: (7b) *(u64 *)(r1 +0) = r3 27: (95) exit The verifier treats this as safe, leading to oob read/write access due to an incorrect verifier conclusion: func#0 @0 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r6 = 1024 ; R6_w=1024 1: (b7) r7 = 0 ; R7_w=0 2: (b7) r8 = 0 ; R8_w=0 3: (b7) r9 = -2147483648 ; R9_w=-2147483648 4: (97) r6 %= 1025 ; R6_w=scalar() 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648 7: (97) r6 %= 1 ; R6_w=scalar() 8: (b7) r9 = 0 ; R9=0 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 10: (b7) r6 = 0 ; R6_w=0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 9 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=0 22: (27) r6 *= 8192 ; R6_w=0 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 19 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? last_idx 18 first_idx 9 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 regs=40 stack=0 before 10: (b7) r6 = 0 25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 27: (95) exit from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 frame 0: propagating r6 last_idx 19 first_idx 11 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0 last_idx 9 first_idx 9 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0 last_idx 8 first_idx 0 regs=40 stack=0 before 8: (b7) r9 = 0 regs=40 stack=0 before 7: (97) r6 %= 1 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=40 stack=0 before 5: (05) goto pc+0 regs=40 stack=0 before 4: (97) r6 %= 1025 regs=40 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 19: safe frame 0: propagating r6 last_idx 9 first_idx 0 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=40 stack=0 before 5: (05) goto pc+0 regs=40 stack=0 before 4: (97) r6 %= 1025 regs=40 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 from 6 to 9: safe verification time 110 usec stack depth 4 processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 The verifier considers this program as safe by mistakenly pruning unsafe code paths. In the above func#0, code lines 0-10 are of interest. In line 0-3 registers r6 to r9 are initialized with known scalar values. In line 4 the register r6 is reset to an unknown scalar given the verifier does not track modulo operations. Due to this, the verifier can also not determine precisely which branches in line 6 and 9 are taken, therefore it needs to explore them both. As can be seen, the verifier starts with exploring the false/fall-through paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic, r6 is correctly marked for precision tracking where backtracking kicks in where it walks back the current path all the way where r6 was set to 0 in the fall-through branch. Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also here, the state of the registers is the same, that is, r6=0 and r9=0, so that at line 19 the path can be pruned as it is considered safe. It is interesting to note that the conditional in line 9 turned r6 into a more precise state, that is, in the fall-through path at the beginning of line 10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed here) at the beginning of line 11, r6 turned into a known const r6=0 as r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must be 0 (**): [...] ; R6_w=scalar() 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 [...] from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 [...] The next path is 'from 6 to 9'. The verifier considers the old and current state equivalent, and therefore prunes the search incorrectly. Looking into the two states which are being compared by the pruning logic at line 9, the old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag correctly set in the old state, r9 did not. Both r6'es are considered as equivalent given the old one is a superset of the current, more precise one, however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9 did not have reg->precise flag set, the verifier does not consider the register as contributing to the precision state of r6, and therefore it considered both r9 states as equivalent. However, for this specific pruned path (which is also the actual path taken at runtime), register r6 will be 0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map. The purpose of precision tracking is to initially mark registers (including spilled ones) as imprecise to help verifier's pruning logic finding equivalent states it can then prune if they don't contribute to the program's safety aspects. For example, if registers are used for pointer arithmetic or to pass constant length to a helper, then the verifier sets reg->precise flag and backtracks the BPF program instruction sequence and chain of verifier states to ensure that the given register or stack slot including their dependencies are marked as precisely tracked scalar. This also includes any other registers and slots that contribute to a tracked state of given registers/stack slot. This backtracking relies on recorded jmp_history and is able to traverse entire chain of parent states. This process ends only when all the necessary registers/slots and their transitive dependencies are marked as precise. The backtrack_insn() is called from the current instruction up to the first instruction, and its purpose is to compute a bitmask of registers and stack slots that need precision tracking in the parent's verifier state. For example, if a current instruction is r6 = r7, then r6 needs precision after this instruction and r7 needs precision before this instruction, that is, in the parent state. Hence for the latter r7 is marked and r6 unmarked. For the class of jmp/jmp32 instructions, backtrack_insn() today only looks at call and exit instructions and for all other conditionals the masks remain as-is. However, in the given situation register r6 has a dependency on r9 (as described above in **), so also that one needs to be marked for precision tracking. In other words, if an imprecise register influences a precise one, then the imprecise register should also be marked precise. Meaning, in the parent state both dest and src register need to be tracked for precision and therefore the marking must be more conservative by setting reg->precise flag for both. The precision propagation needs to cover both for the conditional: if the src reg was marked but not the dst reg and vice versa. After the fix the program is correctly rejected: func#0 @0 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r6 = 1024 ; R6_w=1024 1: (b7) r7 = 0 ; R7_w=0 2: (b7) r8 = 0 ; R8_w=0 3: (b7) r9 = -2147483648 ; R9_w=-2147483648 4: (97) r6 %= 1025 ; R6_w=scalar() 5: (05) goto pc+0 6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648 7: (97) r6 %= 1 ; R6_w=scalar() 8: (b7) r9 = 0 ; R9=0 9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0 10: (b7) r6 = 0 ; R6_w=0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 9 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=0 22: (27) r6 *= 8192 ; R6_w=0 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 19 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm???? last_idx 18 first_idx 9 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 regs=40 stack=0 before 10: (b7) r6 = 0 25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar() 27: (95) exit from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 frame 0: propagating r6 last_idx 19 first_idx 11 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0 last_idx 9 first_idx 9 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0 last_idx 8 first_idx 0 regs=240 stack=0 before 8: (b7) r9 = 0 regs=40 stack=0 before 7: (97) r6 %= 1 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 19: safe from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0 9: (bd) if r6 <= r9 goto pc+1 last_idx 9 first_idx 0 regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 last_idx 9 first_idx 0 regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 11: R6=scalar(umax=18446744071562067968) R9=-2147483648 11: (b7) r0 = 0 ; R0_w=0 12: (63) *(u32 *)(r10 -4) = r0 last_idx 12 first_idx 11 regs=1 stack=0 before 11: (b7) r0 = 0 13: R0_w=0 R10=fp0 fp-8=0000???? 13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0) 16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 17: (07) r2 += -4 ; R2_w=fp-4 18: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0) 19: (55) if r0 != 0x0 goto pc+1 ; R0_w=0 20: (95) exit from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm???? 21: (77) r6 >>= 10 ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff)) 22: (27) r6 *= 8192 ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192) 23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0) 24: (0f) r0 += r6 last_idx 24 first_idx 21 regs=40 stack=0 before 23: (bf) r1 = r0 regs=40 stack=0 before 22: (27) r6 *= 8192 regs=40 stack=0 before 21: (77) r6 >>= 10 parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm???? last_idx 19 first_idx 11 regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1 regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1 regs=40 stack=0 before 17: (07) r2 += -4 regs=40 stack=0 before 16: (bf) r2 = r10 regs=40 stack=0 before 15: (bf) r1 = r4 regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00 regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0 regs=40 stack=0 before 11: (b7) r0 = 0 parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0 last_idx 9 first_idx 0 regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1 regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2 regs=240 stack=0 before 5: (05) goto pc+0 regs=240 stack=0 before 4: (97) r6 %= 1025 regs=240 stack=0 before 3: (b7) r9 = -2147483648 regs=40 stack=0 before 2: (b7) r8 = 0 regs=40 stack=0 before 1: (b7) r7 = 0 regs=40 stack=0 before 0: (b7) r6 = 1024 math between map_value pointer and register with unbounded min value is not allowed verification time 886 usec stack depth 4 processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2 Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Reported-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com> Reported-by: Meador Inge <meadori@google.com> Reported-by: Simon Scannell <simonscannell@google.com> Reported-by: Nenad Stojanovski <thenenadx@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com> Reviewed-by: Meador Inge <meadori@google.com> Reviewed-by: Simon Scannell <simonscannell@google.com>