summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2022-10-31rtnetlink: pass netlink message header and portid to rtnl_configure_link()Hangbin Liu
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31ptp: introduce helpers to adjust by scaled parts per millionJacob Keller
Many drivers implement the .adjfreq or .adjfine PTP op function with the same basic logic: 1. Determine a base frequency value 2. Multiply this by the abs() of the requested adjustment, then divide by the appropriate divisor (1 billion, or 65,536 billion). 3. Add or subtract this difference from the base frequency to calculate a new adjustment. A few drivers need the difference and direction rather than the combined new increment value. I recently converted the Intel drivers to .adjfine and the scaled parts per million (65.536 parts per billion) logic. To avoid overflow with minimal loss of precision, mul_u64_u64_div_u64 was used. The basic logic used by all of these drivers is very similar, and leads to a lot of duplicate code to perform the same task. Rather than keep this duplicate code, introduce diff_by_scaled_ppm and adjust_by_scaled_ppm. These helper functions calculate the difference or adjustment necessary based on the scaled parts per million input. The diff_by_scaled_ppm function returns true if the difference should be subtracted, and false otherwise. Update the Intel drivers to use the new helper functions. Other vendor drivers will be converted to .adjfine and this helper function in the following changes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31ptp: add missing documentation for parametersJacob Keller
The ptp_find_pin_unlocked function and the ptp_system_timestamp structure didn't document their parameters and fields. Fix this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-30net: remove unused netdev_unregistering()Juhee Kang
Currently, use dev->reg_state == NETREG_UNREGISTERING to check the status which is NETREG_UNREGISTERING, rather than using netdev_unregistering. Also, A helper function which is netdev_unregistering on nedevice.h is no longer used. Thus, netdev_unregistering removes from netdevice.h. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net: sfp: move field definitions along side register indexRussell King (Oracle)
Just as we do for the A2h enum, arrange the A0h enum to have the field definitions next to their corresponding register index. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: sfp: convert register indexes from hex to decimalRussell King (Oracle)
The register indexes in the standards are in decimal rather than hex, so lets specify them in decimal in the header file so we can easily cross-reference without converting between hex and decimal. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: phylink: add phylink_get_link_timer_ns() helperRussell King (Oracle)
Add a helper to convert the PHY interface mode to the required link timer setting as stated by the appropriate standard. Inappropriate interface modes return an error. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28Kalle Valo says:Jakub Kicinski
==================== pull-request: wireless-next-2022-10-28 First set of patches v6.2. mac80211 refactoring continues for Wi-Fi 7. All mac80211 driver are now converted to use internal TX queues, this might cause some regressions so we wanted to do this early in the cycle. Note: wireless tree was merged[1] to wireless-next to avoid some conflicts with mac80211 patches between the trees. Unfortunately there are still two smaller conflicts in net/mac80211/util.c which Stephen also reported[2]. In the first conflict initialise scratch_len to "params->scratch_len ?: 3 * params->len" (note number 3, not 2!) and in the second conflict take the version which uses elems->scratch_pos. [1] https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?id=dfd2d876b3fda1790bc0239ba4c6967e25d16e91 [2] https://lore.kernel.org/all/20221020032340.5cf101c0@canb.auug.org.au/ mac80211 - preparation for Wi-Fi 7 Multi-Link Operation (MLO) continues - add API to show the link STAs in debugfs - all mac80211 drivers are now using mac80211 internal TX queues (iTXQs) rtw89 - support 8852BE rtl8xxxu - support RTL8188FU brmfmac - support two station interfaces concurrently bcma - support SPROM rev 11 ==================== Link: https://lore.kernel.org/r/20221028132943.304ECC433B5@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28tcp: add u32 counter in tcp_sock and an SNMP counter for PLBMubashir Adnan Qureshi
A u32 counter is added to tcp_sock for counting the number of PLB triggered rehashes for a TCP connection. An SNMP counter is also added to count overall PLB triggered rehash events for a host. These counters are hooked up to PLB implementation for DCTCP. TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports the rehash attempts triggered due to PLB or timeouts. This gives a historical view of sustained congestion or timeouts experienced by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c 2871edb32f46 ("can: kvaser_usb: Fix possible completions during init_completion") abb8670938b2 ("can: kvaser_usb_leaf: Ignore stale bus-off after start") 8d21f5927ae6 ("can: kvaser_usb_leaf: Fix improved state not being reported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27Merge tag 'net-6.1-rc3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from 802.15.4 (Zigbee et al). Current release - regressions: - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1 Current release - new code bugs: - mptcp: fix abba deadlock on fastopen - eth: stmmac: rk3588: allow multiple gmac controllers in one system Previous releases - regressions: - ip: rework the fix for dflt addr selection for connected nexthop - net: couple more fixes for misinterpreting bits in struct page after the signature was added Previous releases - always broken: - ipv6: ensure sane device mtu in tunnels - openvswitch: switch from WARN to pr_warn on a user-triggerable path - ethtool: eeprom: fix null-deref on genl_info in dump - ieee802154: more return code fixes for corner cases in dgram_sendmsg - mac802154: fix link-quality-indicator recording - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload - eth: fec: limit register access on i.MX6UL - eth: bcm4908_enet: update TX stats after actual transmission - can: rcar_canfd: improve IRQ handling for RZ/G2L Misc: - genetlink: piggy back on the newly added resv_op_start to enforce more sanity checks on new commands" * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) net: enetc: survive memory pressure without crashing kcm: do not sense pfmemalloc status in kcm_sendpage() net: do not sense pfmemalloc status in skb_append_pagefrags() net/mlx5e: Fix macsec sci endianness at rx sa update net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function net/mlx5e: Fix macsec rx security association (SA) update/delete net/mlx5e: Fix macsec coverity issue at rx sa update net/mlx5: Fix crash during sync firmware reset net/mlx5: Update fw fatal reporter state on PCI handlers successful recover net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed net/mlx5e: TC, Reject forwarding from internal port to internal port net/mlx5: Fix possible use-after-free in async command interface net/mlx5: ASO, Create the ASO SQ with the correct timestamp format net/mlx5e: Update restore chain id for slow path packets net/mlx5e: Extend SKB room check to include PTP-SQ net/mlx5: DR, Fix matcher disconnect error flow net/mlx5: Wait for firmware to enable CRS before pci_restore_state net/mlx5e: Do not increment ESN when updating IPsec ESN state netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed ...
2022-10-27Merge tag 'hardening-v6.1-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Fix older Clang vs recent overflow KUnit test additions (Nick Desaulniers, Kees Cook) - Fix kern-doc visibility for overflow helpers (Kees Cook) * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: overflow: Refactor test skips for Clang-specific issues overflow: disable failing tests for older clang versions overflow: Fix kern-doc markup for functions
2022-10-27Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt fix from Eric Biggers: "Fix a memory leak that was introduced by a change that went into -rc1" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: fix keyring memory leak on mount failure
2022-10-27net/mlx5: Fix possible use-after-free in async command interfaceTariq Toukan
mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb </TASK> Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27ice: Add support Flex RXDMichal Jaron
Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid in struct virtchnl_rxq_info to support AVF Flex RXD extension. Add support to allow VF to query flexible descriptor RXDIDs supported by DDP package and configure Rx queues with selected RXDID for IAVF. Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be processed. Add necessary macros for registers. Signed-off-by: Leyi Rong <leyi.rong@intel.com> Signed-off-by: Xu Ting <ting.xu@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-26Merge tag 'spi-fix-v6.1-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A collection of mostly unremarkable fixes for SPI that have built up since the merge window, all driver specific. The change to the qup adding support for GPIO chip selects is fixing a regression due to the removal of legacy GPIO handling, the driver had previously been silently relying on the legacy GPIO support in a slightly broken way which worked well enough on some systems. Fixing it is simply a case of setting a couple of bits of information in the driver description" * tag 'spi-fix-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: aspeed: Fix window offset of CE1 spi: qup: support using GPIO as chip select line spi: intel: Fix the offset to get the 64K erase opcode spi: aspeed: Fix typo in mode_bits field for AST2600 platform spi: mpc52xx: Replace NO_IRQ by 0 spi: spi-mem: Fix typo (of -> or) spi: spi-gxp: fix typo in SPDX identifier line spi: tegra210-quad: Fix combined sequence
2022-10-26Merge tag 'ieee802154-for-net-next-2022-10-25' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== == One of the biggest cycles for ieee802154 in a long time. We are landing the first pieces of a big enhancements in managing PAN's. We might have another pull request ready for this cycle later on, but I want to get this one out first. Miquel Raynal added support for sending frames synchronously as a dependency to handle MLME commands. Also introducing more filtering levels to match with the needs of a device when scanning or operating as a pan coordinator. To support development and testing the hwsim driver for ieee802154 was also enhanced for the new filtering levels and to update the PIB attributes. Alexander Aring fixed quite a few bugs spotted during reviewing changes. He also added support for TRAC in the atusb driver to have better failure handling if the firmware provides the needed information. Jilin Yuan fixed a comment with a repeated word in it. ================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-25overflow: Fix kern-doc markup for functionsKees Cook
Fix the kern-doc markings for several of the overflow helpers and move their location into the core kernel API documentation, where it belongs (it's not driver-specific). Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: linux-hardening@vger.kernel.org Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2022-10-25net: dev: Convert sa_data to flexible array in struct sockaddrKees Cook
One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size. The special case by the compiler is specifically that sizeof(sa->sa_data) and FORTIFY_SOURCE (which uses __builtin_object_size(sa->sa_data, 1)) do not agree (14 and -1 respectively), which makes FORTIFY_SOURCE treat it as a flexible array. However, the coming -fstrict-flex-arrays compiler flag will remove these special cases so that FORTIFY_SOURCE can gain coverage over all the trailing arrays in the kernel that are _not_ supposed to be treated as a flexible array. To deal with this change, convert sa_data to a true flexible array. To keep the structure size the same, move sa_data into a union with a newly introduced sa_data_min with the original size. The result is that FORTIFY_SOURCE can continue to have no idea how large sa_data may actually be, but anything using sizeof(sa->sa_data) must switch to sizeof(sa->sa_data_min). Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: Dylan Yudaken <dylany@fb.com> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Petr Machata <petrm@nvidia.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: syzbot <syzkaller@googlegroups.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221018095503.never.671-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24net: sfp: provide a definition for the power level select bitRussell King (Oracle)
Provide a named definition for the power level select bit in the extended status register, rather than using BIT(0) in the code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/net.h a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag") e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24Merge tag 'net-6.1-rc3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. The net-memcg fix stands out, the rest is very run-off-the-mill. Maybe I'm biased. Current release - regressions: - eth: fman: re-expose location of the MAC address to userspace, apparently some udev scripts depended on the exact value Current release - new code bugs: - bpf: - wait for busy refill_work when destroying bpf memory allocator - allow bpf_user_ringbuf_drain() callbacks to return 1 - fix dispatcher patchable function entry to 5 bytes nop Previous releases - regressions: - net-memcg: avoid stalls when under memory pressure - tcp: fix indefinite deferral of RTO with SACK reneging - tipc: fix a null-ptr-deref in tipc_topsrv_accept - eth: macb: specify PHY PM management done by MAC - tcp: fix a signed-integer-overflow bug in tcp_add_backlog() Previous releases - always broken: - eth: amd-xgbe: SFP fixes and compatibility improvements Misc: - docs: netdev: offer performance feedback to contributors" * tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) net-memcg: avoid stalls when under memory pressure tcp: fix indefinite deferral of RTO with SACK reneging tcp: fix a signed-integer-overflow bug in tcp_add_backlog() net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed docs: netdev: offer performance feedback to contributors kcm: annotate data-races around kcm->rx_wait kcm: annotate data-races around kcm->rx_psock net: fman: Use physical address for userspace interfaces net/mlx5e: Cleanup MACsec uninitialization routine atlantic: fix deadlock at aq_nic_stop nfp: only clean `sp_indiff` when application firmware is unloaded amd-xgbe: add the bit rate quirk for Molex cables amd-xgbe: fix the SFP compliance codes check for DAC cables amd-xgbe: enable PLL_CTL for fixed PHY modes only amd-xgbe: use enums for mailbox cmd and sub_cmds amd-xgbe: Yellow carp devices do not need rrc bpf: Use __llist_del_all() whenever possbile during memory draining bpf: Wait for busy refill_work when destroying bpf memory allocator MAINTAINERS: add keyword match on PTP ...
2022-10-24Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2022-10-23 We've added 7 non-merge commits during the last 18 day(s) which contain a total of 8 files changed, 69 insertions(+), 5 deletions(-). The main changes are: 1) Wait for busy refill_work when destroying bpf memory allocator, from Hou. 2) Allow bpf_user_ringbuf_drain() callbacks to return 1, from David. 3) Fix dispatcher patchable function entry to 5 bytes nop, from Jiri. 4) Prevent decl_tag from being referenced in func_proto, from Stanislav. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use __llist_del_all() whenever possbile during memory draining bpf: Wait for busy refill_work when destroying bpf memory allocator bpf: Fix dispatcher patchable function entry to 5 bytes nop bpf: prevent decl_tag from being referenced in func_proto selftests/bpf: Add reproducer for decl_tag in func_proto return type selftests/bpf: Make bpf_user_ringbuf_drain() selftest callback return 1 bpf: Allow bpf_user_ringbuf_drain() callbacks to return 1 ==================== Link: https://lore.kernel.org/r/20221023192244.81137-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24net: skb: move skb_pp_recycle() to skbuff.cYunsheng Lin
skb_pp_recycle() is only used by skb_free_head() in skbuff.c, so move it to skbuff.c. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24udp: track the forward memory release threshold in an hot cachelinePaolo Abeni
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH. With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended. Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper. Overall the above give a 10% peek throughput increase under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24net: introduce and use custom sockopt socket flagPaolo Abeni
We will soon introduce custom setsockopt for UDP sockets, too. Instead of doing even more complex arbitrary checks inside sock_use_custom_sol_socket(), add a new socket flag and set it for the relevant socket types (currently only MPTCP). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "RISC-V: - Fix compilation without RISCV_ISA_ZICBOM - Fix kvm_riscv_vcpu_timer_pending() for Sstc ARM: - Fix a bug preventing restoring an ITS containing mappings for very large and very sparse device topology - Work around a relocation handling error when compiling the nVHE object with profile optimisation - Fix for stage-2 invalidation holding the VM MMU lock for too long by limiting the walk to the largest block mapping size - Enable stack protection and branch profiling for VHE - Two selftest fixes x86: - add compat implementation for KVM_X86_SET_MSR_FILTER ioctl selftests: - synchronize includes between include/uapi and tools/include/uapi" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: tools: include: sync include/api/linux/kvm.h KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter() kvm: Add support for arch compat vm ioctls RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc RISC-V: Fix compilation without RISCV_ISA_ZICBOM KVM: arm64: vgic: Fix exit condition in scan_its_table() KVM: arm64: nvhe: Fix build with profile optimization KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test KVM: arm64: selftests: Fix multiple versions of GIC creation KVM: arm64: Enable stack protection and branch profiling for VHE KVM: arm64: Limit stage2_apply_range() batch size to largest block KVM: arm64: Work out supported block level at compile time
2022-10-23kernel/utsname_sysctl.c: Fix hostname pollingLinus Torvalds
Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added a new entry to the uts_kern_table[] array, but didn't update the UTS_PROC_xyz enumerators of older entries, breaking anything that used them. Which is admittedly not many cases: it's really just the two uses of uts_proc_notify() in kernel/sys.c. But apparently journald-systemd actually uses this to detect hostname changes. Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Cc: Petr Vorel <pvorel@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-23Merge tag 'perf_urgent_for_v6.1_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Fix raw data handling when perf events are used in bpf - Rework how SIGTRAPs get delivered to events to address a bunch of problems with it. Add a selftest for that too * tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bpf: Fix sample_flags for bpf_perf_event_output selftests/perf_events: Add a SIGTRAP stress test with disables perf: Fix missing SIGTRAPs
2022-10-23Merge tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring follow-up from Jens Axboe: "Currently the zero-copy has automatic fallback to normal transmit, and it was decided that it'd be cleaner to return an error instead if the socket type doesn't support it. Zero-copy does work with UDP and TCP, it's more of a future proofing kind of thing (eg for samba)" * tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux: io_uring/net: fail zc sendmsg when unsupported by socket io_uring/net: fail zc send when unsupported by socket net: flag sockets supporting msghdr originated zerocopy
2022-10-22net: flag sockets supporting msghdr originated zerocopyPavel Begunkov
We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22kvm: Add support for arch compat vm ioctlsAlexander Graf
We will introduce the first architecture specific compat vm ioctl in the next patch. Add all necessary boilerplate to allow architectures to override compat vm ioctls when necessary. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20221017184541.2658-2-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-21Merge tag 'efi-fixes-for-v6.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - fixes for the EFI variable store refactor that landed in v6.0 - fixes for issues that were introduced during the merge window - back out some changes related to EFI zboot signing - we'll add a better solution for this during the next cycle * tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 efi: libstub: Fix incorrect payload size in zboot header efi: libstub: Give efi_main() asmlinkage qualification efi: efivars: Fix variable writes without query_variable_store() efi: ssdt: Don't free memory if ACPI table was loaded successfully efi: libstub: Remove zboot signing from build options
2022-10-21Merge tag 'iommu-fixes-v6.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Intel VT-d fixes: - Fix a lockdep splat issue in intel_iommu_init() - Allow NVS regions to pass RMRR check - Domain cleanup in error path" * tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Clean up si_domain in the init_dmars() error path iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() iommu/vt-d: Use rcu_lock in get_resv_regions iommu: Add gfp parameter to iommu_alloc_resv_region
2022-10-21efi: efivars: Fix variable writes without query_variable_store()Ard Biesheuvel
Commit bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer") refactored the efivars layer so that the 'business logic' related to which UEFI variables affect the boot flow in which way could be moved out of it, and into the efivarfs driver. This inadvertently broke setting variables on firmware implementations that lack the QueryVariableInfo() boot service, because we no longer tolerate a EFI_UNSUPPORTED result from check_var_size() when calling efivar_entry_set_get_size(), which now ends up calling check_var_size() a second time inadvertently. If QueryVariableInfo() is missing, we support writes of up to 64k - let's move that logic into check_var_size(), and drop the redundant call. Cc: <stable@vger.kernel.org> # v6.0 Fixes: bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer") Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-21iommu: Add gfp parameter to iommu_alloc_resv_regionLu Baolu
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify the memory allocation behavior. Thus iommu_alloc_resv_region() could also be available in critical contexts. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-20bpf: Fix dispatcher patchable function entry to 5 bytes nopJiri Olsa
The patchable_function_entry(5) might output 5 single nop instructions (depends on toolchain), which will clash with bpf_arch_text_poke check for 5 bytes nop instruction. Adding early init call for dispatcher that checks and change the patchable entry into expected 5 nop instruction if needed. There's no need to take text_mutex, because we are using it in early init call which is called at pre-smp time. Fixes: ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018075934.574415-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20Merge tag 'net-6.1-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}" - revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" - dsa: uninitialized variable in dsa_slave_netdevice_event() - eth: sunhme: uninitialized variable in happy_meal_init() Current release - new code bugs: - eth: octeontx2: fix resource not freed after malloc Previous releases - regressions: - sched: fix return value of qdisc ingress handling on success - sched: fix race condition in qdisc_graft() - udp: update reuse->has_conns under reuseport_lock. - tls: strp: make sure the TCP skbs do not have overlapping data - hsr: avoid possible NULL deref in skb_clone() - tipc: fix an information leak in tipc_topsrv_kern_subscr - phylink: add mac_managed_pm in phylink_config structure - eth: i40e: fix DMA mappings leak - eth: hyperv: fix a RX-path warning - eth: mtk: fix memory leaks Previous releases - always broken: - sched: cake: fix null pointer access issue when cake_init() fails" * tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) net: phy: dp83822: disable MDI crossover status change interrupt net: sched: fix race condition in qdisc_graft() net: hns: fix possible memory leak in hnae_ae_register() wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new() sfc: include vport_id in filter spec hash and equal() genetlink: fix kdoc warnings selftests: add selftest for chaining of tc ingress handling to egress net: Fix return value of qdisc ingress handling on success net: sched: sfb: fix null pointer access issue when sfb_init() fails Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" net: sched: cake: fix null pointer access issue when cake_init() fails ethernet: marvell: octeontx2 Fix resource not freed after malloc netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. ionic: catch NULL pointer issue on reconfig net: hsr: avoid possible NULL deref in skb_clone() bnxt_en: fix memory leak in bnxt_nvm_test() ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed udp: Update reuse->has_conns under reuseport_lock. net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable() ...
2022-10-19fscrypt: fix keyring memory leak on mount failureEric Biggers
Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") moved the keyring destruction from __put_super() to generic_shutdown_super() so that the filesystem's block device(s) are still available. Unfortunately, this causes a memory leak in the case where a mount is attempted with the test_dummy_encryption mount option, but the mount fails after the option has already been processed. To fix this, attempt the keyring destruction in both places. Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org
2022-10-19netlink: add support for formatted extack messagesEdward Cree
Include an 80-byte buffer in struct netlink_ext_ack that can be used for scnprintf()ed messages. This does mean that the resulting string can't be enumerated, translated etc. in the way NL_SET_ERR_MSG() was designed to allow. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-19net: phylink: provide phylink_validate_mask_caps() helperRussell King (Oracle)
Provide a helper that restricts the link modes according to the phylink capabilities. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [rebased on net-next/master and added documentation] Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-18net: remove smc911x driverArnd Bergmann
This driver was used on Arm and SH machines until 2009, when the last platforms moved to the smsc911x driver for the same hardware. Time to retire this version. Link: https://lore.kernel.org/netdev/1232010482-3744-1-git-send-email-steve.glendinning@smsc.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20221017121900.3520108-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18Merge tag 'for-netdev' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-18 We've added 33 non-merge commits during the last 14 day(s) which contain a total of 31 files changed, 874 insertions(+), 538 deletions(-). The main changes are: 1) Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs, from Hou Tao & Paul E. McKenney. 2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values. In the wild we have seen OS vendors doing buggy backports where helper call numbers mismatched. This is an attempt to make backports more foolproof, from Andrii Nakryiko. 3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions, from Roberto Sassu. 4) Fix libbpf's BTF dumper for structs with padding-only fields, from Eduard Zingerman. 5) Fix various libbpf bugs which have been found from fuzzing with malformed BPF object files, from Shung-Hsi Yu. 6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT, from Jie Meng. 7) Fix various ASAN bugs in both libbpf and selftests when running the BPF selftest suite on arm64, from Xu Kuohai. 8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest and use in-skeleton link pointer to remove an explicit bpf_link__destroy(), from Jiri Olsa. 9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying on symlinked iptables which got upgraded to iptables-nft, from Martin KaFai Lau. 10) Minor BPF selftest improvements all over the place, from various others. * tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits) bpf/docs: Update README for most recent vmtest.sh bpf: Use rcu_trace_implies_rcu_gp() for program array freeing bpf: Use rcu_trace_implies_rcu_gp() in local storage map bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator rcu-tasks: Provide rcu_trace_implies_rcu_gp() selftests/bpf: Use sys_pidfd_open() helper when possible libbpf: Fix null-pointer dereference in find_prog_by_sec_insn() libbpf: Deal with section with no data gracefully libbpf: Use elf_getshdrnum() instead of e_shnum selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow selftest/bpf: Fix memory leak in kprobe_multi_test selftests/bpf: Fix memory leak caused by not destroying skeleton libbpf: Fix memory leak in parse_usdt_arg() libbpf: Fix use-after-free in btf_dump_name_dups selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test selftests/bpf: Alphabetize DENYLISTs selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id() libbpf: Introduce bpf_link_get_fd_by_id_opts() libbpf: Introduce bpf_btf_get_fd_by_id_opts() ... ==================== Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18rcu-tasks: Provide rcu_trace_implies_rcu_gp()Paul E. McKenney
As an accident of implementation, an RCU Tasks Trace grace period also acts as an RCU grace period. However, this could change at any time. This commit therefore creates an rcu_trace_implies_rcu_gp() that currently returns true to codify this accident. Code relying on this accident must call this function to verify that this accident is still happening. Reported-by: Hou Tao <houtao@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-17Merge tag 'cgroup-for-6.1-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix a recent regression where a sleeping kernfs function is called with css_set_lock (spinlock) held - Revert the commit to enable cgroup1 support for cgroup_get_from_fd/file() Multiple users assume that the lookup only works for cgroup2 and breaks when fed a cgroup1 file. Instead, introduce a separate set of functions to lookup both v1 and v2 and use them where the user explicitly wants to support both versions. - Compat update for tools/perf/util/bpf_skel/bperf_cgroup.bpf.c. - Add Josef Bacik as a blkcg maintainer. * tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: Update MAINTAINERS entry mm: cgroup: fix comments for get from fd/file helpers perf stat: Support old kernels for bperf cgroup counting bpf: cgroup_iter: support cgroup1 using cgroup fd cgroup: add cgroup_v1v2_get_from_[fd/file]() Revert "cgroup: enable cgroup_get_from_file() on cgroup1" cgroup: Reorganize css_set_lock and kernfs path processing
2022-10-17perf: Fix missing SIGTRAPsPeter Zijlstra
Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Marco Elver <elver@google.com> Debugged-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com>
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-16Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: "This is the final part of the clk patches for this merge window. The clk rate range series needed another week to fully bake. Maxime fixed the bug that broke clk notifiers and prevented this from being included in the first pull request. He also added a unit test on top to make sure it doesn't break so easily again. The majority of the series fixes up how the clk_set_rate_*() APIs work, particularly around when the rate constraints are dropped and how they move around when reparenting clks. Overall it's a much needed improvement to the clk rate range APIs that used to be pretty broken if you looked sideways. Beyond the core changes there are a few driver fixes for a compilation issue or improper data causing clks to fail to register or have the wrong parents. These are good to get in before the first -rc so that the system actually boots on the affected devices" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits) clk: tegra: Fix Tegra PWM parent clock clk: at91: fix the build with binutils 2.27 clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks clk: mediatek: clk-mux: Add .determine_rate() callback clk: tests: Add tests for notifiers clk: Update req_rate on __clk_recalc_rates() clk: tests: Add missing test case for ranges clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d clk: Introduce the clk_hw_get_rate_range function clk: Zero the clk_rate_request structure clk: Stop forwarding clk_rate_requests to the parent clk: Constify clk_has_parent() clk: Introduce clk_core_has_parent() clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock clk: Add our request boundaries in clk_core_init_rate_req clk: Introduce clk_hw_init_rate_request() clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller clk: Change clk_core_init_rate_req prototype clk: Set req_rate on reparenting clk: Take into account uncached clocks in clk_set_rate_range() ...
2022-10-16Revert "cpumask: fix checking valid cpu range".Tetsuo Handa
This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range"). syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE() when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition. Although syzbot found this problem in linux-next.git on 2022/09/27 [2], this problem was not fixed immediately. As a result, that patch was sent to linux.git before the patch author recognizes this problem, and syzbot started failing to test changes in linux.git since 2022/10/10 [3]. Andrew Jones proposed a fix for x86 and riscv architectures [4]. But [2] and [5] indicate that affected locations are not limited to arch code. More delay before we find and fix affected locations, less tested kernel (and more difficult to bisect and fix) before release. We should have inspected and fixed basically all cpumask users before applying that patch. We should not crash kernels in order to ask existing cpumask users to update their code, even if limited to CONFIG_DEBUG_PER_CPU_MAPS=y case. Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1] Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2] Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3] Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4] Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5] Reported-by: Andrew Jones <ajones@ventanamicro.com> Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Yury Norov <yury.norov@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>