linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-03-06	wifi: rtw88: mac: Return the original error from rtw_mac_power_switch()	Martin Blumenstingl
	rtw_mac_power_switch() calls rtw_pwr_seq_parser() which can return -EINVAL, -EBUSY or 0. Propagate the original error code instead of unconditionally returning -EINVAL in case of an error. Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230226221004.138331-3-martin.blumenstingl@googlemail.com
2023-03-06	wifi: rtw88: mac: Return the original error from rtw_pwr_seq_parser()	Martin Blumenstingl
	rtw_pwr_seq_parser() calls rtw_sub_pwr_seq_parser() which can either return -EBUSY, -EINVAL or 0. Propagate the original error code instead of unconditionally returning -EBUSY in case of an error. Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230226221004.138331-2-martin.blumenstingl@googlemail.com
2023-03-06	wifi: brcmfmac: pcie: Add 4359C0 firmware definition	Konrad Dybcio
	Some phones from around 2016, as well as other random devices have this chip called 43956 or 4359C0 or 43596A0, which is more or less just a rev bump (v9) of the already-supported 4359. Add a corresponding firmware definition to allow for choosing the correct blob. Suggested-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230224-topic-brcm_tone-v1-1-333b0ac67934@linaro.org
2023-03-06	wifi: rtw89: fix SER L1 might stop entering LPS issue	Chih-Kang Chang
	When SER L1 triggered, driver need to stop Rx and clear RTW89_FLAG_RUNNING flag. If track_work check RTW89_FLAG_RUNNING simultaneously, it will check failed and never queue track_work again, and LPS won't enter either. Therefore, we cancel delayed work when enter SER L1, and queue delayed work when leave SER L1. Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230224082117.21241-1-pkshih@realtek.com
2023-03-06	bnxt_en: Fix the double free during device removal	Selvin Xavier
	Following warning reported by KASAN during driver unload ================================================================== BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en] Free of addr ffff88814e8dd4c0 by task rmmod/17469 CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S 6.2.0-rc7+ #2 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019 Call Trace: <TASK> dump_stack_lvl+0x33/0x46 print_report+0x17b/0x4b3 ? __call_rcu_common.constprop.79+0x27e/0x8c0 ? __pfx_free_object_rcu+0x10/0x10 ? __virt_addr_valid+0xe3/0x160 ? bnxt_remove_one+0x103/0x200 [bnxt_en] kasan_report_invalid_free+0x64/0xd0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kasan_slab_free+0x179/0x1c0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kmem_cache_free+0x194/0x350 bnxt_remove_one+0x103/0x200 [bnxt_en] pci_device_remove+0x62/0x110 device_release_driver_internal+0xf6/0x1c0 driver_detach+0x76/0xe0 bus_remove_driver+0x89/0x160 pci_unregister_driver+0x26/0x110 ? strncpy_from_user+0x188/0x1c0 bnxt_exit+0xc/0x24 [bnxt_en] __x64_sys_delete_module+0x21f/0x390 ? __pfx___x64_sys_delete_module+0x10/0x10 ? __pfx_mem_cgroup_handle_over_high+0x10/0x10 ? _raw_spin_lock+0x87/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __audit_syscall_entry+0x185/0x210 ? ktime_get_coarse_real_ts64+0x51/0x80 ? syscall_trace_enter.isra.18+0x126/0x1a0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7effcb6fd71b Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8 RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000 R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0 R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750 </TASK> Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid calling kfree from bnxt_remove_one. Also, set bp->edev to NULL before freeing the auxilary private structure. Fixes: d80d88b0dfff ("bnxt_en: Add auxiliary driver support") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-06	bnxt_en: Avoid order-5 memory allocation for TPA data	Michael Chan
	The driver needs to keep track of all the possible concurrent TPA (GRO/LRO) completions on the aggregation ring. On P5 chips, the maximum number of concurrent TPA is 256 and the amount of memory we allocate is order-5 on systems using 4K pages. Memory allocation failure has been reported: NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL\|__GFP_COMP\|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1 Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022 Call Trace: dump_stack+0x57/0x6e warn_alloc.cold.120+0x7b/0xdd ? _cond_resched+0x15/0x30 ? __alloc_pages_direct_compact+0x15f/0x170 __alloc_pages_slowpath.constprop.108+0xc58/0xc70 __alloc_pages_nodemask+0x2d0/0x300 kmalloc_order+0x24/0xe0 kmalloc_order_trace+0x19/0x80 bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en] ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en] __bnxt_open_nic+0x12e/0x780 [bnxt_en] bnxt_open+0x10b/0x240 [bnxt_en] __dev_open+0xe9/0x180 __dev_change_flags+0x1af/0x220 dev_change_flags+0x21/0x60 do_setlink+0x35c/0x1100 Instead of allocating this big chunk of memory and dividing it up for the concurrent TPA instances, allocate each small chunk separately for each TPA instance. This will reduce it to order-0 allocations. Fixes: 79632e9ba386 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-06	net: phylib: get rid of unnecessary locking	Russell King (Oracle)
	The locking in phy_probe() and phy_remove() does very little to prevent any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA warnings. Remove it. ====================================================== WARNING: possible circular locking dependency detected 6.2.0-dirty #1108 Tainted: G W E ------------------------------------------------------ ip/415 is trying to acquire lock: ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy] but task is already holding lock: ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 rtnl_lock+0x24/0x30 sfp_bus_add_upstream+0x34/0x150 phy_sfp_probe+0x4c/0x94 [libphy] mv3310_probe+0x148/0x184 [marvell10g] phy_probe+0x8c/0x200 [libphy] call_driver_probe+0xbc/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __device_attach_driver+0xc4/0x160 bus_for_each_drv+0x80/0xe0 __device_attach+0xb0/0x1f0 device_initial_probe+0x1c/0x2c bus_probe_device+0xa4/0xb0 device_add+0x360/0x53c phy_device_register+0x60/0xa4 [libphy] fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio] fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio] of_mdiobus_register+0x140/0x340 [of_mdio] orion_mdio_probe+0x298/0x3c0 [mvmdio] platform_probe+0x70/0xe0 call_driver_probe+0x34/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __driver_attach+0x104/0x210 bus_for_each_dev+0x78/0xdc driver_attach+0x2c/0x3c bus_add_driver+0x184/0x240 driver_register+0x80/0x13c __platform_driver_register+0x30/0x3c xt_compat_calc_jump+0x28/0xa4 [x_tables] do_one_initcall+0x50/0x1b0 do_init_module+0x50/0x1fc load_module+0x684/0x744 __do_sys_finit_module+0xc4/0x140 __arm64_sys_finit_module+0x28/0x34 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 -> #0 (&dev->lock){+.+.}-{3:3}: check_prev_add+0xb4/0xc80 validate_chain+0x414/0x47c __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 phy_attach_direct+0x17c/0x3a0 [libphy] phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink] phylink_fwnode_phy_connect+0x48/0x60 [phylink] mvpp2_open+0xec/0x2e0 [mvpp2] __dev_open+0x104/0x214 __dev_change_flags+0x1d4/0x254 dev_change_flags+0x2c/0x7c do_setlink+0x254/0xa50 __rtnl_newlink+0x430/0x514 rtnl_newlink+0x58/0x8c rtnetlink_rcv_msg+0x17c/0x560 netlink_rcv_skb+0x64/0x150 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x1d4/0x2b4 netlink_sendmsg+0x1a4/0x400 ____sys_sendmsg+0x228/0x290 ___sys_sendmsg+0x88/0xec __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&dev->lock); lock(rtnl_mutex); lock(&dev->lock); * DEADLOCK * Fixes: 298e54fa810e ("net: phy: add core phylib sfp support") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-06	drm/meson: fix 1px pink line on GXM when scaling video overlay	Christian Hewitt
	Playing media with a resolution smaller than the crtc size requires the video overlay to be scaled for output and GXM boards display a 1px pink line on the bottom of the scaled overlay. Comparing with the downstream vendor driver revealed VPP_DUMMY_DATA not being set [0]. Setting VPP_DUMMY_DATA prevents the 1px pink line from being seen. [0] https://github.com/endlessm/linux-s905x/blob/master/drivers/amlogic/amports/video.c#L7869 Fixes: bbbe775ec5b5 ("drm: Add support for Amlogic Meson Graphic Controller") Suggested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230303123312.155164-1-christianshewitt@gmail.com
2023-03-06	net: stmmac: add to set device wake up flag when stmmac init phy	Rongguang Wei
	When MAC is not support PMT, driver will check PHY's WoL capability and set device wakeup capability in stmmac_init_phy(). We can enable the WoL through ethtool, the driver would enable the device wake up flag. Now the device_may_wakeup() return true. But if there is a way which enable the PHY's WoL capability derectly, like in BIOS. The driver would not know the enable thing and would not set the device wake up flag. The phy_suspend may failed like this: [ 32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16 [ 32.409065] PM: Device stmmac-1:00 failed to suspend: error -16 [ 32.409067] PM: Some devices failed to suspend, or early wake event detected Add to set the device wakeup enable flag according to the get_wol function result in PHY can fix the error in this scene. v2: add a Fixes tag. Fixes: 1d8e5b0f3f2c ("net: stmmac: Support WOL with phy") Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-05	cifs: Move the in_send statistic to __smb_send_rqst()	Zhang Xiaoxu
	When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the in_send statistic was lost. Let's move the in_send statistic to the send function to avoid this scenario. Fixes: 7ee1af765dfa ("[CIFS]") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-05	ASoC: codecs: tx-macro: Fix for KASAN: slab-out-of-bounds	Ravulapati Vishnu Vardhan Rao
	When we run syzkaller we get below Out of Bound. "KASAN: slab-out-of-bounds Read in regcache_flat_read" Below is the backtrace of the issue: dump_backtrace+0x0/0x4c8 show_stack+0x34/0x44 dump_stack_lvl+0xd8/0x118 print_address_description+0x30/0x2d8 kasan_report+0x158/0x198 __asan_report_load4_noabort+0x44/0x50 regcache_flat_read+0x10c/0x110 regcache_read+0xf4/0x180 _regmap_read+0xc4/0x278 _regmap_update_bits+0x130/0x290 regmap_update_bits_base+0xc0/0x15c snd_soc_component_update_bits+0xa8/0x22c snd_soc_component_write_field+0x68/0xd4 tx_macro_digital_mute+0xec/0x140 Actually There is no need to have decimator with 32 bits. By limiting the variable with short type u8 issue is resolved. Signed-off-by: Ravulapati Vishnu Vardhan Rao <quic_visr@quicinc.com> Link: https://lore.kernel.org/r/20230304080702.609-1-quic_visr@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-03-05	ASoC: qcom: q6prm: fix incorrect clk_root passed to ADSP	Krzysztof Kozlowski
	The second to last argument is clk_root (root of the clock), however the code called q6prm_request_lpass_clock() with clk_attr instead (copy-paste error). This effectively was passing value of 1 as root clock which worked on some of the SoCs (e.g. SM8450) but fails on others, depending on the ADSP. For example on SM8550 this "1" as root clock is not accepted and results in errors coming from ADSP. Fixes: 2f20640491ed ("ASoC: qdsp6: qdsp6: q6prm: handle clk disable correctly") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20230302122908.221398-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-03-05	ASoC: clarify that SND_SOC_IMX_SGTL5000 is the old driver	Luca Ceresoli
	Both SND_SOC_IMX_SGTL5000 and SND_SOC_FSL_ASOC_CARD implement the fsl,imx-audio-sgtl5000 compatible string, which is confusing. It took a little research to find out that the latter is much newer and it is supposed to be the preferred choice since several years. Add a clarification note to avoid wasting time for future readers. Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/20230303093410.357621-1-luca.ceresoli@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-03-05	xfs: fix off-by-one-block in xfs_discard_folio()	Dave Chinner
	The recent writeback corruption fixes changed the code in xfs_discard_folio() to calculate a byte range to for punching delalloc extents. A mistake was made in using round_up(pos) for the end offset, because when pos points at the first byte of a block, it does not get rounded up to point to the end byte of the block. hence the punch range is short, and this leads to unexpected behaviour in certain cases in xfs_bmap_punch_delalloc_range. e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so there is no previous extent and it rounds up the punch to the end of the delalloc extent it found at offset 0, not the end of the range given to xfs_bmap_punch_delalloc_range(). Fix this by handling the zero block offset case correctly. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030 Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/ Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Found-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05	xfs: quotacheck failure can race with background inode inactivation	Dave Chinner
	The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: <TASK> xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 </TASK> .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05	Linux 6.3-rc1v6.3-rc1	Linus Torvalds

2023-03-05	cpumask: re-introduce constant-sized cpumask optimizations	Linus Torvalds
	Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05	Merge tag 'v6.3-p2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in the caam driver" * tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: caam - Fix edesc/iv ordering mixup
2023-03-05	Merge tag 'x86-urgent-2023-03-05' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Thomas Gleixner: "A small set of updates for x86: - Return -EIO instead of success when the certificate buffer for SEV guests is not large enough - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on return to userspace for performance reasons, but the leaves user space vulnerable to cross-thread attacks which STIBP prevents. Update the documentation accordingly" * tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt/sev-guest: Return -EIO if certificate buffer is not large enough Documentation/hw-vuln: Document the interaction between IBRS and STIBP x86/speculation: Allow enabling STIBP with legacy IBRS
2023-03-05	Merge tag 'irq-urgent-2023-03-05' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "A set of updates for the interrupt susbsystem: - Prevent possible NULL pointer derefences in irq_data_get_affinity_mask() and irq_domain_create_hierarchy() - Take the per device MSI lock before invoking code which relies on it being hold - Make sure that MSI descriptors are unreferenced before freeing them. This was overlooked when the platform MSI code was converted to use core infrastructure and results in a fals positive warning - Remove dead code in the MSI subsystem - Clarify the documentation for pci_msix_free_irq() - More kobj_type constification" * tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced genirq/msi: Drop dead domain name assignment irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy() genirq/irqdesc: Make kobj_type structures constant PCI/MSI: Clarify usage of pci_msix_free_irq() genirq/msi: Take the per-device MSI lock before validating the control structure genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
2023-03-05	Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull vfs update from Al Viro: "Adding Christian Brauner as VFS co-maintainer" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Adding VFS co-maintainer
2023-03-05	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull VM_FAULT_RETRY fixes from Al Viro: "Some of the page fault handlers do not deal with the following case correctly: - handle_mm_fault() has returned VM_FAULT_RETRY - there is a pending fatal signal - fault had happened in kernel mode Correct action in such case is not "return unconditionally" - fatal signals are handled only upon return to userland and something like copy_to_user() would end up retrying the faulting instruction and triggering the same fault again and again. What we need to do in such case is to make the caller to treat that as failed uaccess attempt - handle exception if there is an exception handler for faulting instruction or oops if there isn't one. Over the years some architectures had been fixed and now are handling that case properly; some still do not. This series should fix the remaining ones. Status: - m68k, riscv, hexagon, parisc: tested/acked by maintainers. - alpha, sparc32, sparc64: tested locally - bug has been reproduced on the unpatched kernel and verified to be fixed by this series. - ia64, microblaze, nios2, openrisc: build, but otherwise completely untested" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: openrisc: fix livelock in uaccess nios2: fix livelock in uaccess microblaze: fix livelock in uaccess ia64: fix livelock in uaccess sparc: fix livelock in uaccess alpha: fix livelock in uaccess parisc: fix livelock in uaccess hexagon: fix livelock in uaccess riscv: fix livelock in uaccess m68k: fix livelock in uaccess
2023-03-05	Remove Intel compiler support	Masahiro Yamada
	include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05	Adding VFS co-maintainer	Al Viro
	Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-04	Merge tag 'i2c-for-6.3-rc1-part2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: "Some improvements/fixes for the newly added GXP driver and a Kconfig dependency fix" * tag 'i2c-for-6.3-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: gxp: fix an error code in probe i2c: gxp: return proper error on address NACK i2c: gxp: remove "empty" switch statement i2c: Disable I2C_APPLE when I2C_PASEMI is a builtin
2023-03-04	mm: avoid gcc complaint about pointer casting	Linus Torvalds
	The migration code ends up temporarily stashing information of the wrong type in unused fields of the newly allocated destination folio. That all works fine, but gcc does complain about the pointer type mis-use: mm/migrate.c: In function ‘__migrate_folio_extract’: mm/migrate.c:1050:20: note: randstruct: casting between randomized structure pointer types (ssa): ‘struct anon_vma’ and ‘struct address_space’ 1050 \| anon_vmap = (void )dst->mapping; \| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ and gcc is actually right to complain since it really doesn't understand that this is a very temporary special case where this is ok. This could be fixed in different ways by just obfuscating the assignment sufficiently that gcc doesn't see what is going on, but the truly "proper C" way to do this is by explicitly using a union. Using unions for type conversions like this is normally hugely ugly and syntactically nasty, but this really is one of the few cases where we want to make it clear that we're not doing type conversion, we're really re-using the value bit-for-bit just using another type. IOW, this should not become a common pattern, but in this one case using that odd union is probably the best way to document to the compiler what is conceptually going on here. [ Side note: there are valid cases where we convert pointers to other pointer types, notably the whole "folio vs page" situation, where the types actually have fundamental commonalities. The fact that the gcc note is limited to just randomized structures means that we don't see equivalent warnings for those cases, but it migth also mean that we miss other cases where we do play these kinds of dodgy games, and this kind of explicit conversion might be a good idea. ] I verified that at least for an allmodconfig build on x86-64, this generates the exact same code, apart from line numbers and assembler comment changes. Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()") Cc: Huang, Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-04	Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. Eight are for MM and seven are for other parts of the kernel. Seven are cc:stable and eight address post-6.3 issues or were judged unsuitable for -stable backporting" * tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: map Dikshita Agarwal's old address to his current one mailmap: map Vikash Garodia's old address to his current one fs/cramfs/inode.c: initialize file_ra_state fs: hfsplus: fix UAF issue in hfsplus_put_super panic: fix the panic_print NMI backtrace setting lib: parser: update documentation for match_NUMBER functions kasan, x86: don't rename memintrinsics in uninstrumented files kasan: test: fix test for new meminstrinsic instrumentation kasan: treat meminstrinsic as builtins in uninstrumented files kasan: emit different calls for instrumentable memintrinsics ocfs2: fix non-auto defrag path not working issue ocfs2: fix defrag path triggering jbd2 ASSERT mailmap: map Georgi Djakov's old Linaro address to his current one mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH mm/damon/paddr: fix missing folio_put() mm/mremap: fix dup_anon_vma() in vma_merge() case 4
2023-03-04	Merge tag 'powerpc-6.3-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Drop orphaned VAS MAINTAINERS entry - Fix build errors with clang and KCSAN - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together with recordmcount Thanks to Nathan Chancellor. * tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Avoid dead code/data elimination when using recordmcount powerpc/vmlinux.lds: Add .text.asan/tsan sections powerpc: Drop orphaned VAS MAINTAINERS entry
2023-03-04	bpf: add support for fixed-size memory pointer returns for kfuncs	Andrii Nakryiko
	Support direct fixed-size (and for now, read-only) memory access when kfunc's return type is a pointer to non-struct type. Calculate type size and let BPF program access that many bytes directly. This is crucial for numbers iterator. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-13-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: generalize dynptr_get_spi to be usable for iters	Andrii Nakryiko
	Generalize the logic of fetching special stack slot object state using spi (stack slot index). This will be used by STACK_ITER logic next. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-12-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: mark PTR_TO_MEM as non-null register type	Andrii Nakryiko
	PTR_TO_MEM register without PTR_MAYBE_NULL is indeed non-null. This is important for BPF verifier to be able to prune guaranteed not to be taken branches. This is always the case with open-coded iterators. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: move kfunc_call_arg_meta higher in the file	Andrii Nakryiko
	Move struct bpf_kfunc_call_arg_meta higher in the file and put it next to struct bpf_call_arg_meta, so it can be used from more functions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: ensure that r0 is marked scratched after any function call	Andrii Nakryiko
	r0 is important (unless called function is void-returning, but that's taken care of by print_verifier_state() anyways) in verifier logs. Currently for helpers we seem to print it in verifier log, but for kfuncs we don't. Instead of figuring out where in the maze of code we accidentally set r0 as scratched for helpers and why we don't do that for kfuncs, just enforce that after any function call r0 is marked as scratched. Also, perhaps, we should reconsider "scratched" terminology, as it's mightily confusing. "Touched" would seem more appropriate. But I left that for follow ups for now. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper	Andrii Nakryiko
	It's not correct to assume that any BPF_CALL instruction is a helper call. Fix visit_insn()'s detection of bpf_timer_set_callback() helper by also checking insn->code == 0. For kfuncs insn->code would be set to BPF_PSEUDO_KFUNC_CALL, and for subprog calls it will be BPF_PSEUDO_CALL. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: clean up visit_insn()'s instruction processing	Andrii Nakryiko
	Instead of referencing processed instruction repeatedly as insns[t] throughout entire visit_insn() function, take a local insn pointer and work with it in a cleaner way. It makes enhancing this function further a bit easier as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	selftests/bpf: adjust log_fixup's buffer size for proper truncation	Andrii Nakryiko
	Adjust log_fixup's expected buffer length to fix the test. It's pretty finicky in its length expectation, but it doesn't break often. So just adjust the length to work on current kernel and with follow up iterator changes as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: honor env->test_state_freq flag in is_state_visited()	Andrii Nakryiko
	env->test_state_freq flag can be set by user by passing BPF_F_TEST_STATE_FREQ program flag. This is used in a bunch of selftests to have predictable state checkpoints at every jump and so on. Currently, bounded loop handling heuristic ignores this flag if number of processed jumps and/or number of processed instructions is below some thresholds, which throws off that reliable state checkpointing. Honor this flag in all circumstances by disabling heuristic if env->test_state_freq is set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	selftests/bpf: enhance align selftest's expected log matching	Andrii Nakryiko
	Allow to search for expected register state in all the verifier log output that's related to specified instruction number. See added comment for an example of possible situation that is happening due to a simple enhancement done in the next patch, which fixes handling of env->test_state_freq flag in state checkpointing logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}	Andrii Nakryiko
	Teach regsafe() logic to handle PTR_TO_MEM, PTR_TO_BUF, and PTR_TO_TP_BUFFER similarly to PTR_TO_MAP_{KEY,VALUE}. That is, instead of exact match for var_off and range, use tnum_in() and range_within() checks, allowing more general verified state to subsume more specific current state. This allows to match wider range of valid and safe states, speeding up verification and detecting wider range of equivalent states for upcoming open-coded iteration looping logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	bpf: improve stack slot state printing	Andrii Nakryiko
	Improve stack slot state printing to provide more useful and relevant information, especially for dynptrs. While previously we'd see something like: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-8_w=dddddddd fp-16_w=dddddddd refs=2 Now we'll see way more useful: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-16_w=dynptr_ringbuf(ref_id=2) refs=2 I experimented with printing the range of slots taken by dynptr, something like: fp-16..8_w=dynptr_ringbuf(ref_id=2) But it felt very awkward and pretty useless. So we print the lowest address (most negative offset) only. The general structure of this code is now also set up for easier extension and will accommodate ITER slots naturally. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04	Merge tag 'sound-fix-6.3-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of various small fixes that have been gathered since the last PR. The majority of changes are for ASoC, and there is a small change in ASoC PCM core, but the rest are all for driver- specific fixes / quirks / updates" * tag 'sound-fix-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: ice1712: Delete unreachable code in aureon_add_controls() ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls() ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC ALSA: hda/realtek: Improve support for Dell Precision 3260 ASoC: mediatek: mt8195: add missing initialization ASoC: mediatek: mt8188: add missing initialization ASoC: amd: yc: Add DMI entries to support HP OMEN 16-n0xxx (8A43) ASoC: zl38060 add gpiolib dependency ASoC: sam9g20ek: Disable capture unless building with microphone input ASoC: mt8192: Fix range for sidetone positive gain ASoC: mt8192: Report an error if when an invalid sidetone gain is written ASoC: mt8192: Fix event generation for controls ASoC: mt8192: Remove spammy log messages ASoC: mchp-pdmc: fix poc noise at capture startup ASoC: dt-bindings: sama7g5-pdmc: add microchip,startup-delay-us binding ASoC: soc-pcm: add option to start DMA after DAI ASoC: mt8183: Fix event generation for I2S DAI operations ASoC: mt8183: Remove spammy logging from I2S DAI driver ASoC: mt6358: Remove undefined HPx Mux enumeration values ASoC: mt6358: Validate Wake on Voice 2 writes ...
2023-03-03	Merge branch 'bpf: allow ctx writes using BPF_ST_MEM instruction'	Alexei Starovoitov
	Eduard Zingerman says: ==================== Changes v1 -> v2, suggested by Alexei: - Resolved conflict with recent commit: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier"); - Variable `ctx_access` removed in function `convert_ctx_accesses()`; - Macro `BPF_COPY_STORE` renamed to `BPF_EMIT_STORE` and fixed to correctly extract original store instruction class from code. Original message follows: The function verifier.c:convert_ctx_access() applies some rewrites to BPF instructions that read from or write to the BPF program context. For example, the write instruction for the `struct bpf_sockopt::retval` field: (u32 )(r1 + offsetof(struct bpf_sockopt, retval)) = r2 Is transformed to: (u64 )(r1 + offsetof(struct bpf_sockopt_kern, tmp_reg)) = r9 r9 = (u64 )(r1 + offsetof(struct bpf_sockopt_kern, current_task)) r9 = (u64 )(r9 + offsetof(struct task_struct, bpf_ctx)) (u32 )(r9 + offsetof(struct bpf_cg_run_ctx, retval)) = r2 r9 = (u64 )(r1 + offsetof(struct bpf_sockopt_kern, tmp_reg)) Currently, the verifier only supports such transformations for LDX (memory-to-register read) and STX (register-to-memory write) instructions. Error is reported for ST instructions (immediate-to-memory write). This is fine because clang does not currently emit ST instructions. However, new `-mcpu=v4` clang flag is planned, which would allow to emit ST instructions (discussed in [1]). This patch-set adjusts the verifier to support ST instructions in `verifier.c:convert_ctx_access()`. The patches #1 and #2 were previously shared as part of RFC [2]. The changes compared to that RFC are: - In patch #1, a bug in the handling of the `struct __sk_buff::queue_mapping` field was fixed. - Patch #3 is added, which is a set of disassembler-based test cases for context access rewrites. The test cases cover all fields for which the handling code is modified in patch #1. [1] Propose some new instructions for -mcpu=v4 https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ [2] RFC Support for BPF_ST instruction in LLVM C compiler https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/ [3] v1 https://lore.kernel.org/bpf/20230302225507.3413720-1-eddyz87@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03	selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()	Eduard Zingerman
	Function verifier.c:convert_ctx_access() applies some rewrites to BPF instructions that read or write BPF program context. This commit adds machinery to allow test cases that inspect BPF program after these rewrites are applied. An example of a test case: { // Shorthand for field offset and size specification N(CGROUP_SOCKOPT, struct bpf_sockopt, retval), // Pattern generated for field read .read = "$dst = (u64 )($ctx + bpf_sockopt_kern::current_task);" "$dst = (u64 )($dst + task_struct::bpf_ctx);" "$dst = (u32 )($dst + bpf_cg_run_ctx::retval);", // Pattern generated for field write .write = "(u64 )($ctx + bpf_sockopt_kern::tmp_reg) = r9;" "r9 = (u64 )($ctx + bpf_sockopt_kern::current_task);" "r9 = (u64 )(r9 + task_struct::bpf_ctx);" "(u32 )(r9 + bpf_cg_run_ctx::retval) = $src;" "r9 = (u64 )($ctx + bpf_sockopt_kern::tmp_reg);" , }, For each test case, up to three programs are created: - One that uses BPF_LDX_MEM to read the context field. - One that uses BPF_STX_MEM to write to the context field. - One that uses BPF_ST_MEM to write to the context field. The disassembly of each program is compared with the pattern specified in the test case. Kernel code for disassembly is reused (as is in the bpftool). To keep Makefile changes to the minimum, symbolic links to `kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03	selftests/bpf: test if pointer type is tracked for BPF_ST_MEM	Eduard Zingerman
	Check that verifier tracks pointer types for BPF_ST_MEM instructions and reports error if pointer types do not match for different execution branches. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03	bpf: allow ctx writes using BPF_ST_MEM instruction	Eduard Zingerman
	Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03	tools arch x86: Sync the msr-index.h copy with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in: e7862eda309ecfcc ("x86/cpu: Support AMD Automatic IBRS") 0125acda7d76b943 ("x86/bugs: Reset speculation control settings on init") 38aaf921e92dc5cf ("perf/x86: Add Meteor Lake support") 5b6fac3fa44bafee ("x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation") dc2a3e857981f859 ("x86/resctrl: Add interface to read mbm_total_bytes_config") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2023-03-03 18:26:51.766923522 -0300 +++ after 2023-03-03 18:27:09.987415481 -0300 @@ -267,9 +267,11 @@ [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", [0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE", + [0xc0000280 - x86_64_specific_MSRs_offset] = "IA32_SMBA_BW_BASE", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", [0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR", + [0xc0000400 - x86_64_specific_MSRs_offset] = "IA32_EVT_CFG_BASE", }; #define x86_AMD_V_KVM_MSRs_offset 0xc0010000 $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Babu Moger <babu.moger@amd.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Breno Leitao <leitao@debian.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/lkml/ZAJoaZ41+rU5H0vL@kernel.org [ I had published the perf-tools branch before with the sync with ] [ 8c29f01654053258 ("x86/sev: Add SEV-SNP guest feature negotiation support") ] [ I removed it from this new sync ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03	tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in: 89b0e7de3451a17f ("KVM: arm64: nv: Introduce nested virtualization VCPU feature") 14329b825ffb7f27 ("KVM: x86/pmu: Introduce masked events to the pmu event filter") 6213b701a9df0472 ("KVM: x86: Replace 0-length arrays with flexible arrays") 3fd49805d19d1c56 ("KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg") 14329b825ffb7f27 ("KVM: x86/pmu: Introduce masked events to the pmu event filter") That don't change functionality in tools/perf, as no new ioctl is added for the 'perf trace' scripts to harvest. This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Aaron Lewis <aaronlewis@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/lkml/ZAJlg7%2FfWDVGX0F3@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03	tools include UAPI: Synchronize linux/fcntl.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in: 6fd7353829cafc40 ("mm/memfd: add F_SEAL_EXEC") That doesn't add or change any perf tools functionality, only addresses these build warnings: Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h' diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03	tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in this cset: cbdb1f163af2bb90 ("vdso/bits.h: Add BIT_ULL() for the sake of consistency") That just causes perf to rebuild, the macro included doesn't clash with anything in tools/{perf,objtool,bpf}. This addresses this perf build warning: Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h' diff -u tools/include/linux/bits.h include/linux/bits.h Warning: Kernel ABI header at 'tools/include/vdso/bits.h' differs from latest version at 'include/vdso/bits.h' diff -u tools/include/vdso/bits.h include/vdso/bits.h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03	tools headers UAPI: Sync linux/prctl.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick new prctl options introduced in: b507808ebce23561 ("mm: implement memory-deny-write-execute as a prctl") That results in: $ diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h --- tools/include/uapi/linux/prctl.h 2022-06-20 17:54:43.884515663 -0300 +++ include/uapi/linux/prctl.h 2023-03-03 11:18:51.090923569 -0300 @@ -281,6 +281,12 @@ # define PR_SME_VL_LEN_MASK 0xffff # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec / +/ Memory deny write / execute / +#define PR_SET_MDWE 65 +# define PR_MDWE_REFUSE_EXEC_GAIN 1 + +#define PR_GET_MDWE 66 + #define PR_SET_VMA 0x53564d41 # define PR_SET_VMA_ANON_NAME 0 $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2023-03-03 11:47:43.320013146 -0300 +++ after 2023-03-03 11:47:50.937216229 -0300 @@ -59,6 +59,8 @@ [62] = "SCHED_CORE", [63] = "SME_SET_VL", [64] = "SME_GET_VL", + [65] = "SET_MDWE", + [66] = "GET_MDWE", }; static const char prctl_set_mm_options[] = { [1] = "START_CODE", $ Now users can do: # perf trace -e syscalls:sys_enter_prctl --filter "option==SET_MDWE\|\|option==GET_MDWE" ^C# # trace -v -e syscalls:sys_enter_prctl --filter "option==SET_MDWE\|\|option==GET_MDWE" New filter for syscalls:sys_enter_prctl: (option==65\|\|option==66) && (common_pid != 5519 && common_pid != 3404) ^C# And when these prctl options appears in a session, they will be translated to the corresponding string. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZAI%2FAoPXb%2Fsxz1%2Fm@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>