summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-25mlxsw: core_hwmon: Adjust module label names based on MTCAP sensor counterVadim Pasternak
Transceiver module temperature sensors are indexed after ASIC and platform sensors. The current label printing method does not take this into account and simply prints the index of the transceiver module sensor. On new systems that have platform sensors this results in incorrect (shifted) transceiver module labels being printed: $ sensors [...] front panel 002: +37.0°C (crit = +70.0°C, emerg = +75.0°C) front panel 003: +47.0°C (crit = +70.0°C, emerg = +75.0°C) [...] Fix by taking the sensor count into account. After the fix: $ sensors [...] front panel 001: +37.0°C (crit = +70.0°C, emerg = +75.0°C) front panel 002: +47.0°C (crit = +70.0°C, emerg = +75.0°C) [...] Fixes: a53779de6a0e ("mlxsw: core: Add QSFP module temperature label attribute to hwmon") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mlxsw: i2c: Limit single transaction buffer sizeVadim Pasternak
Maximum size of buffer is obtained from underlying I2C adapter and in case adapter allows I2C transaction buffer size greater than 100 bytes, transaction will fail due to firmware limitation. As a result driver will fail initialization. Limit the maximum size of transaction buffer by 100 bytes to fit to firmware. Remove unnecessary calculation: max_t(u16, MLXSW_I2C_BLK_DEF, quirk_size). This condition can not happened. Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mlxsw: i2c: Fix chunk size setting in output mailbox bufferVadim Pasternak
The driver reads commands output from the output mailbox. If the size of the output mailbox is not a multiple of the transaction / block size, then the driver will not issue enough read transactions to read the entire output, which can result in driver initialization errors. Fix by determining the number of transactions using DIV_ROUND_UP(). Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mmc: core: Add host specific tuning support for SD HS modeWenchao Chen
To support the need for host specific tuning for SD high-speed mode, let's add two new optional callbacks, ->prepare|execute_sd_hs_tuning() and let's call them when switching into the SD high-speed mode. Note that, during the tuning process it's also needed for host drivers to send commands to the SD card to verify that the tuning process succeeds. Therefore, let's also share the corresponding functions from the core to allow this. Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com> Link: https://lore.kernel.org/r/20230825091743.15613-2-wenchao.chen@unisoc.com [Ulf: Dropped unnecessary function declarations and updated the commit msg] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-25ALSA: hda: Add missing dependency on CONFIG_EFI for Cirrus/TI sub-codecsTakashi Iwai
The CS35L41 and TAS2781 sub-codecs depend on CONFIG_EFI, as they have the code accessing efi variable directly. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308250621.1lwt7PtZ-lkp@intel.com/ Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Link: https://lore.kernel.org/r/20230825092819.12340-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-08-25ALSA: doc: Fix missing backquote in midi-2.0.rstTakashi Iwai
Fix the missing missing backquote that caused a sphinx warning: Documentation/sound/designs/midi-2.0.rst:517: WARNING: Inline interpreted text or phrase reference start-string without end-string. Fixes: e240cff9e6e9 ("ALSA: documentation: Add description for USB MIDI 2.0 gadget driver") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20230825152957.18c54ae2@canb.auug.org.au Link: https://lore.kernel.org/r/20230825092351.11780-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-08-25kunit: Fix checksum tests on big endian CPUsChristophe Leroy
On powerpc64le checksum kunit tests work: [ 2.011457][ T1] KTAP version 1 [ 2.011662][ T1] # Subtest: checksum [ 2.011848][ T1] 1..3 [ 2.034710][ T1] ok 1 test_csum_fixed_random_inputs [ 2.079325][ T1] ok 2 test_csum_all_carry_inputs [ 2.127102][ T1] ok 3 test_csum_no_carry_inputs [ 2.127202][ T1] # checksum: pass:3 fail:0 skip:0 total:3 [ 2.127533][ T1] # Totals: pass:3 fail:0 skip:0 total:3 [ 2.127956][ T1] ok 1 checksum But on powerpc64 and powerpc32 they fail: [ 1.859890][ T1] KTAP version 1 [ 1.860041][ T1] # Subtest: checksum [ 1.860201][ T1] 1..3 [ 1.861927][ T58] # test_csum_fixed_random_inputs: ASSERTION FAILED at lib/checksum_kunit.c:243 [ 1.861927][ T58] Expected result == expec, but [ 1.861927][ T58] result == 54991 (0xd6cf) [ 1.861927][ T58] expec == 33316 (0x8224) [ 1.863742][ T1] not ok 1 test_csum_fixed_random_inputs [ 1.864520][ T60] # test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267 [ 1.864520][ T60] Expected result == expec, but [ 1.864520][ T60] result == 255 (0xff) [ 1.864520][ T60] expec == 65280 (0xff00) [ 1.868820][ T1] not ok 2 test_csum_all_carry_inputs [ 1.869977][ T62] # test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:306 [ 1.869977][ T62] Expected result == expec, but [ 1.869977][ T62] result == 64515 (0xfc03) [ 1.869977][ T62] expec == 0 (0x0) [ 1.872060][ T1] not ok 3 test_csum_no_carry_inputs [ 1.872102][ T1] # checksum: pass:0 fail:3 skip:0 total:3 [ 1.872458][ T1] # Totals: pass:0 fail:3 skip:0 total:3 [ 1.872791][ T1] not ok 3 checksum This is because all expected values were calculated for X86 which is little endian. On big endian systems all precalculated 16 bits halves must be byte swapped. And this is confirmed by a huge amount of sparse errors when building with C=2 So fix all sparse errors and it will naturally work on all endianness. Fixes: 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: arcnet: Do not call kfree_skb() under local_irq_disable()Jinjie Ruan
It is not allowed to call kfree_skb() from hardware interrupt context or with hardware interrupts being disabled. So replace kfree_skb() with dev_kfree_skb_irq() under local_irq_disable(). Compile tested only. Fixes: 05fcd31cc472 ("arcnet: add err_skb package for package status feedback") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25octeontx2-pf: fix page_pool creation fail for rings > 32kRatheesh Kannoth
octeontx2 driver calls page_pool_create() during driver probe() and fails if queue size > 32k. Page pool infra uses these buffers as shock absorbers for burst traffic. These pages are pinned down over time as working sets varies, due to the recycling nature of page pool, given page pool (currently) don't have a shrinker mechanism, the pages remain pinned down in ptr_ring. Instead of clamping page_pool size to 32k at most, limit it even more to 2k to avoid wasting memory. This have been tested on octeontx2 CN10KA hardware. TCP and UDP tests using iperf shows no performance regressions. Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: fec: add statistics for XDP_TXWei Fang
The FEC driver supports the statistics for XDP actions except for XDP_TX before, because the XDP_TX was not supported when adding the statistics for XDP. Now the FEC driver has supported XDP_TX since commit f601899e4321 ("net: fec: add XDP_TX feature support"). So it's reasonable and necessary to add statistics for XDP_TX. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25ice: avoid executing commands on other ports when driving syncJacob Keller
The ice hardware has a synchronization mechanism used to drive the simultaneous application of commands on both PHY ports and the source timer in the MAC. When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will simultaneously apply the commands programmed for the main timer and each PHY port. Neither the main timer command register, nor the PHY port command registers auto clear on command execution. During the execution of a timer command intended for a single port on E822 devices, such as those used to configure a PHY during link up, the driver is not correctly clearing the previous commands. This results in unintentionally executing the last programmed command on the main timer and other PHY ports whenever performing reconfiguration on E822 ports after link up. This results in unintended side effects on other timers, depending on what command was previously programmed. To fix this, the driver must ensure that the main timer and all other PHY ports are properly initialized to perform no action. The enumeration for timer commands does not include an enumeration value for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a timer command to hardware, leave the command bits set to zero which indicates that no operation should be performed on that port. Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports other than the one being configured, write their timer command register to ICE_PTP_NOP. This ensures that no side effect happens on the timer command. To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always initialize all other ports to ICE_PTP_NOP. This ensures that no side effects happen on the other ports. Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in ice_sync_phy_timer_e822() and ice_start_phy_timer_e822(). With both of these changes, the driver should no longer execute a stale command on the main timer or another PHY port when reconfiguring one of the PHY ports after link up. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25ALSA: hda/realtek: Add quirk for mute LEDs on HP ENVY x360 15-eu0xxxFabian Vogt
The LED for the mic mute button is controlled by GPIO2. The mute button LED is slightly more complex, it's controlled by two bits in coeff 0x0b. Signed-off-by: Fabian Vogt <fabian@ritter-vogt.de> Link: https://lore.kernel.org/r/2693091.mvXUDI8C0e@fabians-envy Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-08-25ALSA: hda/tas2781: Switch back to use struct i2c_driver's .probe()Uwe Kleine-König
struct i2c_driver::probe_new is about to go away. Switch the driver to use the probe callback with the same prototype. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Link: https://lore.kernel.org/r/20230824200219.9569-1-u.kleine-koenig@pengutronix.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-08-25Merge tag 'asoc-fix-v6.5-rc7-2' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Quirk for v6.5 One additional fix for v6.5, an additional quirk. As with the other fixes this could wait for the merge window.
2023-08-25wifi: ath: Use is_multicast_ether_addr() to check multicast Ether addressRuan Jinjie
Use is_multicast_ether_addr() to perform the Checking. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230814124212.302738-2-ruanjinjie@huawei.com
2023-08-25wifi: ath12k: Remove unused declarationsYue Haibing
Commit d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") declared but never implemented these, remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230816130550.50896-1-yuehaibing@huawei.com
2023-08-25wifi: ath12k: add check max message length while scanning with extraieWen Gong
Currently the extraie length is directly used to allocate skb buffer. When the length of skb is greater than the max message length which firmware supports, error will happen in firmware side. Hence add check for the skb length and drop extraie when overflow and print a message. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230809081657.13858-1-quic_wgong@quicinc.com
2023-08-25wifi: ath9k: use IS_ERR() with debugfs_create_dir()Wang Ming
The debugfs_create_dir() function returns error pointers, it never returns NULL. Most incorrect error checks were fixed, but the one in ath9k_htc_init_debug() was forgotten. Fix the remaining error check. Fixes: e5facc75fa91 ("ath9k_htc: Cleanup HTC debugfs") Signed-off-by: Wang Ming <machel@vivo.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230713030358.12379-1-machel@vivo.com
2023-08-25net: handle ARPHRD_PPP in dev_is_mac_header_xmit()Nicolas Dichtel
The goal is to support a bpf_redirect() from an ethernet device (ingress) to a ppp device (egress). The l2 header is added automatically by the ppp driver, thus the ethernet header should be removed. CC: stable@vger.kernel.org Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Siwar Zitouni <siwar.zitouni@6wind.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25Merge branch 'txgbe-link-modes'David S. Miller
Jiawen Wu says: ==================== support more link mode for TXGBE There are three new interface mode support for Wangxun 10Gb NICs: 1000BASE-X, SGMII and XAUI. Specific configurations are added to XPCS. And external PHY attaching is added for copper NICs. v2 -> v3: - add device identifier read - restrict pcs soft reset - add firmware version warning v1 -> v2: - use the string "txgbe_pcs_mdio_bus" directly - use dev_err() instead of pr_err() - add device quirk flag - add more macro definitions to explain PMA registers - move txgbe_enable_sec_tx_path() to mac_finish() - implement phylink for copper NICs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: ngbe: move mdio access registers to libwxJiawen Wu
Registers of mdio accessing are common defined in libwx, remove the redundant macro definitions in ngbe driver. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: txgbe: support copper NIC with external PHYJiawen Wu
Wangxun SP chip supports to connect with external PHY (marvell 88x3310), which links to 10GBASE-T/1000BASE-T/100BASE-T. Add the identification of media types from subsystem device IDs. For sp_media_copper, register mdio bus for the external PHY. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: txgbe: support switching mode to 1000BASE-X and SGMIIJiawen Wu
Disable data path before PCS VR reset while switching PCS mode, to prevent the blocking of data path. Enable AN interrupt for CL37 auto-negotiation. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: txgbe: add FW version warningJiawen Wu
Since XPCS device identifier is implemented in the firmware version 0x20010 and above, so add a warning to prompt the users to upgrade the firmware to make sure the driver works. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: pcs: xpcs: adapt Wangxun NICs for SGMII modeJiawen Wu
Wangxun NICs support the connection with SFP to RJ45 module. In this case, PCS need to be configured in SGMII mode. According to chapter 6.11.1 "SGMII Auto-Negitiation" of DesignWare Cores Ethernet PCS (version 3.20a) and custom design manual, do the following configuration when the interface mode is SGMII. 1. program VR_MII_AN_CTRL bit(3) [TX_CONFIG] = 1b (PHY side SGMII) 2. program VR_MII_AN_CTRL bit(8) [MII_CTRL] = 1b (8-bit MII) 3. program VR_MII_DIG_CTRL1 bit(0) [PHY_MODE_CTRL] = 1b Also CL37 AN in backplane configurations need to be enabled because of the special hardware design. Another thing to note is that PMA needs to be reconfigured before each CL37 AN configuration for SGMII, otherwise AN will fail, although we don't know why. On this device, CL37_ANSGM_STS (bit[4:1] of VR_MII_AN_INTR_STS) indicates the status received from remote link during the auto-negotiation, and self-clear after the auto-negotiation is complete. Meanwhile, CL37_ANCMPLT_INTR will be set to 1, to indicate CL37 AN is complete. So add another way to get the state for CL37 SGMII. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: pcs: xpcs: add 1000BASE-X AN interrupt supportJiawen Wu
Enable CL37 AN complete interrupt for DW XPCS. It requires to clear the bit(0) [CL37_ANCMPLT_INTR] of VR_MII_AN_INTR_STS after AN completed. And there is a quirk for Wangxun devices to enable CL37 AN in backplane configurations because of the special hardware design. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: pcs: xpcs: support to switch mode for Wangxun NICsJiawen Wu
According to chapter 6 of DesignWare Cores Ethernet PCS (version 3.20a) and custom design manual, add a configuration flow for switching interface mode. If the interface changes, the following setting is required: 1. wait VR_XS_PCS_DIG_STS bit(4, 2) [PSEQ_STATE] = 100b (Power-Good) 2. write SR_XS_PCS_CTRL2 to select various PCS type 3. write SR_PMA_CTRL1 and/or SR_XS_PCS_CTRL1 for link speed 4. program PMA registers 5. write VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] = 1b (Vendor-Specific Soft Reset) 6. wait for VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] to get cleared Only 10GBASE-R/SGMII/1000BASE-X modes are planned for the current Wangxun devices. And there is a quirk for Wangxun devices to switch mode although the interface in phylink state has not changed, since PCS will change to default 10GBASE-R when the ethernet driver(txgbe) do LAN reset. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: pcs: xpcs: add specific vendor supoprt for Wangxun 10Gb NICsJiawen Wu
Since Wangxun 10Gb NICs require some special configuration on the IP of Synopsys Designware XPCS, introduce dev_flag for different vendors. Read OUI from device identifier registers, to detect Wangxun devices. And xpcs_soft_reset() is skipped to avoid the reset of device identifier registers. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-24[SMB3] send channel sequence number in SMB3 requests after reconnectsSteve French
The ChannelSequence field in the SMB3 header is supposed to be increased after reconnect to allow the server to distinguish requests from before and after the reconnect. We had always been setting it to zero. There are cases where incrementing ChannelSequence on requests after network reconnects can reduce the chance of data corruptions. See MS-SMB2 3.2.4.1 and 3.2.7.1 Signed-off-by: Steve French <stfrench@microsoft.com> Cc: stable@vger.kernel.org # 5.16+
2023-08-25Merge tag 'drm-intel-next-fixes-2023-08-24' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next - Fix TLB invalidation (Alan) - Fix Display HPD polling (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZOdOP31OE/Cf1ojo@intel.com
2023-08-24Merge tag 'trace-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix ring buffer being permanently disabled due to missed record_disabled() Changing the trace cpu mask will disable the ring buffers for the CPUs no longer in the mask. But it fails to update the snapshot buffer. If a snapshot takes place, the accounting for the ring buffer being disabled is corrupted and this can lead to the ring buffer being permanently disabled. - Add test case for snapshot and cpu mask working together - Fix memleak by the function graph tracer not getting closed properly. The iterator is used to read the ring buffer. When it opens, it calls the open function of a tracer, and when it is closed, it calls the close iteration. While a trace is being read, it is still possible to change the tracer. If this happens between the function graph tracer and the wakeup tracer (which uses function graph tracing), the tracers are not closed properly during when the iterator sees the switch, and the wakeup function did not initialize its private pointer to NULL, which is used to know if the function graph tracer was the last tracer. It could be fooled in thinking it is, but then on exit it does not call the close function of the function graph tracer to clean up its data. - Fix synthetic events on big endian machines, by introducing a union that does the conversions properly. - Fix synthetic events from printing out the number of elements in the stacktrace when it shouldn't. - Fix synthetic events stacktrace to not print a bogus value at the end. - Introduce a pipe_cpumask that prevents the trace_pipe files from being opened by more than one task (file descriptor). There was a race found where if splice is called, the iter->ent could become stale and events could be missed. There's no point reading a producer/consumer file by more than one task as they will corrupt each other anyway. Add a cpumask that keeps track of the per_cpu trace_pipe files as well as the global trace_pipe file that prevents more than one open of a trace_pipe file that represents the same ring buffer. This prevents the race from happening. - Fix ftrace samples for arm64 to work with older compilers. * tag 'trace-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: samples: ftrace: Replace bti assembly with hint for older compiler tracing: Introduce pipe_cpumask to avoid race on trace_pipes tracing: Fix memleak due to race between current_tracer and trace tracing/synthetic: Allocate one additional element for size tracing/synthetic: Skip first entry for stack traces tracing/synthetic: Use union instead of casts selftests/ftrace: Add a basic testcase for snapshot tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
2023-08-24scsi: snic: Fix double free in snic_tgt_create()Zhu Wang
Commit 41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add() fails") fixed the memory leak caused by dev_set_name() when device_add() failed. However, it did not consider that 'tgt' has already been released when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path to avoid double free of 'tgt' and move put_device(&tgt->dev) after the removed kfree(tgt) to avoid a use-after-free. Fixes: 41320b18a0e0 ("scsi: snic: Fix possible memory leak if device_add() fails") Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-24Merge tag 'media/v6.5-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "Fix a potential array out-of-bounds in the mediatek vcodec driver" * tag 'media/v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
2023-08-24Merge branch 'tools-ynl-handful-of-forward-looking-updates'Jakub Kicinski
Jakub Kicinski says: ==================== tools: ynl: handful of forward looking updates Small YNL improvements, mostly for work-in-progress families. ==================== Link: https://lore.kernel.org/r/20230824003056.1436637-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24netlink: specs: fix indent in fouJakub Kicinski
Fix up the indentation. This has no functional effect, AFAICT. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824003056.1436637-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24tools: ynl-gen: support empty attribute listsJakub Kicinski
Differentiate between empty list and None for member lists. New families may want to create request responses with no attribute. If we treat those the same as None we end up rendering a full parsing policy in user space, instead of an empty one. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824003056.1436637-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24tools: ynl-gen: fix collecting global policy attrsJakub Kicinski
We look for attributes inside do.request, but there's another layer of nesting in the spec, look inside do.request.attributes. This bug had no effect as all global policies we generate (fou) seem to be full, anyway, and we treat full and empty the same. Next patch will change the treatment of empty policies. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824003056.1436637-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24tools: ynl-gen: set length of binary fieldsJakub Kicinski
Remember to set the length field in the request setters. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824003056.1436637-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24tools: ynl: allow passing binary dataJakub Kicinski
Recent changes made us assume that input for binary data is in hex. When using YNL as a Python library it's possible to pass in raw bytes. Bring the ability to do that back. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824003056.1436637-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24net/mlx5e: fix up for "net/mlx5e: Move MACsec flow steering operations to be ↵Stephen Rothwell
used as core library" Recent merge had a conflict in: drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c between commit: aeb660171b06 ("net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups") from Linus' tree and commit: cb5ebe4896f9 ("net/mlx5e: Move MACsec flow steering operations to be used as core library") from the mlx5-next tree. This was missed and the former commit got lost, bring it back. Fixes: 3c5066c6b0a5 ("Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20230815123725.4ef5b7b9@canb.auug.org.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24scsi: core: raid_class: Remove raid_component_add()Zhu Wang
The raid_component_add() function was added to the kernel tree via patch "[SCSI] embryonic RAID class" (2005). Remove this function since it never has had any callers in the Linux kernel. And also raid_component_release() is only used in raid_component_add(), so it is also removed. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Fixes: 04b5b5cb0136 ("scsi: core: Fix possible memory leak if device_add() fails") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-24net/mlx5: Dynamic cyclecounter shift calculation for PTP free running clockRahul Rameshbabu
Use a dynamic calculation to determine the shift value for the internal timer cyclecounter that will lead to the highest precision frequency adjustments. Previously used a constant for the shift value assuming all devices supported by the driver had a nominal frequency of 1GHz. However, there are devices that operate at different frequencies. The previous shift value constant would break the PHC functionality for those devices. Reported-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Closes: https://lore.kernel.org/netdev/20230815151507.3028503-1-vadfed@meta.com/ Fixes: 6a4010927562 ("net/mlx5: Update cyclecounter shift value to improve ptp free running mode precision") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230821230554.236210-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24document while_each_thread(), change first_tid() to use for_each_thread()Oleg Nesterov
Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a group leader and thus can't exit before t, t or another sub-thread can exec and remove g from the thread_group list. The only lockless user of while_each_thread() is first_tid() and it is fine in that it can't loop forever, yet for_each_thread() looks better and I am going to change while_each_thread/next_thread. Link: https://lkml.kernel.org/r/20230823170806.GA11724@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24drivers/char/mem.c: shrink character device's devlist[] arrayAlexey Dobriyan
Merge padding, shrinking "struct memdev" from 32 bytes to 24 bytes on 64-bit. Link: https://lkml.kernel.org/r/fe4d62ab-2427-4635-b9f4-467853fb63e3@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24x86/crash: optimize CPU changesEric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs (ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr. The kimage->file_mode term covers kdump images loaded via the kexec_file_load() syscall. Since crash_prepare_elf64_headers() wrote the initial elfcorehdr, no update to the elfcorehdr is needed for CPU changes. The kimage->elfcorehdr_updated term covers kdump images loaded via the kexec_load() syscall. At least one memory or CPU change must occur to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr. Afterwards, no update to the elfcorehdr is needed for CPU changes. This code is intentionally *NOT* hoisted into crash_handle_hotplug_event() as it would prevent the arch-specific handler from running for CPU changes. This would break PPC, for example, which needs to update other information besides the elfcorehdr, on CPU changes. Link: https://lkml.kernel.org/r/20230814214446.6659-9-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr which describes the CPUs and memory in the system for the crash kernel. In particular, it writes out ELF PT_NOTEs for memory regions and the CPUs in the system. With respect to the CPUs, the current implementation utilizes for_each_present_cpu() which means that as CPUs are added and removed, the elfcorehdr must again be updated to reflect the new set of CPUs. The reasoning behind the move to use for_each_possible_cpu(), is: - At kernel boot time, all percpu crash_notes are allocated for all possible CPUs; that is, crash_notes are not allocated dynamically when CPUs are plugged/unplugged. Thus the crash_notes for each possible CPU are always available. - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU. Changing to for_each_possible_cpu() is valid as the crash_notes pointed to by each CPU PT_NOTE are present and always valid. Furthermore, examining a common crash processing path of: kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer elfcorehdr /proc/vmcore vmcore reveals how the ELF CPU PT_NOTEs are utilized: - Upon panic, each CPU is sent an IPI and shuts itself down, recording its state in its crash_notes. When all CPUs are shutdown, the crash kernel is launched with a pointer to the elfcorehdr. - The crash kernel via linux/fs/proc/vmcore.c does not examine or use the contents of the PT_NOTEs, it exposes them via /proc/vmcore. - The makedumpfile utility uses /proc/vmcore and reads the CPU PT_NOTEs to craft a nr_cpus variable, which is reported in a header but otherwise generally unused. Makedumpfile creates the vmcore. - The 'crash' dump analyzer does not appear to reference the CPU PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask symbols and directly examines those structure contents from vmcore memory. From that information it is able to determine which CPUs are present and online, and locate the corresponding crash_notes. Said differently, it appears that 'crash' analyzer does not rely on the ELF PT_NOTEs for CPUs; rather it obtains the information directly via kernel symbols and the memory within the vmcore. (There maybe other vmcore generating and analysis tools that do use these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most common solution.) This results in the benefit of having all CPUs described in the elfcorehdr, and therefore reducing the need to re-generate the elfcorehdr on CPU changes, at the small expense of an additional 56 bytes per PT_NOTE for not-present-but-possible CPUs. On systems where kexec_file_load() syscall is utilized, all the above is valid. On systems where kexec_load() syscall is utilized, there may be the need for the elfcorehdr to be regenerated once. The reason being that some archs only populate the 'present' CPUs from the /sys/devices/system/cpus entries, which the userspace 'kexec' utility uses to generate the userspace-supplied elfcorehdr. In this situation, one memory or CPU change will rewrite the elfcorehdr via the crash_prepare_elf64_headers() function and now all possible CPUs will be described, just as with kexec_file_load() syscall. Link: https://lkml.kernel.org/r/20230814214446.6659-8-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24crash: hotplug support for kexec_load()Eric DeVolder
The hotplug support for kexec_load() requires changes to the userspace kexec-tools and a little extra help from the kernel. Given a kdump capture kernel loaded via kexec_load(), and a subsequent hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites it to reflect the hotplug change. That is the desired outcome, however, at kernel panic time, the purgatory integrity check fails (because the elfcorehdr changed), and the capture kernel does not boot and no vmcore is generated. Therefore, the userspace kexec-tools/kexec must indicate to the kernel that the elfcorehdr can be modified (because the kexec excluded the elfcorehdr from the digest, and sized the elfcorehdr memory buffer appropriately). To facilitate hotplug support with kexec_load(): - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is safe for the kernel to modify the kexec_load()'d elfcorehdr - the /sys/kernel/crash_elfcorehdr_size node communicates the preferred size of the elfcorehdr memory buffer - The sysfs crash_hotplug nodes (ie. /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically take into account kexec_file_load() vs kexec_load() and KEXEC_UPDATE_ELFCOREHDR. This is critical so that the udev rule processing of crash_hotplug is all that is needed to determine if the userspace unload-then-load of the kdump image is to be skipped, or not. The proposed udev rule change looks like: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" The table below indicates the behavior of kexec_load()'d kdump image updates (with the new udev crash_hotplug rule in place): Kernel |Kexec -------+-----+---- Old |Old |New | a | a -------+-----+---- New | a | b -------+-----+---- where kexec 'old' and 'new' delineate kexec-tools has the needed modifications for the crash hotplug feature, and kernel 'old' and 'new' delineate the kernel supports this crash hotplug feature. Behavior 'a' indicates the unload-then-reload of the entire kdump image. For the kexec 'old' column, the unload-then-reload occurs due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel (with 'new' kexec) does not present the crash_hotplug sysfs node, which leads to the unload-then-reload of the kdump image. Behavior 'b' indicates the desired optimized behavior of the kernel directly modifying the elfcorehdr and avoiding the unload-then-reload of the kdump image. If the udev rule is not updated with crash_hotplug node check, then no matter any combination of kernel or kexec is new or old, the kdump image continues to be unload-then-reload on hotplug changes. To fully support crash hotplug feature, there needs to be a rollout of kernel, kexec-tools and udev rule changes. However, the order of the rollout of these pieces does not matter; kexec_load()'d kdump images still function for hotplug as-is. Link: https://lkml.kernel.org/r/20230814214446.6659-7-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24x86/crash: add x86 crash hotplug supportEric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash elfcorehdr, which describes the CPUs and memory in the system, must also be updated. A new elfcorehdr is generated from the available CPUs and memory and replaces the existing elfcorehdr. The segment containing the elfcorehdr is identified at run-time in crash_core:crash_handle_hotplug_event(). No modifications to purgatory (see 'kexec: exclude elfcorehdr from the segment digest') or boot_params (as the elfcorehdr= capture kernel command line parameter pointer remains unchanged and correct) are needed, just elfcorehdr. For kexec_file_load(), the elfcorehdr segment size is based on NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a growing number of CPU and memory resources. For kexec_load(), the userspace kexec utility needs to size the elfcorehdr segment in the same/similar manner. To accommodate kexec_load() syscall in the absence of kexec_file_load() syscall support, prepare_elf_headers() and dependents are moved outside of CONFIG_KEXEC_FILE. [eric.devolder@oracle.com: correct unused function build error] Link: https://lkml.kernel.org/r/20230821182644.2143-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-6-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24crash: memory and CPU hotplug sysfs attributesEric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24kexec: exclude elfcorehdr from the segment digestEric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the kernel places the various segments (ie crash kernel, crash initrd, boot_params, elfcorehdr, purgatory, etc) in memory. For those architectures that utilize purgatory, a hash digest of the segments is calculated for integrity checking. The digest is embedded into the purgatory image prior to placing in memory. Updates to the elfcorehdr in response to CPU and memory changes would cause the purgatory integrity checking to fail (at crash time, and no vmcore created). Therefore, the elfcorehdr segment is explicitly excluded from the purgatory digest, enabling updates to the elfcorehdr while also avoiding the need to recompute the hash digest and reload purgatory. Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>