linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-11-18	Merge tag 'nfsd-6.7-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix several long-standing bugs in the duplicate reply cache - Fix a memory leak * tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix checksum mismatches in the duplicate reply cache NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() NFSD: Update nfsd_cache_append() to use xdr_stream nfsd: fix file memleak on client_opens_release
2023-11-18	Merge tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds
	Pull smb client fixes from Steve French: - multichannel fixes (including a lock ordering fix and an important refcounting fix) - spnego fix * tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix lock ordering while disabling multichannel cifs: fix leak of iface for primary channel cifs: fix check of rc in function generate_smb3signingkey cifs: spnego: add ';' in HOST_KEY_LEN
2023-11-18	prctl: Disable prctl(PR_SET_MDWE) on parisc	Helge Deller
	systemd-254 tries to use prctl(PR_SET_MDWE) for it's MemoryDenyWriteExecute functionality, but fails on parisc which still needs executable stacks in certain combinations of gcc/glibc/kernel. Disable prctl(PR_SET_MDWE) by returning -EINVAL for now on parisc, until userspace has catched up. Signed-off-by: Helge Deller <deller@gmx.de> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sam James <sam@gentoo.org> Closes: https://github.com/systemd/systemd/issues/29775 Tested-by: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/all/875y2jro9a.fsf@gentoo.org/ Cc: <stable@vger.kernel.org> # v6.3+
2023-11-18	Merge tag 'for-6.7/dm-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Various fixes for the DM delay target to address regressions introduced during the 6.7 merge window - Fixes to both DM bufio and the verity target for no-sleep mode, to address sleeping while atomic issues - Update DM crypt target in response to the treewide change that made MAX_ORDER inclusive * tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-crypt: start allocating with MAX_ORDER dm-verity: don't use blocking calls from tasklets dm-bufio: fix no-sleep mode dm-delay: avoid duplicate logic dm-delay: fix bugs introduced by kthread mode dm-delay: fix a race between delay_presuspend and delay_bio
2023-11-18	parisc/power: Fix power soft-off when running on qemu	Helge Deller
	Firmware returns the physical address of the power switch, so need to use gsc_writel() instead of direct memory access. Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v6.0+
2023-11-18	parisc: Replace strlcpy() with strscpy()	Kees Cook
	strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Cc: linux-parisc@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-11-18	Merge tag 'i2c-for-6.7-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Revert a not-working conversion to generic recovery for PXA, use proper IO accessors for designware, and use proper PM level for ocores to allow accessing interrupt providers late" * tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: Move system PM hooks to the NOIRQ phase i2c: designware: Fix corrupted memory seen in the ISR Revert "i2c: pxa: move to generic GPIO recovery"
2023-11-18	Merge tag 'mlx5-updates-2023-11-13' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2023-11-13 1) Cleanup patches, leftovers from previous cycle 2) Allow sync reset flow when BF MGT interface device is present 3) Trivial ptp refactorings and improvements 4) Add local loopback counter to vport rep stats Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	Merge tag 'batadv-next-pullrequest-20231115' of ↵	David S. Miller
	git://git.open-mesh.org/linux-merge This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Implement new multicast packet type, including its transmission, forwarding and parsing, by Linus Lüssing (3 patches) - Switch to new headers for sprintf and array size, by Sven Eckelmann (2 patches) Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	Merge branch 'mlxsw-new-reset-flow'	David S. Miller
	Petr Machata says: ==================== mlxsw: Add support for new reset flow Ido Schimmel writes: This patchset changes mlxsw to issue a PCI reset during probe and devlink reload so that the PCI firmware could be upgraded without a reboot. Unlike the old version of this patchset [1], in this version the driver no longer tries to issue a PCI reset by triggering a PCI link toggle on its own, but instead calls the PCI core to issue the reset. The PCI APIs require the device lock to be held which is why patches Patches #7 adds reset method quirk for NVIDIA Spectrum devices. Patch #8 adds a debug level print in PCI core so that device ready delay will be printed even if it is shorter than one second. Patches #9-#11 are straightforward preparations in mlxsw. Patch #12 finally implements the new reset flow in mlxsw. Patch #13 adds PCI reset handlers in mlxsw to avoid user space from resetting the device from underneath an unaware driver. Instead, the driver is gracefully de-initialized before the PCI reset and then initialized again after it. Patch #14 adds a PCI reset selftest to make sure this code path does not regress. [1] https://lore.kernel.org/netdev/cover.1679502371.git.petrm@nvidia.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	selftests: mlxsw: Add PCI reset test	Ido Schimmel
	Test that PCI reset works correctly by verifying that only the expected reset methods are supported and that after issuing the reset the ifindex of the port changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	mlxsw: pci: Implement PCI reset handlers	Ido Schimmel
	Implement reset_prepare() and reset_done() handlers that are invoked by the PCI core before and after issuing a PCI reset, respectively. Specifically, implement reset_prepare() by calling mlxsw_core_bus_device_unregister() and reset_done() by calling mlxsw_core_bus_device_register(). This is the same implementation as the reload_{down,up}() devlink operations with the following differences: 1. The devlink instance is unregistered and then registered again after the reset. 2. A reset via the device's command interface (using MRSR register) is not issued during reset_done() as PCI core already issued a PCI reset. Tested: # for i in $(seq 1 10); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset; done Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	mlxsw: pci: Add support for new reset flow	Ido Schimmel
	The driver resets the device during probe and during a devlink reload. The current reset method reloads the current firmware version or a pending one, if one was previously flashed using devlink. However, the current reset method does not result in a PCI hot reset, preventing the PCI firmware from being upgraded, unless the system is rebooted. To solve this problem, a new reset command (6) was implemented in the firmware. Unlike the current command (1), after issuing the new command the device will not start the reset immediately, but only after a PCI hot reset. Implement the new reset method by first verifying that it is supported by the current firmware version by querying the Management Capabilities Mask (MCAM) register. If supported, issue the new reset command (6) via MRSR register followed by a PCI reset by calling __pci_reset_function_locked(). Once the PCI firmware is operational, go back to the regular reset flow and wait for the entire device to become ready. That is, repeatedly read the "system_status" register from the BAR until a value of "FW_READY" (0x5E) appears. Tested: # for i in $(seq 1 10); do devlink dev reload pci/0000:01:00.0; done Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	mlxsw: pci: Move software reset code to a separate function	Amit Cohen
	In general, the existing flow of software reset in the driver is: 1. Wait for system ready status. 2. Send MRSR command, to start the reset. 3. Wait for system ready status. This flow will be extended once a new reset command is supported. As a preparation, move step #2 to a separate function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	mlxsw: pci: Rename mlxsw_pci_sw_reset()	Amit Cohen
	In the next patches, mlxsw_pci_sw_reset() will be extended to support more reset types and will not necessarily issue a software reset. Rename the function to reflect that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	mlxsw: Extend MRSR pack() function to support new commands	Amit Cohen
	Currently mlxsw_reg_mrsr_pack() always sets 'command=1'. As preparation for support of new reset flow, pass the command as an argument to the function and add an enum for this field. For now, always pass 'command=1' to the pack() function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	PCI: Add debug print for device ready delay	Ido Schimmel
	Currently, the time it took a PCI device to become ready after reset is only printed if it was longer than 1000ms ('PCI_RESET_WAIT'). However, for debugging purposes it is useful to know this time even if it was shorter. For example, with the device I am working on, hardware engineers asked to verify that it becomes ready on the first try (no delay). To that end, add a debug level print that can be enabled using dynamic debug. Example: # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c \| grep ready # echo "file drivers/pci/pci.c +p" > /sys/kernel/debug/dynamic_debug/control # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c \| grep ready [ 396.060335] mlxsw_spectrum4 0000:01:00.0: ready 0ms after bus reset # echo "file drivers/pci/pci.c -p" > /sys/kernel/debug/dynamic_debug/control # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c \| grep ready Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	PCI: Add no PM reset quirk for NVIDIA Spectrum devices	Ido Schimmel
	Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a reset (i.e., they advertise NoSoftRst-). However, this transition does not have any effect on the device: It continues to be operational and network ports remain up. Advertising this support makes it seem as if a PM reset is viable for these devices. Mark it as unavailable to skip it when testing reset methods. Before: # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method pm bus After: # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method bus Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Add device lock assert in reload operation	Ido Schimmel
	Add an assert to verify that the device lock is always held throughout reload operations. Tested the following flows with netdevsim and mlxsw while lockdep is enabled: netdevsim: # echo "10 1" > /sys/bus/netdevsim/new_device # devlink dev reload netdevsim/netdevsim10 # ip netns add bla # devlink dev reload netdevsim/netdevsim10 netns bla # ip netns del bla # echo 10 > /sys/bus/netdevsim/del_device mlxsw: # devlink dev reload pci/0000:01:00.0 # ip netns add bla # devlink dev reload pci/0000:01:00.0 netns bla # ip netns del bla # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove # echo 1 > /sys/bus/pci/rescan Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Acquire device lock during reload command	Ido Schimmel
	Device drivers register with devlink from their probe routines (under the device lock) by acquiring the devlink instance lock and calling devl_register(). Drivers that support a devlink reload usually implement the reload_{down, up}() operations in a similar fashion to their remove and probe routines, respectively. However, while the remove and probe routines are invoked with the device lock held, the reload operations are only invoked with the devlink instance lock held. It is therefore impossible for drivers to acquire the device lock from their reload operations, as this would result in lock inversion. The motivating use case for invoking the reload operations with the device lock held is in mlxsw which needs to trigger a PCI reset as part of the reload. The driver cannot call pci_reset_function() as this function acquires the device lock. Instead, it needs to call __pci_reset_function_locked which expects the device lock to be held. To that end, adjust devlink to always acquire the device lock before the devlink instance lock when performing a reload. Do that when reload is explicitly triggered by user space by specifying the 'DEVLINK_NL_FLAG_NEED_DEV_LOCK' flag in the pre_doit and post_doit operations of the reload command. A previous patch already handled the case where reload is invoked as part of netns dismantle. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Allow taking device lock in pre_doit operations	Ido Schimmel
	Introduce a new private flag ('DEVLINK_NL_FLAG_NEED_DEV_LOCK') to allow netlink commands to specify that they need to acquire the device lock in their pre_doit operation and release it in their post_doit operation. The reload command will use this flag in the subsequent patch. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Enable the use of private flags in post_doit operations	Ido Schimmel
	Currently, private flags (e.g., 'DEVLINK_NL_FLAG_NEED_PORT') are only used in pre_doit operations, but a subsequent patch will need to conditionally lock and unlock the device lock in pre and post doit operations, respectively. As a preparation, enable the use of private flags in post_doit operations in a similar fashion to how it is done for pre_doit operations. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Acquire device lock during netns dismantle	Ido Schimmel
	Device drivers register with devlink from their probe routines (under the device lock) by acquiring the devlink instance lock and calling devl_register(). Drivers that support a devlink reload usually implement the reload_{down, up}() operations in a similar fashion to their remove and probe routines, respectively. However, while the remove and probe routines are invoked with the device lock held, the reload operations are only invoked with the devlink instance lock held. It is therefore impossible for drivers to acquire the device lock from their reload operations, as this would result in lock inversion. The motivating use case for invoking the reload operations with the device lock held is in mlxsw which needs to trigger a PCI reset as part of the reload. The driver cannot call pci_reset_function() as this function acquires the device lock. Instead, it needs to call __pci_reset_function_locked which expects the device lock to be held. To that end, adjust devlink to always acquire the device lock before the devlink instance lock when performing a reload. For now, only do that when reload is triggered as part of netns dismantle. Subsequent patches will handle the case where reload is explicitly triggered by user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	devlink: Move private netlink flags to C file	Ido Schimmel
	The flags are not used outside of the C file so move them there. Suggested-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	Merge tag 'turbostat-2023.11.07' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Turbostat features are now table-driven (Rui Zhang) - Add support for some new platforms (Sumeet Pawnikar, Rui Zhang) - Gracefully run in configs when CPUs are limited (Rui Zhang, Srinivas Pandruvada) - misc minor fixes [ This came in during the merge window, but sorting out the signed tag took a while, so thus the late merge - Linus ] * tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (86 commits) tools/power turbostat: version 2023.11.07 tools/power/turbostat: bugfix "--show IPC" tools/power/turbostat: Add initial support for LunarLake tools/power/turbostat: Add initial support for ArrowLake tools/power/turbostat: Add initial support for GrandRidge tools/power/turbostat: Add initial support for SierraForest tools/power/turbostat: Add initial support for GraniteRapids tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features tools/power/turbostat: Move process to root cgroup tools/power/turbostat: Handle cgroup v2 cpu limitation tools/power/turbostat: Abstrct function for parsing cpu string tools/power/turbostat: Handle offlined CPUs in cpu_subset tools/power/turbostat: Obey allowed CPUs for system summary tools/power/turbostat: Obey allowed CPUs for primary thread/core detection tools/power/turbostat: Abstract several functions tools/power/turbostat: Obey allowed CPUs during startup tools/power/turbostat: Obey allowed CPUs when accessing CPU counters tools/power/turbostat: Introduce cpu_allowed_set tools/power/turbostat: Remove PC7/PC9 support on ADL/RPL tools/power/turbostat: Enable MSR_CORE_C1_RES on recent Intel client platforms ...
2023-11-18	Merge branch 'ncsi-mac-address-command'	David S. Miller
	Patrick Williams says: ==================== net/ncsi: Add NC-SI 1.2 Get MC MAC Address command NC-SI 1.2 has now been published[1] and adds a new command for "Get MC MAC Address". This is often used by BMCs to get the assigned MAC address for the channel used by the BMC. This change set has been tested on a Broadcomm 200G NIC with updated firmware for NC-SI 1.2 and at least one other non-public NIC design. 1. https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.2.0.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net/ncsi: Add NC-SI 1.2 Get MC MAC Address command	Peter Delevoryas
	This change adds support for the NC-SI 1.2 Get MC MAC Address command, specified here: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.2.0.pdf It serves the exact same function as the existing OEM Get MAC Address commands, so if a channel reports that it supports NC-SI 1.2, we prefer to use the standard command rather than the OEM command. Verified with an invalid MAC address and 2 valid ones: [ 55.137072] ftgmac100 1e690000.ftgmac eth0: NCSI: Received 3 provisioned MAC addresses [ 55.137614] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 0: 00:00:00:00:00:00 [ 55.138026] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 1: fa:ce:b0:0c:20:22 [ 55.138528] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 2: fa:ce:b0:0c:20:23 [ 55.139241] ftgmac100 1e690000.ftgmac eth0: NCSI: Unable to assign 00:00:00:00:00:00 to device [ 55.140098] ftgmac100 1e690000.ftgmac eth0: NCSI: Set MAC address to fa:ce:b0:0c:20:22 Signed-off-by: Peter Delevoryas <peter@pjd.dev> Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net/ncsi: Fix netlink major/minor version numbers	Peter Delevoryas
	The netlink interface for major and minor version numbers doesn't actually return the major and minor version numbers. It reports a u32 that contains the (major, minor, update, alpha1) components as the major version number, and then alpha2 as the minor version number. For whatever reason, the u32 byte order was reversed (ntohl): maybe it was assumed that the encoded value was a single big-endian u32, and alpha2 was the minor version. The correct way to get the supported NC-SI version from the network controller is to parse the Get Version ID response as described in 8.4.44 of the NC-SI spec[1]. Get Version ID Response Packet Format Bits +--------+--------+--------+--------+ Bytes \| 31..24 \| 23..16 \| 15..8 \| 7..0 \| +-------+--------+--------+--------+--------+ \| 0..15 \| NC-SI Header \| +-------+--------+--------+--------+--------+ \| 16..19\| Response code \| Reason code \| +-------+--------+--------+--------+--------+ \|20..23 \| Major \| Minor \| Update \| Alpha1 \| +-------+--------+--------+--------+--------+ \|24..27 \| reserved \| Alpha2 \| +-------+--------+--------+--------+--------+ \| .... other stuff .... \| The major, minor, and update fields are all binary-coded decimal (BCD) encoded [2]. The spec provides examples below the Get Version ID response format in section 8.4.44.1, but for practical purposes, this is an example from a live network card: root@bmc:~# ncsi-util 0x15 NC-SI Command Response: cmd: GET_VERSION_ID(0x15) Response: COMMAND_COMPLETED(0x0000) Reason: NO_ERROR(0x0000) Payload length = 40 20: 0xf1 0xf1 0xf0 0x00 <<<<<<<<< (major, minor, update, alpha1) 24: 0x00 0x00 0x00 0x00 <<<<<<<<< (_, _, _, alpha2) 28: 0x6d 0x6c 0x78 0x30 32: 0x2e 0x31 0x00 0x00 36: 0x00 0x00 0x00 0x00 40: 0x16 0x1d 0x07 0xd2 44: 0x10 0x1d 0x15 0xb3 48: 0x00 0x17 0x15 0xb3 52: 0x00 0x00 0x81 0x19 This should be parsed as "1.1.0". "f" in the upper-nibble means to ignore it, contributing zero. If both nibbles are "f", I think the whole field is supposed to be ignored. Major and minor are "required", meaning they're not supposed to be "ff", but the update field is "optional" so I think it can be ff. I think the simplest thing to do is just set the major and minor to zero instead of juggling some conditional logic or something. bcd2bin() from "include/linux/bcd.h" seems to assume both nibbles are 0-9, so I've provided a custom BCD decoding function. Alpha1 and alpha2 are ISO/IEC 8859-1 encoded, which just means ASCII characters as far as I can tell, although the full encoding table for non-alphabetic characters is slightly different (I think). I imagine the alpha fields are just supposed to be alphabetic characters, but I haven't seen any network cards actually report a non-zero value for either. If people wrote software against this netlink behavior, and were parsing the major and minor versions themselves from the u32, then this would definitely break their code. [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.0.0.pdf [2] https://en.wikipedia.org/wiki/Binary-coded_decimal [2] https://en.wikipedia.org/wiki/ISO/IEC_8859-1 Signed-off-by: Peter Delevoryas <peter@pjd.dev> Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net/ncsi: Simplify Kconfig/dts control flow	Peter Delevoryas
	Background: 1. CONFIG_NCSI_OEM_CMD_KEEP_PHY If this is enabled, we send an extra OEM Intel command in the probe sequence immediately after discovering a channel (e.g. after "Clear Initial State"). 2. CONFIG_NCSI_OEM_CMD_GET_MAC If this is enabled, we send one of 3 OEM "Get MAC Address" commands from Broadcom, Mellanox (Nvidida), and Intel in the configuration sequence for a channel. 3. mellanox,multi-host (or mlx,multi-host) Introduced by this patch: https://lore.kernel.org/all/20200108234341.2590674-1-vijaykhemka@fb.com/ Which was actually originally from cosmo.chou@quantatw.com: https://github.com/facebook/openbmc-linux/commit/9f132a10ec48db84613519258cd8a317fb9c8f1b Cosmo claimed that the Nvidia ConnectX-4 and ConnectX-6 NIC's don't respond to Get Version ID, et. al in the probe sequence unless you send the Set MC Affinity command first. Problem Statement: We've been using a combination of #ifdef code blocks and IS_ENABLED() conditions to conditionally send these OEM commands. It makes adding any new code around these commands hard to understand. Solution: In this patch, I just want to remove the conditionally compiled blocks of code, and always use IS_ENABLED(...) to do dynamic control flow. I don't think the small amount of code this adds to non-users of the OEM Kconfigs is a big deal. Signed-off-by: Peter Delevoryas <peter@pjd.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	Merge branch 'net-make-timestamping-selectable'	David S. Miller
	Kory Maincent says: ==================== net: Make timestamping selectable Up until now, there was no way to let the user select the layer at which time stamping occurs. The stack assumed that PHY time stamping is always preferred, but some MAC/PHY combinations were buggy. This series updates the default MAC/PHY default timestamping and aims to allow the user to select the desired layer administratively. Changes in v2: - Move selected_timestamping_layer variable of the concerned patch. - Use sysfs_streq instead of strmcmp. - Use the PHY timestamp only if available. Changes in v3: - Expose the PTP choice to ethtool instead of sysfs. You can test it with the ethtool source on branch feature_ptp of: https://github.com/kmaincent/ethtool - Added a devicetree binding to select the preferred timestamp. Changes in v4: - Move on to ethtool netlink instead of ioctl. - Add a netdev notifier to allow packet trapping by the MAC in case of PHY time stamping. - Add a PHY whitelist to not break the old PHY default time-stamping preference API. Changes in v5: - Update to ndo_hwstamp_get/set. This bring several new patches. - Add few patches to make the glue. - Convert macb to ndo_hwstamp_get/set. - Add netlink specs description of new ethtool commands. - Removed netdev notifier. - Split the patches that expose the timestamping to userspace to separate the core and ethtool development. - Add description of software timestamping. - Convert PHYs hwtstamp callback to use kernel_hwtstamp_config. Changes in v6: - Few fixes from the reviews. - Replace the allowlist to default_timestamp flag to know which phy is using old API behavior. - Rename the timestamping layer enum values. - Move to a simple enum instead of the mix between enum and bitfield. - Update ts_info and ts-set in software timestamping case. Changes in v7: - Fix a temporary build error. - Link to v6: https://lore.kernel.org/r/20231019-feature_ptp_netnext-v6-0-71affc27b0e5@bootlin.com ==================== Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	netlink: specs: Introduce time stamping set command	Kory Maincent
	Add a new commands allowing to set the time stamping. Example usage : ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema \ --do ts-list-get \ --json '{"header":{"dev-name":"eth0"}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-list-layer': b'\x02\x00\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00'} ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-set \ --json '{"header":{"dev-name":"eth0"}, "ts-layer":5}' none ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-get \ --json '{"header":{"dev-name":"eth0"}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-layer': 5} Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: ethtool: ts: Let the active time stamping layer be selectable	Kory Maincent
	Now that the current timestamp is saved in a variable lets add the ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: ethtool: ts: Update GET_TS to reply the current selected timestamp	Kory Maincent
	As the default selected timestamp API change we have to change also the timestamp return by ethtool. This patch return now the current selected timestamp. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: Change the API of PHY default timestamp to MAC	Kory Maincent
	Change the API to select MAC default time stamping instead of the PHY. Indeed the PHY is closer to the wire therefore theoretically it has less delay than the MAC timestamping but the reality is different. Due to lower time stamping clock frequency, latency in the MDIO bus and no PHC hardware synchronization between different PHY, the PHY PTP is often less precise than the MAC. The exception is for PHY designed specially for PTP case but these devices are not very widespread. For not breaking the compatibility I introduce a default_timestamp flag in phy_device that is set by the phy driver to know we are using the old API behavior. The phy_set_timestamp function is called at each call of phy_attach_direct. In case of MAC driver using phylink this function is called when the interface is turned up. Then if the interface goes down and up again the last choice of timestamp will be overwritten by the default choice. A solution could be to cache the timestamp status but it can bring other issues. In case of SFP, if we change the module, it doesn't make sense to blindly re-set the timestamp back to PHY, if the new module has a PHY with mediocre timestamping capabilities. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: Replace hwtstamp_source by timestamping layer	Kory Maincent
	Replace hwtstamp_source which is only used by the kernel_hwtstamp_config structure by the more widely use timestamp_layer structure. This is done to prepare the support of selectable timestamping source. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	netlink: specs: Introduce new netlink command to list available time ↵	Kory Maincent
	stamping layers Add a new commands allowing to list available time stamping layers on a netdevice's link. Example usage : ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema \ --do ts-list-get \ --json '{"header":{"dev-name":"eth0"}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-list-layer': b'\x01\x00\x00\x00\x05\x00\x00\x00'} Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: ethtool: Add a command to list available time stamping layers	Kory Maincent
	Introduce a new netlink message that lists all available time stamping layers on a given interface. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	netlink: specs: Introduce new netlink command to get current timestamp	Kory Maincent
	Add a new commands allowing to get the current time stamping on a netdevice's link. Example usage : ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-get \ --json '{"header":{"dev-name":"eth0"}}' {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-layer': 1} Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: ethtool: Add a command to expose current time stamping layer	Kory Maincent
	Time stamping on network packets may happen either in the MAC or in the PHY, but not both. In preparation for making the choice selectable, expose both the current layers via ethtool. In accordance with the kernel implementation as it stands, the current layer will always read as "phy" when a PHY time stamping device is present. Future patches will allow changing the current layer administratively. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask	Kory Maincent
	Timestamping software or hardware flags are often used as a group, therefore adding these masks will easier future use. I did not use SOF_TIMESTAMPING_SYS_HARDWARE flag as it is deprecated and not use at all. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: phy: micrel: fix ts_info value in case of no phc	Kory Maincent
	In case of no phc we should not return SOFTWARE TIMESTAMPING flags as we do not know whether the netdev supports of timestamping. Remove it from the lan8841_ts_info and simply return 0. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: Make dev_set_hwtstamp_phylib accessible	Kory Maincent
	Make the dev_set_hwtstamp_phylib function accessible in prevision to use it from ethtool to reset the tstamp current configuration. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: macb: Convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()	Kory Maincent
	The hardware timestamping through ndo_eth_ioctl() is going away. Convert the macb driver to the new API before that can be removed. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: ethtool: Refactor identical get_ts_info implementations.	Richard Cochran
	The vlan, macvlan and the bonding drivers call their "real" device driver in order to report the time stamping capabilities. Provide a core ethtool helper function to avoid copy/paste in the stack. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: phy: Remove the call to phy_mii_ioctl in phy_hwstamp_get/set	Kory Maincent
	__phy_hwtstamp_set function were calling the phy_mii_ioctl function which will then use the ifreq pointer to call the hwtstamp callback. Now that ifreq has been removed from the hwstamp callback parameters it seems more logical to not go through the phy_mii_ioctl function and pass directly kernel_hwtstamp_config parameter to the hwtstamp callback. Lets do the same for __phy_hwtstamp_get function and return directly EOPNOTSUPP as SIOCGHWTSTAMP is not supported for now for the PHYs. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18	net: Convert PHYs hwtstamp callback to use kernel_hwtstamp_config	Kory Maincent
	The PHYs hwtstamp callback are still getting the timestamp config from ifreq and using copy_from/to_user. Get rid of these functions by using timestamp configuration in parameter. This also allow to move on to kernel_hwtstamp_config and be similar to net devices using the new ndo_hwstamp_get/set. This adds the possibility to manipulate the timestamp configuration from the kernel which was not possible with the copy_from/to_user. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-17	Merge tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: "Lots of small fixes for minor nits and compiler warnings. Bigger items: - The six locks lost wakeup is finally fixed: six_read_trylock() was checking for the waiting bit before decrementing the number of readers - validated the fix with a torture test. - Fix for a memory reclaim issue: when needing to reallocate a key cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL dance. - Multiple deleted inodes btree fixes - Fix an issue in fsck, where i_nlink would be recalculated incorrectly for hardlinked files if a snapshot had ever been taken. - Kill journal pre-reservations: This is a bigger patch than I would normally send at this point, but it deletes code and it fixes some of our tests that would sporadically die with the journal getting stuck, and it's a performance improvement, too" * tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs: (22 commits) bcachefs: Fix missing locking for dentry->d_parent access bcachefs: six locks: Fix lost wakeup bcachefs: Fix no_data_io mode checksum check bcachefs: Fix bch2_check_nlinks() for snapshots bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y bcachefs: Disable debug log statements bcachefs: Fix missing transaction commit bcachefs: Fix error path in bch2_mount() bcachefs: Fix potential sleeping during mount bcachefs: Fix iterator leak in may_delete_deleted_inode() bcachefs: Kill journal pre-reservations bcachefs: Check for nonce offset inconsistency in data_update path bcachefs: Make sure to drop/retake btree locks before reclaim bcachefs: btree_trans->write_locked bcachefs: Run btree key cache shrinker less aggressively bcachefs: Split out btree_key_cache_types.h bcachefs: Guard against insufficient devices to create stripes bcachefs: Fix null ptr deref in bch2_backpointer_get_node() bcachefs: Fix multiple -Warray-bounds warnings bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds warnings ...
2023-11-17	Merge tag 'mm-hotfixes-stable-2023-11-17-14-04' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Thirteen hotfixes. Seven are cc:stable and the remainder pertain to post-6.6 issues or aren't considered suitable for backporting" * tag 'mm-hotfixes-stable-2023-11-17-14-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: more ptep_get() conversion parisc: fix mmap_base calculation when stack grows upwards mm/damon/core.c: avoid unintentional filtering out of schemes mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors mm/damon/sysfs-schemes: handle tried region directory allocation failure mm/damon/sysfs-schemes: handle tried regions sysfs directory allocation failure mm/damon/sysfs: check error from damon_sysfs_update_target() mm: fix for negative counter: nr_file_hugepages selftests/mm: add hugetlb_fault_after_madv to .gitignore selftests/mm: restore number of hugepages selftests: mm: fix some build warnings selftests: mm: skip whole test instead of failure mm/damon/sysfs: eliminate potential uninitialized variable warning
2023-11-17	Merge tag 'block-6.7-2023-11-17' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fix from Jens Axboe: "Just a single fix from Christoph/Ming, fixing a case where integrity IO could be called without having an appropriate queue reference" * tag 'block-6.7-2023-11-17' of git://git.kernel.dk/linux: blk-mq: make sure active queue usage is held for bio_integrity_prep()
2023-11-17	Merge tag 'io_uring-6.7-2023-11-17' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring fix from Jens Axboe: "Just a single fixup for a change we made in this release, which caused a regression in sometimes missing fdinfo output if the SQPOLL thread had the lock held when fdinfo output was retrieved. This brings us back on par with what we had before, where just the main uring_lock will prevent that output. We'd love to get rid of that too, but that is beyond the scope of this release and will have to wait for 6.8" * tag 'io_uring-6.7-2023-11-17' of git://git.kernel.dk/linux: io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval