linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-12-13	firmware: cs_dsp: Add mock wmfw file generator for KUnit testing	Richard Fitzgerald
	Add a mock firmware file that emulates what the firmware build tools would normally create. This will be used by KUnit tests to generate a test wmfw file. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20241212143725.1381013-4-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-13	firmware: cs_dsp: Add mock DSP memory map for KUnit testing	Richard Fitzgerald
	Add helper functions to implement an emulation of the DSP memory map. There are three main groups of functionality: 1. Define a mock cs_dsp_region table. 2. Calculate the addresses of memory and algorithms from the firmware header in XM. 3. Build a mock XM header in emulated XM. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20241212143725.1381013-3-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-13	firmware: cs_dsp: Add mock regmap for KUnit testing	Richard Fitzgerald
	Add a mock regmap implementation to act as a simulated DSP for KUnit testing. This is built as a utility module so that it could be used by clients of cs_dsp to create a mock "DSP" for their own testing. cs_dsp interacts with the DSP only through registers. Most of the register space of the DSP is RAM. ADSP cores have a small set of control registers. HALO Core DSPs have a much larger set of control registers but only a small subset are used. Most writes are "blind" in the sense that cs_dsp does not expect to receive any sort of response from the DSP. So there isn't any need to emulate a "DSP", only a set of registers that can be written and read back. The idea of the mock regmap is to use the cache to accumulate writes which can then be tested against the values that are expected to be in the registers. Stray writes can be detected by dropping the cache entries for all addresses that should have been written and then issuing a regcache_sync(). If this causes bus writes it means there were writes to unexpected registers. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20241212143725.1381013-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-13	sched/dlserver: Fix dlserver double enqueue	Vineeth Pillai (Google)
	dlserver can get dequeued during a dlserver pick_task due to the delayed deueue feature and this can lead to issues with dlserver logic as it still thinks that dlserver is on the runqueue. The dlserver throttling and replenish logic gets confused and can lead to double enqueue of dlserver. Double enqueue of dlserver could happend due to couple of reasons: Case 1 ------ Delayed dequeue feature[1] can cause dlserver being stopped during a pick initiated by dlserver: __pick_next_task pick_task_dl -> server_pick_task pick_task_fair pick_next_entity (if (sched_delayed)) dequeue_entities dl_server_stop server_pick_task goes ahead with update_curr_dl_se without knowing that dlserver is dequeued and this confuses the logic and may lead to unintended enqueue while the server is stopped. Case 2 ------ A race condition between a task dequeue on one cpu and same task's enqueue on this cpu by a remote cpu while the lock is released causing dlserver double enqueue. One cpu would be in the schedule() and releasing RQ-lock: current->state = TASK_INTERRUPTIBLE(); schedule(); deactivate_task() dl_stop_server(); pick_next_task() pick_next_task_fair() sched_balance_newidle() rq_unlock(this_rq) at which point another CPU can take our RQ-lock and do: try_to_wake_up() ttwu_queue() rq_lock() ... activate_task() dl_server_start() --> first enqueue wakeup_preempt() := check_preempt_wakeup_fair() update_curr() update_curr_task() if (current->dl_server) dl_server_update() enqueue_dl_entity() --> second enqueue This bug was not apparent as the enqueue in dl_server_start doesn't usually happen because of the defer logic. But as a side effect of the first case(dequeue during dlserver pick), dl_throttled and dl_yield will be set and this causes the time accounting of dlserver to messup and then leading to a enqueue in dl_server_start. Have an explicit flag representing the status of dlserver to avoid the confusion. This is set in dl_server_start and reset in dlserver_stop. Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # ROCK 5B Link: https://lkml.kernel.org/r/20241213032244.877029-1-vineeth@bitbyteword.org
2024-12-13	coresight: Add support for trace filtering by source	Tao Zhang
	Some replicators have hard coded filtering of "trace" data, based on the source device. This is different from the trace filtering based on TraceID, available in the standard programmable replicators. e.g., Qualcomm replicators have filtering based on custom trace protocol format and is not programmable. The source device could be connected to the replicator via intermediate components (e.g., a funnel). Thus we need platform information from the firmware tables to decide the source device corresponding to a given output port from the replicator. Given this affects "trace path building" and traversing the path back from the sink to source, add the concept of "filtering by source" to the generic coresight connection. The specified source will be marked like below in the Devicetree. test-replicator { ... ... ... ... out-ports { ... ... ... ... port@0 { reg = <0>; xyz: endpoint { remote-endpoint = <&zyx>; filter-source = <&source_1>; <-- To specify the source to }; be filtered out here. }; port@1 { reg = <1>; abc: endpoint { remote-endpoint = <&cba>; filter-source = <&source_2>; <-- To specify the source to }; be filtered out here. }; }; }; Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20241213100731.25914-4-quic_taozha@quicinc.com
2024-12-13	coresight: Add a helper to check if a device is source	Tao Zhang
	Since there are a lot of places in the code to check whether the device is source, add a helper to check it. Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20241213100731.25914-3-quic_taozha@quicinc.com
2024-12-13	x86/static-call: provide a way to do very early static-call updates	Juergen Gross
	Add static_call_update_early() for updating static-call targets in very early boot. This will be needed for support of Xen guest type specific hypercall functions. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2024-12-12	skbuff: allow 2-4-argument skb_frag_dma_map()	Alexander Lobakin
	skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE) is repeated across dozens of drivers and really wants a shorthand. Add a macro which will count args and handle all possible number from 2 to 5. Semantics: skb_frag_dma_map(dev, frag) -> __skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset) -> __skb_frag_dma_map(dev, frag, offset, skb_frag_size(frag) - offset, DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset, size) -> __skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset, size, dir) -> __skb_frag_dma_map(dev, frag, offset, size, dir) No object code size changes for the existing callers. Users passing less arguments also won't have bigger size comparing to the full equivalent call. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20241211172649.761483-11-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-13	power: supply: core: Add new "charge_types" property	Hans de Goede
	Add a new "charge_types" property, this is identical to "charge_type" but reading returns a list of supported charge-types with the currently active type surrounded by square brackets, e.g.: Fast [Standard] "Long_Life" This has the advantage over the existing "charge_type" property that this allows userspace to find out which charge-types are supported for writable charge_type properties. Drivers which already support "charge_type" can easily add support for this by setting power_supply_desc.charge_types to a bitmask representing valid charge_type values. The existing "charge_type" get_property() and set_property() code paths can be re-used for "charge_types". Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241211174451.355421-3-hdegoede@redhat.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-12-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.13-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12	Merge tag 'net-6.13-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, netfilter and wireless. Current release - fix to a fix: - rtnetlink: fix error code in rtnl_newlink() - tipc: fix NULL deref in cleanup_bearer() Current release - regressions: - ip: fix warning about invalid return from in ip_route_input_rcu() Current release - new code bugs: - udp: fix L4 hash after reconnect - eth: lan969x: fix cyclic dependency between modules - eth: bnxt_en: fix potential crash when dumping FW log coredump Previous releases - regressions: - wifi: mac80211: - fix a queue stall in certain cases of channel switch - wake the queues in case of failure in resume - splice: do not checksum AF_UNIX sockets - virtio_net: fix BUG()s in BQL support due to incorrect accounting of purged packets during interface stop - eth: - stmmac: fix TSO DMA API mis-usage causing oops - bnxt_en: fixes for HW GRO: GSO type on 5750X chips and oops due to incorrect aggregation ID mask on 5760X chips Previous releases - always broken: - Bluetooth: improve setsockopt() handling of malformed user input - eth: ocelot: fix PTP timestamping in presence of packet loss - ptp: kvm: x86: avoid "fail to initialize ptp_kvm" when simply not supported" * tag 'net-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) net: dsa: tag_ocelot_8021q: fix broken reception net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries net: renesas: rswitch: fix initial MPIC register setting Bluetooth: btmtk: avoid UAF in btmtk_process_coredump Bluetooth: iso: Fix circular lock in iso_conn_big_sync Bluetooth: iso: Fix circular lock in iso_listen_bis Bluetooth: SCO: Add support for 16 bits transparent voice setting Bluetooth: iso: Fix recursive locking warning Bluetooth: iso: Always release hdev at the end of iso_listen_bis Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating Bluetooth: hci_core: Fix sleeping function called from invalid context team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL team: Fix initial vlan_feature set in __team_compute_features bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features net, team, bonding: Add netdev_base_features helper net/sched: netem: account for backlog updates from child qdisc net: dsa: felix: fix stuck CPU-injected packets with short taprio windows splice: do not checksum AF_UNIX sockets net: usb: qmi_wwan: add Telit FE910C04 compositions ...
2024-12-12	PCI: endpoint: Replace magic number '6' by PCI_STD_NUM_BARS	Rick Wertenbroek
	Replace the constant "6" by PCI_STD_NUM_BARS, as defined in include/uapi/linux/pci_regs.h: #define PCI_STD_NUM_BARS 6 /* Number of standard BARs */ Link: https://lore.kernel.org/r/20241212162547.225880-1-rick.wertenbroek@gmail.com Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-12-12	turris-omnia-mcu-interface.h: Add LED commands related definitions to global ↵	Marek Behún
	header Add definitions for contents of the OMNIA_CMD_LED_MODE and OMNIA_CMD_LED_STATE commands to the global turris-omnia-mcu-interface.h header. Signed-off-by: Marek Behún <kabel@kernel.org> Link: https://lore.kernel.org/r/20241111100355.6978-4-kabel@kernel.org Signed-off-by: Lee Jones <lee@kernel.org>
2024-12-12	turris-omnia-mcu-interface.h: Move command execution function to global header	Marek Behún
	Move the command execution functions from the turris-omnia-mcu platform driver private header to the global turris-omnia-mcu-interface.h header, so that they can be used by the LED driver. Signed-off-by: Marek Behún <kabel@kernel.org> Link: https://lore.kernel.org/r/20241111100355.6978-2-kabel@kernel.org Signed-off-by: Lee Jones <lee@kernel.org>
2024-12-12	block: Make bio_iov_bvec_set() accept pointer to const iov_iter	John Garry
	Make bio_iov_bvec_set() accept a pointer to const iov_iter, which means that we can drop the undesirable casting to struct iov_iter pointer in blk_rq_map_user_bvec(). Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241202115727.2320401-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-12	Merge branch 'platform-drivers-x86-platform-profile' into for-next	Ilpo Järvinen

2024-12-12	net, team, bonding: Add netdev_base_features helper	Daniel Borkmann
	Both bonding and team driver have logic to derive the base feature flags before iterating over their slave devices to refine the set via netdev_increment_features(). Add a small helper netdev_base_features() so this can be reused instead of having it open-coded multiple times. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-11	lib: packing: add pack_fields() and unpack_fields()	Vladimir Oltean
	This is new API which caters to the following requirements: - Pack or unpack a large number of fields to/from a buffer with a small code footprint. The current alternative is to open-code a large number of calls to pack() and unpack(), or to use packing() to reduce that number to half. But packing() is not const-correct. - Use unpacked numbers stored in variables smaller than u64. This reduces the rodata footprint of the stored field arrays. - Perform error checking at compile time, rather than runtime, and return void from the API functions. Because the C preprocessor can't generate variable length code (loops), this is a bit tricky to do with macros. To handle this, implement macros which sanity check the packed field definitions based on their size. Finally, a single macro with a chain of __builtin_choose_expr() is used to select the appropriate macros. We enforce the use of ascending or descending order to avoid O(N^2) scaling when checking for overlap. Note that the macros are written with care to ensure that the compilers can correctly evaluate the resulting code at compile time. In particular, care was taken with avoiding too many nested statement expressions. Nested statement expressions trip up some compilers, especially when passing down variables created in previous statement expressions. There are two key design choices intended to keep the overall macro code size small. First, the definition of each CHECK_PACKED_FIELDS_N macro is implemented recursively, by calling the N-1 macro. This avoids needing the code to repeat multiple times. Second, the CHECK_PACKED_FIELD macro enforces that the fields in the array are sorted in order. This allows checking for overlap only with neighboring fields, rather than the general overlap case where each field would need to be checked against other fields. The overlap checks use the first two fields to determine the order of the remaining fields, thus allowing either ascending or descending order. This enables drivers the flexibility to keep the fields ordered in which ever order most naturally fits their hardware design and its associated documentation. The CHECK_PACKED_FIELDS macro is directly called from within pack_fields and unpack_fields, ensuring that all drivers using the API receive the benefits of the compile-time checks. Users do not need to directly call any of the macros directly. The CHECK_PACKED_FIELDS and its helper macros CHECK_PACKED_FIELDS_(0..50) are generated using a simple C program in scripts/gen_packed_field_checks.c This program can be compiled on demand and executed to generate the macro code in include/linux/packing.h. This will aid in the event that a driver needs more than 50 fields. The generator can be updated with a new size, and used to update the packing.h header file. In practice, the ice driver will need to support 27 fields, and the sja1105 driver will need to support 0 fields. This on-demand generation avoids the need to modify Kbuild. We do not anticipate the maximum number of fields to grow very often. - Reduced rodata footprint for the storage of the packed field arrays. To that end, we have struct packed_field_u8 and packed_field_u16, which define the fields with the associated type. More can be added as needed (unlikely for now). On these types, the same generic pack_fields() and unpack_fields() API can be used, thanks to the new C11 _Generic() selection feature, which can call pack_fields_u8() or pack_fields_16(), depending on the type of the "fields" array - a simplistic form of polymorphism. It is evaluated at compile time which function will actually be called. Over time, packing() is expected to be completely replaced either with pack() or with pack_fields(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-3-ee56a47479ac@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11	kexec: Consolidate machine_kexec_mask_interrupts() implementation	Eliav Farber
	Consolidate the machine_kexec_mask_interrupts implementation into a common function located in a new file: kernel/irq/kexec.c. This removes duplicate implementations from architecture-specific files in arch/arm, arch/arm64, arch/powerpc, and arch/riscv, reducing code duplication and improving maintainability. The new implementation retains architecture-specific behavior for CONFIG_GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD, which was previously implemented for ARM64. When enabled (currently for ARM64), it clears the active state of interrupts forwarded to virtual machines (VMs) before handling other interrupt masking operations. Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204142003.32859-2-farbere@amazon.com
2024-12-11	iio: consumers: ensure read buffers for labels and ext_info are page aligned	Matteo Martelli
	Attributes of iio providers are exposed via sysfs. Typically, providers pass attribute values to the iio core, which handles formatting and printing to sysfs. However, some attributes, such as labels or extended info, are directly formatted and printed to sysfs by provider drivers using sysfs_emit() and sysfs_emit_at(). These helpers assume the read buffer, allocated by sysfs fop, is page-aligned. When these attributes are accessed by consumer drivers, the read buffer is allocated by the consumer and may not be page-aligned, leading to failures in the provider's callback that utilizes sysfs_emit*. Add a check to ensure that read buffers for labels and external info attributes are page-aligned. Update the prototype documentation as well. Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com> Link: https://patch.msgid.link/20241202-iio-kmalloc-align-v1-1-aa9568c03937@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-12-11	iio: adc: ad_sigma_delta: Store information about reset sequence length	Uwe Kleine-König
	The various chips can be reset using a sequence of SPI transfers with MOSI = 1. The length of such a sequence varies from chip to chip. Store that length in struct ad_sigma_delta_info and replace the respective parameter to ad_sd_reset() with it. Note the ad7192 used to pass 48 as length but the documentation specifies 40 as the required length. Assuming the latter is right. (Using a too long sequence doesn't hurt apart from using a longer spi transfer than necessary, so this is no relevant fix.) The motivation for storing this information is that this is useful to clear a pending R̅D̅Y̅ signal in the next change. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/9750db62fce638bf140ff48172c23bff7f785e5b.1733504533.git.u.kleine-koenig@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-12-11	iio: adc: ad_sigma_delta: Fix a race condition	Uwe Kleine-König
	The ad_sigma_delta driver helper uses irq_disable_nosync(). With that one it is possible that the irq handler still runs after the irq_disable_nosync() function call returns. Also to properly synchronize irq disabling in the different threads proper locking is needed and because it's unclear if the irq handler's irq_disable_nosync() call comes first or the one in the enabler's error path, all code locations that disable the irq must check for .irq_dis first to ensure there is exactly one disable call per enable call. So add a spinlock to the struct ad_sigma_delta and use it to synchronize irq enabling and disabling. Also only act in the irq handler if the irq is still enabled. Fixes: af3008485ea0 ("iio:adc: Add common code for ADI Sigma Delta devices") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/9e6def47e2e773e0e15b7a2c29d22629b53d91b1.1733504533.git.u.kleine-koenig@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-12-11	iio: adc: ad_sigma_delta: Add support for reading irq status using a GPIO	Uwe Kleine-König
	Some of the ADCs by Analog signal their irq condition on the MISO line. So typically that line is connected to an SPI controller and a GPIO. The GPIO is used as input and the respective interrupt is enabled when the last SPI transfer is completed. Depending on the GPIO controller the toggling MISO line might make the interrupt pending even while it's masked. In that case the irq handler is called immediately after irq_enable() and so before the device actually pulls that line low which results in non-sense values being reported to the upper layers. The only way to find out if the line was actually pulled low is to read the GPIO. (There is a flag in AD7124's status register that also signals if an interrupt was asserted, but reading that register toggles the MISO line and so might trigger another spurious interrupt.) Add the possibility to specify an interrupt GPIO in the machine description in addition to the plain interrupt. This GPIO is used then to check if the irq line is actually active in the irq handler. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/5be9a4cc4dc600ec384c88db01dd661a21506b9c.1733504533.git.u.kleine-koenig@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-12-11	fs: don't block write during exec on pre-content watched files	Amir Goldstein
	Commit 2a010c412853 ("fs: don't block i_writecount during exec") removed the legacy behavior of getting ETXTBSY on attempt to open and executable file for write while it is being executed. This commit was reverted because an application that depends on this legacy behavior was broken by the change. We need to allow HSM writing into executable files while executed to fill their content on-the-fly. To that end, disable the ETXTBSY legacy behavior for files that are watched by pre-content events. This change is not expected to cause regressions with existing systems which do not have any pre-content event listeners. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241128142532.465176-1-amir73il@gmail.com
2024-12-11	fsnotify: generate pre-content permission event on page fault	Josef Bacik
	FS_PRE_ACCESS will be generated on page fault depending on the faulting method. This pre-content event is meant to be used by hierarchical storage managers that want to fill in the file content on first read access. Export a simple helper that file systems that have their own ->fault() will use, and have a more complicated helper to be do fancy things in filemap_fault. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/aa56c50ce81b1fd18d7f5d71dd2dfced5eba9687.1731684329.git.josef@toxicpanda.com
2024-12-11	regmap: regmap_multi_reg_read(): make register list const	Richard Fitzgerald
	Mark the list of registers passed into regmap_multi_reg_read() as a pointer to const. This allows the caller to define the register list as const data. This requires making the same change to _regmap_bulk_read(), which is called by regmap_multi_reg_read(). Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20241211133558.884669-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-12-11	coresight: Add support to get static id for system trace sources	Mao Jinlong
	Dynamic trace id was introduced in coresight subsystem, so trace id is allocated dynamically. However, some hardware ATB source has static trace id and it cannot be changed via software programming. For such source, it can call coresight_get_static_trace_id to get the fixed trace id from device node and pass id to coresight_trace_id_get_static_system_id to reserve the id. Signed-off-by: Mao Jinlong <quic_jinlmao@quicinc.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20241121062829.11571-3-quic_jinlmao@quicinc.com
2024-12-11	coresight: Drop atomics in connection refcounts	James Clark
	These belong to the device being enabled or disabled and are only ever used inside the device's spinlock. Remove the atomics to not imply that there are any other concurrent accesses. If atomics were necessary I don't think they would have been enough anyway. There would be nothing to prevent an enable or disable running concurrently if not for the spinlock. Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20241128121414.2425119-1-james.clark@linaro.org
2024-12-11	iomap: fix zero padding data issue in concurrent append writes	Long Li
	During concurrent append writes to XFS filesystem, zero padding data may appear in the file after power failure. This happens due to imprecise disk size updates when handling write completion. Consider this scenario with concurrent append writes same file: Thread 1: Thread 2: ------------ ----------- write [A, A+B] update inode size to A+B submit I/O [A, A+BS] write [A+B, A+B+C] update inode size to A+B+C <I/O completes, updates disk size to min(A+B+C, A+BS)> <power failure> After reboot: 1) with A+B+C < A+BS, the file has zero padding in range [A+B, A+B+C] \|< Block Size (BS) >\| \|DDDDDDDDDDDDDDDD0000000000000000\| ^ ^ ^ A A+B A+B+C (EOF) 2) with A+B+C > A+BS, the file has zero padding in range [A+B, A+BS] \|< Block Size (BS) >\|< Block Size (BS) >\| \|DDDDDDDDDDDDDDDD0000000000000000\|00000000000000000000000000000000\| ^ ^ ^ ^ A A+B A+BS A+B+C (EOF) D = Valid Data 0 = Zero Padding The issue stems from disk size being set to min(io_offset + io_size, inode->i_size) at I/O completion. Since io_offset+io_size is block size granularity, it may exceed the actual valid file data size. In the case of concurrent append writes, inode->i_size may be larger than the actual range of valid file data written to disk, leading to inaccurate disk size updates. This patch modifies the meaning of io_size to represent the size of valid data within EOF in an ioend. If the ioend spans beyond i_size, io_size will be trimmed to provide the file with more accurate size information. This is particularly useful for on-disk size updates at completion time. After this change, ioends that span i_size will not grow or merge with other ioends in concurrent scenarios. However, these cases that need growth/merging rarely occur and it seems no noticeable performance impact. Although rounding up io_size could enable ioend growth/merging in these scenarios, we decided to keep the code simple after discussion [1]. Another benefit is that it makes the xfs_ioend_is_append() check more accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio() in certain scenarios, such as repeated writes at the file tail without extending the file size. Link [1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure") # goes further back than this Signed-off-by: Long Li <leo.lilong@huawei.com> Link: https://lore.kernel.org/r/20241209114241.3725722-3-leo.lilong@huawei.com Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-10	rtnetlink: remove pad field in ndo_fdb_dump_context	Eric Dumazet
	I chose to remove this field in a separate patch to ease potential bisection, in case one ndo_fdb_dump() is still using the old way (cb->args[2] instead of ctx->fdb_idx) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10	rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()	Eric Dumazet
	This is the last netdev iterator still using net->dev_index_head[]. Convert to modern for_each_netdev_dump() for better scalability, and use common patterns in our stack. Following patch in this series removes the pad field in struct ndo_fdb_dump_context. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10	rtnetlink: add ndo_fdb_dump_context	Eric Dumazet
	rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx. Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following patch, make this more explicit. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11	power: supply: core: introduce dev_to_psy()	Thomas Weißschuh
	The psy core and drivers currently use dev_get_drvdata() to go from a 'struct device' to its 'struct power_supply'. This is not typesafe and or documented. Introduce a new helper to make this pattern explicit. Instead of using dev_get_drvdata(), use container_of_const() which also preserves the constness. Furthermore 'dev' does need to be dereferenced anymore and at some point the drvdata could be reused for something else. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241210-power-supply-dev_to_psy-v2-7-9d8c9d24cfe4@weissschuh.net Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-12-11	power: supply: core: remove power_supply_for_each_device()	Thomas Weißschuh
	There are no users anymore. All potential future users are expected to use power_supply_for_each_psy(). Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241210-power-supply-dev_to_psy-v2-6-9d8c9d24cfe4@weissschuh.net Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-12-11	power: supply: core: introduce power_supply_for_each_psy()	Thomas Weißschuh
	All existing callers of power_supply_for_each_device() want to iterate over 'struct power_supply', not 'struct device'. The power_supply_for_each_device() forces each caller to duplicate the logic to go from one to the other. Introduce power_supply_for_each_psy() to simplify the callers. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241210-power-supply-dev_to_psy-v2-2-9d8c9d24cfe4@weissschuh.net Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-12-10	bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()	Jann Horn
	Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no immediately visible RCU protection, so it looks as if the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything. Fix it by aligning the semantics to bpf_prog_run_array(): Let the caller provide rcu_read_lock_trace() protection and then load call->prog_array with rcu_dereference_check(). This issue seems to be theoretical: I don't know of any way to reach this code without having handle_swbp() further up the stack, which is already holding a rcu_read_lock_trace() lock, so where we take rcu_read_lock_trace() in __uprobe_perf_func()/bpf_prog_run_array_uprobe() doesn't actually have any effect. Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241210-bpf-fix-uprobe-uaf-v4-1-5fc8959b2b74@google.com
2024-12-10	bpf: check changes_pkt_data property for extension programs	Eduard Zingerman
	When processing calls to global sub-programs, verifier decides whether to invalidate all packet pointers in current state depending on the changes_pkt_data property of the global sub-program. Because of this, an extension program replacing a global sub-program must be compatible with changes_pkt_data property of the sub-program being replaced. This commit: - adds changes_pkt_data flag to struct bpf_prog_aux: - this flag is set in check_cfg() for main sub-program; - in jit_subprogs() for other sub-programs; - modifies bpf_check_attach_btf_id() to check changes_pkt_data flag; - moves call to check_attach_btf_id() after the call to check_cfg(), because it needs changes_pkt_data flag to be set: bpf_check: ... ... - check_attach_btf_id resolve_pseudo_ldimm64 resolve_pseudo_ldimm64 --> bpf_prog_is_offloaded bpf_prog_is_offloaded check_cfg check_cfg + check_attach_btf_id ... ... The following fields are set by check_attach_btf_id(): - env->ops - prog->aux->attach_btf_trace - prog->aux->attach_func_name - prog->aux->attach_func_proto - prog->aux->dst_trampoline - prog->aux->mod - prog->aux->saved_dst_attach_type - prog->aux->saved_dst_prog_type - prog->expected_attach_type Neither of these fields are used by resolve_pseudo_ldimm64() or bpf_prog_offload_verifier_prep() (for netronome and netdevsim drivers), so the reordering is safe. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10	bpf: track changes_pkt_data property for global functions	Eduard Zingerman
	When processing calls to certain helpers, verifier invalidates all packet pointers in a current state. For example, consider the following program: __attribute__((__noinline__)) long skb_pull_data(struct __sk_buff sk, __u32 len) { return bpf_skb_pull_data(sk, len); } SEC("tc") int test_invalidate_checks(struct __sk_buff sk) { int p = (void )(long)sk->data; if ((void )(p + 1) > (void )(long)sk->data_end) return TCX_DROP; skb_pull_data(sk, 0); *p = 42; return TCX_PASS; } After a call to bpf_skb_pull_data() the pointer 'p' can't be used safely. See function filter.c:bpf_helper_changes_pkt_data() for a list of such helpers. At the moment verifier invalidates packet pointers when processing helper function calls, and does not traverse global sub-programs when processing calls to global sub-programs. This means that calls to helpers done from global sub-programs do not invalidate pointers in the caller state. E.g. the program above is unsafe, but is not rejected by verifier. This commit fixes the omission by computing field bpf_subprog_info->changes_pkt_data for each sub-program before main verification pass. changes_pkt_data should be set if: - subprogram calls helper for which bpf_helper_changes_pkt_data returns true; - subprogram calls a global function, for which bpf_subprog_info->changes_pkt_data should be set. The verifier.c:check_cfg() pass is modified to compute this information. The commit relies on depth first instruction traversal done by check_cfg() and absence of recursive function calls: - check_cfg() would eventually visit every call to subprogram S in a state when S is fully explored; - when S is fully explored: - every direct helper call within S is explored (and thus changes_pkt_data is set if needed); - every call to subprogram S1 called by S was visited with S1 fully explored (and thus S inherits changes_pkt_data from S1). The downside of such approach is that dead code elimination is not taken into account: if a helper call inside global function is dead because of current configuration, verifier would conservatively assume that the call occurs for the purpose of the changes_pkt_data computation. Reported-by: Nick Zavaritsky <mejedi@gmail.com> Closes: https://lore.kernel.org/bpf/0498CA22-5779-4767-9C0C-A9515CEA711F@gmail.com/ Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10	bpf: refactor bpf_helper_changes_pkt_data to use helper number	Eduard Zingerman
	Use BPF helper number instead of function pointer in bpf_helper_changes_pkt_data(). This would simplify usage of this function in verifier.c:check_cfg() (in a follow-up patch), where only helper number is easily available and there is no real need to lookup helper proto. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10	ACPI: platform_profile: Add concept of a "custom" profile	Mario Limonciello
	When two profile handlers don't agree on the current profile it's ambiguous what to show to the legacy sysfs interface. Add a "custom" profile string that userspace will be able to use the legacy sysfs interface to distinguish this situation.. Additionally drivers can choose to use this to indicate that a user has modified driver settings in a way that the platform profile advertised by a driver is not accurate. Reviewed-by: Armin Wolf <W_Armin@gmx.de> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-17-mario.limonciello@amd.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	ACPI: platform_profile: Create class for ACPI platform profile	Mario Limonciello
	When registering a platform profile handler create a class device that will allow changing a single platform profile handler. The class and sysfs group are no longer needed when the platform profile core is a module and unloaded, so remove them at that time as well. Reviewed-by: Armin Wolf <W_Armin@gmx.de> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-11-mario.limonciello@amd.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	ACPI: platform_profile: Pass the profile handler into platform_profile_notify()	Mario Limonciello
	The profile handler will be used to notify the appropriate class devices. Reviewed-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-6-mario.limonciello@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	ACPI: platform_profile: Add platform handler argument to ↵	Mario Limonciello
	platform_profile_remove() To allow registering and unregistering multiple platform handlers calls to platform_profile_remove() will need to know which handler is to be removed. Add an argument for this. Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-5-mario.limonciello@amd.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	ACPI: platform_profile: Add device pointer into platform profile handler	Mario Limonciello
	In order to let platform profile handlers manage platform profile for their driver the core code will need a pointer to the device. Add this to the structure and use it in the trivial driver cases. Reviewed-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-4-mario.limonciello@amd.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	ACPI: platform-profile: Add a name member to handlers	Mario Limonciello
	In order to prepare for allowing multiple handlers, introduce a name field that can be used to distinguish between different handlers. Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241206031918.1537-2-mario.limonciello@amd.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-10	of: Hide of_default_bus_match_table[]	Stephen Boyd
	This isn't used outside this file. Hide the array in the C file. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20241204194806.2665589-1-swboyd@chromium.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-12-10	block: Prevent potential deadlocks in zone write plug error recovery	Damien Le Moal
	Zone write plugging for handling writes to zones of a zoned block device always execute a zone report whenever a write BIO to a zone fails. The intent of this is to ensure that the tracking of a zone write pointer is always correct to ensure that the alignment to a zone write pointer of write BIOs can be checked on submission and that we can always correctly emulate zone append operations using regular write BIOs. However, this error recovery scheme introduces a potential deadlock if a device queue freeze is initiated while BIOs are still plugged in a zone write plug and one of these write operation fails. In such case, the disk zone write plug error recovery work is scheduled and executes a report zone. This in turn can result in a request allocation in the underlying driver to issue the report zones command to the device. But with the device queue freeze already started, this allocation will block, preventing the report zone execution and the continuation of the processing of the plugged BIOs. As plugged BIOs hold a queue usage reference, the queue freeze itself will never complete, resulting in a deadlock. Avoid this problem by completely removing from the zone write plugging code the use of report zones operations after a failed write operation, instead relying on the device user to either execute a report zones, reset the zone, finish the zone, or give up writing to the device (which is a fairly common pattern for file systems which degrade to read-only after write failures). This is not an unreasonnable requirement as all well-behaved applications, FSes and device mapper already use report zones to recover from write errors whenever possible by comparing the current position of a zone write pointer with what their assumption about the position is. The changes to remove the automatic error recovery are as follows: - Completely remove the error recovery work and its associated resources (zone write plug list head, disk error list, and disk zone_wplugs_work work struct). This also removes the functions disk_zone_wplug_set_error() and disk_zone_wplug_clear_error(). - Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write plug whenever a write opration targetting the zone of the zone write plug fails. This flag indicates that the zone write pointer offset is not reliable and that it must be updated when the next report zone, reset zone, finish zone or disk revalidation is executed. - Modify blk_zone_write_plug_bio_endio() to set the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed write BIO. - Modify the function disk_zone_wplug_set_wp_offset() to clear this new flag, thus implementing recovery of a correct write pointer offset with the reset (all) zone and finish zone operations. - Modify blkdev_report_zones() to always use the disk_report_zones_cb() callback so that disk_zone_wplug_sync_wp_offset() can be called for any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag. This implements recovery of a correct write pointer offset for zone write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within the range of the report zones operation executed by the user. - Modify blk_revalidate_seq_zone() to call disk_zone_wplug_sync_wp_offset() for all sequential write required zones when a zoned block device is revalidated, thus always resolving any inconsistency between the write pointer offset of zone write plugs and the actual write pointer position of sequential zones. Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-10	dm: Fix dm-zoned-reclaim zone write pointer alignment	Damien Le Moal
	The zone reclaim processing of the dm-zoned device mapper uses blkdev_issue_zeroout() to align the write pointer of a zone being used for reclaiming another zone, to write the valid data blocks from the zone being reclaimed at the same position relative to the zone start in the reclaim target zone. The first call to blkdev_issue_zeroout() will try to use hardware offload using a REQ_OP_WRITE_ZEROES operation if the device reports a non-zero max_write_zeroes_sectors queue limit. If this operation fails because of the lack of hardware support, blkdev_issue_zeroout() falls back to using a regular write operation with the zero-page as buffer. Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by the block layer zone write plugging code which will execute a report zones operation to ensure that the write pointer of the target zone of the failed operation has not changed and to "rewind" the zone write pointer offset of the target zone as it was advanced when the write zero operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not cause any issue and blkdev_issue_zeroout() works as expected. However, since the automatic recovery of zone write pointers by the zone write plugging code can potentially cause deadlocks with queue freeze operations, a different recovery must be implemented in preparation for the removal of zone write plugging report zones based recovery. Do this by introducing the new function blk_zone_issue_zeroout(). This function first calls blkdev_issue_zeroout() with the flag BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution which attempt to use the device hardware offload with the REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone operation is issued to restore the zone write pointer offset of the target zone to the correct position and blkdev_issue_zeroout() is called again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones operation performing this recovery is implemented using the helper function disk_zone_sync_wp_offset() which calls the gendisk report_zones file operation with the callback disk_report_zones_cb(). This callback updates the target write pointer offset of the target zone using the new function disk_zone_wplug_sync_wp_offset(). dmz_reclaim_align_wp() is modified to change its call to blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any other change needed as the two functions are functionnally equivalent. Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-10	rseq: Validate read-only fields under DEBUG_RSEQ config	Mathieu Desnoyers
	The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://github.com/google/tcmalloc/issues/144
2024-12-10	pmdomain: core: Support naming idle states	Konrad Dybcio
	Commit 422f2d418186 ("arm64: dts: qcom: Drop undocumented domain "idle-state-name"") brought to light the common misbelief that idle-state-names also applies to e.g. PSCI power domain idle states. Make that a reality, mimicking the property name used by cpuidle states. Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Message-ID: <20241130-topic-idle_state_name-v1-2-d0ff67b0c8e9@oss.qualcomm.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>