linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-07-03	perf: RISC-V: Check standard event availability	Samuel Holland
	The RISC-V SBI PMU specification defines several standard hardware and cache events. Currently, all of these events are exposed to userspace, even when not actually implemented. They appear in the `perf list` output, and commands like `perf stat` try to use them. This is more than just a cosmetic issue, because the PMU driver's .add function fails for these events, which causes pmu_groups_sched_in() to prematurely stop scheduling in other (possibly valid) hardware events. Add logic to check which events are supported by the hardware (i.e. can be mapped to some counter), so only usable events are reported to userspace. Since the kernel does not know the mapping between events and possible counters, this check must happen during boot, when no counters are in use. Make the check asynchronous to minimize impact on boot time. Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03	drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus	Samuel Holland
	Currently, we stop all the counters while a new cpu is brought online. However, the hpmevent to counter mappings are not reset. The firmware may have some stale encoding in their mapping structure which may lead to undesirable results. We have not encountered such scenario though. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03	drivers/perf: riscv: Do not update the event data if uptodate	Atish Patra
	In case of an counter overflow, the event data may get corrupted if called from an external overflow handler. This happens because we can't update the counter without starting it when SBI PMU extension is in use. However, the prev_count has been already updated at the first pass while the counter value is still the old one. The solution is simple where we don't need to update it again if it is already updated which can be detected using hwc state. The event state in the overflow handler is updated in the following patch. Thus, this fix can't be backported to kernel version where overflow support was added. Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function") Closes:https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/ Reported-by: garthlei@pku.edu.cn Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-1-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03	nilfs2: fix incorrect inode allocation from reserved inodes	Ryusuke Konishi
	If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	nilfs2: add missing check for inode numbers on directory entries	Ryusuke Konishi
	Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by: Jan Kara <jack@suse.cz> Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	nilfs2: fix inode number range checks	Ryusuke Konishi
	Patch series "nilfs2: fix potential issues related to reserved inodes". This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. This patch (of 3): In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	mm: avoid overflows in dirty throttling logic	Jan Kara
	The dirty throttling logic is interspersed with assumptions that dirty limits in PAGE_SIZE units fit into 32-bit (so that various multiplications fit into 64-bits). If limits end up being larger, we will hit overflows, possible divisions by 0 etc. Fix these problems by never allowing so large dirty limits as they have dubious practical value anyway. For dirty_bytes / dirty_background_bytes interfaces we can just refuse to set so large limits. For dirty_ratio / dirty_background_ratio it isn't so simple as the dirty limit is computed from the amount of available memory which can change due to memory hotplug etc. So when converting dirty limits from ratios to numbers of pages, we just don't allow the result to exceed UINT_MAX. This is root-only triggerable problem which occurs when the operator sets dirty limits to >16 TB. Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Zach O'Keefe <zokeefe@google.com> Reviewed-By: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"	Jan Kara
	Patch series "mm: Avoid possible overflows in dirty throttling". Dirty throttling logic assumes dirty limits in page units fit into 32-bits. This patch series makes sure this is true (see patch 2/2 for more details). This patch (of 2): This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78. The commit is broken in several ways. Firstly, the removed (u64) cast from the multiplication will introduce a multiplication overflow on 32-bit archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the default settings with 4GB of RAM will trigger this). Secondly, the div64_u64() is unnecessarily expensive on 32-bit archs. We have div64_ul() in case we want to be safe & cheap. Thirdly, if dirty thresholds are larger than 1<<32 pages, then dirty balancing is going to blow up in many other spectacular ways anyway so trying to fix one possible overflow is just moot. Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-By: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	mm: optimize the redundant loop of mm_update_owner_next()	Jinliang Zheng
	When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tycho Andersen <tandersen@netflix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03	Merge tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring fix from Jens Axboe: "A fix for a feature that went into the 6.10 merge window actually ended up causing a regression in building bundles for receives. Fix that up by ensuring we don't overwrite msg_inq before we use it in the loop" * tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux: io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it
2024-07-03	Merge tag 'media/v6.10-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some fixes related to the IPU6 driver" * tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: ivsc: Depend on IPU_BRIDGE or not IPU_BRIDGE media: intel/ipu6: Fix a null pointer dereference in ipu6_isys_query_stream_by_source media: ipu6: Use the ISYS auxdev device as the V4L2 device's device
2024-07-03	s390/dasd: Fix invalid dereferencing of indirect CCW data pointer	Stefan Haberland
	Fix invalid dereferencing of indirect CCW data pointer in dasd_eckd_dump_sense() that leads to a kernel panic in error cases. When using indirect addressing for DASD CCWs (IDAW) the CCW CDA pointer does not contain the data address itself but a pointer to the IDAL. This needs to be translated from physical to virtual as well before using it. This dereferencing is also used for dasd_page_cache and also fixed although it is very unlikely that this code path ever gets used. Fixes: c0bd39601c13 ("s390/dasd: use new address translation helpers") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-07-03	serial: imx: ensure RTS signal is not left active after shutdown	Rasmus Villemoes
	If a process is killed while writing to a /dev/ttymxc* device in RS485 mode, we observe that the RTS signal is left high, thus making it impossible for other devices to transmit anything. Moreover, the ->tx_state variable is left in state SEND, which means that when one next opens the device and configures baud rate etc., the initialization code in imx_uart_set_termios dutifully ensures the RTS pin is pulled down, but since ->tx_state is already SEND, the logic in imx_uart_start_tx() does not in fact pull the pin high before transmitting, so nothing actually gets on the wire on the other side of the transceiver. Only when that transmission is allowed to complete is the state machine then back in a consistent state. This is completely reproducible by doing something as simple as seq 10000 > /dev/ttymxc0 and hitting ctrl-C, and watching with a logic analyzer. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: stable <stable@kernel.org> Reviewed-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20240625184206.508837-1-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	tty: serial: ma35d1: Add a NULL check for of_node	Jacky Huang
	The pdev->dev.of_node can be NULL if the "serial" node is absent. Add a NULL check to return an error in such cases. Fixes: 930cbf92db01 ("tty: serial: Add Nuvoton ma35d1 serial driver support") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/8df7ce45-fd58-4235-88f7-43fe7cd67e8f@moroto.mountain/ Signed-off-by: Jacky Huang <ychuang3@nuvoton.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20240625064128.127-1-ychuang570808@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	serial: 8250_omap: Fix Errata i2310 with RX FIFO level check	Udit Kumar
	Errata i2310[0] says, Erroneous timeout can be triggered, if this Erroneous interrupt is not cleared then it may leads to storm of interrupts. Commit 9d141c1e6157 ("serial: 8250_omap: Implementation of Errata i2310") which added the workaround but missed ensuring RX FIFO is really empty before applying the errata workaround as recommended in the errata text. Fix this by adding back check for UART_OMAP_RX_LVL to be 0 for workaround to take effect. [0] https://www.ti.com/lit/pdf/sprz536 page 23 Fixes: 9d141c1e6157 ("serial: 8250_omap: Implementation of Errata i2310") Cc: stable@vger.kernel.org Reported-by: Vignesh Raghavendra <vigneshr@ti.com> Closes: https://lore.kernel.org/all/e96d0c55-0b12-4cbf-9d23-48963543de49@ti.com/ Signed-off-by: Udit Kumar <u-kumar1@ti.com> Link: https://lore.kernel.org/r/20240625160725.2102194-1-u-kumar1@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference	Miri Korenblit
	iwl_mvm_get_bss_vif might return a NULL or ERR_PTR. Some of the callers check only the NULL case, and some doesn't check at all. Some of the callers even have a pointer to the mvmvif of the bss vif, so we don't even need to call this function, and can simply get the vif from mvmvif. Do it for those cases, and for the others - properly check if IS_ERR_OR_NULL Fixes: ec0d43d26f2c ("wifi: iwlwifi: mvm: Activate EMLSR based on traffic volume") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240703064027.a661f8c65aac.I45cf09b01af8ee3d55828863958ead741ea43b7f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03	usb: dwc3: pci: add support for the Intel Panther Lake	Heikki Krogerus
	This patch adds the necessary PCI IDs for Intel Panther Lake devices. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: stable <stable@kernel.org> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20240628111834.1498461-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	wifi: iwlwifi: mvm: avoid link lookup in statistics	Johannes Berg
	We already iterate the link bss_conf/link_info and have the pointer, or know that deflink/bss_conf is used, so avoid an extra lookup and just pass the pointer. This may also avoid a crash when this is processed during restart, where the FW to link conf array (link_id_to_link_conf) may be NULLed out. Fixes: c1e458b987f2 ("wifi: iwlwifi: mvm: Move beacon filtering to be per link") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240703064026.346a6ef67a86.Iba5d65d728ca9f58518c88d029496c1250670544@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03	wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL	Emmanuel Grumbach
	Since we now want to sync the queues even when we're in RFKILL, we shouldn't wake up the wait queue since we still expect to get all the notifications from the firmware. Fixes: 4d08c0b3357c ("wifi: iwlwifi: mvm: handle BA session teardown in RF-kill") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240703064027.be7a9dbeacde.I5586cb3ca8d6e44f79d819a48a0c22351ff720c9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03	wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK	Daniel Gabay
	The WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK should be set based on the WOWLAN_KEK_KCK_MATERIAL command version. Currently, the command version in the firmware has advanced to 4, which prevents the flag from being set correctly, fix that. Signed-off-by: Daniel Gabay <daniel.gabay@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240703064026.a0f162108575.If1a9785727d2a1b0197a396680965df1b53d4096@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03	usb: core: add missing of_node_put() in usb_of_has_devices_or_graph	Javier Carrasco
	The for_each_child_of_node() macro requires an explicit call to of_node_put() on early exits to decrement the child refcount and avoid a memory leak. The child node is not required outsie the loop, and the resource must be released before the function returns. Add the missing of_node_put(). Cc: stable@vger.kernel.org Fixes: 82e82130a78b ("usb: core: Set connect_type of ports based on DT node") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20240624-usb_core_of_memleak-v1-1-af6821c1a584@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k	WangYuli
	START BP-850K is a dot matrix printer that crashes when it receives a Set-Interface request and needs USB_QUIRK_NO_SET_INTF to work properly. Cc: stable <stable@kernel.org> Signed-off-by: jinxiaobo <jinxiaobo@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Link: https://lore.kernel.org/r/202E4B2BD0F0FEA4+20240702154408.631201-1-wangyuli@uniontech.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	USB: core: Fix duplicate endpoint bug by clearing reserved bits in the ↵	Alan Stern
	descriptor Syzbot has identified a bug in usbcore (see the Closes: tag below) caused by our assumption that the reserved bits in an endpoint descriptor's bEndpointAddress field will always be 0. As a result of the bug, the endpoint_is_duplicate() routine in config.c (and possibly other routines as well) may believe that two descriptors are for distinct endpoints, even though they have the same direction and endpoint number. This can lead to confusion, including the bug identified by syzbot (two descriptors with matching endpoint numbers and directions, where one was interrupt and the other was bulk). To fix the bug, we will clear the reserved bits in bEndpointAddress when we parse the descriptor. (Note that both the USB-2.0 and USB-3.1 specs say these bits are "Reserved, reset to zero".) This requires us to make a copy of the descriptor earlier in usb_parse_endpoint() and use the copy instead of the original when checking for duplicates. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+8693a0bb9c10b554272a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/0000000000003d868e061bc0f554@google.com/ Fixes: 0a8fd1346254 ("USB: fix problems with duplicate endpoint addresses") CC: Oliver Neukum <oneukum@suse.com> CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/205a5edc-7fef-4159-b64a-80374b6b101a@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	xhci: always resume roothubs if xHC was reset during resume	Mathias Nyman
	Usb device connect may not be detected after runtime resume if xHC is reset during resume. In runtime resume cases xhci_resume() will only resume roothubs if there are pending port events. If the xHC host is reset during runtime resume due to a Save/Restore Error (SRE) then these pending port events won't be detected as PORTSC change bits are not immediately set by host after reset. Unconditionally resume roothubs if xHC is reset during resume to ensure device connections are detected. Also return early with error code if starting xHC fails after reset. Issue was debugged and a similar solution suggested by Remi Pommarel. Using this instead as it simplifies future refactoring. Reported-by: Remi Pommarel <repk@triplefau.lt> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218987 Suggested-by: Remi Pommarel <repk@triplefau.lt> Tested-by: Remi Pommarel <repk@triplefau.lt> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240627145523.1453155-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	serial: imx: only set receiver level if it is zero	Stefan Eichenberger
	With commit a81dbd0463ec ("serial: imx: set receiver level before starting uart") we set the receiver level to its default value. This caused a regression when using SDMA, where the receiver level is 9 instead of 8 (default). This change will first check if the receiver level is zero and only then set it to the default. This still avoids the interrupt storm when the receiver level is zero. Fixes: a81dbd0463ec ("serial: imx: set receiver level before starting uart") Cc: stable <stable@kernel.org> Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Link: https://lore.kernel.org/r/20240703112543.148304-1-eichest@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	wifi: wilc1000: fix ies_len type in connect path	Jozef Hopko
	Commit 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") made sure that the IEs data was manipulated under the relevant RCU section. Unfortunately, while doing so, the commit brought a faulty implicit cast from int to u8 on the ies_len variable, making the parsing fail to be performed correctly if the IEs block is larger than 255 bytes. This failure can be observed with Access Points appending a lot of IEs TLVs in their beacon frames (reproduced with a Pixel phone acting as an Access Point, which brough 273 bytes of IE data in my testing environment). Fix IEs parsing by removing this undesired implicit cast. Fixes: 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") Signed-off-by: Jozef Hopko <jozef.hopko@altana.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Acked-by: Ajay Singh <ajay.kathat@microchip.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20240701-wilc_fix_ies_data-v1-1-7486cbacf98a@bootlin.com
2024-07-03	Merge tag 'usb-serial-6.10-rc6' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial device ids for 6.10-rc6 Here are some new modem device ids. All have been in linux-next with no reported issues. * tag 'usb-serial-6.10-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add Telit generic core-dump composition USB: serial: option: add Fibocom FM350-GL USB: serial: option: add Telit FN912 rmnet compositions
2024-07-03	gpio: mmio: do not calculate bgpio_bits via "ngpios"	Shiji Yang
	bgpio_bits must be aligned with the data bus width. For example, on a 32 bit big endian system and we only have 16 GPIOs. If we only assume bgpio_bits=16 we can never control the GPIO because the base address is the lowest address. low address high address ------------------------------------------------- \| byte3 \| byte2 \| byte1 \| byte0 \| ------------------------------------------------- \| NaN \| NaN \| gpio8-15 \| gpio0-7 \| ------------------------------------------------- Fixes: 55b2395e4e92 ("gpio: mmio: handle "ngpios" properly in bgpio_init()") Fixes: https://github.com/openwrt/openwrt/issues/15739 Reported-by: Mark Mentovai <mark@mentovai.com> Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Suggested-By: Mark Mentovai <mark@mentovai.com> Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Tested-by: Lóránd Horváth <lorand.horvath82@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/TYCP286MB089577B47D70F0AB25ABA6F5BCD52@TYCP286MB0895.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03	s390/vfio_ccw: Fix target addresses of TIC CCWs	Eric Farman
	The processing of a Transfer-In-Channel (TIC) CCW requires locating the target of the CCW in the channel program, and updating the address to reflect what will actually be sent to hardware. An error exists where the 64-bit virtual address is truncated to 32-bits (variable "cda") when performing this math. Since s390 addresses of that size are 31-bits, this leaves that additional bit enabled such that the resulting I/O triggers a channel program check. This shows up occasionally when booting a KVM guest from a passthrough DASD device: ..snip... Interrupt Response Block Data: : 0x0000000000003990 Function Ctrl : [Start] Activity Ctrl : Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending] Device Status : Channel Status : [Program-Check] cpa=: 0x00000000008d0018 prev_ccw=: 0x0000000000000000 this_ccw=: 0x0000000000000000 ...snip... dasd-ipl: Failed to run IPL1 channel program The channel program address of "0x008d0018" in the IRB doesn't look wrong, but tracing the CCWs shows the offending bit enabled: ccw=0x0000012e808d0000 cda=00a0b030 ccw=0x0000012e808d0008 cda=00a0b038 ccw=0x0000012e808d0010 cda=808d0008 ccw=0x0000012e808d0018 cda=00a0b040 Fix the calculation of the TIC CCW's data address such that it points to a valid 31-bit address regardless of the input address. Fixes: bd36cfbbb9e1 ("s390/vfio_ccw_cp: use new address translation helpers") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240628163738.3643513-1-farman@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-07-03	cachefiles: add missing lock protection when polling	Jingbo Xu
	Add missing lock protection in poll routine when iterating xarray, otherwise: Even with RCU read lock held, only the slot of the radix tree is ensured to be pinned there, while the data structure (e.g. struct cachefiles_req) stored in the slot has no such guarantee. The poll routine will iterate the radix tree and dereference cachefiles_req accordingly. Thus RCU read lock is not adequate in this case and spinlock is needed here. Fixes: b817e22b2e91 ("cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-10-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: cyclic allocation of msg_id to avoid reuse	Baokun Li
	Reusing the msg_id after a maliciously completed reopen request may cause a read request to remain unprocessed and result in a hung, as shown below: t1 \| t2 \| t3 ------------------------------------------------- cachefiles_ondemand_select_req cachefiles_ondemand_object_is_close(A) cachefiles_ondemand_set_object_reopening(A) queue_work(fscache_object_wq, &info->work) ondemand_object_worker cachefiles_ondemand_init_object(A) cachefiles_ondemand_send_req(OPEN) // get msg_id 6 wait_for_completion(&req_A->done) cachefiles_ondemand_daemon_read // read msg_id 6 req_A cachefiles_ondemand_get_fd copy_to_user // Malicious completion msg_id 6 copen 6,-1 cachefiles_ondemand_copen complete(&req_A->done) // will not set the object to close // because ondemand_id && fd is valid. // ondemand_object_worker() is done // but the object is still reopening. // new open req_B cachefiles_ondemand_init_object(B) cachefiles_ondemand_send_req(OPEN) // reuse msg_id 6 process_open_req copen 6,A.size // The expected failed copen was executed successfully Expect copen to fail, and when it does, it closes fd, which sets the object to close, and then close triggers reopen again. However, due to msg_id reuse resulting in a successful copen, the anonymous fd is not closed until the daemon exits. Therefore read requests waiting for reopen to complete may trigger hung task. To avoid this issue, allocate the msg_id cyclically to avoid reusing the msg_id for a very short duration of time. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-9-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: wait for ondemand_object_worker to finish when dropping object	Hou Tao
	When queuing ondemand_object_worker() to re-open the object, cachefiles_object is not pinned. The cachefiles_object may be freed when the pending read request is completed intentionally and the related erofs is umounted. If ondemand_object_worker() runs after the object is freed, it will incur use-after-free problem as shown below. process A processs B process C process D cachefiles_ondemand_send_req() // send a read req X // wait for its completion // close ondemand fd cachefiles_ondemand_fd_release() // set object as CLOSE cachefiles_ondemand_daemon_read() // set object as REOPENING queue_work(fscache_wq, &info->ondemand_work) // close /dev/cachefiles cachefiles_daemon_release cachefiles_flush_reqs complete(&req->done) // read req X is completed // umount the erofs fs cachefiles_put_object() // object will be freed cachefiles_ondemand_deinit_obj_info() kmem_cache_free(object) // both info and object are freed ondemand_object_worker() When dropping an object, it is no longer necessary to reopen the object, so use cancel_work_sync() to cancel or wait for ondemand_object_worker() to finish. Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-8-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: cancel all requests for the object that is being dropped	Baokun Li
	Because after an object is dropped, requests for that object are useless, cancel them to avoid causing other problems. This prepares for the later addition of cancel_work_sync(). After the reopen requests is generated, cancel it to avoid cancel_work_sync() blocking by waiting for daemon to complete the reopen requests. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-7-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: stop sending new request when dropping object	Baokun Li
	Added CACHEFILES_ONDEMAND_OBJSTATE_DROPPING indicates that the cachefiles object is being dropped, and is set after the close request for the dropped object completes, and no new requests are allowed to be sent after this state. This prepares for the later addition of cancel_work_sync(). It prevents leftover reopen requests from being sent, to avoid processing unnecessary requests and to avoid cancel_work_sync() blocking by waiting for daemon to complete the reopen requests. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-6-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop	Baokun Li
	In cachefiles_check_volume_xattr(), the error returned by vfs_getxattr() is not passed to ret, so it ends up returning -ESTALE, which leads to an endless loop as follows: cachefiles_acquire_volume retry: ret = cachefiles_check_volume_xattr ret = -ESTALE xlen = vfs_getxattr // return -EIO // The ret is not updated when xlen < 0, so -ESTALE is returned. return ret // Supposed to jump out of the loop at this judgement. if (ret != -ESTALE) goto error_dir; cachefiles_bury_object // EIO causes rename failure goto retry; Hence propagate the error returned by vfs_getxattr() to avoid the above issue. Do the same in cachefiles_check_auxdata(). Fixes: 32e150037dce ("fscache, cachefiles: Store the volume coherency data") Fixes: 72b957856b0c ("cachefiles: Implement metadata/coherency data storage in xattrs") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-5-libaokun@huaweicloud.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()	Baokun Li
	We got the following issue in our fault injection stress test: ================================================================== BUG: KASAN: slab-use-after-free in cachefiles_withdraw_cookie+0x4d9/0x600 Read of size 8 at addr ffff888118efc000 by task kworker/u78:0/109 CPU: 13 PID: 109 Comm: kworker/u78:0 Not tainted 6.8.0-dirty #566 Call Trace: <TASK> kasan_report+0x93/0xc0 cachefiles_withdraw_cookie+0x4d9/0x600 fscache_cookie_state_machine+0x5c8/0x1230 fscache_cookie_worker+0x91/0x1c0 process_one_work+0x7fa/0x1800 [...] Allocated by task 117: kmalloc_trace+0x1b3/0x3c0 cachefiles_acquire_volume+0xf3/0x9c0 fscache_create_volume_work+0x97/0x150 process_one_work+0x7fa/0x1800 [...] Freed by task 120301: kfree+0xf1/0x2c0 cachefiles_withdraw_cache+0x3fa/0x920 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290 __fput+0x204/0xa00 task_work_run+0x139/0x230 do_exit+0x87a/0x29b0 [...] ================================================================== Following is the process that triggers the issue: p1 \| p2 ------------------------------------------------------------ fscache_begin_lookup fscache_begin_volume_access fscache_cache_is_live(fscache_cache) cachefiles_daemon_release cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache fscache_withdraw_cache fscache_set_cache_state(cache, FSCACHE_CACHE_IS_WITHDRAWN); cachefiles_withdraw_objects(cache) fscache_wait_for_objects(fscache) atomic_read(&fscache_cache->object_count) == 0 fscache_perform_lookup cachefiles_lookup_cookie cachefiles_alloc_object refcount_set(&object->ref, 1); object->volume = volume fscache_count_object(vcookie->cache); atomic_inc(&fscache_cache->object_count) cachefiles_withdraw_volumes cachefiles_withdraw_volume fscache_withdraw_volume __cachefiles_free_volume kfree(cachefiles_volume) fscache_cookie_state_machine cachefiles_withdraw_cookie cache = object->volume->cache; // cachefiles_volume UAF !!! After setting FSCACHE_CACHE_IS_WITHDRAWN, wait for all the cookie lookups to complete first, and then wait for fscache_cache->object_count == 0 to avoid the cookie exiting after the volume has been freed and triggering the above issue. Therefore call fscache_withdraw_volume() before calling cachefiles_withdraw_objects(). This way, after setting FSCACHE_CACHE_IS_WITHDRAWN, only the following two cases will occur: 1) fscache_begin_lookup fails in fscache_begin_volume_access(). 2) fscache_withdraw_volume() will ensure that fscache_count_object() has been executed before calling fscache_wait_for_objects(). Fixes: fe2140e2f57f ("cachefiles: Implement volume support") Suggested-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-4-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: fix slab-use-after-free in fscache_withdraw_volume()	Baokun Li
	We got the following issue in our fault injection stress test: ================================================================== BUG: KASAN: slab-use-after-free in fscache_withdraw_volume+0x2e1/0x370 Read of size 4 at addr ffff88810680be08 by task ondemand-04-dae/5798 CPU: 0 PID: 5798 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #565 Call Trace: kasan_check_range+0xf6/0x1b0 fscache_withdraw_volume+0x2e1/0x370 cachefiles_withdraw_volume+0x31/0x50 cachefiles_withdraw_cache+0x3ad/0x900 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290 __fput+0x204/0xa00 task_work_run+0x139/0x230 Allocated by task 5820: __kmalloc+0x1df/0x4b0 fscache_alloc_volume+0x70/0x600 __fscache_acquire_volume+0x1c/0x610 erofs_fscache_register_volume+0x96/0x1a0 erofs_fscache_register_fs+0x49a/0x690 erofs_fc_fill_super+0x6c0/0xcc0 vfs_get_super+0xa9/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...] Freed by task 5820: kfree+0xf1/0x2c0 fscache_put_volume.part.0+0x5cb/0x9e0 erofs_fscache_unregister_fs+0x157/0x1b0 erofs_kill_sb+0xd9/0x1c0 deactivate_locked_super+0xa3/0x100 vfs_get_super+0x105/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...] ================================================================== Following is the process that triggers the issue: mount failed \| daemon exit ------------------------------------------------------------ deactivate_locked_super cachefiles_daemon_release erofs_kill_sb erofs_fscache_unregister_fs fscache_relinquish_volume __fscache_relinquish_volume fscache_put_volume(fscache_volume, fscache_volume_put_relinquish) zero = __refcount_dec_and_test(&fscache_volume->ref, &ref); cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache cachefiles_withdraw_volumes list_del_init(&volume->cache_link) fscache_free_volume(fscache_volume) cache->ops->free_volume cachefiles_free_volume list_del_init(&cachefiles_volume->cache_link); kfree(fscache_volume) cachefiles_withdraw_volume fscache_withdraw_volume fscache_volume->n_accesses // fscache_volume UAF !!! The fscache_volume in cache->volumes must not have been freed yet, but its reference count may be 0. So use the new fscache_try_get_volume() helper function try to get its reference count. If the reference count of fscache_volume is 0, fscache_put_volume() is freeing it, so wait for it to be removed from cache->volumes. If its reference count is not 0, call cachefiles_withdraw_volume() with reference count protection to avoid the above issue. Fixes: fe2140e2f57f ("cachefiles: Implement volume support") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-3-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()	Baokun Li
	Export fscache_put_volume() and add fscache_try_get_volume() helper function to allow cachefiles to get/put fscache_volume via linux/fscache-cache.h. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-2-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	net: stmmac: enable HW-accelerated VLAN stripping for gmac4 only	Furong Xu
	Commit 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping") enables MAC level VLAN tag stripping for all MAC cores, but leaves set_hw_vlan_mode() and rx_hw_vlan() un-implemented for both gmac and xgmac. On gmac and xgmac, ethtool reports rx-vlan-offload is on, both MAC and driver do nothing about VLAN packets actually, although VLAN works well. Driver level stripping should be used on gmac and xgmac for now. Fixes: 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping") Signed-off-by: Furong Xu <0x1207@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03	drm/fbdev-generic: Fix framebuffer on big endian devices	Thomas Huth
	Starting with kernel 6.7, the framebuffer text console is not working anymore with the virtio-gpu device on s390x hosts. Such big endian fb devices are usinga different pixel ordering than little endian devices, e.g. DRM_FORMAT_BGRX8888 instead of DRM_FORMAT_XRGB8888. This used to work fine as long as drm_client_buffer_addfb() was still calling drm_mode_addfb() which called drm_driver_legacy_fb_format() internally to get the right format. But drm_client_buffer_addfb() has recently been reworked to call drm_mode_addfb2() instead with the format value that has been passed to it as a parameter (see commit 6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()"). That format parameter is determined in drm_fbdev_generic_helper_fb_probe() via the drm_mode_legacy_fb_format() function - which only generates formats suitable for little endian devices. So to fix this issue switch to drm_driver_legacy_fb_format() here instead to take the device endianness into consideration. Fixes: 6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()") Closes: https://issues.redhat.com/browse/RHEL-45158 Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240627173530.460615-1-thuth@redhat.com
2024-07-03	drm/panthor: Fix sync-only jobs	Boris Brezillon
	A sync-only job is meant to provide a synchronization point on a queue, so we can't return a NULL fence there, we have to add a signal operation to the command stream which executes after all other previously submitted jobs are done. v2: - Fixed a UAF bug - Added R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703071640.231278-3-boris.brezillon@collabora.com
2024-07-03	drm/panthor: Don't check the array stride on empty uobj arrays	Boris Brezillon
	The user is likely to leave all the drm_panthor_obj_array fields to zero when the array is empty, which will cause an EINVAL failure. v2: - Added R-bs Fixes: 4bdca1150792 ("drm/panthor: Add the driver frontend block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703071640.231278-2-boris.brezillon@collabora.com
2024-07-02	cifs: Fix read-performance regression by dropping readahead expansion	David Howells
	cifs_expand_read() is causing a performance regression of around 30% by causing extra pagecache to be allocated for an inode in the readahead path before we begin actually dispatching RPC requests, thereby delaying the actual I/O. The expansion is sized according to the rsize parameter, which seems to be 4MiB on my test system; this is a big step up from the first requests made by the fio test program. Simple repro (look at read bandwidth number): fio --name=writetest --filename=/xfstest.test/foo --time_based --runtime=60 --size=16M --numjobs=1 --rw=read Fix this by removing cifs_expand_readahead(). Readahead expansion is mostly useful for when we're using the local cache if the local cache has a block size greater than PAGE_SIZE, so we can dispense with it when not caching. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-02	net: ntb_netdev: Move ntb_netdev_rx_handler() to call netif_rx() from ↵	Dave Jiang
	__netif_rx() The following is emitted when using idxd (DSA) dmanegine as the data mover for ntb_transport that ntb_netdev uses. [74412.546922] BUG: using smp_processor_id() in preemptible [00000000] code: irq/52-idxd-por/14526 [74412.556784] caller is netif_rx_internal+0x42/0x130 [74412.562282] CPU: 6 PID: 14526 Comm: irq/52-idxd-por Not tainted 6.9.5 #5 [74412.569870] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.E9I.1752.P05.2402080856 02/08/2024 [74412.581699] Call Trace: [74412.584514] <TASK> [74412.586933] dump_stack_lvl+0x55/0x70 [74412.591129] check_preemption_disabled+0xc8/0xf0 [74412.596374] netif_rx_internal+0x42/0x130 [74412.600957] __netif_rx+0x20/0xd0 [74412.604743] ntb_netdev_rx_handler+0x66/0x150 [ntb_netdev] [74412.610985] ntb_complete_rxc+0xed/0x140 [ntb_transport] [74412.617010] ntb_rx_copy_callback+0x53/0x80 [ntb_transport] [74412.623332] idxd_dma_complete_txd+0xe3/0x160 [idxd] [74412.628963] idxd_wq_thread+0x1a6/0x2b0 [idxd] [74412.634046] irq_thread_fn+0x21/0x60 [74412.638134] ? irq_thread+0xa8/0x290 [74412.642218] irq_thread+0x1a0/0x290 [74412.646212] ? __pfx_irq_thread_fn+0x10/0x10 [74412.651071] ? __pfx_irq_thread_dtor+0x10/0x10 [74412.656117] ? __pfx_irq_thread+0x10/0x10 [74412.660686] kthread+0x100/0x130 [74412.664384] ? __pfx_kthread+0x10/0x10 [74412.668639] ret_from_fork+0x31/0x50 [74412.672716] ? __pfx_kthread+0x10/0x10 [74412.676978] ret_from_fork_asm+0x1a/0x30 [74412.681457] </TASK> The cause is due to the idxd driver interrupt completion handler uses threaded interrupt and the threaded handler is not hard or soft interrupt context. However __netif_rx() can only be called from interrupt context. Change the call to netif_rx() in order to allow completion via normal context for dmaengine drivers that utilize threaded irq handling. While the following commit changed from netif_rx() to __netif_rx(), baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context."), the change should've been a noop instead. However, the code precedes this fix should've been using netif_rx_ni() or netif_rx_any_context(). Fixes: 548c237c0a99 ("net: Add support for NTB virtual ethernet device") Reported-by: Jerry Dai <jerry.dai@intel.com> Tested-by: Jerry Dai <jerry.dai@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20240701181538.3799546-1-dave.jiang@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02	net: phy: aquantia: add missing include guards	Bartosz Golaszewski
	The header is missing the include guards so add them. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Fixes: fb470f70fea7 ("net: phy: aquantia: add hwmon support") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20240701080322.9569-1-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02	drm/amdgpu/atomfirmware: silence UBSAN warning	Alex Deucher
	This is a variable sized array. Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/110420.html Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-07-02	Merge tag 'linux_kselftest-fixes-6.10-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "One single patch to fix the non-contiguous CBM resctrl: - AMD supports non-contiguous CBM but does not report it via CPUID. This test should not use CPUID on AMD to detect non-contiguous CBM support. Fix the problem so the test uses CPUID to discover non-contiguous CBM support only on Intel" * tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/resctrl: Fix non-contiguous CBM for AMD
2024-07-02	Merge tag 'vfs-6.10-rc7.fixes.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - Improve handling of deep ancestor chains in is_subdir() - Release locks cleanly when fctnl_setlk() races with close(). When setting a file lock fails the VFS tries to cleanup the already created lock. The helper used for this calls back into the LSM layer which may cause it to fail, leaving the stale lock accessible via /proc/locks. AFS: - Fix a comma/semicolon typo" * tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Convert comma to semicolon fs: better handle deep ancestor chains in is_subdir() filelock: Remove locks reliably when fcntl/close race is detected
2024-07-02	afs: Convert comma to semicolon	Chen Ni
	Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn/ Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02	fs: better handle deep ancestor chains in is_subdir()	Christian Brauner
	Jan reported that 'cd ..' may take a long time in deep directory hierarchies under a bind-mount. If concurrent renames happen it is possible to livelock in is_subdir() because it will keep retrying. Change is_subdir() from simply retrying over and over to retry once and then acquire the rename lock to handle deep ancestor chains better. The list of alternatives to this approach were less then pleasant. Change the scope of rcu lock to cover the whole walk while at it. A big thanks to Jan and Linus. Both Jan and Linus had proposed effectively the same thing just that one version ended up being slightly more elegant. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>