git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-05	fs: use fput_close_sync() in close()	Mateusz Guzik
	This bumps open+close rate by 1% on Sapphire Rapids by eliding one atomic. It would be higher if it was not for several other slowdowns of the same nature. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	file: add fput and file_ref_put routines optimized for use when closing a fd	Mateusz Guzik
	Vast majority of the time closing a file descriptor also operates on the last reference, where a regular fput usage will result in 2 atomics. This can be changed to only suffer 1. See commentary above file_ref_put_close() for more information. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	fs: predict no error in close()	Mateusz Guzik
	Vast majority of the time the system call returns 0. Letting the compiler know shortens the routine (119 -> 116) and the fast path. Disasm starting at the call to __fput_sync(): before: <+55>: call 0xffffffff816b0da0 <__fput_sync> <+60>: lea 0x201(%rbx),%eax <+66>: cmp $0x1,%eax <+69>: jbe 0xffffffff816ab707 <__x64_sys_close+103> <+71>: mov %ebx,%edx <+73>: movslq %ebx,%rax <+76>: and $0xfffffffd,%edx <+79>: cmp $0xfffffdfc,%edx <+85>: mov $0xfffffffffffffffc,%rdx <+92>: cmove %rdx,%rax <+96>: pop %rbx <+97>: pop %rbp <+98>: jmp 0xffffffff82242fa0 <__x86_return_thunk> <+103>: mov $0xfffffffffffffffc,%rax <+110>: jmp 0xffffffff816ab700 <__x64_sys_close+96> <+112>: mov $0xfffffffffffffff7,%rax <+119>: jmp 0xffffffff816ab700 <__x64_sys_close+96> after: <+56>: call 0xffffffff816b0da0 <__fput_sync> <+61>: xor %eax,%eax <+63>: test %ebp,%ebp <+65>: jne 0xffffffff816ab6ea <__x64_sys_close+74> <+67>: pop %rbx <+68>: pop %rbp <+69>: jmp 0xffffffff82242fa0 <__x86_return_thunk> # the jmp out <+74>: lea 0x201(%rbp),%edx <+80>: mov $0xfffffffffffffffc,%rax <+87>: cmp $0x1,%edx <+90>: jbe 0xffffffff816ab6e3 <__x64_sys_close+67> <+92>: mov %ebp,%edx <+94>: and $0xfffffffd,%edx <+97>: cmp $0xfffffdfc,%edx <+103>: cmovne %rbp,%rax <+107>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67> <+109>: mov $0xfffffffffffffff7,%rax <+116>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250301104356.246031-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	ASoC: ops: Consistently treat platform_max as control value	Charles Keepax
	This reverts commit 9bdd10d57a88 ("ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min"), and makes some additional related updates. There are two ways the platform_max could be interpreted; the maximum register value, or the maximum value the control can be set to. The patch moved from treating the value as a control value to a register one. When the patch was applied it was technically correct as snd_soc_limit_volume() also used the register interpretation. However, even then most of the other usages treated platform_max as a control value, and snd_soc_limit_volume() has since been updated to also do so in commit fb9ad24485087 ("ASoC: ops: add correct range check for limiting volume"). That patch however, missed updating snd_soc_put_volsw() back to the control interpretation, and fixing snd_soc_info_volsw_range(). The control interpretation makes more sense as limiting is typically done from the machine driver, so it is appropriate to use the customer facing representation rather than the internal codec representation. Update all the code to consistently use this interpretation of platform_max. Finally, also add some comments to the soc_mixer_control struct to hopefully avoid further patches switching between the two approaches. Fixes: fb9ad24485087 ("ASoC: ops: add correct range check for limiting volume") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20250228151456.3703342-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-05	include/linux/pipe_fs_i: Add htmldoc annotation for "head_tail" member	K Prateek Nayak
	Add htmldoc annotation for the newly introduced "head_tail" member describing it to be a union of the pipe_inode_info's @head and @tail members. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20250305204609.5e64768e@canb.auug.org.au/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-05	fs/pipe: Fix pipe_occupancy() with 16-bit indexes	Linus Torvalds
	The pipe_occupancy() logic implicitly relied on the natural unsigned modulo arithmetic in C, but that doesn't work for the new 'pipe_index_t' case, since any arithmetic will be done in 'int' (and here we had also made it 'unsigned int' due to the function call boundary). So make the modulo arithmetic explicit by casting the result to the proper type. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/all/CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-05	drm/amdkfd: Fix NULL Pointer Dereference in KFD queue	Andrew Martin
	Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence when calling kfd_queue_acquire_buffers. Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 049e5bf3c8406f87c3d8e1958e0a16804fa1d530) Cc: stable@vger.kernel.org
2025-03-05	ice: register devlink prior to creating health reporters	Przemek Kitszel
	ice_health_init() was introduced in the commit 2a82874a3b7b ("ice: add Tx hang devlink health reporter"). The call to it should have been put after ice_devlink_register(). It went unnoticed until next reporter by Konrad, which receives events from FW. FW is reporting all events, also from prior driver load, and thus it is not unlikely to have something at the very beginning. And that results in a splat: [ 24.455950] ? devlink_recover_notify.constprop.0+0x198/0x1b0 [ 24.455973] devlink_health_report+0x5d/0x2a0 [ 24.455976] ? __pfx_ice_health_status_lookup_compare+0x10/0x10 [ice] [ 24.456044] ice_process_health_status_event+0x1b7/0x200 [ice] Do the analogous thing for deinit patch. Fixes: 85d6164ec56d ("ice: add fw and port health reporters") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Konrad Knitter <konrad.knitter@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-05	drm/amd/display: Fix null check for pipe_ctx->plane_state in ↵	Ma Ke
	resource_build_scaling_params Null pointer dereference issue could occur when pipe_ctx->plane_state is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not null before accessing. This prevents a null pointer dereference. Found by code review. Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63e6a77ccf239337baa9b1e7787cde9fa0462092) Cc: stable@vger.kernel.org
2025-03-05	ice: Fix switchdev slow-path in LAG	Marcin Szycik
	Ever since removing switchdev control VSI and using PF for port representor Tx/Rx, switchdev slow-path has been working improperly after failover in SR-IOV LAG. LAG assumes that the first uplink to be added to the aggregate will own VFs and have switchdev configured. After failing-over to the other uplink, representors are still configured to Tx through the uplink they are set up on, which fails because that uplink is now down. On failover, update all PRs on primary uplink to use the currently active uplink for Tx. Call netif_keep_dst(), as the secondary uplink might not be in switchdev mode. Also make sure to call ice_eswitch_set_target_vsi() if uplink is in LAG. On the Rx path, representors are already working properly, because default Tx from VFs is set to PF owning the eswitch. After failover the same PF is receiving traffic from VFs, even though link is down. Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-05	ice: fix memory leak in aRFS after reset	Grzegorz Nitka
	Fix aRFS (accelerated Receive Flow Steering) structures memory leak by adding a checker to verify if aRFS memory is already allocated while configuring VSI. aRFS objects are allocated in two cases: - as part of VSI initialization (at probe), and - as part of reset handling However, VSI reconfiguration executed during reset involves memory allocation one more time, without prior releasing already allocated resources. This led to the memory leak with the following signature: [root@os-delivery ~]# cat /sys/kernel/debug/kmemleak unreferenced object 0xff3c1ca7252e6000 (size 8192): comm "kworker/0:0", pid 8, jiffies 4296833052 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): [<ffffffff991ec485>] __kmalloc_cache_noprof+0x275/0x340 [<ffffffffc0a6e06a>] ice_init_arfs+0x3a/0xe0 [ice] [<ffffffffc09f1027>] ice_vsi_cfg_def+0x607/0x850 [ice] [<ffffffffc09f244b>] ice_vsi_setup+0x5b/0x130 [ice] [<ffffffffc09c2131>] ice_init+0x1c1/0x460 [ice] [<ffffffffc09c64af>] ice_probe+0x2af/0x520 [ice] [<ffffffff994fbcd3>] local_pci_probe+0x43/0xa0 [<ffffffff98f07103>] work_for_cpu_fn+0x13/0x20 [<ffffffff98f0b6d9>] process_one_work+0x179/0x390 [<ffffffff98f0c1e9>] worker_thread+0x239/0x340 [<ffffffff98f14abc>] kthread+0xcc/0x100 [<ffffffff98e45a6d>] ret_from_fork+0x2d/0x50 [<ffffffff98e083ba>] ret_from_fork_asm+0x1a/0x30 ... Fixes: 28bf26724fdb ("ice: Implement aRFS") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-05	regulator: core: Fix deadlock in create_regulator()	Ludvig Pärsson
	Currently, we are unnecessarily holding a regulator_ww_class_mutex lock when creating debugfs entries for a newly created regulator. This was brought up as a concern in the discussion in commit cba6cfdc7c3f ("regulator: core: Avoid lockdep reports when resolving supplies"). This causes the following lockdep splat after executing `ls /sys/kernel/debug` on my platform: ====================================================== WARNING: possible circular locking dependency detected 5.15.167-axis9-devel #1 Tainted: G O ------------------------------------------------------ ls/2146 is trying to acquire lock: ffffff803a562918 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x88 but task is already holding lock: ffffff80014497f8 (&sb->s_type->i_mutex_key#3){++++}-{3:3}, at: iterate_dir+0x50/0x1f4 which lock already depends on the new lock. [...] Chain exists of: &mm->mmap_lock --> regulator_ww_class_mutex --> &sb->s_type->i_mutex_key#3 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#3); lock(regulator_ww_class_mutex); lock(&sb->s_type->i_mutex_key#3); lock(&mm->mmap_lock); * DEADLOCK * This lock dependency still exists on the latest kernel and using a newer non-tainted kernel would still cause this problem. Fix by moving sysfs symlinking and creation of debugfs entries to after the release of the regulator lock. Fixes: cba6cfdc7c3f ("regulator: core: Avoid lockdep reports when resolving supplies") Fixes: eaa7995c529b ("regulator: core: avoid regulator_resolve_supply() race condition") Signed-off-by: Ludvig Pärsson <ludvig.parsson@axis.com> Link: https://patch.msgid.link/20250305-regulator_lockdep_fix-v1-1-ab938b12e790@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-05	sched/fair: Fix potential memory corruption in child_cfs_rq_on_list	Zecheng Li
	child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq. This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list, making the conversion invalid and potentially leading to memory corruption. Depending on the relative positions of leaf_cfs_rq_list and the task group (tg) pointer within the struct, this can cause a memory fault or access garbage data. The issue arises in list_add_leaf_cfs_rq, where both cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list. This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main conditional in child_cfs_rq_on_list. This ensures that the container_of operation will convert a correct cfs_rq struct. This check is sufficient because only cfs_rqs on the same CPU are added to the list, so verifying the 'prev' pointer against the current rq's list head is enough. Fixes a potential memory corruption issue that due to current struct layout might not be manifesting as a crash but could lead to unpredictable behavior when the layout changes. Fixes: fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added after unthrottling") Signed-off-by: Zecheng Li <zecheng@google.com> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250304214031.2882646-1-zecheng@google.com
2025-03-05	ice: do not configure destination override for switchdev	Larysa Zaremba
	After switchdev is enabled and disabled later, LLDP packets sending stops, despite working perfectly fine before and during switchdev state. To reproduce (creating/destroying VF is what triggers the reconfiguration): devlink dev eswitch set pci/<address> mode switchdev echo '2' > /sys/class/net/<ifname>/device/sriov_numvfs echo '0' > /sys/class/net/<ifname>/device/sriov_numvfs This happens because LLDP relies on the destination override functionality. It needs to 1) set a flag in the descriptor, 2) set the VSI permission to make it valid. The permissions are set when the PF VSI is first configured, but switchdev then enables it for the uplink VSI (which is always the PF) once more when configured and disables when deconfigured, which leads to software-generated LLDP packets being blocked. Do not modify the destination override permissions when configuring switchdev, as the enabled state is the default configuration that is never modified. Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-05	pmdomain: amlogic: fix T7 ISP secpower	Xianwei Zhao
	ISP and MIPI_ISP, these two have a parent-child relationship, ISP depends on MIPI_ISP. Fixes: ca75e4b214c6 ("pmdomain: amlogic: Add support for T7 power domains controller") Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250303-fix-t7-pwrc-v1-1-b563612bcd86@amlogic.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-03-05	block: fix conversion of GPT partition name to 7-bit	Olivier Gayot
	The utf16_le_to_7bit function claims to, naively, convert a UTF-16 string to a 7-bit ASCII string. By naively, we mean that it: * drops the first byte of every character in the original UTF-16 string * checks if all characters are printable, and otherwise replaces them by exclamation mark "!". This means that theoretically, all characters outside the 7-bit ASCII range should be replaced by another character. Examples: * lower-case alpha (ɒ) 0x0252 becomes 0x52 (R) * ligature OE (œ) 0x0153 becomes 0x53 (S) * hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B) * upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets replaced by "!" The result of this conversion for the GPT partition name is passed to user-space as PARTNAME via udev, which is confusing and feels questionable. However, there is a flaw in the conversion function itself. By dropping one byte of each character and using isprint() to check if the remaining byte corresponds to a printable character, we do not actually guarantee that the resulting character is 7-bit ASCII. This happens because we pass 8-bit characters to isprint(), which in the kernel returns 1 for many values > 0x7f - as defined in ctype.c. This results in many values which should be replaced by "!" to be kept as-is, despite not being valid 7-bit ASCII. Examples: * e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because isprint(0xE9) returns 1. * euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC) returns 1. This way has broken pyudev utility[1], fixes it by using a mask of 7 bits instead of 8 bits before calling isprint. Link: https://github.com/pyudev/pyudev/issues/490#issuecomment-2685794648 [1] Link: https://lore.kernel.org/linux-block/4cac90c2-e414-4ebb-ae62-2a4589d9dc6e@canonical.com/ Cc: Mulhern <amulhern@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: stable@vger.kernel.org Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250305022154.3903128-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-05	ublk: set_params: properly check if parameters can be applied	Uday Shankar
	The parameters set by the set_params call are only applied to the block device in the start_dev call. So if a device has already been started, a subsequently issued set_params on that device will not have the desired effect, and should return an error. There is an existing check for this - set_params fails on devices in the LIVE state. But this check is not sufficient to cover the recovery case. In this case, the device will be in the QUIESCED or FAIL_IO states, so set_params will succeed. But this success is misleading, because the parameters will not be applied, since the device has already been started (by a previous ublk server). The bit UB_STATE_USED is set on completion of the start_dev; use it to detect and fail set_params commands which arrive too late to be applied (after start_dev). Signed-off-by: Uday Shankar <ushankar@purestorage.com> Fixes: 0aa73170eba5 ("ublk_drv: add SET_PARAMS/GET_PARAMS control command") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250304-set_params-v1-1-17b5e0887606@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-05	ASoC: rt1320: set wake_capable = 0 explicitly	Bard Liao
	"generic_new_peripheral_assigned: invalid dev_num 1, wake supported 1" is reported by our internal CI test. Rt1320's wake feature is not used in Linux and that's why it is not in the wake_capable_list[] list in intel_auxdevice.c. However, BIOS may set it as wake-capable. Overwrite wake_capable to 0 in the codec driver to align with wake_capable_list[]. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Acked-by: Shuming Fan <shumingf@realtek.com> Link: https://patch.msgid.link/20250305134113.201326-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-05	net-timestamp: support TCP GSO case for a few missing flags	Jason Xing
	When I read through the TSO codes, I found out that we probably miss initializing the tx_flags of last seg when TSO is turned off, which means at the following points no more timestamp (for this last one) will be generated. There are three flags to be handled in this patch: 1. SKBTX_HW_TSTAMP 2. SKBTX_BPF 3. SKBTX_SCHED_TSTAMP Note that SKBTX_BPF[1] was added in 6.14.0-rc2 by commit 6b98ec7e882af ("bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback") and only belongs to net-next branch material for now. The common issue of the above three flags can be fixed by this single patch. This patch initializes the tx_flags to SKBTX_ANY_TSTAMP like what the UDP GSO does to make the newly segmented last skb inherit the tx_flags so that requested timestamp will be generated in each certain layer, or else that last one has zero value of tx_flags which leads to no timestamp at all. Fixes: 4ed2d765dfacc ("net-timestamp: TCP timestamping") Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05	exfat: add a check for invalid data size	Yuezhang Mo
	Add a check for invalid data size to avoid corrupted filesystem from being further corrupted. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05	exfat: short-circuit zero-byte writes in exfat_file_write_iter	Eric Sandeen
	When generic_write_checks() returns zero, it means that iov_iter_count() is zero, and there is no work to do. Simply return success like all other filesystems do, rather than proceeding down the write path, which today yields an -EFAULT in generic_perform_write() via the (fault_in_iov_iter_readable(i, bytes) == bytes) check when bytes == 0. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Reported-by: Noah <kernel-org-10@maxgrass.eu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05	exfat: fix soft lockup in exfat_clear_bitmap	Namjae Jeon
	bitmap clear loop will take long time in __exfat_free_cluster() if data size of file/dir enty is invalid. If cluster bit in bitmap is already clear, stop clearing bitmap go to out of loop. Fixes: 31023864e67a ("exfat: add fat entry operations") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05	exfat: fix just enough dentries but allocate a new cluster to dir	Yuezhang Mo
	This commit fixes the condition for allocating cluster to parent directory to avoid allocating new cluster to parent directory when there are just enough empty directory entries at the end of the parent directory. Fixes: af02c72d0b62 ("exfat: convert exfat_find_empty_entry() to use dentry cache") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05	Merge patch series "pidfs: provide information after task has been reaped"	Christian Brauner
	Christian Brauner <brauner@kernel.org> says: Various tools need access to information about a process/task even after it has already been reaped. For example, systemd's journal logs and uses such information as the cgroup id and exit status to deal with processes that have been sent via SCM_PIDFD or SCM_PEERPIDFD. By the time the pidfd is received the process might have already been reaped. This series aims to provide information by extending the PIDFD_GET_INFO ioctl to retrieve the exit code and cgroup id. There might be other stuff that we would want in the future. Pidfd polling allows waiting on either task exit or for a task to have been reaped. The contract for PIDFD_INFO_EXIT is simply that EPOLLHUP must be observed before exit information can be retrieved, i.e., exit information is only provided once the task has been reaped. Note, that if a thread-group leader exits before other threads in the thread-group then exit information will only be available once the thread-group is empty. This aligns with wait() as well, where reaping of a thread-group leader that exited before the thread-group was empty is delayed until the thread-group is empty. With PIDFD_INFO_EXIT autoreaping might actually become usable because it means a parent can ignore SIGCHLD or set SA_NOCLDWAIT and simply use pidfd polling and PIDFD_INFO_EXIT to get get status information for its children. The kernel will autocleanup right away instead of delaying. This includes expansive selftests including for thread-group behior and multi-threaded exec by a non-thread-group leader thread. * patches from https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-0-c8c3d8361705@kernel.org: selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest selftests/pidfd: add third PIDFD_INFO_EXIT selftest selftests/pidfd: add second PIDFD_INFO_EXIT selftest selftests/pidfd: add first PIDFD_INFO_EXIT selftest selftests/pidfd: expand common pidfd header pidfs/selftests: ensure correct headers for ioctl handling selftests/pidfd: fix header inclusion pidfs: allow to retrieve exit information pidfs: record exit code and cgroupid at exit pidfs: use private inode slab cache pidfs: move setting flags into pidfs_alloc_file() pidfd: rely on automatic cleanup in __pidfd_prepare() pidfs: switch to copy_struct_to_user() Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-0-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-16-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-15-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-14-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-13-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add third PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-12-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add second PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-11-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: add first PIDFD_INFO_EXIT selftest	Christian Brauner
	Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-10-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: expand common pidfd header	Christian Brauner
	Move more infrastructure to the pidfd header. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-9-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs/selftests: ensure correct headers for ioctl handling	Christian Brauner
	Ensure that necessary ioctl infrastructure is available. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-8-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	selftests/pidfd: fix header inclusion	Christian Brauner
	Ensure that necessary defines are present. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-7-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs: allow to retrieve exit information	Christian Brauner
	Some tools like systemd's jounral need to retrieve the exit and cgroup information after a process has already been reaped. This can e.g., happen when retrieving a pidfd via SCM_PIDFD or SCM_PEERPIDFD. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-6-c8c3d8361705@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs: record exit code and cgroupid at exit	Christian Brauner
	Record the exit code and cgroupid in release_task() and stash in struct pidfs_exit_info so it can be retrieved even after the task has been reaped. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-5-c8c3d8361705@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs: use private inode slab cache	Christian Brauner
	Introduce a private inode slab cache for pidfs. In follow-up patches pidfs will gain the ability to provide exit information to userspace after the task has been reaped. This means storing exit information even after the task has already been released and struct pid's task linkage is gone. Store that information alongside the inode. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-4-c8c3d8361705@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs: move setting flags into pidfs_alloc_file()	Christian Brauner
	Instead od adding it into __pidfd_prepare() place it where the actual file allocation happens and update the outdated comment. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-3-c8c3d8361705@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfd: rely on automatic cleanup in __pidfd_prepare()	Christian Brauner
	Rely on scope-based cleanup for the allocated file descriptor. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-2-c8c3d8361705@kernel.org Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	pidfs: switch to copy_struct_to_user()	Christian Brauner
	We have a helper that deals with all the required logic. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-1-c8c3d8361705@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	gpio: rcar: Use raw_spinlock to protect register access	Niklas Söderlund
	Use raw_spinlock in order to fix spurious messages about invalid context when spinlock debugging is enabled. The lock is only used to serialize register access. [ 4.239592] ============================= [ 4.239595] [ BUG: Invalid wait context ] [ 4.239599] 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35 Not tainted [ 4.239603] ----------------------------- [ 4.239606] kworker/u8:5/76 is trying to lock: [ 4.239609] ffff0000091898a0 (&p->lock){....}-{3:3}, at: gpio_rcar_config_interrupt_input_mode+0x34/0x164 [ 4.239641] other info that might help us debug this: [ 4.239643] context-{5:5} [ 4.239646] 5 locks held by kworker/u8:5/76: [ 4.239651] #0: ffff0000080fb148 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x190/0x62c [ 4.250180] OF: /soc/sound@ec500000/ports/port@0/endpoint: Read of boolean property 'frame-master' with a value. [ 4.254094] #1: ffff80008299bd80 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1b8/0x62c [ 4.254109] #2: ffff00000920c8f8 [ 4.258345] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'bitclock-master' with a value. [ 4.264803] (&dev->mutex){....}-{4:4}, at: __device_attach_async_helper+0x3c/0xdc [ 4.264820] #3: ffff00000a50ca40 (request_class#2){+.+.}-{4:4}, at: __setup_irq+0xa0/0x690 [ 4.264840] #4: [ 4.268872] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'frame-master' with a value. [ 4.273275] ffff00000a50c8c8 (lock_class){....}-{2:2}, at: __setup_irq+0xc4/0x690 [ 4.296130] renesas_sdhi_internal_dmac ee100000.mmc: mmc1 base at 0x00000000ee100000, max clock rate 200 MHz [ 4.304082] stack backtrace: [ 4.304086] CPU: 1 UID: 0 PID: 76 Comm: kworker/u8:5 Not tainted 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35 [ 4.304092] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) [ 4.304097] Workqueue: async async_run_entry_fn [ 4.304106] Call trace: [ 4.304110] show_stack+0x14/0x20 (C) [ 4.304122] dump_stack_lvl+0x6c/0x90 [ 4.304131] dump_stack+0x14/0x1c [ 4.304138] __lock_acquire+0xdfc/0x1584 [ 4.426274] lock_acquire+0x1c4/0x33c [ 4.429942] _raw_spin_lock_irqsave+0x5c/0x80 [ 4.434307] gpio_rcar_config_interrupt_input_mode+0x34/0x164 [ 4.440061] gpio_rcar_irq_set_type+0xd4/0xd8 [ 4.444422] __irq_set_trigger+0x5c/0x178 [ 4.448435] __setup_irq+0x2e4/0x690 [ 4.452012] request_threaded_irq+0xc4/0x190 [ 4.456285] devm_request_threaded_irq+0x7c/0xf4 [ 4.459398] ata1: link resume succeeded after 1 retries [ 4.460902] mmc_gpiod_request_cd_irq+0x68/0xe0 [ 4.470660] mmc_start_host+0x50/0xac [ 4.474327] mmc_add_host+0x80/0xe4 [ 4.477817] tmio_mmc_host_probe+0x2b0/0x440 [ 4.482094] renesas_sdhi_probe+0x488/0x6f4 [ 4.486281] renesas_sdhi_internal_dmac_probe+0x60/0x78 [ 4.491509] platform_probe+0x64/0xd8 [ 4.495178] really_probe+0xb8/0x2a8 [ 4.498756] __driver_probe_device+0x74/0x118 [ 4.503116] driver_probe_device+0x3c/0x154 [ 4.507303] __device_attach_driver+0xd4/0x160 [ 4.511750] bus_for_each_drv+0x84/0xe0 [ 4.515588] __device_attach_async_helper+0xb0/0xdc [ 4.520470] async_run_entry_fn+0x30/0xd8 [ 4.524481] process_one_work+0x210/0x62c [ 4.528494] worker_thread+0x1ac/0x340 [ 4.532245] kthread+0x10c/0x110 [ 4.535476] ret_from_fork+0x10/0x20 Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250121135833.3769310-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-05	gpio: aggregator: protect driver attr handlers against module unload	Koichiro Den
	Both new_device_store and delete_device_store touch module global resources (e.g. gpio_aggregator_lock). To prevent race conditions with module unload, a reference needs to be held. Add try_module_get() in these handlers. For new_device_store, this eliminates what appears to be the most dangerous scenario: if an id is allocated from gpio_aggregator_idr but platform_device_register has not yet been called or completed, a concurrent module unload could fail to unregister/delete the device, leaving behind a dangling platform device/GPIO forwarder. This can result in various issues. The following simple reproducer demonstrates these problems: #!/bin/bash while :; do # note: whether 'gpiochip0 0' exists or not does not matter. echo 'gpiochip0 0' > /sys/bus/platform/drivers/gpio-aggregator/new_device done & while :; do modprobe gpio-aggregator modprobe -r gpio-aggregator done & wait Starting with the following warning, several kinds of warnings will appear and the system may become unstable: ------------[ cut here ]------------ list_del corruption, ffff888103e2e980->next is LIST_POISON1 (dead000000000100) WARNING: CPU: 1 PID: 1327 at lib/list_debug.c:56 __list_del_entry_valid_or_report+0xa3/0x120 [...] RIP: 0010:__list_del_entry_valid_or_report+0xa3/0x120 [...] Call Trace: <TASK> ? __list_del_entry_valid_or_report+0xa3/0x120 ? __warn.cold+0x93/0xf2 ? __list_del_entry_valid_or_report+0xa3/0x120 ? report_bug+0xe6/0x170 ? __irq_work_queue_local+0x39/0xe0 ? handle_bug+0x58/0x90 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? __list_del_entry_valid_or_report+0xa3/0x120 gpiod_remove_lookup_table+0x22/0x60 new_device_store+0x315/0x350 [gpio_aggregator] kernfs_fop_write_iter+0x137/0x1f0 vfs_write+0x262/0x430 ksys_write+0x60/0xd0 do_syscall_64+0x6c/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]--- Fixes: 828546e24280 ("gpio: Add GPIO Aggregator") Cc: stable@vger.kernel.org Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Link: https://lore.kernel.org/r/20250224143134.3024598-2-koichiro.den@canonical.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-05	fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio	Matthew Wilcox (Oracle)
	ext4 and ceph already have a folio to pass; f2fs needs to be properly converted but this will do for now. This removes a reference to page->index and page->mapping as well as removing a call to compound_head(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250304170224.523141-1-willy@infradead.org Acked-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	platform/x86/amd/pmf: Update PMF Driver for Compatibility with new PMF-TA	Shyam Sundar S K
	The PMF driver allocates a shared memory buffer using tee_shm_alloc_kernel_buf() for communication with the PMF-TA. The latest PMF-TA version introduces new structures with OEM debug information and additional policy input conditions for evaluating the policy binary. Consequently, the shared memory size must be increased to ensure compatibility between the PMF driver and the updated PMF-TA. To do so, introduce the new PMF-TA UUID and update the PMF shared memory configuration to ensure compatibility with the latest PMF-TA version. Additionally, export the TA UUID. These updates will result in modifications to the prototypes of amd_pmf_tee_init() and amd_pmf_ta_open_session(). Link: https://lore.kernel.org/all/55ac865f-b1c7-fa81-51c4-d211c7963e7e@linux.intel.com/ Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20250305045842.4117767-2-Shyam-sundar.S-k@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-05	platform/x86/amd/pmf: Propagate PMF-TA return codes	Shyam Sundar S K
	In the amd_pmf_invoke_cmd_init() function within the PMF driver ensure that the actual result from the PMF-TA is returned rather than a generic EIO. This change allows for proper handling of errors originating from the PMF-TA. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20250305045842.4117767-1-Shyam-sundar.S-k@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-05	doc: correcting two prefix errors in idmappings.rst	Aiden Ma
	Add the 'k' prefix to id 21000. And id `u1000` in the third idmapping should be mapped to `k31000`, not `u31000`. Signed-off-by: Aiden Ma <jiaheng.ma@foxmail.com> Link: https://lore.kernel.org/r/tencent_4E7B1F143E8051530C21FCADF4E014DCBB06@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	doc: fix inline emphasis warning	Christian Brauner
	Fix a warning spotted by linux-next build (htmldocs): Documentation/filesystems/porting.rst:1186: WARNING: Inline emphasis start-string without end-string. [docutils] Introduced by commit 88d5baf69082 ("Change inode_operations.mkdir to return struct dentry ") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 88d5baf69082 ("Change inode_operations.mkdir to return struct dentry ") Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	Merge patch series "Change inode_operations.mkdir to return struct dentry *"	Christian Brauner
	NeilBrown <neilb@suse.de> says: This revised series contains a few clean-ups as requested by various people but no substantial changes. I reviewed the mkdir functions in many (all?) filesystems and found a few that use d_instantiate() on an unlocked inode (after unlock_new_inode()) and also support export_operations. These could potentially call d_instantiate() on a directory inode which is already attached to a dentry, though making that happen would usually require guessing the filehandle correctly. I haven't tried to address those here, (this patch set doesn't make that situation any worse) but I may in the future. * patches from https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de: VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	VFS: Change vfs_mkdir() to return the dentry.	NeilBrown
	vfs_mkdir() does not guarantee to leave the child dentry hashed or make it positive on success, and in many such cases the filesystem had to use a different dentry which it can now return. This patch changes vfs_mkdir() to return the dentry provided by the filesystems which is hashed and positive when provided. This reduces the number of cases where the resulting dentry is not positive to a handful which don't deserve extra efforts. The only callers of vfs_mkdir() which are interested in the resulting inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server. The only filesystems that don't reliably provide the inode are: - kernfs, tracefs which these clients are unlikely to be interested in - cifs in some configurations would need to do a lookup to find the created inode, but doesn't. cifs cannot be exported via NFS, is unlikely to be used by cachefiles, and smb/server only has a soft requirement for the inode, so this is unlikely to be a problem in practice. - hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is possible for a race to make that lookup fail. Actual failure is unlikely and providing callers handle negative dentries graceful they will fail-safe. So this patch removes the lookup code in nfsd and smb/server and adjusts them to fail safe if a negative dentry is provided: - cache-files already fails safe by restarting the task from the top - it still does with this change, though it no longer calls cachefiles_put_directory() as that will crash if the dentry is negative. - nfsd reports "Server-fault" which it what it used to do if the lookup failed. This will never happen on any file-systems that it can actually export, so this is of no consequence. I removed the fh_update() call as that is not needed and out-of-place. A subsequent nfsd_create_setattr() call will call fh_update() when needed. - smb/server only wants the inode to call ksmbd_smb_inherit_owner() which updates ->i_uid (without calling notify_change() or similar) which can be safely skipping on cifs (I hope). If a different dentry is returned, the first one is put. If necessary the fact that it is new can be determined by comparing pointers. A new dentry will certainly have a new pointer (as the old is put after the new is obtained). Similarly if an error is returned (via ERR_PTR()) the original dentry is put. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05	nfs: change mkdir inode_operation to return alternate dentry if needed.	NeilBrown
	mkdir now allows a different dentry to be returned which is sometimes relevant for nfs. This patch changes the nfs_rpc_ops mkdir op to return a dentry, and passes that back to the caller. The mkdir nfs_rpc_op will return NULL if the original dentry should be used. This matches the mkdir inode_operation. nfs4_do_create() is duplicated to nfs4_do_mkdir() which is changed to handle the specifics of directories. Consequently the current special handling for directories is removed from nfs4_do_create() Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-6-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>