summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-04-15remove call_{read,write}_iter() functionsMiklos Szeredi
These have no clear purpose. This is effectively a revert of commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()"). The patch was created with the help of a coccinelle script. Fixes: bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-15kernel_file_open(): get rid of inode argumentAl Viro
always equal to ->dentry->d_inode of the path argument these days. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-15fd_is_open(): move to fs/file.cAl Viro
no users outside that... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-15close_on_exec(): pass files_struct instead of fdtableAl Viro
both callers are happier that way... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-15net: dqs: make struct dql more cache efficientBreno Leitao
With the previous change, struct dqs->stall_thrs will be in the hot path (at queue side), even if DQS is disabled. The other fields accessed in this function (last_obj_cnt and num_queued) are in the first cache line, let's move this field (stall_thrs) to the very first cache line, since there is a hole there. This does not change the structure size, since it moves an short (2 bytes) to 4-bytes whole in the first cache line. This is the new structure format now: struct dql { unsigned int num_queued; unsigned int last_obj_cnt; ... short unsigned int stall_thrs; /* XXX 2 bytes hole, try to pack */ ... /* --- cacheline 1 boundary (64 bytes) --- */ ... /* Longest stall detected, reported to user */ short unsigned int stall_max; /* XXX 2 bytes hole, try to pack */ }; Also, read the stall_thrs (now in the very first cache line) earlier, together with dql->num_queued (also in the first cache line). Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-15net: dql: Optimize stall information populationBreno Leitao
When Dynamic Queue Limit (DQL) is set, it always populate stall information through dql_queue_stall(). However, this information is only necessary if a stall threshold is set, stored in struct dql->stall_thrs. dql_queue_stall() is cheap, but not free, since it does have memory barriers and so forth. Do not call dql_queue_stall() if there is no stall threshold set, and save some CPU cycles. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-4-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-15net: dql: Separate queue function responsibilitiesBreno Leitao
The dql_queued() function currently handles both queuing object counts and populating bitmaps for reporting stalls. This commit splits the bitmap population into a separate function, allowing for conditional invocation in scenarios where the feature is disabled. This refactor maintains functionality while improving code organization. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-15net: dql: Avoid calling BUG() when WARN() is enoughBreno Leitao
If the dql_queued() function receives an invalid argument, WARN about it and continue, instead of crashing the kernel. This was raised by checkpatch, when I am refactoring this code (see following patch/commit) WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-15Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfigVignesh Balasubramanian
"ARCH_HAVE_EXTRA_ELF_NOTES" enables an extra note section in the core dump. Kconfig variable is preferred over ARCH_HAVE_* macro. Co-developed-by: Jini Susan George <jinisusan.george@amd.com> Signed-off-by: Jini Susan George <jinisusan.george@amd.com> Signed-off-by: Vignesh Balasubramanian <vigbalas@amd.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20240412062138.1132841-2-vigbalas@amd.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-15rcu: Add a trace event for synchronize_rcu_normal()Uladzislau Rezki (Sony)
Add an rcu_sr_normal() trace event. It takes three arguments first one is the name of RCU flavour, second one is a user id which triggeres synchronize_rcu_normal() and last one is an event. There are two traces in the synchronize_rcu_normal(). On entry, when a new request is registered and on exit point when request is completed. Please note, CONFIG_RCU_TRACE=y is required to activate traces. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15rcu: Mollify sparse with RCU guardJohannes Berg
When using "guard(rcu)();" sparse will complain, because even though it now understands the cleanup attribute, it doesn't evaluate the calls from it at function exit, and thus doesn't count the context correctly. Given that there's a conditional in the resulting code: static inline void class_rcu_destructor(class_rcu_t *_T) { if (_T->lock) { rcu_read_unlock(); } } it seems that even trying to teach sparse to evalulate the cleanup attribute function it'd still be difficult to really make it understand the full context here. Suppress the sparse warning by just releasing the context in the acquisition part of the function, after all we know it's safe with the guard, that's the whole point of it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FSTvrtko Ursulin
There is no point in compiling in the list and mutex operations which are only used from the dma-buf debugfs code, if debugfs is not compiled in. Put the code in questions behind some kconfig guards and so save some text and maybe even a pointer per object at runtime when not enabled. Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com Reviewed-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240328145323.68872-1-tursulin@igalia.com
2024-04-15drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_bufferMaíra Canal
On version 11 of the "drm/panic: Add a drm panic handler" series [1], Thomas suggested to change the name of the function `drm_panic_gem_get_scanout_buffer` to `drm_fb_dma_get_scanout_buffer` and this request was applied on version 12, which is the version that landed [2]. Although the name of the function changed on the C file, it didn't changed on the header file, leading to a compilation error as such: drivers/gpu/drm/imx/ipuv3/ipuv3-plane.c:780:24: error: use of undeclared identifier 'drm_fb_dma_get_scanout_buffer'; did you mean 'drm_panic_gem_get_scanout_buffer'? 780 | .get_scanout_buffer = drm_fb_dma_get_scanout_buffer, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | drm_panic_gem_get_scanout_buffer ./include/drm/drm_fb_dma_helper.h:23:5: note: 'drm_panic_gem_get_scanout_buffer' declared here 23 | int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane, | ^ 1 error generated. Fix the compilation error by changing `drm_panic_gem_get_scanout_buffer` to `drm_fb_dma_get_scanout_buffer` on the header file. Link: https://lore.kernel.org/dri-devel/20240328120638.468738-1-jfalempe@redhat.com/ [1] Link: https://lore.kernel.org/dri-devel/aea2aa01-7f03-453b-8b30-8f4d90b1b47f@redhat.com/ [2] Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for drm_panic") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240415151013.3210278-1-mcanal@igalia.com
2024-04-15rcu: Remove redundant CONFIG_PROVE_RCU #if conditionPaul E. McKenney
The #if condition controlling the rcu_preempt_sleep_check() definition has a redundant check for CONFIG_PREEMPT_RCU, which is already checked for by an enclosing #ifndef. This commit therefore removes this redundant condition from the inner #if. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15vfs, swap: compile out IS_SWAPFILE() on swapless configsAlexey Dobriyan
No swap support -- no swapfiles possible. Signed-off-by: Alexey Dobriyan (Yandex) <adobriyan@gmail.com> Link: https://lore.kernel.org/r/2391c7f5-0f83-4188-ae56-4ec7ccbf2576@p183 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-15drm/fb_dma: Add generic get_scanout_buffer() for drm_panicJocelyn Falempe
This was initialy done for imx6, but should work on most drivers using drm_fb_dma_helper. v8: * Replace get_scanout_buffer() logic with drm_panic_set_buffer() (Thomas Zimmermann) v9: * go back to get_scanout_buffer() * move get_scanout_buffer() to plane helper functions v12: * Rename drm_panic_gem_get_scanout_buffer to drm_fb_dma_get_scanout_buffer (Thomas Zimmermann) * Remove the #ifdef CONFIG_DRM_PANIC, and build it unconditionnaly, as it's a small function. (Thomas Zimmermann) Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-6-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-04-15drm/panic: Add a drm panic handlerJocelyn Falempe
This module displays a user friendly message when a kernel panic occurs. It currently doesn't contain any debug information, but that can be added later. v2 * Use get_scanout_buffer() instead of the drm client API. (Thomas Zimmermann) * Add the panic reason to the panic message (Nerdopolis) * Add an exclamation mark (Nerdopolis) v3 * Rework the drawing functions, to write the pixels line by line and to use the drm conversion helper to support other formats. (Thomas Zimmermann) v4 * Use drm_fb_r1_to_32bit for fonts (Thomas Zimmermann) * Remove the default y to DRM_PANIC config option (Thomas Zimmermann) * Add foreground/background color config option * Fix the bottom lines not painted if the framebuffer height is not a multiple of the font height. * Automatically register the device to drm_panic, if the function get_scanout_buffer exists. (Thomas Zimmermann) v5 * Change the drawing API, use drm_fb_blit_from_r1() to draw the font. * Also add drm_fb_fill() to fill area with background color. * Add draw_pixel_xy() API for drivers that can't provide a linear buffer. * Add a flush() callback for drivers that needs to synchronize the buffer. * Add a void *private field, so drivers can pass private data to draw_pixel_xy() and flush(). v6 * Fix sparse warning for panic_msg and logo. v7 * Add select DRM_KMS_HELPER for the color conversion functions. v8 * Register directly each plane to the panic notifier (Sima) * Add raw_spinlock to properly handle concurrency (Sima) * Register plane instead of device, to avoid looping through plane list, and simplify code. * Replace get_scanout_buffer() logic with drm_panic_set_buffer() (Thomas Zimmermann) * Removed the draw_pixel_xy() API, will see later if it can be added back. v9 * Revert to using get_scanout_buffer() (Sima) * Move get_scanout_buffer() and panic_flush() to the plane helper functions (Thomas Zimmermann) * Register all planes with get_scanout_buffer() to the panic notifier * Use drm_panic_lock() to protect against race (Sima) v10 * Move blit and fill functions back in drm_panic (Thomas Zimmermann). * Simplify the text drawing functions. * Use kmsg_dumper instead of panic_notifier (Sima). v12 * Use array for map and pitch in struct drm_scanout_buffer to support multi-planar format later. (Thomas Zimmermann) * Better indent struct drm_scanout_buffer declaration. (Thomas Zimmermann) Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-3-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-04-15drm/panic: Add drm panic lockingDaniel Vetter
Rough sketch for the locking of drm panic printing code. The upshot of this approach is that we can pretty much entirely rely on the atomic commit flow, with the pair of raw_spin_lock/unlock providing any barriers we need, without having to create really big critical sections in code. This also avoids the need that drivers must explicitly update the panic handler state, which they might forget to do, or not do consistently, and then we blow up in the worst possible times. It is somewhat racy against a concurrent atomic update, and we might write into a buffer which the hardware will never display. But there's fundamentally no way to avoid that - if we do the panic state update explicitly after writing to the hardware, we might instead write to an old buffer that the user will barely ever see. Note that an rcu protected deference of plane->state would give us the the same guarantees, but it has the downside that we then need to protect the plane state freeing functions with call_rcu too. Which would very widely impact a lot of code and therefore doesn't seem worth the complexity compared to a raw spinlock with very tiny critical sections. Plus rcu cannot be used to protect access to peek/poke registers anyway, so we'd still need it for those cases. Peek/poke registers for vram access (or a gart pte reserved just for panic code) are also the reason I've gone with a per-device and not per-plane spinlock, since usually these things are global for the entire display. Going with per-plane locks would mean drivers for such hardware would need additional locks, which we don't want, since it deviates from the per-console takeoverlocks design. Longer term it might be useful if the panic notifiers grow a bit more structure than just the absolute bare EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console drivers with proper register/unregister interfaces we could perhaps reuse the very fancy console lock with all it's check and takeover semantics that John Ogness is developing to fix the console_lock mess. But for the initial cut of a drm panic printing support I don't think we need that, because the critical sections are extremely small and only happen once per display refresh. So generally just 60 tiny locked sections per second, which is nothing compared to a serial console running a 115kbaud doing really slow mmio writes for each byte. So for now the raw spintrylock in drm panic notifier callback should be good enough. Another benefit of making panic notifiers more like full blown consoles (that are used in panics only) would be that we get the two stage design, where first all the safe outputs are used. And then the dangerous takeover tricks are deployed (where for display drivers we also might try to intercept any in-flight display buffer flips, which if we race and misprogram fifos and watermarks can hang the memory controller on some hw). For context the actual implementation on the drm side is by Jocelyn and this patch is meant to be combined with the overall approach in v7 (v8 is a bit less flexible, which I think is the wrong direction): https://lore.kernel.org/dri-devel/20240104160301.185915-1-jfalempe@redhat.com/ Note that the locking is very much not correct there, hence this separate rfc. Starting from v10, I (Jocelyn) have included this patch in the drm_panic series, and done the corresponding changes. v2: - fix authorship, this was all my typing - some typo oopsies - link to the drm panic work by Jocelyn for context v10: - Use spinlock_irqsave/restore (John Ogness) v11: - Use macro instead of inline functions for drm_panic_lock/unlock (John Ogness) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-2-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-04-15io_uring: separate header for exported net bitsPavel Begunkov
We're exporting some io_uring bits to networking, e.g. for implementing a net callback for io_uring cmds, but we don't want to expose more than needed. Add a separate header for networking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://lore.kernel.org/r/20240409210554.1878789-1-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: remove async request cachePavel Begunkov
io_req_complete_post() was a sole user of ->locked_free_list, but since we just gutted the function, the cache is not used anymore and can be removed. ->locked_free_list served as an asynhronous counterpart of the main request (i.e. struct io_kiocb) cache for all unlocked cases like io-wq. Now they're all forced to be completed into the main cache directly, off of the normal completion path or via io_free_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7bffccd213e370abd4de480e739d8b08ab6c1326.1712331455.git.asml.silence@gmail.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ringJens Axboe
Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work. This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: Avoid anonymous enums in io_uring uapiGabriel Krisman Bertazi
While valid C, anonymous enums confuse Cython (Python to C translator), as reported by Ritesh (YoSTEALTH) [1] . Since people rely on it when building against liburing and we want to keep this header in sync with the library version, let's name the existing enums in the uapi header. [1] https://github.com/cython/cython/issues/3240 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240328210935.25640-1-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/alloc_cache: switch to array based cachingJens Axboe
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself. Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle. Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size. This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/uring_cmd: switch to always allocating async dataJens Axboe
Basic conversion ensuring async_data is allocated off the prep path. Adds a basic alloc cache as well, as passthrough IO can be quite high in rate. Tested-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/rw: always setup io_async_rw for read/write requestsJens Axboe
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state. Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers. This includes adding an alloc cache for io_async_rw, to make it cheap to handle. In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: get rid of intermediate aux cqe cachesPavel Begunkov
io_post_aux_cqe(), which is used for multishot requests, delays completions by putting CQEs into a temporary array for the purpose completion lock/flush batching. DEFER_TASKRUN doesn't need any locking, so for it we can put completions directly into the CQ and defer post completion handling with a flag. That leaves !DEFER_TASKRUN, which is not that interesting / hot for multishot requests, so have conditional locking with deferred flush for them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: remove struct io_tw_state::lockedPavel Begunkov
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15nvme/io_uring: use helper for polled completionsJens Axboe
NVMe is making up issue_flags, which is a no-no in general, and to make matters worse, they are completely the wrong ones. For a pure polled request, which it does check for, we're already inside the ctx->uring_lock when the completions are run off io_do_iopoll(). Hence the correct flag would be '0' rather than IO_URING_F_UNLOCKED. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/cmd: document some uring_cmd related helpersPavel Begunkov
Add comments warning users that they're only allowed to pass issue_flags that were given from io_uring. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/82ff8a45f2c3eb5f3a04a33f0692e5e4a1320455.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15ACPI: platform-profile: add platform_profile_cycle()Gergo Koteles
Some laptops have a key to switch platform profiles. Add a platform_profile_cycle() function to cycle between the enabled profiles. Signed-off-by: Gergo Koteles <soyer@irl.hu> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/5a97deddf72aa5e764d881eb39a7ba35c01a903e.1712597199.git.soyer@irl.hu Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-15dt-bindings: leds: Add LED_FUNCTION_FNLOCKGergo Koteles
Newer laptops have FnLock LED. Add a define for this very common function. Signed-off-by: Gergo Koteles <soyer@irl.hu> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Lee Jones <lee@kernel.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/8ac95e85a53dc0b8cce1e27fc1cab6d19221543b.1712063200.git.soyer@irl.hu Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-15gpiolib: acpi: Pass con_id instead of property into acpi_dev_gpio_irq_get_by()Andy Shevchenko
Pass the con_id instead of property so that callers won't repeat the GPIO suffixes to try. Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2024-04-15drm/edid: add drm_edid_print_product_id()Jani Nikula
Add a function to print a decoded EDID vendor and product id to a drm printer, optionally with the raw data. v2: - refactor date printing - use seq_buf to avoid kasprintf() (Ville) - handle week == 0 (Ville) - use be16_to_cpu() on manufacturer_name Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Melissa Wen <mwen@igalia.com> # v1 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/32bbc83ee6557809ef6d7a5edb1bc8ef4d56d10f.1712655867.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-04-15drm/edid: add drm_edid_get_product_id()Jani Nikula
Add a struct drm_edid based function to get the vendor and product ID from an EDID. Add a separate struct for defining this part of the EDID, with defined byte order for manufacturer name, product code and serial number. v2: Define manufacturer_name as __be16 instead of u8[2] (Ville) Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/df0e7dedbf7f2c190039d6e6eae3e126eba113c9.1712655867.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-04-15iommu: account IOMMU allocated memoryPasha Tatashin
In order to be able to limit the amount of memory that is allocated by IOMMU subsystem, the memory must be accounted. Account IOMMU as part of the secondary pagetables as it was discussed at LPC. The value of SecPageTables now contains mmeory allocation by IOMMU and KVM. There is a difference between GFP_ACCOUNT and what NR_IOMMU_PAGES shows. GFP_ACCOUNT is set only where it makes sense to charge to user processes, i.e. IOMMU Page Tables, but there more IOMMU shared data that should not really be charged to a specific process. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240413002522.1101315-12-pasha.tatashin@soleen.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-15iommu: observability of the IOMMU allocationsPasha Tatashin
Add NR_IOMMU_PAGES into node_stat_item that counts number of pages that are allocated by the IOMMU subsystem. The allocations can be view per-node via: /sys/devices/system/node/nodeN/vmstat. For example: $ grep iommu /sys/devices/system/node/node*/vmstat /sys/devices/system/node/node0/vmstat:nr_iommu_pages 106025 /sys/devices/system/node/node1/vmstat:nr_iommu_pages 3464 The value is in page-count, therefore, in the above example the iommu allocations amount to ~428M. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240413002522.1101315-11-pasha.tatashin@soleen.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-15media: videodev2: Fix v4l2_ext_control packing.Ricardo Ribalda
The structure is packed, which requires that all its fields need to be also packed. ./include/uapi/linux/videodev2.h:1810:2: warning: field within 'struct v4l2_ext_control' is less aligned than 'union v4l2_ext_control::(anonymous at ./include/uapi/linux/videodev2.h:1810:2)' and is usually due to 'struct v4l2_ext_control' being packed, which can lead to unaligned accesses [-Wunaligned-access] Explicitly set the inner union as packed. Marking the inner union as 'packed' does not change the layout, since the whole struct is already packed, it just silences the clang warning. See also this llvm discussion: https://github.com/llvm/llvm-project/issues/55520 Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-04-15media: dvb: Fix dtvs_stats packing.Ricardo Ribalda
The structure is packed, which requires that all its fields need to be also packed. ./include/uapi/linux/dvb/frontend.h:854:2: warning: field within 'struct dtv_stats' is less aligned than 'union dtv_stats::(anonymous at ./include/uapi/linux/dvb/frontend.h:854:2)' and is usually due to 'struct dtv_stats' being packed, which can lead to unaligned accesses [-Wunaligned-access] Explicitly set the inner union as packed. Marking the inner union as 'packed' does not change the layout, since the whole struct is already packed, it just silences the clang warning. See also this llvm discussion: https://github.com/llvm/llvm-project/issues/55520 Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-04-15flow_offload: add control flag checking helpersAsbjørn Sloth Tønnesen
These helpers aim to help drivers, with checking for the presence of unsupported control flags. For drivers supporting at least one control flag: flow_rule_is_supp_control_flags() For drivers using flow_rule_match_control(), but not using flags: flow_rule_has_control_flags() For drivers not using flow_rule_match_control(): flow_rule_match_has_control_flags() While primarily aimed at FLOW_DISSECTOR_KEY_CONTROL and flow_rule_match_control(), then the first two can also be used with FLOW_DISSECTOR_KEY_ENC_CONTROL and flow_rule_match_enc_control(). These helpers mirrors the existing check done in sfc: drivers/net/ethernet/sfc/tc.c +276 Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15srcu: Make Tiny SRCU explicitly disable preemptionPaul E. McKenney
Because Tiny SRCU is used only in kernels built with either CONFIG_PREEMPT_NONE=y or CONFIG_PREEMPT_VOLUNTARY=y, there has not been any need for TINY SRCU to explicitly disable preemption. However, the prospect of lazy preemption changes that, and the lazy-preemption patches do result in rcutorture runs finding both too-short grace periods and grace-period hangs for Tiny SRCU. This commit therefore adds the needed preempt_disable() and preempt_enable() calls to Tiny SRCU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ankur Arora <ankur.a.arora@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15rcu: Update lockdep while in RCU read-side critical sectionUladzislau Rezki (Sony)
With Ankur's lazy-/auto-preemption patches applied and with a lazy-preemptible kernel in combination with a non-preemptible RCU, lockdep sometimes complains about context switches within RCU read-side critical sections. This is a false positive due to rcu_read_unlock() updating lockdep state too late: __release(RCU); __rcu_read_unlock(); // Context switch here results in lockdep false positive!!! rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */ Although this complaint could also happen with preemptible RCU in a preemptible kernel, the odds of that happening aer quite low. In constrast, with non-preemptible RCU, a long critical section has a high probability of performing a context switch from the preempt_enable() in __rcu_read_unlock(). The fix is straightforward, just move the rcu_lock_release() within rcu_read_unlock() to obtain the reverse order from that of rcu_read_lock(): rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */ __release(RCU); __rcu_read_unlock(); This commit makes this change. Co-developed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ankur Arora <ankur.a.arora@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de>
2024-04-14perf: Move perf_event_fasync() to perf_event.hKyle Huey
This will allow it to be called from perf_output_wakeup(). Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240413141618.4160-2-khuey@kylehuey.com
2024-04-14Merge branch 'linus' into perf/core, to pick up perf/urgent fixesIngo Molnar
Pick up perf/urgent fixes that are upstream already, but not yet in the perf/core development branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-14Merge tag 'timers-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: - Address a (valid) W=1 build warning - Fix timer self-tests - Annotate a KCSAN warning wrt. accesses to the tick_do_timer_cpu global variable - Address a !CONFIG_BUG build warning * tag 'timers-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: kselftest: Fix build failure with NOLIBC selftests: timers: Fix abs() warning in posix_timers test selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn selftests: timers: Fix posix_timers ksft_print_msg() warning selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior bug: Fix no-return-statement warning with !CONFIG_BUG timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu selftests/timers/posix_timers: Reimplement check_timer_distribution() irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
2024-04-14Merge tag 'locking-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix a PREEMPT_RT build bug" * tag 'locking-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Make rwsem_assert_held_write_nolockdep() build with PREEMPT_RT=y
2024-04-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio bugfixes from Michael Tsirkin: "Some small, obvious (in hindsight) bugfixes: - new ioctl in vhost-vdpa has a wrong # - not too late to fix - vhost has apparently been lacking an smp_rmb() - due to code duplication :( The duplication will be fixed in the next merge cycle, this is a minimal fix - an error message in vhost talks about guest moving used index - which of course never happens, guest only ever moves the available index - i2c-virtio didn't set the driver owner so it did not get refcounted correctly" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct misleading printing information vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE virtio: store owner from modules with register_virtio_driver() vhost: Add smp_rmb() in vhost_enable_notify() vhost: Add smp_rmb() in vhost_vq_avail_empty()
2024-04-14net: change maximum number of UDP segments to 128Yuri Benditovich
The commit fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") adds check of potential number of UDP segments vs UDP_MAX_SEGMENTS in linux/virtio_net.h. After this change certification test of USO guest-to-guest transmit on Windows driver for virtio-net device fails, for example with packet size of ~64K and mss of 536 bytes. In general the USO should not be more restrictive than TSO. Indeed, in case of unreasonably small mss a lot of segments can cause queue overflow and packet loss on the destination. Limit of 128 segments is good for any practical purpose, with minimal meaningful mss of 536 the maximal UDP packet will be divided to ~120 segments. The number of segments for UDP packets is validated vs UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect quest-to-guest path but does affect packets sent to host, for example. It is important to mention that UDP_MAX_SEGMENTS is kernel-only define and not available to user mode socket applications. In order to request MSS smaller than MTU the applications just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is no limitations on socket API level. Fixes: fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-14bootconfig: use memblock_free_late to free xbc memory to buddyQiang Zhang
On the time to free xbc memory in xbc_exit(), memblock may has handed over memory to buddy allocator. So it doesn't make sense to free memory back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs shows this case. This patch fixes the xbc memory free problem by calling memblock_free() in early xbc init error rewind path and calling memblock_free_late() in xbc exit path to free memory to buddy allocator. [ 9.410890] ================================================================== [ 9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260 [ 9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1 [ 9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 6.9.0-rc3-00208-g586b5dfb51b9 #5 [ 9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023 [ 9.460789] Call Trace: [ 9.463518] <TASK> [ 9.465859] dump_stack_lvl+0x53/0x70 [ 9.469949] print_report+0xce/0x610 [ 9.473944] ? __virt_addr_valid+0xf5/0x1b0 [ 9.478619] ? memblock_isolate_range+0x12d/0x260 [ 9.483877] kasan_report+0xc6/0x100 [ 9.487870] ? memblock_isolate_range+0x12d/0x260 [ 9.493125] memblock_isolate_range+0x12d/0x260 [ 9.498187] memblock_phys_free+0xb4/0x160 [ 9.502762] ? __pfx_memblock_phys_free+0x10/0x10 [ 9.508021] ? mutex_unlock+0x7e/0xd0 [ 9.512111] ? __pfx_mutex_unlock+0x10/0x10 [ 9.516786] ? kernel_init_freeable+0x2d4/0x430 [ 9.521850] ? __pfx_kernel_init+0x10/0x10 [ 9.526426] xbc_exit+0x17/0x70 [ 9.529935] kernel_init+0x38/0x1e0 [ 9.533829] ? _raw_spin_unlock_irq+0xd/0x30 [ 9.538601] ret_from_fork+0x2c/0x50 [ 9.542596] ? __pfx_kernel_init+0x10/0x10 [ 9.547170] ret_from_fork_asm+0x1a/0x30 [ 9.551552] </TASK> [ 9.555649] The buggy address belongs to the physical page: [ 9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30 [ 9.570821] flags: 0x200000000000000(node=0|zone=2) [ 9.576271] page_type: 0xffffffff() [ 9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000 [ 9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 [ 9.597476] page dumped because: kasan: bad access detected [ 9.605362] Memory state around the buggy address: [ 9.610714] ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.618786] ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.634930] ^ [ 9.638534] ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.646605] ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.654675] ================================================================== Link: https://lore.kernel.org/all/20240414114944.1012359-1-qiang4.zhang@linux.intel.com/ Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Stable@vger.kernel.org Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-13Merge tag 'counter-updates-for-6.10' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next William writes: Counter updates for 6.10 Three key updates of note herein: - Introduction of the COUNTER_COMP_FREQUENCY() macro to simplify creation of "frequency" Counter extensions - Three additional Signals (Clock, Channel 3, and Channel 4) are supported for the stm32-timer-cnt - Counter events support added for the stm32-timer-cnt There are also some miscellaneous cleanups and improvements, such as constifying Counter structures, resolving a kernel-doc description warning, and converting platform_driver remove callbacks to remove_new. * tag 'counter-updates-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter: counter: ti-ecap-capture: Utilize COUNTER_COMP_FREQUENCY macro counter: ti-eqep: Convert to platform remove callback returning void counter: ti-ecap-capture: Convert to platform remove callback returning void MAINTAINERS: Update email addresses for William Breathitt Gray counter: stm32-timer-cnt: add support for capture events counter: stm32-timer-cnt: add support for overflow events counter: stm32-timer-cnt: probe number of channels from registers counter: stm32-timer-cnt: introduce channels counter: stm32-timer-cnt: add checks on quadrature encoder capability counter: stm32-timer-cnt: add counter prescaler extension counter: stm32-timer-cnt: introduce clock signal counter: stm32-timer-cnt: adopt signal definitions counter: stm32-timer-cnt: rename counter counter: stm32-timer-cnt: rename quadrature signal counter: Introduce the COUNTER_COMP_FREQUENCY() macro counter: constify the struct device_type usage counter: make counter_bus_type const counter: linux/counter.h: fix Excess kernel-doc description warning
2024-04-13vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirementsLinus Torvalds
"The definition of insanity is doing the same thing over and over again and expecting different results” We've tried to do this before, most recently with commit bb2314b47996 ("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a decade ago. But the effort goes back even further than that, eg this thread back from 1998 that is so old that we don't even have it archived in lore: https://lkml.org/lkml/1998/3/10/108 which also points out some of the reasons why it's dangerous. Or, how about then in 2003: https://lkml.org/lkml/2003/4/6/112 where we went through some of the same arguments, just wirh different people involved. In particular, having access to a file descriptor does not necessarily mean that you have access to the path that was used for lookup, and there may be very good reasons why you absolutely must not have access to a path to said file. For example, if we were passed a file descriptor from the outside into some limited environment (think chroot, but also user namespaces etc) a 'flink()' system call could now make that file visible inside a context where it's not supposed to be visible. In the process the user may also be able to re-open it with permissions that the original file descriptor did not have (eg a read-only file descriptor may be associated with an underlying file that is writable). Another variation on this is if somebody else (typically root) opens a file in a directory that is not accessible to others, and passes the file descriptor on as a read-only file. Again, the access to the file descriptor does not imply that you should have access to a path to the file in the filesystem. So while we have tried this several times in the past, it never works. The last time we did this, that commit bb2314b47996 quickly got reverted again in commit f0cc6ffb8ce8 (Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once the whole discussion about the interface is done". Well, the discussion is long done, and didn't come to any resolution. There's no question that 'flink()' would be a useful operation, but it's a dangerous one. However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED: Use creds in file structs") we have had a fairly straightforward way to check whether the file descriptor was opened by the same credentials as the credentials of the flink(). That allows the most common patterns that people want to use, which tend to be to either open the source carefully (ie using the openat2() RESOLVE_xyz flags, and/or checking ownership with fstat() before linking), or to use O_TMPFILE and fill in the file contents before it's exposed to the world with linkat(). But it also means that if the file descriptor was opened by somebody else, or we've gone through a credentials change since, the operation no longer works (unless we have CAP_DAC_READ_SEARCH capabilities in the opener's user namespace, as before). Note that the credential equality check is done by using pointer equality, which means that it's not enough that you have effectively the same user - they have to be literally identical, since our credentials are using copy-on-write semantics. So you can't change your credentials to something else and try to change it back to the same ones between the open() and the linkat(). This is not meant to be some kind of generic permission check, this is literally meant as a "the open and link calls are 'atomic' wrt user credentials" check. It also means that you can't just move things between namespaces, because the credentials aren't just a list of uid's and gid's: they includes the pointer to the user_ns that the capabilities are relative to. So let's try this one more time and see if maybe this approach ends up being workable after all. Cc: Andrew Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240411001012.12513-1-torvalds@linux-foundation.org [brauner: relax capability check to opener of the file] Link: https://lore.kernel.org/all/20231113-undenkbar-gediegen-efde5f1c34bc@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>