summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-07-17mm: move page zone helpers from mm.h to mmzone.hAlex Sierra
It makes more sense to have these helpers in zone specific header file, rather than the generic mm.h Link: https://lkml.kernel.org/r/20220715150521.18165-3-alex.sierra@amd.com Signed-off-by: Alex Sierra <alex.sierra@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17mm: rename is_pinnable_page() to is_longterm_pinnable_page()Alex Sierra
Patch series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping", v9. This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory owned by a device that can be mapped into CPU page tables like MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE. This patch series is mostly self-contained except for a few places where it needs to update other subsystems to handle the new memory type. System stability and performance are not affected according to our ongoing testing, including xfstests. How it works: The system BIOS advertises the GPU device memory (aka VRAM) as SPM (special purpose memory) in the UEFI system address map. The amdgpu driver registers the memory with devmap as MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for this hardware page migration capability is the Frontier supercomputer project. This functionality is not AMD-specific. We expect other GPU vendors to find this functionality useful, and possibly other hardware types in the future. Our test nodes in the lab are similar to the Frontier configuration, with .5 TB of system memory plus 256 GB of device memory split across 4 GPUs, all in a single coherent address space. Page migration is expected to improve application efficiency significantly. We will report empirical results as they become available. Coherent device type pages at gup are now migrated back to system memory if they are being pinned long-term (FOLL_LONGTERM). The reason is, that long-term pinning would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). These series incorporate Alistair Popple patches to do this migration from pin_user_pages() calls. hmm_gup_test has been added to hmm-test to test different get user pages calls. This series includes handling of device-managed anonymous pages returned by vm_normal_pages. Although they behave like normal pages for purposes of mapping in CPU page tables and for COW, they do not support LRU lists, NUMA migration or THP. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. This patch (of 14): is_pinnable_page() and folio_is_pinnable() are renamed to is_longterm_pinnable_page() and folio_is_longterm_pinnable() respectively. These functions are used in the FOLL_LONGTERM flag context. Link: https://lkml.kernel.org/r/20220715150521.18165-1-alex.sierra@amd.com Link: https://lkml.kernel.org/r/20220715150521.18165-2-alex.sierra@amd.com Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17mm: Add PAGE_ALIGN_DOWN macroDavid Gow
This is just the same as PAGE_ALIGN(), but rounds the address down, not up. Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Gow <davidgow@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17net/mlx5: fs, allow flow table creation with a UIDMark Bloch
Add UID field to flow table attributes to allow creating flow tables with a non default (zero) uid. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17net/mlx5: fs, expose flow table ID to usersMark Bloch
Expose the flow table ID to users. This will be used by downstream patches to allow creating steering rules that point to a flow table ID. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17net/mlx5: Expose the ability to point to any UID from shared UIDMark Bloch
Expose shared_object_to_user_object_allowed, this capability means an object created with shared UID can point to any UID. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17Merge tag 'input-for-v5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - fix Goodix driver to properly behave on the Aya Neo Next - some more sanity checks in usbtouchscreen driver - a tweak in wm97xx driver in preparation for remove() to return void - a clarification in input core regarding units of measurement for resolution on touch events. * tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: document the units for resolution of size axes Input: goodix - call acpi_device_fix_up_power() in some cases Input: wm97xx - make .remove() obviously always return 0 Input: usbtouchscreen - add driver_info sanity check
2022-07-17KVM: arm64: vgic: Consolidate userspace access for base address settingMarc Zyngier
Align kvm_vgic_addr() with the rest of the code by moving the userspace accesses into it. kvm_vgic_addr() is also made static. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-17KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address settingMarc Zyngier
We carry a legacy interface to set the base addresses for GICv2. As this is currently plumbed into the same handling code as the modern interface, it limits the evolution we can make there. Add a helper dedicated to this handling, with a view of maybe removing this in the future. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-17media: mc-entity: Add a new helper function to get a remote pad for a padLaurent Pinchart
The newly added media_entity_remote_source_pad_unique() helper function handles use cases where the entity has a link enabled uniqueness constraint covering all pads. There are use cases where the constraint covers a specific pad only. Add a new media_pad_remote_pad_unique() function to handle this. It operates as media_entity_remote_source_pad_unique(), but on a given pad instead of on the entity. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-17media: mc-entity: Add a new helper function to get a remote padLaurent Pinchart
The media_entity_remote_pad_first() helper function returns the first remote pad it finds connected to a given pad. Beside being possibly non-deterministic (as it stops at the first enabled link), the fact that it returns the first match makes it unsuitable for drivers that need to guarantee that a single link is enabled, for instance when an entity can process data from one of multiple sources at a time. For those use cases, add a new helper function, media_entity_remote_pad_unique(), that operates on an entity and returns a remote pad, with a guarantee that only one link is enabled. To ease its use in drivers, also add an inline wrapper that locates source pads specifically. A wrapper that locates sink pads can easily be added when needed. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-17media: mc-entity: Rename media_entity_remote_pad() to ↵Laurent Pinchart
media_pad_remote_pad_first() The media_entity_remote_pad() is misnamed, as it operates on a pad and not an entity. Rename it to media_pad_remote_pad_first() to clarify its behaviour. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-17media: v4l2-async: Add notifier operation to destroy asd instancesLaurent Pinchart
Drivers typically extend the v4l2_async_subdev structure by embedding it in a driver-specific structure, to store per-subdev custom data. The v4l2_async_subdev instances are freed by the v4l2-async framework, which makes this mechanism cumbersome to use safely when custom data needs special treatment to be destroyed (such as freeing additional memory, or releasing references to kernel objects). To ease this, add a .destroy() operation to the v4l2_async_notifier_operations structure. The operation is called right before the v4l2_async_subdev is freed, giving drivers a chance to destroy data if needed. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-17media: videobuf2: Introduce vb2_find_buffer()Ezequiel Garcia
All users of vb2_find_timestamp() combine it with vb2_get_buffer() to retrieve a videobuf2 buffer, given a u64 timestamp. Introduce an API for this use-case. Users will be converted to the new API as follow-up commits. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Acked-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-17media: Add P010 tiled formatEzequiel Garcia
Add P010 tiled format [rebased, updated pixel format name and added description] Tested-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-16Merge tag 'tty-5.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty and serial driver fixes from Greg KH: "Here are some TTY and Serial driver fixes for 5.19-rc7. They resolve a number of reported problems including: - longtime bug in pty_write() that has been reported in the past. - 8250 driver fixes - new serial device ids - vt overlapping data copy bugfix - other tiny serial driver bugfixes All of these have been in linux-next for a while with no reported problems" * tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: use new tty_insert_flip_string_and_push_buffer() in pty_write() tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push() serial: 8250: dw: Fix the macro RZN1_UART_xDMACR_8_WORD_BURST vt: fix memory overlapping when deleting chars in the buffer serial: mvebu-uart: correctly report configured baudrate value serial: 8250: Fix PM usage_count for console handover serial: 8250: fix return error code in serial8250_request_std_resource() serial: stm32: Clear prev values before setting RTS delays tty: Add N_CAN327 line discipline ID for ELM327 based CAN driver serial: 8250: Fix __stop_tx() & DMA Tx restart races serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle tty: serial: samsung_tty: set dma burst_size to 1 serial: 8250: dw: enable using pdata with ACPI
2022-07-16iio: trigger: move trig->owner init to trigger allocate() stageDmitry Rokosov
To provide a new IIO trigger to the IIO core, usually driver executes the following pipeline: allocate()/register()/get(). Before, IIO core assigned trig->owner as a pointer to the module which registered this trigger at the register() stage. But actually the trigger object is owned by the module earlier, on the allocate() stage, when trigger object is successfully allocated for the driver. This patch moves trig->owner initialization from register() stage of trigger initialization pipeline to allocate() stage to eliminate all misunderstandings and time gaps between trigger object creation and owner acquiring. Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Link: https://lore.kernel.org/r/20220601174837.20292-1-ddrokosov@sberdevices.ru Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-07-16fs: remove no_llseekJason A. Donenfeld
Now that all callers of ->llseek are going through vfs_llseek(), we don't gain anything by keeping no_llseek around. Nothing actually calls it and setting ->llseek to no_lseek is completely equivalent to leaving it NULL. Longer term (== by the end of merge window) we want to remove all such intializations. To simplify the merge window this commit does *not* touch initializers - it only defines no_llseek as NULL (and simplifies the tests on file opening). At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-16serial: remove VR41XX serial driverThomas Bogendoerfer
Commit d3164e2f3b0a ("MIPS: Remove VR41xx support") removed support for MIPS VR41xx platform, so remove exclusive drivers for this platform, too. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lore.kernel.org/r/20220715140322.135825-1-tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-16Merge tag 'extcon-next-for-5.20' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next Chanwoo writes: Update extcon next for v5.20 Detailed description for this pull request: 1. Add new connector type of both EXTCON_DISP_CVBS and EXTCON_DISP_EDP - Add both EXTCON_DISP_CVBS for Composite Video Broadcast Signal[1] and EXTCON_DISP_EDP for Embedded Display Port[2]. [1] https://en.wikipedia.org/wiki/Composite_video [2] https://en.wikipedia.org/wiki/DisplayPort#eDP 2. Fix the minor issues of extcon provider driver - Drop unused remove function on extcon-fsa9480.c - Remove extraneous space before a debug message on extcon-palmas.c - Remove duplicate word in the comment - Drop useless mask_invert flag on irqchip on extcon-sm5502.c - Drop useless mask_invert flag on irqchip on extcon-rt8973a.c * tag 'extcon-next-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon: extcon: Add EXTCON_DISP_CVBS and EXTCON_DISP_EDP extcon: rt8973a: Drop useless mask_invert flag on irqchip extcon: sm5502: Drop useless mask_invert flag on irqchip extcon: Drop unexpected word "the" in the comments extcon: Remove extraneous space before a debug message extcon: fsa9480: Drop no-op remove function
2022-07-16Merge tag 'icc-5.20-rc1-v2' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next Georgi writes: interconnect changes for 5.20 Here are the interconnect changes for the 5.20-rc1 merge window consisting of two new drivers, misc driver improvements and new device managed API. Core change: - Add device managed bulk API Driver changes: - New driver for NXP i.MX8MP platforms - New driver for Qualcomm SM6350 platforms - Multiple bucket support for Qualcomm RPM-based drivers. Signed-off-by: Georgi Djakov <djakov@kernel.org> * tag 'icc-5.20-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc: PM / devfreq: imx: Register i.MX8MP interconnect device interconnect: imx: Add platform driver for imx8mp interconnect: imx: configure NoC mode/prioriry/ext_control interconnect: imx: introduce imx_icc_provider interconnect: imx: set src node interconnect: imx: fix max_node_id interconnect: qcom: icc-rpm: Set bandwidth and clock for bucket values interconnect: qcom: icc-rpm: Support multiple buckets interconnect: qcom: icc-rpm: Change to use qcom_icc_xlate_extended() interconnect: qcom: Move qcom_icc_xlate_extended() to a common file dt-bindings: interconnect: Update property for icc-rpm path tag interconnect: icc-rpm: Set destination bandwidth as well as source bandwidth interconnect: qcom: msm8939: Use icc_sync_state interconnect: add device managed bulk API dt-bindings: interconnect: add fsl,imx8mp.h dt-bindings: interconnect: imx8m: Add bindings for imx8mp noc interconnect: qcom: Add SM6350 driver support dt-bindings: interconnect: Add Qualcomm SM6350 NoC support dt-bindings: interconnect: qcom: Split out rpmh-common bindings interconnect: qcom: icc-rpmh: Support child NoC device probe
2022-07-15net: ipv4: new arp_accept option to accept garp only if in-networkJaehee Park
In many deployments, we want the option to not learn a neighbor from garp if the src ip is not in the same subnet as an address configured on the interface that received the garp message. net.ipv4.arp_accept sysctl is currently used to control creation of a neigh from a received garp packet. This patch adds a new option '2' to net.ipv4.arp_accept which extends option '1' by including the subnet check. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-15tcp/udp: Make early_demux back namespacified.Kuniyuki Iwashima
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns. We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob. This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time. Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-15tracing/events: Add __vstring() and __assign_vstr() helper macrosSteven Rostedt (Google)
There's several places that open code the following logic: TP_STRUCT__entry(__dynamic_array(char, msg, MSG_MAX)), TP_fast_assign(vsnprintf(__get_str(msg), MSG_MAX, vaf->fmt, *vaf->va);) To load a string created by variable array va_list. The main issue with this approach is that "MSG_MAX" usage in the __dynamic_array() portion. That actually just reserves the MSG_MAX in the event, and even wastes space because there's dynamic meta data also saved in the event to denote the offset and size of the dynamic array. It would have been better to just use a static __array() field. Instead, create __vstring() and __assign_vstr() that work like __string and __assign_str() but instead of taking a destination string to copy, take a format string and a va_list pointer and fill in the values. It uses the helper: #define __trace_event_vstr_len(fmt, va) \ ({ \ va_list __ap; \ int __ret; \ \ va_copy(__ap, *(va)); \ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ va_end(__ap); \ \ min(__ret, TRACE_EVENT_STR_MAX); \ }) To figure out the length to store the string. It may be slightly slower as it needs to run the vsnprintf() twice, but it now saves space on the ring buffer. Link: https://lkml.kernel.org/r/20220705224749.053570613@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Gregory Greenman <gregory.greenman@intel.com> Cc: Peter Chen <peter.chen@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: Bin Liu <b-liu@ti.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: Sven Eckelmann <sven@narfation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15acl: make posix_acl_clone() available to overlayfsChristian Brauner
The ovl_get_acl() function needs to alter the POSIX ACLs retrieved from the lower filesystem. Instead of hand-rolling a overlayfs specific posix_acl_clone() variant allow export it. It's not special and it's not deeply internal anyway. Link: https://lore.kernel.org/r/20220708090134.385160-3-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15acl: move idmapped mount fixup into vfs_{g,s}etxattr()Christian Brauner
This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it. I'm going to give a rather detailed explanation to both the origin of the problem and the solution. Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem. The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file: setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc Now they to expose the whole rootfs to a container using an idmapped mount. So they first create: mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work} The user now creates an idmapped mount for the rootfs: mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc. Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container. mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge The user can do this in two ways: (1) Mount overlayfs in the initial user namespace and expose it to the container. (2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace. Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts. Now the user tries to retrieve the POSIX ACLs using the getfacl command getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc and to their surprise they see: # file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r-- indicating the the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account. This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain: setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations. If the user chooses option (1) we get the following callchain for fscaps: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | -> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ And if the user chooses option (2) we get: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | |-> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper. For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping. The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user(). To this end we split out two small and unexported helpers posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The former to be called in vfs_getxattr() and the latter to be called in vfs_setxattr(). Let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { | | V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(&init_user_ns, 10000004); | } |_________________________________________________ | | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004); } And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(0(&init_user_ns, 10000004); | |_________________________________________________ | } | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004); } The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call: ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() /* on the underlying filesystem) ->inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission() passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. So we block using ACLs for now until we decided on a nice way to fix this. Note this limitation both in the documentation and in the code. The most straightforward solution would be to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission() forgetting them afterwards. This is a bit heavy handed but fairly straightforward otherwise. Link: https://github.com/brauner/mount-idmapped/issues/9 Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()Christian Brauner
Add two tiny helpers to conver a vfsuid into a kuid. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15Merge tag 'ovl-fixes-5.19-rc7' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into fs.idmapped.overlay.acl Bring in Miklos' tree which contains the temporary fix for POSIX ACLs with overlayfs on top of idmapped layers. We will add a proper fix on top of it and then revert the temporary fix. Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15security: Add LSM hook to setgroups() syscallMicah Morton
Give the LSM framework the ability to filter setgroups() syscalls. There are already analagous hooks for the set*uid() and set*gid() syscalls. The SafeSetID LSM will use this new hook to ensure setgroups() calls are allowed by the installed security policy. Tested by putting print statement in security_task_fix_setgroups() hook and confirming that it gets hit when userspace does a setgroups() syscall. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Micah Morton <mortonm@chromium.org>
2022-07-15neighbor: tracing: Have neigh_create event use __string()Steven Rostedt (Google)
The dev field of the neigh_create event uses __dynamic_array() with a fixed size, which defeats the purpose of __dynamic_array(). Looking at the logic, as it already uses __assign_str(), just use the same logic in __string to create the size needed. It appears that because "dev" can be NULL, it needs the check. But __string() can have the same checks as __assign_str() so use them there too. Link: https://lkml.kernel.org/r/20220705183741.35387e3f@rorschach.local.home Cc: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15tracing/ipv4/ipv6: Use static array for name field in fib*_lookup_table eventSteven Rostedt (Google)
The fib_lookup_table and fib6_lookup_table events declare name as a dynamic_array, but also give it a fixed size, which defeats the purpose of the dynamic array, especially since the dynamic array also includes meta data in the event to specify its size. Since the size of the name is at most 16 bytes (defined by IFNAMSIZ), it is not worth spending the effort to determine the size of the string. Just use a fixed size array and copy into it. This will save 4 bytes that are used for the meta data that saves the size and position of a dynamic array, and even slightly speed up the event processing. Link: https://lkml.kernel.org/r/20220704091436.3705edbf@rorschach.local.home Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "RISC-V: - Fix missing PAGE_PFN_MASK - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests() x86: - Fix for nested virtualization when TSC scaling is active - Estimate the size of fastcc subroutines conservatively, avoiding disastrous underestimation when return thunks are enabled - Avoid possible use of uninitialized fields of 'struct kvm_lapic_irq' Generic: - Mark as such the boolean values available from the statistics file descriptors - Clarify statistics documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: emulate: do not adjust size of fastop and setcc subroutines KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() Documentation: kvm: clarify histogram units kvm: stats: tell userspace which values are boolean x86/kvm: fix FASTOP_SIZE when return thunks are enabled KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests() riscv: Fix missing PAGE_PFN_MASK
2022-07-15Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A folio locking fixup that Xiubo and David cooperated on, marked for stable. Most of it is in netfs but I picked it up into ceph tree on agreement with David" * tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client: netfs: do not unlock and put the folio twice
2022-07-15firmware: arm_scmi: Get detailed power scale from perfLukasz Luba
In SCMI v3.1 the power scale can be in micro-Watts. The upper layers, e.g. cpufreq and EM should handle received power values properly (upscale when needed). Thus, provide an interface which allows to check what is the scale for power values. The old interface allowed to distinguish between bogo-Watts and milli-Watts only (which was good for older SCMI spec). Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15PM: EM: convert power field to micro-Watts precision and align driversLukasz Luba
The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15Merge tag 'soc-fixes-5.19-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Most of the contents are bugfixes for the devicetree files: - A Qualcomm MSM8974 pin controller regression, caused by a cleanup patch that gets partially reverted here. - Missing properties for Broadcom BCM49xx to fix timer detection and SMP boot. - Fix touchscreen pinctrl for imx6ull-colibri board - Multiple fixes for Rockchip rk3399 based machines including the vdu clock-rate fix, otg port fix on Quartz64-A and ethernet on Quartz64-B - Fixes for misspelled DT contents causing minor problems on imx6qdl-ts7970m, orangepi-zero, sama5d2, kontron-kswitch-d10, and ls1028a And a couple of changes elsewhere: - Fix binding for Allwinner D1 display pipeline - Trivial code fixes to the TEE and reset controller driver subsystems and the rockchip platform code. - Multiple updates to the MAINTAINERS files, marking the Palm Treo support as orphaned, and fixing some entries for added or changed file names" * tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits) arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero ARM: dts: at91: sama5d2: Fix typo in i2s1 node tee: tee_get_drvdata(): fix description of return value optee: Remove duplicate 'of' in two places. ARM: dts: kswitch-d10: use open drain mode for coma-mode pins ARM: dts: colibri-imx6ull: fix snvs pinmux group optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR() MAINTAINERS: add polarfire rng, pci and clock drivers MAINTAINERS: mark ARM/PALM TREO SUPPORT orphan ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count arm64: dts: ls1028a: Update SFP node to include clock dt-bindings: display: sun4i: Fix D1 pipeline count ARM: dts: qcom: msm8974: re-add missing pinctrl reset: Fix devm bulk optional exclusive control getter MAINTAINERS: rectify entry for SYNOPSYS AXS10x RESET CONTROLLER DRIVER ARM: rockchip: Add missing of_node_put() in rockchip_suspend_init() arm64: dts: rockchip: Assign RK3399 VDU clock rate arm64: dts: rockchip: Fix Quartz64-A dwc3 otg port behavior ...
2022-07-15media: uapi: move HEVC stateless controls out of stagingBenjamin Gaignard
HEVC uAPI is used by 2 mainline drivers (Hantro, Cedrus) and at least 2 out-of-tree drivers (rkvdec, RPi). The uAPI has been reviewed so it is time to make it 'public' by un-staging it. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: uapi: Change data_bit_offset definitionBenjamin Gaignard
'F.7.3.6.1 General slice segment header syntax' section of HEVC specification describes that a slice header always end aligned on byte boundary, therefore we only need to provide the data offset in bytes. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: uapi: HEVC: fix padding in v4l2 control structuresBenjamin Gaignard
Fix padding where needed to remove holes Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: hantro: Stop using Hantro dedicated controlBenjamin Gaignard
The number of bits to skip in the slice header can be computed in the driver by using sps, pps and decode_params information. This makes it possible to remove Hantro dedicated control. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: uapi: Move the HEVC stateless control type out of stagingBenjamin Gaignard
Move the HEVC stateless controls types out of staging, and re-number them. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: uapi: Move parsed HEVC pixel format out of stagingBenjamin Gaignard
Move HEVC pixel format since we are ready to stabilize the uAPI Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15media: uapi: Add V4L2_CID_STATELESS_HEVC_ENTRY_POINT_OFFSETS controlBenjamin Gaignard
The number of 'entry point offset' can be very variable. Instead of using a large static array define a v4l2 dynamic array of U32 (V4L2_CTRL_TYPE_U32). The number of entry point offsets is reported by the elems field and in struct v4l2_ctrl_hevc_slice_params.num_entry_point_offsets field. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Acked-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Jernej Skrabec <jernej.skrabec@gmail.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15kexec, KEYS: make the code in bzImage64_verify_sig genericCoiby Xu
commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify") adds platform keyring support on x86 kexec but not arm64. The code in bzImage64_verify_sig uses the keys on the .builtin_trusted_keys, .machine, if configured and enabled, .secondary_trusted_keys, also if configured, and .platform keyrings to verify the signed kernel image as PE file. Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Reviewed-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec: clean up arch_kexec_kernel_verify_sigCoiby Xu
Before commit 105e10e2cf1c ("kexec_file: drop weak attribute from functions"), there was already no arch-specific implementation of arch_kexec_kernel_verify_sig. With weak attribute dropped by that commit, arch_kexec_kernel_verify_sig is completely useless. So clean it up. Note later patches are dependent on this patch so it should be backported to the stable tree as well. Cc: stable@vger.kernel.org Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Michal Suchanek <msuchanek@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> [zohar@linux.ibm.com: reworded patch description "Note"] Link: https://lore.kernel.org/linux-integrity/20220714134027.394370-1-coxu@redhat.com/ Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec: drop weak attribute from functionsNaveen N. Rao
Drop __weak attribute from functions in kexec_core.c: - machine_kexec_post_load() - arch_kexec_protect_crashkres() - arch_kexec_unprotect_crashkres() - crash_free_reserved_phys_range() Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec_file: drop weak attribute from functionsNaveen N. Rao
As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15drivers/base: fix userspace break from using bin_attributes for cpumap and ↵Phil Auld
cpulist Using bin_attributes with a 0 size causes fstat and friends to return that 0 size. This breaks userspace code that retrieves the size before reading the file. Rather than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") let's put in a size value at compile time. For cpulist the maximum size is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1)/2 which for 8192 is 20480 (8192 * 5)/2. In order to get near that you'd need a system with every other CPU on one node. For example: (0,2,4,8, ... ). To simplify the math and support larger NR_CPUS in the future we are using (NR_CPUS * 7)/2. We also set it to a min of PAGE_SIZE to retain the older behavior for smaller NR_CPUS. The cpumap file the size works out to be NR_CPUS/4 + NR_CPUS/32 - 1 (or NR_CPUS * 9/32 - 1) including the ","s. Add a set of macros for these values to cpumask.h so they can be used in multiple places. Apply these to the handful of such files in drivers/base/topology.c as well as node.c. As an example, on an 80 cpu 4-node system (NR_CPUS == 8192): before: -r--r--r--. 1 root root 0 Jul 12 14:08 system/node/node0/cpulist -r--r--r--. 1 root root 0 Jul 11 17:25 system/node/node0/cpumap after: -r--r--r--. 1 root root 28672 Jul 13 11:32 system/node/node0/cpulist -r--r--r--. 1 root root 4096 Jul 13 11:31 system/node/node0/cpumap CONFIG_NR_CPUS = 16384 -r--r--r--. 1 root root 57344 Jul 13 14:03 system/node/node0/cpulist -r--r--r--. 1 root root 4607 Jul 13 14:02 system/node/node0/cpumap The actual number of cpus doesn't matter for the reported size since they are based on NR_CPUS. Fixes: 75bd50fa841d ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") Fixes: bb9ec13d156e ("topology: use bin_attribute to break the size limitation of cpumap ABI") Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Yury Norov <yury.norov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Yury Norov <yury.norov@gmail.com> (for include/linux/cpumask.h) Signed-off-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220715134924.3466194-1-pauld@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-15firmware: stratix10-svc: fix kernel-doc warningDinh Nguyen
include/linux/firmware/intel/stratix10-svc-client.h:55: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Flag bit for COMMAND_RECONFIG Fixes: 4a4709d470e6 ("firmware: stratix10-svc: add new FCS commands") Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20220715150349.2413994-1-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-15media: uapi: HEVC: Define V4L2_CID_STATELESS_HEVC_SLICE_PARAMS as a dynamic ↵Benjamin Gaignard
array Make explicit that V4L2_CID_STATELESS_HEVC_SLICE_PARAMS control is a dynamic array control type. Some drivers may be able to receive multiple slices in one control to improve decoding performance. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>