summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2022-11-08Merge git://linuxtv.org/sailus/media_tree into media_stageMauro Carvalho Chehab
* git://linuxtv.org/sailus/media_tree: (47 commits) media: i2c: ov4689: code cleanup media: ov9650: Drop platform data code path media: ov7670: Drop unused include media: ov2640: Drop legacy includes media: tc358746: add Toshiba TC358746 Parallel to CSI-2 bridge driver media: dt-bindings: add bindings for Toshiba TC358746 phy: dphy: add support to calculate the timing based on hs_clk_rate phy: dphy: refactor get_default_config v4l: subdev: Warn if disabling streaming failed, return success dw9768: Enable low-power probe on ACPI media: i2c: imx290: Replace GAIN control with ANALOGUE_GAIN media: i2c: imx290: Add crop selection targets support media: i2c: imx290: Factor out format retrieval to separate function media: i2c: imx290: Move registers with fixed value to init array media: i2c: imx290: Create controls for fwnode properties media: i2c: imx290: Implement HBLANK and VBLANK controls media: i2c: imx290: Split control initialization to separate function media: i2c: imx290: Fix max gain value media: i2c: imx290: Add exposure time control media: i2c: imx290: Define more register macros ...
2022-11-07mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTSBaoquan He
Since commit 40064aeca35c ("percpu: replace area map allocator with bitmap"), there's no place to use PERCPU_DYNAMIC_EARLY_SLOTS. So clean it up. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>
2022-11-07mtd: nand: drop EXPORT_SYMBOL_GPL for nanddev_erase()Dario Binacchi
This function is only used within this module, so it is no longer necessary to use EXPORT_SYMBOL_GPL(). Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20221018170205.1733958-1-dario.binacchi@amarulasolutions.com
2022-11-07arm_pmu: rework ACPI probingMark Rutland
The current ACPI PMU probing logic tries to associate PMUs with CPUs when the CPU is first brought online, in order to handle late hotplug, though PMUs are only registered during early boot, and so for late hotplugged CPUs this can only associate the CPU with an existing PMU. We tried to be clever and the have the arm_pmu_acpi_cpu_starting() callback allocate a struct arm_pmu when no matching instance is found, in order to avoid duplication of logic. However, as above this doesn't do anything useful for late hotplugged CPUs, and this requires us to allocate memory in an atomic context, which is especially problematic for PREEMPT_RT, as reported by Valentin and Pierre. This patch reworks the probing to detect PMUs for all online CPUs in the arm_pmu_acpi_probe() function, which is more aligned with how DT probing works. The arm_pmu_acpi_cpu_starting() callback only tries to associate CPUs with an existing arm_pmu instance, avoiding the problem of allocating in atomic context. Note that as we didn't previously register PMUs for late-hotplugged CPUs, this change doesn't result in a loss of existing functionality, though we will now warn when we cannot associate a CPU with a PMU. This change allows us to pull the hotplug callback registration into the arm_pmu_acpi_probe() function, as we no longer need the callbacks to be invoked shortly after probing the boot CPUs, and can register it without invoking the calls. For the moment the arm_pmu_acpi_init() initcall remains to register the SPE PMU, though in future this should probably be moved elsewhere (e.g. the arm64 ACPI init code), since this doesn't need to be tied to the regular CPU PMU code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20210810134127.1394269-2-valentin.schneider@arm.com/ Reported-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/ Cc: Pierre Gondois <pierre.gondois@arm.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Will Deacon <will@kernel.org> Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20220930111844.1522365-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-07ACPI: ARM Performance Monitoring Unit Table (APMT) initial supportBesar Wicaksono
ARM Performance Monitoring Unit Table describes the properties of PMU support in ARM-based system. The APMT table contains a list of nodes, each represents a PMU in the system that conforms to ARM CoreSight PMU architecture. The properties of each node include information required to access the PMU (e.g. MMIO base address, interrupt number) and also identification. For more detailed information, please refer to the specification below: * APMT: https://developer.arm.com/documentation/den0117/latest * ARM Coresight PMU: https://developer.arm.com/documentation/ihi0091/latest The initial support adds the detection of APMT table and generic infrastructure to create platform devices for ARM CoreSight PMUs. Similar to IORT the root pointer of APMT is preserved during runtime and each PMU platform device is given a pointer to the corresponding APMT node. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20220929002834.32664-1-bwicaksono@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-07can: dev: fix skb drop checkOliver Hartkopp
In commit a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") the priv->ctrlmode element is read even on virtual CAN interfaces that do not create the struct can_priv at startup. This out-of-bounds read may lead to CAN frame drops for virtual CAN interfaces like vcan and vxcan. This patch mainly reverts the original commit and adds a new helper for CAN interface drivers that provide the required information in struct can_priv. Fixes: a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") Reported-by: Dariusz Stojaczyk <Dariusz.Stojaczyk@opensynergy.com> Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: Max Staudt <max@enpas.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20221102095431.36831-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org # 6.0.x [mkl: patch pch_can, too] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-11-04lsm: make security_socket_getpeersec_stream() sockptr_t safePaul Moore
Commit 4ff09db1b79b ("bpf: net: Change sk_getsockopt() to take the sockptr_t argument") made it possible to call sk_getsockopt() with both user and kernel address space buffers through the use of the sockptr_t type. Unfortunately at the time of conversion the security_socket_getpeersec_stream() LSM hook was written to only accept userspace buffers, and in a desire to avoid having to change the LSM hook the commit author simply passed the sockptr_t's userspace buffer pointer. Since the only sk_getsockopt() callers at the time of conversion which used kernel sockptr_t buffers did not allow SO_PEERSEC, and hence the security_socket_getpeersec_stream() hook, this was acceptable but also very fragile as future changes presented the possibility of silently passing kernel space pointers to the LSM hook. There are several ways to protect against this, including careful code review of future commits, but since relying on code review to catch bugs is a recipe for disaster and the upstream eBPF maintainer is "strongly against defensive programming", this patch updates the LSM hook, and all of the implementations to support sockptr_t and safely handle both user and kernel space buffers. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-04bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)Peter Zijlstra
The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/Im@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/bpf/20221103120647.796772565@infradead.org
2022-11-04bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")Peter Zijlstra
Because __attribute__((patchable_function_entry)) is only available since GCC-8 this solution fails to build on the minimum required GCC version. Undo these changes so we might try again -- without cluttering up the patches with too many changes. This is an almost complete revert of: dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") (notably the arch/x86/Kconfig hunk is kept). Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/439d8dc735bb4858875377df67f1b29a@AcuMS.aculab.com Link: https://lore.kernel.org/bpf/20221103120647.728830733@infradead.org
2022-11-04Merge tag 'hardening-v6.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fix from Kees Cook: - Correctly report struct member size on memcpy overflow (Kees Cook) * tag 'hardening-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fortify: Capture __bos() results in const temp vars
2022-11-04Merge tag 'efi-fixes-for-v6.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - A pair of tweaks to the EFI random seed code so that externally provided version of this config table are handled more robustly - Another fix for the v6.0 EFI variable refactor that turned out to break Apple machines which don't provide QueryVariableInfo() - Add some guard rails to the EFI runtime service call wrapper so we can recover from synchronous exceptions caused by firmware * tag 'efi-fixes-for-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Recover from synchronous exceptions occurring in firmware efi: efivars: Fix variable writes with unsupported query_variable_store() efi: random: Use 'ACPI reclaim' memory for random seed efi: random: reduce seed size to 32 bytes efi/tpm: Pass correct address to memblock_reserve
2022-11-04mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()Vlastimil Babka
For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call. However since commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled). We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil. It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway. [1] https://lore.kernel.org/linux-mm/20221101222520.never.109-kees@kernel.org/T/#m20ecf14390e406247bde0ea9cce368f469c539ed Link: https://lore.kernel.org/all/097d8fba-bd10-a312-24a3-a4068c4f424c@suse.cz/ Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-11-04spi: Merge spi_controller.{slave,target}_abort()Geert Uytterhoeven
Mixing SPI slave/target handlers and SPI slave/target controllers using legacy and modern naming does not work well: there are now two different callbacks for aborting a slave/target operation, of which only one is populated, while spi_{slave,target}_abort() check and use only one, which may be the unpopulated one. Fix this by merging the slave/target abort callbacks into a single callback using a union, like is already done for the slave/target flags. Fixes: b8d3b056a78dcc94 ("spi: introduce new helpers with using modern naming") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/809c82d54b85dd87ef7ee69fc93016085be85cec.1667555967.git.geert+renesas@glider.be Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-04Merge tag 'drm-intel-gt-next-2022-11-03' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Fix for #7306: [Arc A380] white flickering when using arc as a secondary gpu (Matt A) - Add Wa_18017747507 for DG2 (Wayne) - Avoid spurious WARN on DG1 due to incorrect cache_dirty flag (Niranjana, Matt A) - Corrections to CS timestamp support for Gen5 and earlier (Ville) - Fix a build error used with clang compiler on hwmon (GG) - Improvements to LMEM handling with RPM (Anshuman, Matt A) - Cleanups in dmabuf code (Mike) - Selftest improvements (Matt A) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Y2N11wu175p6qeEN@jlahtine-mobl.ger.corp.intel.com
2022-11-03Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-11-04 We've added 8 non-merge commits during the last 3 day(s) which contain a total of 10 files changed, 113 insertions(+), 16 deletions(-). The main changes are: 1) Fix memory leak upon allocation failure in BPF verifier's stack state tracking, from Kees Cook. 2) Fix address leakage when BPF progs release reference to an object, from Youlin Li. 3) Fix BPF CI breakage from buggy in.h uapi header dependency, from Andrii Nakryiko. 4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui. 5) Fix BPF sockmap lockdep warning by cancelling psock work outside of socket lock, from Cong Wang. 6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting, from Wang Yufen. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add verifier test for release_reference() bpf: Fix wrong reg type conversion in release_reference() bpf, sock_map: Move cancel_work_sync() out of sock lock tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI net/ipv4: Fix linux/in.h header dependencies bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues bpf, verifier: Fix memory leak in array reallocation for stack state ==================== Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03regulator: userspace-consumer: Handle regulator-output DT nodesZev Weiss
In addition to adding some fairly simple OF support code, we make some slight adjustments to the userspace-consumer driver to properly support use with regulator-output hardware: - We now do an exclusive get of the supply regulators so as to prevent regulator_init_complete_work from automatically disabling them. - Instead of assuming that the supply is initially disabled, we now query its state to determine the initial value of drvdata->enabled. Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Link: https://lore.kernel.org/r/20221031233704.22575-4-zev@bewilderbeest.net Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-03regulator: devres: Add devm_regulator_bulk_get_exclusive()Zev Weiss
We had an exclusive variant of the devm_regulator_get() API, but no corresponding variant for the bulk API; let's add one now. We add a generalized version of the existing regulator_bulk_get() function that additionally takes a get_type parameter and redefine regulator_bulk_get() in terms of it, then do similarly with devm_regulator_bulk_get(), and finally add the new devm_regulator_bulk_get_exclusive(). Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Link: https://lore.kernel.org/r/20221031233704.22575-2-zev@bewilderbeest.net Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-03bpf, sock_map: Move cancel_work_sync() out of sock lockCong Wang
Stanislav reported a lockdep warning, which is caused by the cancel_work_sync() called inside sock_map_close(), as analyzed below by Jakub: psock->work.func = sk_psock_backlog() ACQUIRE psock->work_mutex sk_psock_handle_skb() skb_send_sock() __skb_send_sock() sendpage_unlocked() kernel_sendpage() sock->ops->sendpage = inet_sendpage() sk->sk_prot->sendpage = tcp_sendpage() ACQUIRE sk->sk_lock tcp_sendpage_locked() RELEASE sk->sk_lock RELEASE psock->work_mutex sock_map_close() ACQUIRE sk->sk_lock sk_psock_stop() sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED) cancel_work_sync() __cancel_work_timer() __flush_work() // wait for psock->work to finish RELEASE sk->sk_lock We can move the cancel_work_sync() out of the sock lock protection, but still before saved_close() was called. Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
2022-11-02blk-mq: add tagset quiesce interfaceChao Leng
Drivers that have shared tagsets may need to quiesce potentially a lot of request queues that all share a single tagset (e.g. nvme). Add an interface to quiesce all the queues on a given tagset. This interface is useful because it can speedup the quiesce by doing it in parallel. Because some queues should not need to be quiesced (e.g. the nvme connect_q) when quiescing the tagset, introduce a QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag to allow this new interface to ski quiescing a particular queue. Signed-off-by: Chao Leng <lengchao@huawei.com> [hch: simplify for the per-tag_set srcu_struct] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02blk-mq: pass a tagset to blk_mq_wait_quiesce_doneChristoph Hellwig
Nothing in blk_mq_wait_quiesce_done needs the request_queue now, so just pass the tagset, and move the non-mq check into the only caller that needs it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02blk-mq: move the srcu_struct used for quiescing to the tagsetChristoph Hellwig
All I/O submissions have fairly similar latencies, and a tagset-wide quiesce is a fairly common operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-12-hch@lst.de [axboe: fix whitespace] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02memory: omap-gpmc: wait pin additionsBenedikt Niedermayr
This patch introduces support for setting the wait-pin polarity as well as using the same wait-pin for different CS regions. The waitpin polarity can be configured via the WAITPIN<X>POLARITY bits in the GPMC_CONFIG register. This is currently not supported by the driver. This patch adds support for setting the required register bits with the "ti,wait-pin-polarity" dt-property. The wait-pin can also be shared between different CS regions for special usecases. Therefore GPMC must keep track of wait-pin allocations, so it knows that either GPMC itself or another driver has the ownership. Signed-off-by: Benedikt Niedermayr <benedikt.niedermayr@siemens.com> Link: https://lore.kernel.org/r/20221102133047.1654449-2-benedikt.niedermayr@siemens.com Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2022-11-02device property: Introduce fwnode_device_is_compatible() helperAndy Shevchenko
The fwnode_device_is_compatible() helper searches for the given string in the "compatible" string array property and, if found, returns true. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
2022-11-02cpufreq: Generalize of_perf_domain_get_sharing_cpumask phandle formatHector Martin
of_perf_domain_get_sharing_cpumask currently assumes a 1-argument phandle format, and directly returns the argument. Generalize this to return the full of_phandle_args, so it can be used by drivers which use other phandle styles (e.g. separate nodes). This also requires changing the CPU sharing match to compare the full args structure. Also, make sure to of_node_put(args.np) (the original code was leaking a reference). Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86: - fix lock initialization race in gfn-to-pfn cache (+selftests) - fix two refcounting errors - emulator fixes - mask off reserved bits in CPUID - fix bug with disabling SGX RISC-V: - update MAINTAINERS" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign() KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format KVM: x86: emulator: update the emulation mode after CR0 write KVM: x86: emulator: update the emulation mode after rsm KVM: x86: emulator: introduce emulator_recalc_and_set_mode KVM: x86: emulator: em_sysexit should update ctxt->mode KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test KVM: selftests: Add tests in xen_shinfo_test to detect lock races KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache KVM: Initialize gfn_to_pfn_cache locks in dedicated helper KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable KVM: x86: Exempt pending triple fault from event injection sanity check MAINTAINERS: git://github -> https://github.com for kvm-riscv KVM: debugfs: Return retval of simple_attr_open() if it fails KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open() KVM: x86: Mask off reserved bits in CPUID.8000001FH KVM: x86: Mask off reserved bits in CPUID.8000001AH KVM: x86: Mask off reserved bits in CPUID.80000008H KVM: x86: Mask off reserved bits in CPUID.80000006H KVM: x86: Mask off reserved bits in CPUID.80000001H
2022-11-01spi: introduce new helpers with using modern namingYang Yingliang
For using modern names host/target to instead of all the legacy names, I think it takes 3 steps: - step1: introduce new helpers with modern naming. - step2: switch to use these new helpers in all drivers. - step3: remove all legacy helpers and update all legacy names. This patch is for step1, it introduces new helpers with host/target naming for drivers using. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221011092204.950288-1-yangyingliang@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-10-31cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REFTejun Heo
6ab428604f72 ("cgroup: Implement DEBUG_CGROUP_REF") added a config option which forces cgroup refcnt functions to be not inlined so that they can be kprobed for debugging. However, it forgot export them when the config is enabled breaking modules which make use of css reference counting. Fix it by adding CGROUP_REF_EXPORT() macro to cgroup_refcnt.h which is defined to EXPORT_SYMBOL_GPL when CONFIG_DEBUG_CGROUP_REF is set. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 6ab428604f72 ("cgroup: Implement DEBUG_CGROUP_REF")
2022-10-31fs: introduce dedicated idmap type for mountsChristian Brauner
Last cycle we've already made the interaction with idmapped mounts more robust and type safe by introducing the vfs{g,u}id_t type. This cycle we concluded the conversion and removed the legacy helpers. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate filesystem and mount namespaces and what different roles they have to play. Especially for filesystem developers without much experience in this area this is an easy source for bugs. Instead of passing the plain namespace we introduce a dedicated type struct mnt_idmap and replace the pointer with a pointer to a struct mnt_idmap. There are no semantic or size changes for the mount struct caused by this. We then start converting all places aware of idmapped mounts to rely on struct mnt_idmap. Once the conversion is done all helpers down to the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two, removing and thus eliminating the possibility of any bugs. Fwiw, I fixed some issues in that area a while ago in ntfs3 and ksmbd in the past. Afterwards, only low-level code can ultimately use the associated namespace for any permission checks. Even most of the vfs can be ultimately completely oblivious about this and filesystems will never interact with it directly in any form in the future. A struct mnt_idmap currently encompasses a simple refcount and a pointer to the relevant namespace the mount is idmapped to. If a mount isn't idmapped then it will point to a static nop_mnt_idmap. If it is an idmapped mount it will point to a new struct mnt_idmap. As usual there are no allocations or anything happening for non-idmapped mounts. Everthing is carefully written to be a nop for non-idmapped mounts as has always been the case. If an idmapped mount or mount tree is created a new struct mnt_idmap is allocated and a reference taken on the relevant namespace. For each mount in a mount tree that gets idmapped or a mount that inherits the idmap when it is cloned the reference count on the associated struct mnt_idmap is bumped. Just a reminder that we only allow a mount to change it's idmapping a single time and only if it hasn't already been attached to the filesystems and has no active writers. The actual changes are fairly straightforward. This will have huge benefits for maintenance and security in the long run even if it causes some churn. I'm aware that there's some cost for all of you. And I'll commit to doing this work and make this as painless as I can. Note that this also makes it possible to extend struct mount_idmap in the future. For example, it would be possible to place the namespace pointer in an anonymous union together with an idmapping struct. This would allow us to expose an api to userspace that would let it specify idmappings directly instead of having to go through the detour of setting up namespaces at all. This just adds the infrastructure and doesn't do any conversions. Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-31block: simplify blksize_bits() implementationDawei Li
Convert current looping-based implementation into bit operation, which can bring improvement for: 1) bitops is more efficient for its arch-level optimization. 2) Given that blksize_bits() is inline, _if_ @size is compile-time constant, it's possible that order_base_2() _may_ make output compile-time evaluated, depending on code context and compiler behavior. Signed-off-by: Dawei Li <set_pte_at@outlook.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/TYCP286MB23238842958D7C083D6B67CECA349@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-30Merge tag 'fbdev-for-6.1-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes from Helge Deller: "A use-after-free bugfix in the smscufx driver and various minor error path fixes, smaller build fixes, sysfs fixes and typos in comments in the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb drivers" * tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: cyber2000fb: fix missing pci_disable_device() fbdev: sisfb: use explicitly signed char fbdev: smscufx: Fix several use-after-free bugs fbdev: xilinxfb: Make xilinxfb_release() return void fbdev: sisfb: fix repeated word in comment fbdev: gbefb: Convert sysfs snprintf to sysfs_emit fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards fbdev: da8xx-fb: Fix error handling in .remove() fbdev: MIPS supports iomem addresses
2022-10-30Merge tag 'char-misc-6.1-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Some small driver fixes for 6.1-rc3. They include: - iio driver bugfixes - counter driver bugfixes - coresight bugfixes, including a revert and then a second fix to get it right. All of these have been in linux-next with no reported problems" * tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) misc: sgi-gru: use explicitly signed char coresight: cti: Fix hang in cti_disable_hw() Revert "coresight: cti: Fix hang in cti_disable_hw()" counter: 104-quad-8: Fix race getting function mode and direction counter: microchip-tcb-capture: Handle Signal1 read and Synapse coresight: cti: Fix hang in cti_disable_hw() coresight: Fix possible deadlock with lock dependency counter: ti-ecap-capture: fix IS_ERR() vs NULL check counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array iio: bmc150-accel-core: Fix unsafe buffer attributes iio: adxl367: Fix unsafe buffer attributes iio: adxl372: Fix unsafe buffer attributes iio: at91-sama5d2_adc: Fix unsafe buffer attributes iio: temperature: ltc2983: allocate iio channels once tools: iio: iio_utils: fix digit calculation iio: adc: stm32-adc: fix channel sampling time init iio: adc: mcp3911: mask out device ID in debug prints iio: adc: mcp3911: use correct id bits iio: adc: mcp3911: return proper error code on failure to allocate trigger iio: adc: mcp3911: fix sizeof() vs ARRAY_SIZE() bug ...
2022-10-30sched/psi: Use task->psi_flags to clear in CPU migrationChengming Zhou
The commit d583d360a620 ("psi: Fix psi state corruption when schedule() races with cgroup move") fixed a race problem by making cgroup_move_task() use task->psi_flags instead of looking at the scheduler state. We can extend task->psi_flags usage to CPU migration, which should be a minor optimization for performance and code simplicity. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220926081931.45420-1-zhouchengming@bytedance.com
2022-10-30sched/psi: Stop relying on timer_pending() for poll_work reschedulingSuren Baghdasaryan
Psi polling mechanism is trying to minimize the number of wakeups to run psi_poll_work and is currently relying on timer_pending() to detect when this work is already scheduled. This provides a window of opportunity for psi_group_change to schedule an immediate psi_poll_work after poll_timer_fn got called but before psi_poll_work could reschedule itself. Below is the depiction of this entire window: poll_timer_fn wake_up_interruptible(&group->poll_wait); psi_poll_worker wait_event_interruptible(group->poll_wait, ...) psi_poll_work psi_schedule_poll_work if (timer_pending(&group->poll_timer)) return; ... mod_timer(&group->poll_timer, jiffies + delay); Prior to 461daba06bdc we used to rely on poll_scheduled atomic which was reset and set back inside psi_poll_work and therefore this race window was much smaller. The larger window causes increased number of wakeups and our partners report visible power regression of ~10mA after applying 461daba06bdc. Bring back the poll_scheduled atomic and make this race window even narrower by resetting poll_scheduled only when we reach polling expiration time. This does not completely eliminate the possibility of extra wakeups caused by a race with psi_group_change however it will limit it to the worst case scenario of one extra wakeup per every tracking window (0.5s in the worst case). This patch also ensures correct ordering between clearing poll_scheduled flag and obtaining changed_states using memory barrier. Correct ordering between updating changed_states and setting poll_scheduled is ensured by atomic_xchg operation. By tracing the number of immediate rescheduling attempts performed by psi_group_change and the number of these attempts being blocked due to psi monitor being already active, we can assess the effects of this change: Before the patch: Run#1 Run#2 Run#3 Immediate reschedules attempted: 684365 1385156 1261240 Immediate reschedules blocked: 682846 1381654 1258682 Immediate reschedules (delta): 1519 3502 2558 Immediate reschedules (% of attempted): 0.22% 0.25% 0.20% After the patch: Run#1 Run#2 Run#3 Immediate reschedules attempted: 882244 770298 426218 Immediate reschedules blocked: 881996 769796 426074 Immediate reschedules (delta): 248 502 144 Immediate reschedules (% of attempted): 0.03% 0.07% 0.03% The number of non-blocked immediate reschedules dropped from 0.22-0.25% to 0.03-0.07%. The drop is attributed to the decrease in the race window size and the fact that we allow this race only when psi monitors reach polling window expiration time. Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism") Reported-by: Kathleen Chang <yt.chang@mediatek.com> Reported-by: Wenju Xu <wenju.xu@mediatek.com> Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: SH Chen <show-hong.chen@mediatek.com> Link: https://lore.kernel.org/r/20221028194541.813985-1-surenb@google.com
2022-10-30sched/psi: Fix avgs_work re-arm in psi_avgs_work()Chengming Zhou
Pavan reported a problem that PSI avgs_work idle shutoff is not working at all. Because PSI_NONIDLE condition would be observed in psi_avgs_work()->collect_percpu_times()->get_recent_times() even if only the kworker running avgs_work on the CPU. Although commit 1b69ac6b40eb ("psi: fix aggregation idle shut-off") avoided the ping-pong wake problem when the worker sleep, psi_avgs_work() still will always re-arm the avgs_work, so shutoff is not working. This patch changes to use PSI_STATE_RESCHEDULE to flag whether to re-arm avgs_work in get_recent_times(). For the current CPU, we re-arm avgs_work only when (NR_RUNNING > 1 || NR_IOWAIT > 0 || NR_MEMSTALL > 0), for other CPUs we can just check PSI_NONIDLE delta. The new flag is only used in psi_avgs_work(), so we check in get_recent_times() that current_work() is avgs_work. One potential problem is that the brief period of non-idle time incurred between the aggregation run and the kworker's dequeue will be stranded in the per-cpu buckets until avgs_work run next time. The buckets can hold 4s worth of time, and future activity will wake the avgs_work with a 2s delay, giving us 2s worth of data we can leave behind when shut off the avgs_work. If the kworker run other works after avgs_work shut off and doesn't have any scheduler activities for 2s, this maybe a problem. Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20221014110551.22695-1-zhouchengming@bytedance.com
2022-10-29Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - make the multipath dma alignment match the non-multipath one (Keith Busch) - fix a bogus use of sg_init_marker() (Nam Cao) - fix circulr locking in nvme-tcp (Sagi Grimberg) - Initialization fix for requests allocated via the special hw queue allocator (John) - Fix for a regression added in this release with the batched completions of end_io backed requests (Ming) - Error handling leak fix for rbd (Yang) - Error handling leak fix for add_disk() failure (Yu) * tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux: blk-mq: Properly init requests from blk_mq_alloc_request_hctx() blk-mq: don't add non-pt request with ->end_io to batch rbd: fix possible memory leak in rbd_sysfs_init() nvme-multipath: set queue dma alignment to 3 nvme-tcp: fix possible circular locking when deleting a controller under memory pressure nvme-tcp: replace sg_init_marker() with sg_init_table() block: fix memory leak for elevator on add_disk failure
2022-10-29Merge tag 'mm-hotfixes-stable-2022-10-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Eight fix pre-6.0 bugs and the remainder address issues which were introduced in the 6.1-rc merge cycle, or address issues which aren't considered sufficiently serious to warrant a -stable backport" * tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region lib: maple_tree: remove unneeded initialization in mtree_range_walk() mmap: fix remap_file_pages() regression mm/shmem: ensure proper fallback if page faults mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page() x86: fortify: kmsan: fix KMSAN fortify builds x86: asm: make sure __put_user_size() evaluates pointer once Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default x86/purgatory: disable KMSAN instrumentation mm: kmsan: export kmsan_copy_page_meta() mm: migrate: fix return value if all subpages of THPs are migrated successfully mm/uffd: fix vma check on userfault for wp mm: prep_compound_tail() clear page->private mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs mm/page_isolation: fix clang deadcode warning fs/ext4/super.c: remove unused `deprecated_msg' ipc/msg.c: fix percpu_counter use after free memory tier, sysfs: rename attribute "nodes" to "nodelist" MAINTAINERS: git://github.com -> https://github.com for nilfs2 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops ...
2022-10-28fortify: Capture __bos() results in const temp varsKees Cook
In two recent run-time memcpy() bound checking bug reports (NFS[1] and JFS[2]), the _detection_ was working correctly (in the sense that the requested copy size was larger than the destination field size), but the _warning text_ was showing the destination field size as SIZE_MAX ("unknown size"). This should be impossible, since the detection function will explicitly give up if the destination field size is unknown. For example, the JFS warning was: memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615) Other cases of this warning (e.g.[3]) have reported correctly, and the reproducer only happens under GCC (at least 10.2 and 12.1), so this currently appears to be a GCC bug. Explicitly capturing the __builtin_object_size() results in const temporary variables fixes the report. For example, the JFS reproducer now correctly reports the field size (128): memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 128) Examination of the .text delta (which is otherwise identical), shows the literal value used in the report changing: - mov $0xffffffffffffffff,%rcx + mov $0x80,%ecx [1] https://lore.kernel.org/lkml/Y0zEzZwhOxTDcBTB@codemonkey.org.uk/ [2] https://syzkaller.appspot.com/bug?id=23d613df5259b977dac1696bec77f61a85890e3d [3] https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com/ Cc: "Dr. David Alan Gilbert" <linux@treblig.org> Cc: llvm@lists.linux.dev Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2022-10-28cgroup: Implement DEBUG_CGROUP_REFTejun Heo
It's really difficult to debug when cgroup or css refs leak. Let's add a debug option to force the refcnt function to not be inlined so that they can be kprobed for debugging. Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-28x86: fortify: kmsan: fix KMSAN fortify buildsAlexander Potapenko
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the respective __msan_XXX functions, and that none of the macros are redefined twice. This should allow building kernel with both CONFIG_KMSAN and CONFIG_FORTIFY_SOURCE. Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com Link: https://github.com/google/kmsan/issues/89 Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28mm/uffd: fix vma check on userfault for wpPeter Xu
We used to have a report that pte-marker code can be reached even when uffd-wp is not compiled in for file memories, here: https://lore.kernel.org/all/YzeR+R6b4bwBlBHh@x1n/T/#u I just got time to revisit this and found that the root cause is we simply messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was wrong: if (vm_flags & VM_UFFD_MINOR) return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); Where we'll allow anything to pass on shmem as long as minor mode is requested. Axel did it right when introducing minor mode but I messed it up in b1f9e876862d when moving code around. Fix it. Link: https://lkml.kernel.org/r/20221024193336.1233616-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20221024193336.1233616-2-peterx@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28Merge branch 'fs.acl.rework' into for-nextChristian Brauner
2022-10-28acl: make vfs_posix_acl_to_xattr() staticChristian Brauner
After reworking posix acls this helper isn't used anywhere outside the core posix acl paths. Make it static. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-27Merge tag 'net-6.1-rc3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from 802.15.4 (Zigbee et al). Current release - regressions: - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1 Current release - new code bugs: - mptcp: fix abba deadlock on fastopen - eth: stmmac: rk3588: allow multiple gmac controllers in one system Previous releases - regressions: - ip: rework the fix for dflt addr selection for connected nexthop - net: couple more fixes for misinterpreting bits in struct page after the signature was added Previous releases - always broken: - ipv6: ensure sane device mtu in tunnels - openvswitch: switch from WARN to pr_warn on a user-triggerable path - ethtool: eeprom: fix null-deref on genl_info in dump - ieee802154: more return code fixes for corner cases in dgram_sendmsg - mac802154: fix link-quality-indicator recording - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload - eth: fec: limit register access on i.MX6UL - eth: bcm4908_enet: update TX stats after actual transmission - can: rcar_canfd: improve IRQ handling for RZ/G2L Misc: - genetlink: piggy back on the newly added resv_op_start to enforce more sanity checks on new commands" * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) net: enetc: survive memory pressure without crashing kcm: do not sense pfmemalloc status in kcm_sendpage() net: do not sense pfmemalloc status in skb_append_pagefrags() net/mlx5e: Fix macsec sci endianness at rx sa update net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function net/mlx5e: Fix macsec rx security association (SA) update/delete net/mlx5e: Fix macsec coverity issue at rx sa update net/mlx5: Fix crash during sync firmware reset net/mlx5: Update fw fatal reporter state on PCI handlers successful recover net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed net/mlx5e: TC, Reject forwarding from internal port to internal port net/mlx5: Fix possible use-after-free in async command interface net/mlx5: ASO, Create the ASO SQ with the correct timestamp format net/mlx5e: Update restore chain id for slow path packets net/mlx5e: Extend SKB room check to include PTP-SQ net/mlx5: DR, Fix matcher disconnect error flow net/mlx5: Wait for firmware to enable CRS before pci_restore_state net/mlx5e: Do not increment ESN when updating IPsec ESN state netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed ...
2022-10-27Merge tag 'hardening-v6.1-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Fix older Clang vs recent overflow KUnit test additions (Nick Desaulniers, Kees Cook) - Fix kern-doc visibility for overflow helpers (Kees Cook) * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: overflow: Refactor test skips for Clang-specific issues overflow: disable failing tests for older clang versions overflow: Fix kern-doc markup for functions
2022-10-27Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt fix from Eric Biggers: "Fix a memory leak that was introduced by a change that went into -rc1" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: fix keyring memory leak on mount failure
2022-10-27perf: Rewrite core context handlingPeter Zijlstra
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
2022-10-27net/mlx5: Fix possible use-after-free in async command interfaceTariq Toukan
mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb </TASK> Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27blk-mq: don't add non-pt request with ->end_io to batchMing Lei
dm-rq implements ->end_io callback for request issued to underlying queue, and it isn't passthrough request. Commit ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") doesn't clear rq->bio and rq->__data_len for request with ->end_io in blk_mq_end_request_batch(), and this way is actually dangerous, but so far it is only for nvme passthrough request. dm-rq needs to clean up remained bios in case of partial completion, and req->bio is required, then use-after-free is triggered, so the underlying clone request can't be completed in blk_mq_end_request_batch. Fix panic by not adding such request into batch list, and the issue can be triggered simply by exposing nvme pci to dm-mpath simply. Fixes: ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") Cc: dm-devel@redhat.com Cc: Mike Snitzer <snitzer@kernel.org> Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221027085709.513175-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27phy: dphy: add support to calculate the timing based on hs_clk_rateMarco Felsch
For MIPI-CSI sender use-case it is common to specify the allowed link-frequencies which should be used for the MIPI link and is half the hs-clock rate. This commit adds a helper to calculate the D-PHY timing based on the hs-clock rate so we don't need to calculate the timings within the driver. Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
2022-10-27KVM: Initialize gfn_to_pfn_cache locks in dedicated helperMichal Luczaj
Move the gfn_to_pfn_cache lock initialization to another helper and call the new helper during VM/vCPU creation. There are race conditions possible due to kvm_gfn_to_pfn_cache_init()'s ability to re-initialize the cache's locks. For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock. (thread 1) | (thread 2) | kvm_xen_set_evtchn_fast | read_lock_irqsave(&gpc->lock, ...) | | kvm_gfn_to_pfn_cache_init | rwlock_init(&gpc->lock) read_unlock_irqrestore(&gpc->lock, ...) | Rename "cache_init" and "cache_destroy" to activate+deactivate to avoid implying that the cache really is destroyed/freed. Note, there more races in the newly named kvm_gpc_activate() that will be addressed separately. Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> [sean: call out that this is a bug fix] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013211234.1318131-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>