Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Initial support for the ARMv9 Scalable Matrix Extension (SME).
SME takes the approach used for vectors in SVE and extends this to
provide architectural support for matrix operations. No KVM support
yet, SME is disabled in guests.
- Support for crashkernel reservations above ZONE_DMA via the
'crashkernel=X,high' command line option.
- btrfs search_ioctl() fix for live-lock with sub-page faults.
- arm64 perf updates: support for the Hisilicon "CPA" PMU for
monitoring coherent I/O traffic, support for Arm's CMN-650 and
CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.
- Kselftest updates for SME, BTI, MTE.
- Automatic generation of the system register macros from a 'sysreg'
file describing the register bitfields.
- Update the type of the function argument holding the ESR_ELx register
value to unsigned long to match the architecture register size
(originally 32-bit but extended since ARMv8.0).
- stacktrace cleanups.
- ftrace cleanups.
- Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
avoid executable mappings in kexec/hibernate code, drop TLB flushing
from get_clear_flush() (and rename it to get_clear_contig()),
ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
arm64/sysreg: Generate definitions for FAR_ELx
arm64/sysreg: Generate definitions for DACR32_EL2
arm64/sysreg: Generate definitions for CSSELR_EL1
arm64/sysreg: Generate definitions for CPACR_ELx
arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
arm64/sysreg: Generate definitions for CLIDR_EL1
arm64/sve: Move sve_free() into SVE code section
arm64: Kconfig.platforms: Add comments
arm64: Kconfig: Fix indentation and add comments
arm64: mm: avoid writable executable mappings in kexec/hibernate code
arm64: lds: move special code sections out of kernel exec segment
arm64/hugetlb: Implement arm64 specific huge_ptep_get()
arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
arm64: kdump: Do not allocate crash low memory if not needed
arm64/sve: Generate ZCR definitions
arm64/sme: Generate defintions for SVCR
arm64/sme: Generate SMPRI_EL1 definitions
arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
arm64/sme: Automatically generate SMIDR_EL1 defines
arm64/sme: Automatically generate defines for SMCR
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Make use of the IBM z16 processor activity instrumentation facility
to count cryptography operations: add a new PMU device driver so that
perf can make use of this.
- Add new IBM z16 extended counter set to cpumf support.
- Add vdso randomization support.
- Add missing KCSAN instrumentation to barriers and spinlocks, which
should make s390's KCSAN support complete.
- Add support for IPL-complete-control facility: notify the hypervisor
that kexec finished work and the kernel starts.
- Improve error logging for PCI.
- Various small changes to workaround llvm's integrated assembler
limitations, and one bug, to make it finally possible to compile the
kernel with llvm's integrated assembler. This also requires to raise
the minimum clang version to 14.0.0.
- Various other small enhancements, bug fixes, and cleanups all over
the place.
* tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/head: get rid of 31 bit leftovers
scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390
s390/boot: do not emit debug info for assembly with llvm's IAS
s390/boot: workaround llvm IAS bug
s390/purgatory: workaround llvm's IAS limitations
s390/entry: workaround llvm's IAS limitations
s390/alternatives: remove padding generation code
s390/alternatives: provide identical sized orginal/alternative sequences
s390/cpumf: add new extended counter set for IBM z16
s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
s390/stp: clock_delta should be signed
s390/stp: fix todoff size
s390/pai: add support for cryptography counters
entry: Rename arch_check_user_regs() to arch_enter_from_user_mode()
s390/compat: cleanup compat_linux.h header file
s390/entry: remove broken and not needed code
s390/boot: convert parmarea to C
s390/boot: convert initial lowcore to C
s390/ptrace: move short psw definitions to ptrace header file
s390/head: initialize all new psws
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"This includes some small changes to kernel/stop_machine.c and arch/x86
which are deps of the new Intel IFS support.
Highlights:
- New drivers:
- Intel "In Field Scan" (IFS) support
- Winmate FM07/FM07P buttons
- Mellanox SN2201 support
- AMD PMC driver enhancements
- Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
platform/x86: intel_cht_int33fe: Set driver data
platform/x86: intel-hid: fix _DSM function index handling
platform/x86: toshiba_acpi: use kobj_to_dev()
platform/x86: samsung-laptop: use kobj_to_dev()
platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
tools/power/x86/intel-speed-select: Display error on turbo mode disabled
Documentation: In-Field Scan
platform/x86/intel/ifs: add ABI documentation for IFS
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: Add scan test support
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Read IFS firmware image
platform/x86/intel/ifs: Add stub driver for In-Field Scan
stop_machine: Add stop_core_cpuslocked() for per-core operations
x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
x86/microcode/intel: Expose collect_cpu_info_early() for IFS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Borislav Petkov:
- Add Raptor Lake to the set of CPU models which support splitlock
- Make life miserable for apps using split locks by slowing them down
considerably while the rest of the system remains responsive. The
hope is it will hurt more and people will really fix their misaligned
locks apps. As a result, free a TIF bit.
* tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Enable the split lock feature on Raptor Lake
x86/split-lock: Remove unused TIF_SLD bit
x86/split_lock: Make life miserable for split lockers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Borislav Petkov:
- Remove all the code around GS switching on 32-bit now that it is not
needed anymore
- Other misc improvements
* tag 'x86_core_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bug: Use normal relative pointers in 'struct bug_entry'
x86/nmi: Make register_nmi_handler() more robust
x86/asm: Merge load_gs_index()
x86/32: Remove lazy GS macros
ELF: Remove elf_core_copy_kernel_regs()
x86/32: Simplify ELF_CORE_COPY_REGS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
- Add a "make x86_debug.config" target which enables a bunch of useful
config debug options when trying to debug an issue
- A gcc-12 build warnings fix
* tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Wrap literal addresses in absolute_pointer()
x86/configs: Add x86 debugging Kconfig fragment plus docs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov:
"Intel Trust Domain Extensions (TDX) support.
This is the Intel version of a confidential computing solution called
Trust Domain Extensions (TDX). This series adds support to run the
kernel as part of a TDX guest. It provides similar guest protections
to AMD's SEV-SNP like guest memory and register state encryption,
memory integrity protection and a lot more.
Design-wise, it differs from AMD's solution considerably: it uses a
software module which runs in a special CPU mode called (Secure
Arbitration Mode) SEAM. As the name suggests, this module serves as
sort of an arbiter which the confidential guest calls for services it
needs during its lifetime.
Just like AMD's SNP set, this series reworks and streamlines certain
parts of x86 arch code so that this feature can be properly
accomodated"
* tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/tdx: Fix RETs in TDX asm
x86/tdx: Annotate a noreturn function
x86/mm: Fix spacing within memory encryption features message
x86/kaslr: Fix build warning in KASLR code in boot stub
Documentation/x86: Document TDX kernel architecture
ACPICA: Avoid cache flush inside virtual machines
x86/tdx/ioapic: Add shared bit for IOAPIC base address
x86/mm: Make DMA memory shared for TD guest
x86/mm/cpa: Add support for TDX shared memory
x86/tdx: Make pages shared in ioremap()
x86/topology: Disable CPU online/offline control for TDX guests
x86/boot: Avoid #VE during boot for TDX platforms
x86/boot: Set CR0.NE early and keep it set during the boot
x86/acpi/x86/boot: Add multiprocessor wake-up support
x86/boot: Add a trampoline for booting APs via firmware handoff
x86/tdx: Wire up KVM hypercalls
x86/tdx: Port I/O: Add early boot support
x86/tdx: Port I/O: Add runtime hypercalls
x86/boot: Port I/O: Add decompression-time support for TDX
x86/boot: Port I/O: Allow to hook up alternative helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner:
- Expose CLOCK_TAI to instrumentation to aid with TSN debugging.
- Ensure that the clockevent is stopped when there is no timer armed to
avoid pointless wakeups.
- Make the sched clock frequency handling and rounding consistent.
- Provide a better debugobject hint for delayed works. The timer
callback is always the same, which makes it difficult to identify the
underlying work. Use the work function as a hint instead.
- Move the timer specific sysctl code into the timer subsystem.
- The usual set of improvements and cleanups
* tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Provide a better debugobjects hint for delayed works
time/sched_clock: Fix formatting of frequency reporting code
time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz
time/sched_clock: Round the frequency reported to nearest rather than down
timekeeping: Consolidate fast timekeeper
timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
timekeeping: Introduce fast accessor to clock tai
tracing/timer: Add missing argument documentation of trace points
clocksource: Replace cpumask_weight() with cpumask_empty()
timers: Move timer sysctl into the timer code
clockevents: Use dedicated list iterator variable
timers: Simplify calc_index()
timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt handling updates from Thomas Gleixner:
"Core code:
- Make the managed interrupts more robust by shutting them down in
the core code when the assigned affinity mask does not contain
online CPUs.
- Make the irq simulator chip work on RT
- A small set of cpumask and power manageent cleanups
Drivers:
- A set of changes which mark GPIO interrupt chips immutable to
prevent the GPIO subsystem from modifying it under the hood. This
provides the necessary infrastructure and converts a set of GPIO
and pinctrl drivers over.
- A set of changes to make the pseudo-NMI handling for GICv3 more
robust: a missing barrier and consistent handling of the priority
mask.
- Another set of GICv3 improvements and fixes, but nothing
outstanding
- The usual set of improvements and cleanups all over the place
- No new irqchip drivers and not even a new device tree binding!
100+ interrupt chips are truly enough"
* tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
irqchip: Add Kconfig symbols for sunxi drivers
irqchip/gic-v3: Fix priority mask handling
irqchip/gic-v3: Refactor ISB + EOIR at ack time
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
genirq/irq_sim: Make the irq_work always run in hard irq context
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip/gic: Improved warning about incorrect type
irqchip/csky: Return true/false (not 1/0) from bool functions
irqchip/imx-irqsteer: Add runtime PM support
irqchip/imx-irqsteer: Constify irq_chip struct
irqchip/armada-370-xp: Enable MSI affinity configuration
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/sun6i-r: Use NULL for chip_data
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
irqchip/gic-v3: Claim iomem resources
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
irqchip/gic-v3: Detect LPI invalidation MMIO registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
- Initialize the per-CPU structures during early boot so that the state
is consistent from the very beginning.
- Make the virtualization hotplug state handling more robust and let
the core bringup CPUs which timed out in an earlier attempt again.
- Make the x86/xen CPU state tracking consistent on a failed online
attempt, so a consecutive bringup does not fall over the inconsistent
state.
* tag 'smp-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Initialise all cpuhp_cpu_state structs earlier
cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
x86/xen: Allow to retry if cpu_initialize_context() failed.
|
|
Pull block updates from Jens Axboe:
"Here are the core block changes for 5.19. This contains:
- blk-throttle accounting fix (Laibin)
- Series removing redundant assignments (Michal)
- Expose bio cache via the bio_set, so that DM can use it (Mike)
- Finish off the bio allocation interface cleanups by dealing with
the weirdest member of the family. bio_kmalloc combines a kmalloc
for the bio and bio_vecs with a hidden bio_init call and magic
cleanup semantics (Christoph)
- Clean up the block layer API so that APIs consumed by file systems
are (almost) only struct block_device based, so that file systems
don't have to poke into block layer internals like the
request_queue (Christoph)
- Clean up the blk_execute_rq* API (Christoph)
- Clean up various lose end in the blk-cgroup code to make it easier
to follow in preparation of reworking the blkcg assignment for bios
(Christoph)
- Fix use-after-free issues in BFQ when processes with merged queues
get moved to different cgroups (Jan)
- BFQ fixes (Jan)
- Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
Wolfgang, me)"
* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
blk-mq: fix typo in comment
bfq: Remove bfq_requeue_request_body()
bfq: Remove superfluous conversion from RQ_BIC()
bfq: Allow current waker to defend against a tentative one
bfq: Relax waker detection for shared queues
blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
blk-throttle: Set BIO_THROTTLED when bio has been throttled
blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
blk-cgroup: always terminate io.stat lines
block, bfq: make bfq_has_work() more accurate
block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
block: cleanup the VM accounting in submit_bio
block: Fix the bio.bi_opf comment
block: reorder the REQ_ flags
blk-iocost: combine local_stat and desc_stat to stat
block: improve the error message from bio_check_eod
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
block: remove superfluous calls to blkcg_bio_issue_init
kthread: unexport kthread_blkcg
blk-cgroup: cleanup blkcg_maybe_throttle_current
...
|
|
Pull io_uring updates from Jens Axboe:
"Here are the main io_uring changes for 5.19. This contains:
- Fixes for sparse type warnings (Christoph, Vasily)
- Support for multi-shot accept (Hao)
- Support for io_uring managed fixed files, rather than always
needing the applicationt o manage the indices (me)
- Fix for a spurious poll wakeup (Dylan)
- CQE overflow fixes (Dylan)
- Support more types of cancelations (me)
- Support for co-operative task_work signaling, rather than always
forcing an IPI (me)
- Support for doing poll first when appropriate, rather than always
attempting a transfer first (me)
- Provided buffer cleanups and support for mapped buffers (me)
- Improve how io_uring handles inflight SCM files (Pavel)
- Speedups for registered files (Pavel, me)
- Organize the completion data in a struct in io_kiocb rather than
keep it in separate spots (Pavel)
- task_work improvements (Pavel)
- Cleanup and optimize the submission path, in general and for
handling links (Pavel)
- Speedups for registered resource handling (Pavel)
- Support sparse buffers and file maps (Pavel, me)
- Various fixes and cleanups (Almog, Pavel, me)"
* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
io_uring: fix incorrect __kernel_rwf_t cast
io_uring: disallow mixed provided buffer group registrations
io_uring: initialize io_buffer_list head when shared ring is unregistered
io_uring: add fully sparse buffer registration
io_uring: use rcu_dereference in io_close
io_uring: consistently use the EPOLL* defines
io_uring: make apoll_events a __poll_t
io_uring: drop a spurious inline on a forward declaration
io_uring: don't use ERR_PTR for user pointers
io_uring: use a rwf_t for io_rw.flags
io_uring: add support for ring mapped supplied buffers
io_uring: add io_pin_pages() helper
io_uring: add buffer selection support to IORING_OP_NOP
io_uring: fix locking state for empty buffer group
io_uring: implement multishot mode for accept
io_uring: let fast poll support multishot
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: only wake when the correct events are set
io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU update from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offloading updates, mainly simplifications
- RCU-tasks updates, including some -rt fixups, handling of systems
with sparse CPU numbering, and a fix for a boot-time race-condition
failure
- Put SRCU on a memory diet in order to reduce the size of the
srcu_struct structure
- Torture-test updates fixing some bugs in tests and closing some
testing holes
- Torture-test updates for the RCU tasks flavors, most notably ensuring
that building rcutorture and friends does not change the
RCU-tasks-related Kconfig options
- Torture-test scripting updates
- Expedited grace-period updates, most notably providing
milliseconds-scale (not all that) soft real-time response from
synchronize_rcu_expedited().
This is also the first time in almost 30 years of RCU that someone
other than me has pushed for a reduction in the RCU CPU stall-warning
timeout, in this case by more than three orders of magnitude from 21
seconds to 20 milliseconds. This tighter timeout applies only to
expedited grace periods
* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu: Move expedited grace period (GP) work to RT kthread_worker
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
srcu: Drop needless initialization of sdp in srcu_gp_start()
srcu: Prevent expedited GPs and blocking readers from consuming CPU
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
srcu: Automatically determine size-transition strategy at boot
rcutorture: Make torture.sh allow for --kasan
rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
rcutorture: Make kvm.sh allow more memory for --kasan runs
torture: Save "make allmodconfig" .config file
scftorture: Remove extraneous "scf" from per_version_boot_params
rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
torture: Enable CSD-lock stall reports for scftorture
torture: Skip vmlinux check for kvm-again.sh runs
scftorture: Adjust for TASKS_RCU Kconfig option being selected
rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
rcuscale: Allow rcuscale without RCU Tasks
refscale: Allow refscale without RCU Tasks Rude/Trace
refscale: Allow refscale without RCU Tasks
rcutorture: Allow specifying per-scenario stat_interval
...
|
|
Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.
The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.
Therefore, re-check the result after acquiring locks and bailing
if they no longer match.
Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.
Fixes: f63a8daa5812 ("perf: Fix event->ctx locking")
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf/arm-cmn: Decode CAL devices properly in debugfs
perf/arm-cmn: Fix filter_sel lookup
perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first
drivers/perf: hisi: Add Support for CPA PMU
drivers/perf: hisi: Associate PMUs in SICL with CPUs online
drivers/perf: arm_spe: Expose saturating counter to 16-bit
perf/arm-cmn: Add CMN-700 support
perf/arm-cmn: Refactor occupancy filter selector
perf/arm-cmn: Add CMN-650 support
dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
perf: check return value of armpmu_request_irq()
perf: RISC-V: Remove non-kernel-doc ** comments
* for-next/sme: (30 commits)
: Scalable Matrix Extensions support.
arm64/sve: Move sve_free() into SVE code section
arm64/sve: Make kernel FPU protection RT friendly
arm64/sve: Delay freeing memory in fpsimd_flush_thread()
arm64/sme: More sensibly define the size for the ZA register set
arm64/sme: Fix NULL check after kzalloc
arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding()
arm64/sme: Provide Kconfig for SME
KVM: arm64: Handle SME host state when running guests
KVM: arm64: Trap SME usage in guest
KVM: arm64: Hide SME system registers from guests
arm64/sme: Save and restore streaming mode over EFI runtime calls
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Add ptrace support for ZA
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Implement ZA signal handling
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Disable ZA and streaming mode when handling signals
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Implement ZA context switching
arm64/sme: Implement streaming SVE context switching
...
* for-next/stacktrace:
: Stacktrace cleanups.
arm64: stacktrace: align with common naming
arm64: stacktrace: rename stackframe to unwind_state
arm64: stacktrace: rename unwinder functions
arm64: stacktrace: make struct stackframe private to stacktrace.c
arm64: stacktrace: delete PCS comment
arm64: stacktrace: remove NULL task check from unwind_frame()
* for-next/fault-in-subpage:
: btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable().
btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
arm64: Add support for user sub-page fault probing
mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
* for-next/misc:
: Miscellaneous patches.
arm64: Kconfig.platforms: Add comments
arm64: Kconfig: Fix indentation and add comments
arm64: mm: avoid writable executable mappings in kexec/hibernate code
arm64: lds: move special code sections out of kernel exec segment
arm64/hugetlb: Implement arm64 specific huge_ptep_get()
arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
arm64: mm: Make arch_faults_on_old_pte() check for migratability
arm64: mte: Clean up user tag accessors
arm64/hugetlb: Drop TLB flush from get_clear_flush()
arm64: Declare non global symbols as static
arm64: mm: Cleanup useless parameters in zone_sizes_init()
arm64: fix types in copy_highpage()
arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE
arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK
arm64: document the boot requirements for MTE
arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE
* for-next/ftrace:
: ftrace cleanups.
arm64/ftrace: Make function graph use ftrace directly
ftrace: cleanup ftrace_graph_caller enable and disable
* for-next/crashkernel:
: Support for crashkernel reservations above ZONE_DMA.
arm64: kdump: Do not allocate crash low memory if not needed
docs: kdump: Update the crashkernel description for arm64
of: Support more than one crash kernel regions for kexec -s
of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
arm64: kdump: Reimplement crashkernel=X
arm64: Use insert_resource() to simplify code
kdump: return -ENOENT if required cmdline option does not exist
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Add new infrastructure to stop gpiolib from rewriting irq_chip
structures behind our back. Convert a few of them, but this will
obviously be a long effort.
- A bunch of GICv3 improvements, such as using MMIO-based invalidations
when possible, and reducing the amount of polling we perform when
reconfiguring interrupts.
- Another set of GICv3 improvements for the Pseudo-NMI functionality,
with a nice cleanup making it easy to reason about the various
states we can be in when an NMI fires.
- The usual bunch of misc fixes and minor improvements.
Link: https://lore.kernel.org/all/20220519165308.998315-1-maz@kernel.org
|
|
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"The recent expansion of the sched switch tracepoint inserted a new
argument in the middle of the arguments. This reordering broke BPF
programs which relied on the old argument list.
While tracepoints are not considered stable ABI, it's not trivial to
make BPF cope with such a change, but it's being worked on. For now
restore the original argument order and move the new argument to the
end of the argument list"
* tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/tracing: Append prev_state to tp args instead
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a recent (introduced in 5.16) regression in the core
interrupt code.
The consolidation of the interrupt handler invocation code added an
unconditional warning when generic_handle_domain_irq() is invoked from
outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required
for certain interrupt types. The subsequently called code already
contains a warning which triggers conditionally for interrupt chips
which indicate this requirement in their properties.
Remove the overbroad one"
* tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
|
|
The IRQ simulator uses irq_work to trigger an interrupt. Without the
IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context
on PREEMPT_RT. This causes locking errors later in handle_simple_irq()
which expects to be invoked with disabled interrupts.
Triggering individual interrupts in hardirq context should not lead to
unexpected high latencies since this is also what the hardware
controller does. Also it is used as a simulator so...
Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de
|
|
With debugobjects enabled the timer hint for freeing of active timers
embedded inside delayed works is always the same, i.e. the hint is
delayed_work_timer_fn, even though the function the delayed work is going
to run can be wildly different depending on what work was queued. Enabling
workqueue debugobjects doesn't help either because the delayed work isn't
considered active until it is actually queued to run on a workqueue. If the
work is freed while the timer is pending the work isn't considered active
so there is no information from workqueue debugobjects.
Special case delayed works in the timer debugobjects hint logic so that the
delayed work function is returned instead of the delayed_work_timer_fn.
This will help to understand which delayed work was pending that got
freed.
Apply the same treatment for kthread_delayed_work because it follows the
same pattern.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
were hot-added"
* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
|
|
Hardware core level testing features require near simultaneous execution
of WRMSR instructions on all threads of a core to initiate a test.
Provide a customized cut down version of stop_machine_cpuslocked() that
just operates on the threads of a single core.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
|
|
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
|
|
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes. This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.
This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.
The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.
The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:
------------------------------------------------------------------------
| | workqueues | kthread_worker | Diff |
------------------------------------------------------------------------
| Count | 725 | 688 | |
------------------------------------------------------------------------
| Min Duration (ns) | 326 | 447 | 37.12% |
------------------------------------------------------------------------
| Q1 (ns) | 39,428 | 38,971 | -1.16% |
------------------------------------------------------------------------
| Q2 - Median (ns) | 98,225 | 69,743 | -29.00% |
------------------------------------------------------------------------
| Q3 (ns) | 342,122 | 126,638 | -62.98% |
------------------------------------------------------------------------
| Max Duration (ns) | 372,766,967 | 2,329,671 | -99.38% |
------------------------------------------------------------------------
| Avg Duration (ns) | 2,746,353 | 151,242 | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns) | 19,327,765 | 294,408 | |
------------------------------------------------------------------------
The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:
------------------------------------------------------------------------
| | workqueues | kthread_worker | Diff |
------------------------------------------------------------------------
| Total No. of Experiments | 25 | 23 | |
------------------------------------------------------------------------
| Largest Maximum (ns) | 372,766,967 | 2,329,671 | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum (ns) | 38,819 | 86,954 | 124.00% |
------------------------------------------------------------------------
| Range of Maximums (ns) | 372,728,148 | 2,242,717 | |
------------------------------------------------------------------------
| Largest Minimum (ns) | 88,623 | 27,588 | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum (ns) | 326 | 447 | 37.12% |
------------------------------------------------------------------------
| Range of Minimums (ns) | 88,297 | 27,141 | |
------------------------------------------------------------------------
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Wei Wang <wvw@google.com>
Tested-by: Kyle Lin <kylelin@google.com>
Tested-by: Chunwei Lu <chunweilu@google.com>
Tested-by: Lulu Wang <luluw@google.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds. However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning. Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.
Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise. It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.
[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Since commit 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()"),
generic_handle_domain_irq() warns if called outside hardirq context, even
though the function calls down to handle_irq_desc(), which warns about the
same, but conditionally on handle_enforce_irqctx().
The newly added warning is a false positive if the interrupt originates
from any other irqchip than x86 APIC or ARM GIC/GICv3. Those are the only
ones for which handle_enforce_irqctx() returns true. Per commit
c16816acd086 ("genirq: Add protection against unsafe usage of
generic_handle_irq()"):
"In general calling generic_handle_irq() with interrupts disabled from non
interrupt context is harmless. For some interrupt controllers like the
x86 trainwrecks this is outright dangerous as it might corrupt state if
an interrupt affinity change is pending."
Examples for interrupt chips where the warning is a false positive are
USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c:
USB gadgets are incapable of directly signaling an interrupt because they
cannot initiate a bus transaction by themselves. All communication on
the bus is initiated by the host controller, which polls a gadget's
Interrupt Endpoint in regular intervals. If an interrupt is pending,
that information is passed up the stack in softirq context, from which a
hardirq is synthesized via generic_handle_domain_irq().
Remove the warning to eliminate such false positives.
Fixes: 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jakub Kicinski <kuba@kernel.org>
CC: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Octavian Purdila <octavian.purdila@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220505113207.487861b2@kernel.org
Link: https://lore.kernel.org/r/20220506203242.GA1855@wunner.de
Link: https://lore.kernel.org/r/c3caf60bfa78e5fdbdf483096b7174da65d1813a.1652168866.git.lukas@wunner.de
|
|
arch_check_user_regs() is used at the moment to verify that struct pt_regs
contains valid values when entering the kernel from userspace. s390 needs
a place in the generic entry code to modify a cpu data structure when
switching from userspace to kernel mode. As arch_check_user_regs() is
exactly this, rename it to arch_enter_from_user_mode().
When entering the kernel from userspace, arch_check_user_regs() is
used to verify that struct pt_regs contains valid values. Note that
the NMI codepath doesn't call this function. s390 needs a place in the
generic entry code to modify a cpu data structure when switching from
userspace to kernel mode. As arch_check_user_regs() is exactly this,
rename it to arch_enter_from_user_mode().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220504062351.2954280-2-tmricht@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A fix and an email address update:
- Mark the NMI safe time accessors notrace to prevent tracer
recursion when they are selected as trace clocks.
- John Stultz has a new email address"
* tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Mark NMI safe time accessors as notrace
MAINTAINERS: Update email address for John Stultz
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A fix for the threaded interrupt core.
A quick sequence of request/free_irq() can result in a hang because
the interrupt thread did not reach the thread function and got stopped
in the kthread core already. That leaves a state active counter
arround which makes a invocation of synchronized_irq() on that
interrupt hang forever.
Ensure that the thread reached the thread function in request_irq() to
prevent that"
* tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Synchronize interrupt thread startup
|
|
According to the current crashkernel=Y,low support in other ARCHes, it's
an optional command-line option. When it doesn't exist, kernel will try
to allocate minimum required memory below 4G automatically.
However, __parse_crashkernel() returns '-EINVAL' for all error cases. It
can't distinguish the nonexistent option from invalid option.
Change __parse_crashkernel() to return '-ENOENT' for the nonexistent option
case. With this change, crashkernel,low memory will take the default
value if crashkernel=,low is not specified; while crashkernel reservation
will fail and bail out if an invalid option is specified.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220506114402.365-2-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
There are 3 places where the cpu and node masks of the top cpuset can
be initialized in the order they are executed:
1) start_kernel -> cpuset_init()
2) start_kernel -> cgroup_init() -> cpuset_bind()
3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp()
The first cpuset_init() call just sets all the bits in the masks.
The second cpuset_bind() call sets cpus_allowed and mems_allowed to the
default v2 values. The third cpuset_init_smp() call sets them back to
v1 values.
For systems with cgroup v2 setup, cpuset_bind() is called once. As a
result, cpu and memory node hot add may fail to update the cpu and node
masks of the top cpuset to include the newly added cpu or node in a
cgroup v2 environment.
For systems with cgroup v1 setup, cpuset_bind() is called again by
rebind_subsystem() when the v1 cpuset filesystem is mounted as shown
in the dmesg log below with an instrumented kernel.
[ 2.609781] cpuset_bind() called - v2 = 1
[ 3.079473] cpuset_init_smp() called
[ 7.103710] cpuset_bind() called - v2 = 0
smp_init() is called after the first two init functions. So we don't
have a complete list of active cpus and memory nodes until later in
cpuset_init_smp() which is the right time to set up effective_cpus
and effective_mems.
To fix this cgroup v2 mask setup problem, the potentially incorrect
cpus_allowed & mems_allowed setting in cpuset_init_smp() are removed.
For cgroup v2 systems, the initial cpuset_bind() call will set the masks
correctly. For cgroup v1 systems, the second call to cpuset_bind()
will do the right setup.
cc: stable@vger.kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:
setserial
open("/dev/ttyXXX")
request_irq()
do_stuff()
-> serial interrupt
-> wake(irq_thread)
desc->threads_active++;
close()
free_irq()
kthread_stop(irq_thread)
synchronize_irq() <- hangs because desc->threads_active != 0
The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.
This problem was introduced with commit 519cc8652b3a, which addressed a
interrupt sharing issue in the PCIe code.
Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.
To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.
This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.
Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().
This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.
[ tglx: Amended changelog ]
Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
|
|
'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
docs.2022.04.20a: Documentation updates.
fixes.2022.04.20a: Miscellaneous fixes.
nocb.2022.04.11b: Callback-offloading updates.
rcu-tasks.2022.04.11b: RCU-tasks updates.
srcu.2022.05.03a: Put SRCU on a memory diet.
torture.2022.04.11b: Torture-test updates.
torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration.
torturescript.2022.04.20a: Torture-test scripting updates.
|
|
Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without
snp_node array") initializes the local variable sdp differently depending
on the srcu's state in srcu_gp_start(). Either way, this initialization
overwrites the value used when sdp is defined.
This commit therefore drops this pointless definition-time initialization.
Although there is no functional change, compiler code generation may
be affected.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
If an SRCU reader blocks while a synchronize_srcu_expedited() waits for
that same reader, then that grace period will spawn an endless series of
workqueue handlers, consuming a full CPU. This quickly gets pointless
because consuming more CPU isn't going to make that reader get done
faster, especially if it is blocked waiting for an external event.
This commit therefore spawns at most one pair of back-to-back workqueue
handlers per expedited grace period phase, instead inserting increasing
delays as that grace period phase grows older, but capped at 10 jiffies.
In any case, if there have been at least 100 back-to-back workqueue
handlers within a single jiffy, regardless of grace period or grace-period
phase, then a one-jiffy delay is inserted.
[ paulmck: Apply feedback from kernel test robot. ]
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Song Liu <song@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit increases the sensitivity of contention detection by adding
checks to the acquisition of the srcu_data structure's lock on the
call_srcu() code path.
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems). A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
kthread_blkcg is only used by the built-in blk-cgroup code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
All callers of bio_blkcg actually want the CSS, so replace it with an
interface that does return the CSS. This now allows to move
struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a
public header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the cgroup_subsys_state instead of a the blkg so that blktrace
doesn't need to poke into blk-cgroup internals, and give the name a
blk prefix as the current name is way too generic for a public
interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use flat rather than nested indentation for chained else/if clauses as
per coding-style.rst:
if (x == y) {
..
} else if (x > y) {
...
} else {
....
}
This also improves readability.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2204240148220.9383@angie.orcam.me.uk
|
|
The kernel uses kHz as the unit for clock rates reported between 1MHz
(inclusive) and 4MHz (exclusive), e.g.:
sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
This reduces the amount of data lost due to rounding, but hasn't been
replicated for the kHz range when support was added for proper reporting of
sub-kHz clock rates. Take the same approach for rates between 1kHz
(inclusive) and 4kHz (exclusive), which makes it consistent.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2204240106380.9383@angie.orcam.me.uk
|
|
The frequency reported for clock sources are rounded down, which gives
misleading figures, e.g.:
I/O ASIC clock frequency 24999480Hz
sched_clock: 32 bits at 24MHz, resolution 40ns, wraps every 85901132779ns
MIPS counter frequency 59998512Hz
sched_clock: 32 bits at 59MHz, resolution 16ns, wraps every 35792281591ns
Rounding to nearest is more adequate:
I/O ASIC clock frequency 24999664Hz
sched_clock: 32 bits at 25MHz, resolution 40ns, wraps every 85900499947ns
MIPS counter frequency 59999728Hz
sched_clock: 32 bits at 60MHz, resolution 16ns, wraps every 35791556599ns
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2204240055590.9383@angie.orcam.me.uk
|
|
pm_runtime_resume_and_get() achieves the same and simplifies the code.
[ tglx: Simplify it further by presetting retval ]
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220418110716.2559453-1-chi.minghao@zte.com.cn
|
|
Provide a inline function which replaces the copy & pasta.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220415091921.072296632@linutronix.de
|
|
Accessing timekeeper::offset_boot in ktime_get_boot_fast_ns() is an
intended data race as the reader side cannot synchronize with a writer and
there is no space in struct tk_read_base of the NMI safe timekeeper.
Mark it so.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220415091920.956045162@linutronix.de
|
|
The PASID is being freed too early. It needs to stay around until after
device drivers that might be using it have had a chance to clear it out
of the hardware.
The relevant refcounts are:
mmget() /mmput() refcount the mm's address space
mmgrab()/mmdrop() refcount the mm itself
The PASID is currently tied to the life of the mm's address space and freed
in __mmput(). This makes logical sense because the PASID can't be used
once the address space is gone.
But, this misses an important point: even after the address space is gone,
the PASID will still be programmed into a device. Device drivers might,
for instance, still need to flush operations that are outstanding and need
to use that PASID. They do this at file->release() time.
Device drivers call the IOMMU driver to hold a reference on the mm itself
and drop it at file->release() time. But, the IOMMU driver holds a
reference on the mm itself, not the address space. The address space (and
the PASID) is long gone by the time the driver tries to clean up. This is
effectively a use-after-free bug on the PASID.
To fix this, move the PASID free operation from __mmput() to __mmdrop().
This ensures that the IOMMU driver's existing mmgrab() keeps the PASID
allocated until it drops its mm reference.
Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit")
Reported-by: Zhangfei Gao <zhangfei.gao@foxmail.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Zhangfei Gao <zhangfei.gao@foxmail.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20220428180041.806809-1-fenghua.yu@intel.com
|
|
Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except
it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.
This can be useful in avoiding a forceful transition to the kernel if the
task is running in userspace. Depending on the task_work in question, it
may be quite fine waiting for the next reschedule or kernel enter anyway,
or the use case may even have other mechanisms for hinting to the task
that a transition may be useful. This can drive more cooperative
scheduling of task_work.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|