Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP and CPU hotplug updates from Thomas Gleixner:
- Switch the smp_call_function*() @csd argument to call_single_data_t
type, which is a cache-line aligned typedef of the underlying struct
__call_single_data.
This ensures that the call data is not crossing a cacheline which
avoids bouncing an extra cache-line for the SMP function call
- Prevent offlining of the last housekeeping CPU when CPU isolation is
active.
Offlining the last housekeeping CPU makes no sense in general, but
also caused the scheduler to panic due to the empty CPU mask when
rebuilding the scheduler domains.
- Remove an unused CPU hotplug state
* tag 'smp-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Don't offline the last non-isolated CPU
cpu/hotplug: Remove unused cpuhp_state CPUHP_AP_X86_VDSO_VMA_ONLINE
smp: Change function signatures to use call_single_data_t
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Core:
- Exclude managed interrupts in the calculation of interrupts which
are targeted to a CPU which is about to be offlined to ensure that
there are enough free vectors on the still online CPUs to migrate
them over.
Managed interrupts do not need to be accounted because they are
either shut down on offline or migrated to an already reserved and
guaranteed slot on a still online CPU in the interrupts affinity
mask.
Including managed interrupts is overaccounting and can result in
needlessly aborting hibernation on large server machines.
- The usual set of small improvements
Drivers:
- Make the generic interrupt chip implementation handle interrupt
domains correctly and initialize the name pointers correctly
- Add interrupt affinity setting support to the Renesas RZG2L chip
driver.
- Prevent registering syscore operations multiple times in the SiFive
PLIC chip driver.
- Update device tree handling in the NXP Layerscape MSI chip driver"
* tag 'irq-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Fix syscore registration for multi-socket systems
irqchip/ls-scfg-msi: Use device_get_match_data()
genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()
PCI/MSI: Provide stubs for IMS functions
irqchip/renesas-rzg2l: Enhance driver to support interrupt affinity setting
genirq/generic-chip: Fix the irq_chip name for /proc/interrupts
irqdomain: Annotate struct irq_domain with __counted_by
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core updates from Thomas Gleixner:
"Two small updates to ptrace_stop():
- Add a comment to explain that the preempt_disable() before
unlocking tasklist lock is not a correctness problem and just
avoids the tracer to preempt the tracee before the tracee schedules
out.
- Make that preempt_disable() conditional on PREEMPT_RT=n.
RT enabled kernels cannot disable preemption at this point because
cgroup_enter_frozen() and sched_submit_work() acquire spinlocks or
rwlocks which are substituted by sleeping locks on RT. Acquiring a
sleeping lock in a preemption disable region is obviously not
possible.
This obviously brings back the potential slowdown of ptrace() for
RT enabled kernels, but that's a price to be paid for latency
guarantees"
* tag 'core-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT
signal: Add a proper comment about preempt_disable() in ptrace_stop()
|
|
The connection could be binded to the existing session for Multichannel.
session will be destroyed when binded connections are released.
So no need to wait for that's connection at logoff.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.7-2023-10-27:
amdgpu:
- RAS fixes
- Seamless boot fixes
- NBIO 7.7 fix
- SMU 14.0 fixes
- GC 11.5 fixes
- DML2 fixes
- ASPM fixes
- VPE fixes
- Misc code cleanups
- SRIOV fixes
- Add some missing copyright notices
- DCN 3.5 fixes
- FAMS fixes
- Backlight fix
- S/G display fix
- fdinfo cleanups
- EXT_COHERENT fixes for APU and NUMA systems
amdkfd:
- Misc fixes
- Misc code cleanups
- SVM fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build update from Ingo Molnar:
"Enable CONFIG_DEBUG_ENTRY=y in the x86 defconfigs"
* tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/defconfig: Enable CONFIG_DEBUG_ENTRY=y
|
|
As pointed out by Linus, closure_sync() was racy; we could skip blocking
immediately after a get() and a put(), but then that would skip any
barrier corresponding to the other thread's put() barrier.
To fix this, always do the full __closure_sync() sequence whenever any
get() has happened and the closure might have been used by other
threads.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
atomic_(dec|sub)_return_release() are a thing now - use them.
Also, delete the useless barrier in set_closure_fn(): it's redundant
with the memory barrier in closure_put(0.
Since closure_put() would now otherwise just have a release barrier, we
also need a new barrier when the ref hits 0 -
smp_acquire__after_ctrl_dep().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm handling updates from Ingo Molnar:
- Add new NX-stack self-test
- Improve NUMA partial-CFMWS handling
- Fix #VC handler bugs resulting in SEV-SNP boot failures
- Drop the 4MB memory size restriction on minimal NUMA nodes
- Reorganize headers a bit, in preparation to header dependency
reduction efforts
- Misc cleanups & fixes
* tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
selftests/x86/lam: Zero out buffer for readlink()
x86/sev: Drop unneeded #include
x86/sev: Move sev_setup_arch() to mem_encrypt.c
x86/tdx: Replace deprecated strncpy() with strtomem_pad()
selftests/x86/mm: Add new test that userspace stack is in fact NX
x86/sev: Make boot_ghcb_page[] static
x86/boot: Move x86_cache_alignment initialization to correct spot
x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
x86_64: Show CR4.PSE on auxiliaries like on BSP
x86/iommu/docs: Update AMD IOMMU specification document URL
x86/sev/docs: Update document URL in amd-memory-encryption.rst
x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
x86/numa: Introduce numa_fill_memblks()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq fix from Ingo Molnar:
"Fix out-of-order NMI nesting checks resulting in false positive
warnings"
* tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Ingo Molnar:
- Make IA32_EMULATION boot time configurable with
the new ia32_emulation=<bool> boot option
- Clean up fast syscall return validation code: convert
it to C and refactor the code
- As part of this, optimize the canonical RIP test code
* tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/32: Clean up syscall fast exit tests
x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test
x86/entry/64: Convert SYSRET validation tests to C
x86/entry/32: Remove SEP test for SYSEXIT
x86/entry/32: Convert do_fast_syscall_32() to bool return type
x86/entry/compat: Combine return value test from syscall handler
x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
x86: Make IA32_EMULATION boot time configurable
x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
x86/elf: Make loading of 32bit processes depend on ia32_enabled()
x86/entry: Compile entry_SYSCALL32_ignore() unconditionally
x86/entry: Rename ignore_sysret()
x86: Introduce ia32_enabled()
|
|
This commit adds mount option 'zero_size_dir'. If this option
enabled, don't allocate a cluster to directory when creating
it, and set the directory size to 0.
On Windows, a cluster is allocated for a directory when it is
created, so the mount option is disabled by default.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
After repairing a corrupted file system with exfatprogs' fsck.exfat,
zero-size directories may result. It is also possible to create
zero-size directories in other exFAT implementation, such as Paragon
ufsd dirver.
As described in the specification, the lower directory size limits
is 0 bytes.
Without this commit, sub-directories and files cannot be created
under a zero-size directory, and it cannot be removed.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Add GET and SET attributes ioctls to enable attribute modification.
We already do this in FAT and a few userspace utils made for it would
benefit from this also working on exFAT, namely fatattr.
Signed-off-by: Jan Cincera <hcincera@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.7-rc1:
drm-misc-next-2023-10-19 + following:
UAPI Changes:
Cross-subsystem Changes:
- Convert fbdev drivers to use fbdev i/o mem helpers.
Core Changes:
- Use cross-references for macros in docs.
- Make drm_client_buffer_addb use addfb2.
- Add NV20 and NV30 YUV formats.
- Documentation updates for create_dumb ioctl.
- CI fixes.
- Allow variable number of run-queues in scheduler.
Driver Changes:
- Rename drm/ast constants.
- Make ili9882t its own driver.
- Assorted fixes in ivpu, vc4, bridge/synopsis, amdgpu.
- Add planar formats to rockchip.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3d92fae8-9b1b-4165-9ca8-5fda11ee146b@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 assembly code updates from Ingo Molnar:
- Micro-optimize the x86 bitops code
- Define target-specific {raw,this}_cpu_try_cmpxchg{64,128}() to
improve code generation
- Define and use raw_cpu_try_cmpxchg() preempt_count_set()
- Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
- Remove the unused __sw_hweight64() implementation on x86-32
- Misc fixes and cleanups
* tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Address kernel-doc warnings
x86/entry: Fix typos in comments
x86/entry: Remove unused argument %rsi passed to exc_nmi()
x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32
x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
x86/percpu: Use raw_cpu_try_cmpxchg() in preempt_count_set()
x86/percpu: Define raw_cpu_try_cmpxchg and this_cpu_try_cmpxchg()
x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- Rework PE header generation, primarily to generate a modern, 4k
aligned kernel image view with narrower W^X permissions.
- Further refine init-lifetime annotations
- Misc cleanups & fixes
* tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/boot: efistub: Assign global boot_params variable
x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
x86/head/64: Move the __head definition to <asm/init.h>
x86/head/64: Add missing __head annotation to startup_64_load_idt()
x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata
x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
x86/boot: Fix incorrect startup_gdt_descr.size
x86/boot: Compile boot code with -std=gnu11 too
x86/boot: Increase section and file alignment to 4k/512
x86/boot: Split off PE/COFF .data section
x86/boot: Drop PE/COFF .reloc section
x86/boot: Construct PE/COFF .text section from assembler
x86/boot: Derive file size from _edata symbol
x86/boot: Define setup size in linker script
x86/boot: Set EFI handover offset directly in header asm
x86/boot: Grab kernel_info offset from zoffset header directly
x86/boot: Drop references to startup_64
x86/boot: Drop redundant code setting the root device
x86/boot: Omit compression buffer from PE/COFF image memory footprint
x86/boot: Remove the 'bugger off' message
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header file cleanup from Ingo Molnar:
"Replace <asm/export.h> uses with <linux/export.h> and then remove
<asm/export.h>"
* tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Remove <asm/export.h>
x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
x86/headers: Remove unnecessary #include <asm/export.h>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
- Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
- Simplify & clean up the uncore management code
- Fall back from RDPMC to RDMSR on certain uncore PMUs
- Improve per-package and cstate event reading
- Extend the Intel ref-cycles event to GP counters
- Fix Intel MTL event constraints
- Improve the Intel hybrid CPU handling code
- Micro-optimize the RAPL code
- Optimize perf_cgroup_switch()
- Improve large AUX area error handling
- Misc fixes and cleanups
* tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
perf: Optimize perf_cgroup_switch()
perf/x86/amd/uncore: Add memory controller support
perf/x86/amd/uncore: Add group exclusivity
perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
perf/x86/amd/uncore: Move discovery and registration
perf/x86/amd/uncore: Refactor uncore management
perf/core: Allow reading package events from perf_event_read_local
perf/x86/cstate: Allow reading the package statistics from local CPU
perf/x86/intel/pt: Fix kernel-doc comments
perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability
perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
perf/core: Bail out early if the request AUX area is out of bound
perf/x86/intel: Extend the ref-cycles event to GP counters
perf/x86/intel: Fix broken fixed event constraints extension
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
"Misc fixes and cleanups:
- Fix potential MAX_NAME_LEN limit related build failures
- Fix scripts/faddr2line symbol filtering bug
- Fix scripts/faddr2line on LLVM=1
- Fix scripts/faddr2line to accept readelf output with mapping
symbols
- Minor cleanups"
* tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
scripts/faddr2line: Skip over mapping symbols in output from readelf
scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
scripts/faddr2line: Don't filter out non-function symbols from readelf
objtool: Remove max symbol name length limitation
objtool: Propagate early errors
objtool: Use 'the fallthrough' pseudo-keyword
x86/speculation, objtool: Use absolute relocations for annotations
x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Fair scheduler (SCHED_OTHER) improvements:
- Remove the old and now unused SIS_PROP code & option
- Scan cluster before LLC in the wake-up path
- Use candidate prev/recent_used CPU if scanning failed for cluster
wakeup
NUMA scheduling improvements:
- Improve the VMA access-PID code to better skip/scan VMAs
- Extend tracing to cover VMA-skipping decisions
- Improve/fix the recently introduced sched_numa_find_nth_cpu() code
- Generalize numa_map_to_online_node()
Energy scheduling improvements:
- Remove the EM_MAX_COMPLEXITY limit
- Add tracepoints to track energy computation
- Make the behavior of the 'sched_energy_aware' sysctl more
consistent
- Consolidate and clean up access to a CPU's max compute capacity
- Fix uclamp code corner cases
RT scheduling improvements:
- Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
- Drive the ->rto_mask with rt_rq->pushable_tasks updates
Scheduler scalability improvements:
- Rate-limit updates to tg->load_avg
- On x86 disable IBRS when CPU is offline to improve single-threaded
performance
- Micro-optimize in_task() and in_interrupt()
- Micro-optimize the PSI code
- Avoid updating PSI triggers and ->rtpoll_total when there are no
state changes
Core scheduler infrastructure improvements:
- Use saved_state to reduce some spurious freezer wakeups
- Bring in a handful of fast-headers improvements to scheduler
headers
- Make the scheduler UAPI headers more widely usable by user-space
- Simplify the control flow of scheduler syscalls by using lock
guards
- Fix sched_setaffinity() vs. CPU hotplug race
Scheduler debuggability improvements:
- Disallow writing invalid values to sched_rt_period_us
- Fix a race in the rq-clock debugging code triggering warnings
- Fix a warning in the bandwidth distribution code
- Micro-optimize in_atomic_preempt_off() checks
- Enforce that the tasklist_lock is held in for_each_thread()
- Print the TGID in sched_show_task()
- Remove the /proc/sys/kernel/sched_child_runs_first sysctl
... and misc cleanups & fixes"
* tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
sched/fair: Remove SIS_PROP
sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
sched/fair: Scan cluster before scanning LLC in wake-up path
sched: Add cpus_share_resources API
sched/core: Fix RQCF_ACT_SKIP leak
sched/fair: Remove unused 'curr' argument from pick_next_entity()
sched/nohz: Update comments about NEWILB_KICK
sched/fair: Remove duplicate #include
sched/psi: Update poll => rtpoll in relevant comments
sched: Make PELT acronym definition searchable
sched: Fix stop_one_cpu_nowait() vs hotplug
sched/psi: Bail out early from irq time accounting
sched/topology: Rename 'DIE' domain to 'PKG'
sched/psi: Delete the 'update_total' function parameter from update_triggers()
sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
sched/numa: Complete scanning of inactive VMAs when there is no alternative
sched/numa: Complete scanning of partial VMAs regardless of PID activity
sched/numa: Move up the access pid reset logic
sched/numa: Trace decisions related to skipping VMAs
...
|
|
- Remove unused includes like <linux/parser.h> and <linux/prefetch.h>;
- Move common includes into "internal.h".
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20231026021627.23284-2-mengferry@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Let's open code this helper for simplicity.
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20231026021627.23284-1-mengferry@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Move erofs_load_compr_cfgs() into decompressor.c as well as introduce
a callback instead of a hard-coded switch for each algorithm for
simplicity.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231022130957.11398-1-xiang@kernel.org
|
|
The LZMA algorithm support has been landed for more than one year since
Linux 5.16. Besides, the new XZ Utils 5.4 has been available in most
Linux distributions.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231021020137.1646959-1-hsiangkao@linux.alibaba.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Info Molnar:
"Futex improvements:
- Add the 'futex2' syscall ABI, which is an attempt to get away from
the multiplex syscall and adds a little room for extentions, while
lifting some limitations.
- Fix futex PI recursive rt_mutex waiter state bug
- Fix inter-process shared futexes on no-MMU systems
- Use folios instead of pages
Micro-optimizations of locking primitives:
- Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
architectures, to improve lockref code generation
- Improve the x86-32 lockref_get_not_zero() main loop by adding
build-time CMPXCHG8B support detection for the relevant lockref
code, and by better interfacing the CMPXCHG8B assembly code with
the compiler
- Introduce arch_sync_try_cmpxchg() on x86 to improve
sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
users to sync_try_cmpxchg().
- Micro-optimize rcuref_put_slowpath()
Locking debuggability improvements:
- Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
- Enforce atomicity of sched_submit_work(), which is de-facto atomic
but was un-enforced previously.
- Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
semantics
- Fix ww_mutex self-tests
- Clean up const-propagation in <linux/seqlock.h> and simplify the
API-instantiation macros a bit
RT locking improvements:
- Provide the rt_mutex_*_schedule() primitives/helpers and use them
in the rtmutex code to avoid recursion vs. rtlock on the PI state.
- Add nested blocking lockdep asserts to rt_mutex_lock(),
rtlock_lock() and rwbase_read_lock()
.. plus misc fixes & cleanups"
* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
futex: Don't include process MM in futex key on no-MMU
locking/seqlock: Fix grammar in comment
alpha: Fix up new futex syscall numbers
locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
locking/seqlock: Change __seqprop() to return the function pointer
locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
locking/seqlock: Fix typo in comment
futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
locking/local, arch: Rewrite local_add_unless() as a static inline function
locking/debug: Fix debugfs API return value checks to use IS_ERR()
locking/ww_mutex/test: Make sure we bail out instead of livelock
locking/ww_mutex/test: Fix potential workqueue corruption
locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
futex: Add sys_futex_requeue()
futex: Add flags2 argument to futex_requeue()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu fixlet from Borislav Petkov:
- kernel-doc fix
* tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Address kernel-doc warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Borislav Petkov:
- Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
actually used in the amd_nb.c enumeration
- Add support for extracting NUMA information from devicetree for
Hyper-V usages
- Add PCI device IDs for the new AMD MI300 AI accelerators
- Annotate an array in struct uv_rtc_timer_head with the new
__counted_by attribute
- Rework UV's NMI action parameter handling
* tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
x86/numa: Add Devicetree support
x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
x86/amd_nb: Add AMD Family MI300 PCI IDs
x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by
x86/platform/uv: Rework NMI "action" modparam handling
|
|
Add new mount options lowerdir+ and datadir+ that can be used to add
layers to lower layers stack one by one.
Unlike the legacy lowerdir mount option, special characters (i.e. colons
and cammas) are not unescaped with these new mount options.
The new mount options can be repeated to compose a large stack of lower
layers, but they may not be mixed with the lagacy lowerdir mount option,
because for displaying lower layers in mountinfo, we do not want to mix
escaped with unescaped lower layers path syntax.
Similar to data-only layer rules with the lowerdir mount option, the
datadir+ option must follow at least one lowerdir+ option and the
lowerdir+ option must not follow the datadir+ option.
If the legacy lowerdir mount option follows lowerdir+ and datadir+
mount options, it overrides them. Sepcifically, calling:
fsconfig(FSCONFIG_SET_STRING, "lowerdir", "", 0);
can be used to reset previously setup lower layers.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
In preparation for new mount options to add lowerdirs one by one,
generalize ovl_parse_param_upperdir() into helper ovl_parse_layer()
that will be used for parsing a single lower layers.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
We are about to add new mount options for adding lowerdir one by one,
but those mount options will not support escaping.
For the existing case, where lowerdir mount option is provided as a colon
separated list, store the user provided (possibly escaped) string and
display it as is when showing the lowerdir mount option.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Commit beae836e9c61 ("ovl: temporarily disable appending lowedirs")
removed the ability to append lowerdirs with syntax lowerdir=":<path>".
Remove leftover code and comments that are irrelevant with lowerdir
append mode disabled.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
An xattr whiteout (called "xwhiteout" in the code) is a reguar file of
zero size with the "overlay.whiteout" xattr set. A file like this in a
directory with the "overlay.whiteouts" xattrs set will be treated the
same way as a regular whiteout.
The "overlay.whiteouts" directory xattr is used in order to
efficiently handle overlay checks in readdir(), as we only need to
checks xattrs in affected directories.
The advantage of this kind of whiteout is that they can be escaped
using the standard overlay xattr escaping mechanism. So, a file with a
"overlay.overlay.whiteout" xattr would be unescaped to
"overlay.whiteout", which could then be consumed by another overlayfs
as a whiteout.
Overlayfs itself doesn't create whiteouts like this, but a userspace
mechanism could use this alternative mechanism to convert images that
may contain whiteouts to be used with overlayfs.
To work as a whiteout for both regular overlayfs mounts as well as
userxattr mounts both the "user.overlay.whiteout*" and the
"trusted.overlay.whiteout*" xattrs will need to be created.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
There are cases where you want to use an overlayfs mount as a lowerdir
for another overlayfs mount. For example, if the system rootfs is on
overlayfs due to composefs, or to make it volatile (via tmps), then
you cannot currently store a lowerdir on the rootfs. This means you
can't e.g. store on the rootfs a prepared container image for use
using overlayfs.
To work around this, we introduce an escapment mechanism for overlayfs
xattrs. Whenever the lower/upper dir has a xattr named
"overlay.overlay.XYZ", we list it as "overlay.XYZ" in listxattrs, and
when the user calls getxattr or setxattr on "overlay.XYZ", we apply to
"overlay.overlay.XYZ" in the backing directories.
This allows storing any kind of overlay xattrs in a overlayfs mount
that can be used as a lowerdir in another mount. It is possible to
stack this mechanism multiple times, such that
"overlay.overlay.overlay.XYZ" will survive two levels of overlay mounts,
however this is not all that useful in practice because of stack depth
limitations of overlayfs mounts.
Note: These escaped xattrs are copied to upper during copy-up.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
These match the ones for e.g. XATTR_TRUSTED_PREFIX_LEN.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
This moves the code from super.c and inode.c, and makes ovl_xattr_get/set()
static.
This is in preparation for doing more work on xattrs support.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
When lower fs is a nested overlayfs, calling encode_fh() on a lower
directory dentry may trigger copy up and take sb_writers on the upper fs
of the lower nested overlayfs.
The lower nested overlayfs may have the same upper fs as this overlayfs,
so nested sb_writers lock is illegal.
Move all the callers that encode lower fh to before ovl_want_write().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek
take the ovl_inode_lock, without holding upper sb_writers.
In case of nested lower overlay that uses same upper fs as this overlay,
lockdep will warn about (possibly false positive) circular lock
dependency when doing open/llseek of lower ovl file during copy up with
our upper sb_writers held, because the locking ordering seems reverse to
the locking order in ovl_copy_up_start():
- lower ovl_inode_lock
- upper sb_writers
Let the copy up "transaction" keeps an elevated mnt write count on upper
mnt, but leaves taking upper sb_writers to lower level helpers only when
they actually need it. This allows to avoid holding upper sb_writers
during lower file open/llseek and prevents the lockdep warning.
Minimizing the scope of upper sb_writers during copy up is also needed
for fixing another possible deadlocks by a following patch.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Make the locking order of ovl_inode_lock() strictly between the two
vfs stacked layers, i.e.:
- ovl vfs locks: sb_writers, inode_lock, ...
- ovl_inode_lock
- upper vfs locks: sb_writers, inode_lock, ...
To that effect, move ovl_want_write() into the helpers ovl_nlink_start()
and ovl_copy_up_start which currently take the ovl_inode_lock() after
ovl_want_write().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_get_write_access() gets write access to upper mnt without taking
freeze protection on upper sb and ovl_start_write() only takes freeze
protection on upper sb.
These helpers will be used to breakup the large ovl_want_write() scope
during copy up into finer grained freeze protection scopes.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
A simple wrapper for updating ovl inode size/mtime, to conform
with ovl_file_accessed().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_copyattr() may be called concurrently from aio completion context
without any lock and that could lead to overlay inode attributes getting
permanently out of sync with real inode attributes.
Use ovl inode spinlock to protect ovl_copyattr().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
We want to protect concurrent updates of ovl inode size and mtime
(i.e. ovl_copyattr()) from aio completion context.
Punt write aio completion to a workqueue so that we can protect
ovl_copyattr() with a spinlock.
Export sb_init_dio_done_wq(), so that overlayfs can use its own
dio workqueue to punt aio completions.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/8620dfd3-372d-4ae0-aa3f-2fe97dda1bca@kernel.dk/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
If ovl file is opened O_APPEND, the underlying realfile is also
opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags
on sync writes to realfile, just as we do with aio writes.
Effectively, because sync ovl writes are protected by inode lock,
this change only makes a difference if the realfile is written to (size
extending writes) from underneath overlayfs. The behavior in this case
is undefined, so it is ok if we change the behavior (to fail the ovl
IOCB_APPEND write).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Overlayfs implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
virtualization support is disabled in the BIOS on AMD and Hygon
platforms
- A minor cleanup
* tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Remove redundant 'break' statement
x86/cpu: Clear SVM feature if disabled by BIOS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Add support for non-contiguous capacity bitmasks being added to
Intel's CAT implementation
- Other improvements to resctrl code: better configuration,
simplifications, debugging support, fixes
* tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Display RMID of resource group
x86/resctrl: Add support for the files of MON groups only
x86/resctrl: Display CLOSID for resource group
x86/resctrl: Introduce "-o debug" mount option
x86/resctrl: Move default group file creation to mount
x86/resctrl: Unwind properly from rdt_enable_ctx()
x86/resctrl: Rename rftype flags for consistency
x86/resctrl: Simplify rftype flag definitions
x86/resctrl: Add multiple tasks to the resctrl group at once
Documentation/x86: Document resctrl's new sparse_masks
x86/resctrl: Add sparse_masks file in info
x86/resctrl: Enable non-contiguous CBMs in Intel CAT
x86/resctrl: Rename arch_has_sparse_bitmaps
x86/resctrl: Fix remaining kernel-doc warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hw mitigation updates from Borislav Petkov:
- A bunch of improvements, cleanups and fixlets to the SRSO mitigation
machinery and other, general cleanups to the hw mitigations code, by
Josh Poimboeuf
- Improve the return thunk detection by objtool as it is absolutely
important that the default return thunk is not used after returns
have been patched. Future work to detect and report this better is
pending
- Other misc cleanups and fixes
* tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/retpoline: Document some thunk handling aspects
x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
x86/callthunks: Delete unused "struct thunk_desc"
x86/vdso: Run objtool on vdso32-setup.o
objtool: Fix return thunk patching in retpolines
x86/srso: Remove unnecessary semicolon
x86/pti: Fix kernel warnings for pti= and nopti cmdline options
x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
x86/nospec: Refactor UNTRAIN_RET[_*]
x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
x86/srso: Disentangle rethunk-dependent options
x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
x86/bugs: Remove default case for fully switched enums
x86/srso: Remove 'pred_cmd' label
x86/srso: Unexport untraining functions
x86/srso: Improve i-cache locality for alias mitigation
x86/srso: Fix unret validation dependencies
x86/srso: Fix vulnerability reporting for missing microcode
x86/srso: Print mitigation for retbleed IBPB case
x86/srso: Print actual mitigation if requested mitigation isn't possible
...
|