Age | Commit message (Collapse) | Author |
|
On x86_64, with allmodconfig, struct uprobe_task is 72 bytes long, with a
hole and some padding.
/* size: 72, cachelines: 2, members: 7 */
/* sum members: 64, holes: 1, sum holes: 4 */
/* padding: 4 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 8 bytes */
Reorder the structure to fill the hole and avoid the padding.
This way, the whole structure fits in a single cacheline and some memory is
saved when it is allocated.
/* size: 64, cachelines: 1, members: 7 */
/* forced alignments: 1 */
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/a9f541d0cedf421f765c77a1fb93d6a979778a88.1730495562.git.christophe.jaillet@wanadoo.fr
|
|
Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.
The ability to pause or resume tracing when another event happens, can do
that.
Add ability for an event to "pause" or "resume" AUX area tracing.
Add aux_pause bit to perf_event_attr to indicate that, if the event
happens, the associated AUX area tracing should be paused. Ditto
aux_resume. Do not allow aux_pause and aux_resume to be set together.
Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
event that it should start in a "paused" state.
Add aux_paused to struct hw_perf_event for AUX area events to keep track of
the "paused" state. aux_paused is initialized to aux_start_paused.
Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start()
callbacks. Call as needed, during __perf_event_output(). Add
aux_in_pause_resume to struct perf_buffer to prevent races with the NMI
handler. Pause/resume in NMI context will miss out if it coincides with
another pause/resume.
To use aux_pause or aux_resume, an event must be in a group with the AUX
area event as the group leader.
Example (requires Intel PT and tools patches also):
$ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.043 MB perf.data ]
$ perf script --call-trace
uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
uname 30805 [000] 24001.058784424: psb offs: 0
uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%)
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr
uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr
uname 30805 [000] 24001.058785639: 0x0
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20241022155920.17511-3-adrian.hunter@intel.com
|
|
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take
uretprobe-specific SRCU lock and keep it active as kernel transfers
control back to user space.
Given we can't rely on user space returning from traced function within
reasonable time period, we need to make sure not to keep SRCU lock
active for too long, though. To that effect, we employ a timer callback
which is meant to terminate SRCU lock region after predefined timeout
(currently set to 100ms), and instead transfer underlying struct
uprobe's lifetime protection to refcounting.
This fallback to less scalable refcounting after 100ms is a fine
tradeoff from uretprobe's scalability and performance perspective,
because uretprobing *long running* user functions inherently doesn't run
into scalability issues (there is just not enough frequency to cause
noticeable issues with either performance or scalability).
The overall trick is in ensuring synchronization between current thread
and timer's callback fired on some other thread. To cope with that with
minimal logic complications, we add hprobe wrapper which is used to
contain all the synchronization related issues behind a small number of
basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected
state to refcounted state, and a hprobe_consume() and hprobe_finalize()
pair of single-use consuming helpers. Other than that, whatever current
thread's logic is there stays the same, as timer thread cannot modify
return_instance state (or add new/remove old return_instances). It only
takes care of SRCU unlock and uprobe refcounting, which is hidden from
the higher-level uretprobe handling logic.
We use atomic xchg() in hprobe_consume(), which is called from
performance critical handle_uretprobe_chain() function run in the
current context. When uncontended, this xchg() doesn't seem to hurt
performance as there are no other competing CPUs fighting for the same
cache line. We also mark struct return_instance as ____cacheline_aligned
to ensure no false sharing can happen.
Another technical moment. We need to make sure that the list of return
instances can be safely traversed under RCU from timer callback, so we
delay return_instance freeing with kfree_rcu() and make sure that list
modifications use RCU-aware operations.
Also, given SRCU lock survives transition from kernel to user space and
back we need to use lower-level __srcu_read_lock() and
__srcu_read_unlock() to avoid lockdep complaining.
Just to give an impression of a kind of performance improvements this
change brings, below are benchmarking results with and without these
SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks
Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to
uprobe lookup) are left intact:
WITHOUT SRCU for uretprobes
===========================
uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu)
uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu)
uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu)
uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu)
uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu)
uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu)
uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu)
uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu)
uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu)
WITH SRCU for uretprobes
========================
uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu)
uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu)
uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu)
uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu)
uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu)
uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu)
uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu)
uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu)
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241024044159.3156646-3-andrii@kernel.org
|
|
The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.
Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241010142604.770192-2-kan.liang@linux.intel.com
|
|
This change allows the uprobe consumer to behave as session which
means that 'handler' and 'ret_handler' callbacks are connected in
a way that allows to:
- control execution of 'ret_handler' from 'handler' callback
- share data between 'handler' and 'ret_handler' callbacks
The session concept fits to our common use case where we do filtering
on entry uprobe and based on the result we decide to run the return
uprobe (or not).
It's also convenient to share the data between session callbacks.
To achive this we are adding new return value the uprobe consumer
can return from 'handler' callback:
UPROBE_HANDLER_IGNORE
- Ignore 'ret_handler' callback for this consumer.
And store cookie and pass it to 'ret_handler' when consumer has both
'handler' and 'ret_handler' callbacks defined.
We store shared data in the return_consumer object array as part of
the return_instance object. This way the handle_uretprobe_chain can
find related return_consumer and its shared data.
We also store entry handler return value, for cases when there are
multiple consumers on single uprobe and some of them are ignored and
some of them not, in which case the return probe gets installed and
we need to have a way to find out which consumer needs to be ignored.
The tricky part is when consumer is registered 'after' the uprobe
entry handler is hit. In such case this consumer's 'ret_handler' gets
executed as well, but it won't have the proper data pointer set,
so we can filter it out.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241018202252.693462-3-jolsa@kernel.org
|
|
Adding data pointer to both entry and exit consumer handlers and all
its users. The functionality itself is coming in following change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241018202252.693462-2-jolsa@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
intel_pstate driver:
- Fix CPU device node reference counting in the cpufreq core (Miquel
Sabaté Solà)
- Turn the spinlock used by the intel_pstate driver in hard IRQ
context into a raw one to prevent the driver from crashing when
PREEMPT_RT is enabled (Uwe Kleine-König)"
* tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid a bad reference count on CPU node
cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Slightly high amount of changes in this round, partly because of my
vacation in the last weeks. But all changes are small and nothing
looks worrisome.
The biggest LOCs is MAINTAINERS updates, and there is a core change
for card-ID string creation for non-ASCII inputs. Others are rather
device-specific, such as new quirks and device IDs for ASoC, usual
HD-audio and USB-audio quirks and fixes, as well as regression fixes
in HD-audio HDMI audio and Conexant codec"
* tag 'sound-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin
ALSA: line6: add hw monitor volume control to POD HD500X
ALSA: gus: Fix some error handling paths related to get_bpos() usage
ALSA: hda: Add missing parameter description for snd_hdac_stream_timecounter_init()
ALSA: usb-audio: Add native DSD support for Luxman D-08u
ALSA: core: add isascii() check to card ID generator
MAINTAINERS: ALSA: use linux-sound@vger.kernel.org list
Revert "ALSA: hda: Conditionally use snooping for AMD HDMI"
ASoC: intel: sof_sdw: Add check devm_kasprintf() returned value
ASoC: imx-card: Set card.owner to avoid a warning calltrace if SND=m
ASoC: dt-bindings: davinci-mcasp: Fix interrupts property
ASoC: qcom: sm8250: add qrb4210-rb2-sndcard compatible string
ASoC: dt-bindings: qcom,sm8250: add qrb4210-rb2-sndcard
ALSA: hda: fix trigger_tstamp_latched
ALSA: hda/realtek: Add a quirk for HP Pavilion 15z-ec200
ALSA: hda/generic: Drop obsoleted obey_preferred_dacs flag
ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs
ALSA: silence integer wrapping warning
ASoC: Intel: soc-acpi: arl: Fix some missing empty terminators
ASoC: Intel: soc-acpi-intel-rpl-match: add missing empty item
...
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, xe and amdgpu lead the way, with panthor, and few core
components getting various fixes. Nothing seems too out of the
ordinary.
atomic:
- Use correct type when reading damage rectangles
display:
- Fix kernel docs
dp-mst:
- Fix DSC decompression detection
hdmi:
- Fix infoframe size
sched:
- Update maintainers
- Fix race condition whne queueing up jobs
- Fix locking in drm_sched_entity_modify_sched()
- Fix pointer deref if entity queue changes
sysfb:
- Disable sysfb if framebuffer parent device is unknown
amdgpu:
- DML2 fix
- DSC fix
- Dispclk fix
- eDP HDR fix
- IPS fix
- TBT fix
i915:
- One fix for bitwise and logical "and" mixup in PM code
xe:
- Restore pci state on resume
- Fix locking on submission, queue and vm
- Fix UAF on queue destruction
- Fix resource release on freq init error path
- Use rw_semaphore to reduce contention on ASID->VM lookup
- Fix steering for media on Xe2_HPM
- Tuning updates to Xe2
- Resume TDR after GT reset to prevent jobs running forever
- Move id allocation to avoid userspace using a guessed number to
trigger UAF
- Fix OA stream close preventing pbatch buffers to complete
- Fix NPD when migrating memory on LNL
- Fix memory leak when aborting binds
panthor:
- Fix locking
- Set FOP_UNSIGNED_OFFSET in fops instance
- Acquire lock in panthor_vm_prepare_map_op_ctx()
- Avoid uninitialized variable in tick_ctx_cleanup()
- Do not block scheduler queue if work is pending
- Do not add write fences to the shared BOs
vbox:
- Fix VLA handling"
* tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits)
drm/xe: Fix memory leak when aborting binds
drm/xe: Prevent null pointer access in xe_migrate_copy
drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close
drm/xe/queue: move xa_alloc to prevent UAF
drm/xe/vm: move xa_alloc to prevent UAF
drm/xe: Clean up VM / exec queue file lock usage.
drm/xe: Resume TDR after GT reset
drm/xe/xe2: Add performance tuning for L3 cache flushing
drm/xe/xe2: Extend performance tuning to media GT
drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM
drm/xe: Use helper for ASID -> VM in GPU faults and access counters
drm/xe: Convert to USM lock to rwsem
drm/xe: use devm_add_action_or_reset() helper
drm/xe: fix UAF around queue destruction
drm/xe/guc_submit: add missing locking in wedged_fini
drm/xe: Restore pci state upon resume
drm/amd/display: Fix system hang while resume with TBT monitor
drm/amd/display: Enable idle workqueue for more IPS modes
drm/amd/display: Add HDR workaround for specific eDP
drm/amd/display: avoid set dispclk to 0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for an inotify deadlock and a data race in fsnotify"
* tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify: Fix possible deadlock in fsnotify_destroy_mark
fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in incremental send, fix invalid clone operation for file that got
its size decreased
- fix __counted_by() annotation of send path cache entries, we do not
store the terminating NUL
- fix a longstanding bug in relocation (and quite hard to hit by
chance), drop back reference cache that can get out of sync after
transaction commit
- wait for fixup worker kthread before finishing umount
- add missing raid-stripe-tree extent for NOCOW files, zoned mode
cannot have NOCOW files but RST is meant to be a standalone feature
- handle transaction start error during relocation, avoid potential
NULL pointer dereference of relocation control structure (reported by
syzbot)
- disable module-wide rate limiting of debug level messages
- minor fix to tracepoint definition (reported by checkpatch.pl)
* tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: disable rate limiting when debug enabled
btrfs: wait for fixup workers before stopping cleaner kthread during umount
btrfs: fix a NULL pointer dereference when failed to start a new trasacntion
btrfs: send: fix invalid clone operation for file that got its size decreased
btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class
btrfs: drop the backref cache during relocation if we commit
btrfs: also add stripe entries for NOCOW writes
btrfs: send: fix buffer overflow detection when copying path to cache entry
|
|
Pull close_range() fix from Al Viro:
"Fix the logic in descriptor table trimming"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
close_range(): fix the logics in descriptor table trimming
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ieee802154, bluetooth and netfilter.
Current release - regressions:
- eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc
- eth: am65-cpsw: fix forever loop in cleanup code
Current release - new code bugs:
- eth: mlx5: HWS, fixed double-free in error flow of creating SQ
Previous releases - regressions:
- core: avoid potential underflow in qdisc_pkt_len_init() with UFO
- core: test for not too small csum_start in virtio_net_hdr_to_skb()
- vrf: revert "vrf: remove unnecessary RCU-bh critical section"
- bluetooth:
- fix uaf in l2cap_connect
- fix possible crash on mgmt_index_removed
- dsa: improve shutdown sequence
- eth: mlx5e: SHAMPO, fix overflow of hd_per_wq
- eth: ip_gre: fix drops of small packets in ipgre_xmit
Previous releases - always broken:
- core: fix gso_features_check to check for both
dev->gso_{ipv4_,}max_size
- core: fix tcp fraglist segmentation after pull from frag_list
- netfilter: nf_tables: prevent nf_skb_duplicated corruption
- sctp: set sk_state back to CLOSED if autobind fails in
sctp_listen_start
- mac802154: fix potential RCU dereference issue in
mac802154_scan_worker
- eth: fec: restart PPS after link state change"
* tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems
doc: net: napi: Update documentation for napi_schedule_irqoff
net/ncsi: Disable the ncsi work before freeing the associated structure
net: phy: qt2025: Fix warning: unused import DeviceId
gso: fix udp gso fraglist segmentation after pull from frag_list
bridge: mcast: Fail MDB get request on empty entry
vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code
net: phy: realtek: Check the index value in led_hw_control_get
ppp: do not assume bh is held in ppp_channel_bridge_input()
selftests: rds: move include.sh to TEST_FILES
net: test for not too small csum_start in virtio_net_hdr_to_skb()
net: gso: fix tcp fraglist segmentation after pull from frag_list
ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
net: add more sanity checks to qdisc_pkt_len_init()
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
net: ethernet: ti: cpsw_ale: Fix warning on some platforms
net: microchip: Make FDMA config symbol invisible
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"vfs:
- Ensure that iter_folioq_get_pages() advances to the next slot
otherwise it will end up using the same folio with an out-of-bound
offset.
iomap:
- Dont unshare delalloc extents which can't be reflinked, and thus
can't be shared.
- Constrain the file range passed to iomap_file_unshare() directly in
iomap instead of requiring the callers to do it.
netfs:
- Use folioq_count instead of folioq_nr_slot to prevent an
unitialized value warning in netfs_clear_buffer().
- Fix missing wakeup after issuing writes by scheduling the write
collector only if all the subrequest queues are empty and thus no
writes are pending.
- Fix two minor documentation bugs"
* tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: constrain the file range passed to iomap_file_unshare
iomap: don't bother unsharing delalloc extents
netfs: Fix missing wakeup after issuing writes
Documentation: add missing folio_queue entry
folio_queue: fix documentation
netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer
iov_iter: fix advancing slot in iter_folioq_get_pages()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h
regarding flowtable hooks, from Phil Sutter.
2) Fix nft_audit.sh selftests with newer nft binaries, due to different
(valid) audit output, also from Phil.
3) Disable BH when duplicating packets via nf_dup infrastructure,
otherwise race on nf_skb_duplicated for locally generated traffic.
From Eric.
4) Missing return in callback of selftest C program, from zhang jiao.
netfilter pull request 24-10-02
* tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: Add missing return value
netfilter: nf_tables: prevent nf_skb_duplicated corruption
selftests: netfilter: Fix nft_audit.sh for newer nft binaries
netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED
====================
Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot was able to trigger this warning [1], after injecting a
malicious packet through af_packet, setting skb->csum_start and thus
the transport header to an incorrect value.
We can at least make sure the transport header is after
the end of the network header (with a estimated minimal size).
[1]
[ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0
mac=(-1,-1) mac_len=0 net=(16,-6) trans=10
shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9
[ 67.877764] sk family=17 type=3 proto=0
[ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00
[ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02
[ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00
[ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e
[ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00
[ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.887384] skb frag: 00000100: 00 00
[ 67.887878] ------------[ cut here ]------------
[ 67.887908] offset (-6) >= skb_headlen() (14)
[ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs
[ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011
[ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[ 67.891043] Call Trace:
[ 67.891173] <TASK>
[ 67.891274] ? __warn (kernel/panic.c:741)
[ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
[ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1))
[ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113)
[ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200)
[ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13))
[ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1))
[ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670)
[ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227)
[ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604)
[ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25))
[ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1))
[ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4))
[ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657)
[ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3))
[ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
[ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
[ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
[ 67.891734] ? do_sock_setsockopt (net/socket.c:2335)
[ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
[ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
[ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2024-09-25
* tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice
net/mlx5e: SHAMPO, Fix overflow of hd_per_wq
net/mlx5: HWS, changed E2BIG error to a negative return code
net/mlx5: HWS, fixed double-free in error flow of creating SQ
net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc
net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc()
net/mlx5: Added cond_resched() to crdump collection
net/mlx5: Fix error path in multi-packet WQE transmit
====================
Link: https://patch.msgid.link/20240925202013.45374-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull generic unaligned.h cleanups from Al Viro:
"Get rid of architecture-specific <asm/unaligned.h> includes, replacing
them with a single generic <linux/unaligned.h> header file.
It's the second largest (after asm/io.h) class of asm/* includes, and
all but two architectures actually end up using exact same file.
Massage the remaining two (arc and parisc) to do the same and just
move the thing to from asm-generic/unaligned.h to linux/unaligned.h"
[ This is one of those things that we're better off doing outside the
merge window, and would only cause extra conflict noise if it was in
linux-next for the next release due to all the trivial #include line
updates. Rip off the band-aid. - Linus ]
* tag 'pull-work.unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
move asm/unaligned.h to linux/unaligned.h
arc: get rid of private asm/unaligned.h
parisc: get rid of private asm/unaligned.h
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.12
A bunch of fixes here that came in during the merge window and the first
week of release, plus some new quirks and device IDs. There's nothing
major here, it's a bit bigger than it might've been due to there being
no fixes sent during the merge window due to your vacation.
|
|
[Syzbot reported]
WARNING: possible circular locking dependency detected
6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted
------------------------------------------------------
kswapd0/78 is trying to acquire lock:
ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578
but task is already holding lock:
ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline]
ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
...
kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044
inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline]
inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline]
__do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline]
__se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&group->mark_mutex){+.+.}-{3:3}:
...
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578
fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934
fsnotify_inoderemove include/linux/fsnotify.h:264 [inline]
dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403
__dentry_kill+0x20d/0x630 fs/dcache.c:610
shrink_kill+0xa9/0x2c0 fs/dcache.c:1055
shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082
prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163
super_cache_scan+0x34f/0x4b0 fs/super.c:221
do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
shrink_one+0x43b/0x850 mm/vmscan.c:4815
shrink_many mm/vmscan.c:4876 [inline]
lru_gen_shrink_node mm/vmscan.c:4954 [inline]
shrink_node+0x3799/0x3de0 mm/vmscan.c:5934
kswapd_shrink_node mm/vmscan.c:6762 [inline]
balance_pgdat mm/vmscan.c:6954 [inline]
kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223
[Analysis]
The problem is that inotify_new_watch() is using GFP_KERNEL to allocate
new watches under group->mark_mutex, however if dentry reclaim races
with unlinking of an inode, it can end up dropping the last dentry reference
for an unlinked inode resulting in removal of fsnotify mark from reclaim
context which wants to acquire group->mark_mutex as well.
This scenario shows that all notification groups are in principle prone
to this kind of a deadlock (previously, we considered only fanotify and
dnotify to be problematic for other reasons) so make sure all
allocations under group->mark_mutex happen with GFP_NOFS.
Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
|
|
When the trigger_tstamp_latched flag is set, the PCM core code assumes that
the low-level driver handles the trigger timestamping itself. Ensure that
runtime->trigger_tstamp is always updated.
Buglink: https://github.com/alsa-project/alsa-lib/issues/387
Reported-by: Zeno Endemann <zeno.endemann@mailbox.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://patch.msgid.link/20241002081306.1788405-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
In the parse_perf_domain function, if the call to
of_parse_phandle_with_args returns an error, then the reference to the
CPU device node that was acquired at the start of the function would not
be properly decremented.
Address this by declaring the variable with the __free(device_node)
cleanup attribute.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
event class
While running checkpatch.pl against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:
$ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
WARNING: Possible comma where semicolon could be used
#215: FILE: include/trace/events/btrfs.h:1720:
+ __entry->bytenr = bytenr,
__entry->num_bytes = rec->num_bytes;
total: 0 errors, 1 warnings, 184 lines checked
So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
s/folioq_count/folioq_full/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()")
added a dev->gso_max_size test to gso_features_check() in order to fall
back to GSO when needed.
This was added as it was noticed that some drivers could misbehave if TSO
packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size
limit. For instance, a device could be configured with BIG TCP for IPv4,
but not IPv6.
Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size()
and use the helper to respect both limits before falling back to GSO engine.
Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4
or IPv6 GRO size of the netdevice.
We later add a netif_get_gso_max_size() equivalent as well for GSO, so that
these helpers can be used consistently instead of open-coded checks.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
atomic:
- Use correct type when reading damage rectangles
display:
- Fix kernel docs
dp-mst:
- Fix DSC decompression detection
hdmi:
- Fix infoframe size
panthor:
- Fix locking
sched:
- Update maintainers
- Fix race condition whne queueing up jobs
sysfb:
- Disable sysfb if framebuffer parent device is unknown
vbox:
- Fix VLA handling
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240926121045.GA561653@localhost.localdomain
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"afs:
- Fix setting of the server responding flag
- Remove unused struct afs_address_list and afs_put_address_list()
function
- Fix infinite loop because of unresponsive servers
- Ensure that afs_retry_request() function is correctly added to the
afs_req_ops netfs operations table
netfs:
- Fix netfs_folio tracepoint handling to handle NULL mappings
- Add a missing folio_queue API documentation
- Ensure that netfs_write_folio() correctly advances the iterator via
iov_iter_advance()
- Fix a dentry leak during concurrent cull and cookie lookup
operations in cachefiles
pidfs:
- Correctly handle accessing another task's pid namespace"
* tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the netfs_folio tracepoint to handle NULL mapping
netfs: Add folio_queue API documentation
netfs: Advance iterator correctly rather than jumping it
afs: Fix the setting of the server responding flag
afs: Remove unused struct and function prototype
afs: Fix possible infinite loop with unresponsive servers
pidfs: check for valid pid namespace
afs: Fix missing wire-up of afs_retry_request()
cachefiles: fix dentry leak in cachefiles_open_file()
|
|
Fix the netfs_folio tracepoint to handle folios that have a NULL mapping
pointer. In such a case, just substitute a zero inode number.
Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2917423.1727697556@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add API documentation for folio_queue.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2912369.1727691281@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-doc@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Cloning a descriptor table picks the size that would cover all currently
opened files. That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.
Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().
The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.
That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.
It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size(). There we are under ->files_lock,
so the race is trivially avoided.
So we do the following:
* switch close_files() from calling unshare_fd() to calling
dup_fd().
* undo the calling convention change done to unshare_fd() in
60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting. NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.
Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- handle chained SGLs in the new tracing code (Christoph Hellwig)
* tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix DMA API tracing for chained scatterlists
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"lockdep:
- Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
- Use str_plural() to address Coccinelle warning (Thorsten Blum)
- Add debuggability enhancement (Luis Claudio R. Goncalves)
static keys & calls:
- Fix static_key_slow_dec() yet again (Peter Zijlstra)
- Handle module init failure correctly in static_call_del_module()
(Thomas Gleixner)
- Replace pointless WARN_ON() in static_call_module_notify() (Thomas
Gleixner)
<linux/cleanup.h>:
- Add usage and style documentation (Dan Williams)
rwsems:
- Move is_rwsem_reader_owned() and rwsem_owner() under
CONFIG_DEBUG_RWSEMS (Waiman Long)
atomic ops, x86:
- Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
- Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
Bizjak)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
jump_label: Fix static_key_slow_dec() yet again
static_call: Replace pointless WARN_ON() in static_call_module_notify()
static_call: Handle module init failure correctly in static_call_del_module()
locking/lockdep: Simplify character output in seq_line()
lockdep: fix deadlock issue between lockdep and rcu
lockdep: Use str_plural() to fix Coccinelle warning
cleanup: Add usage and style documentation
lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
|
|
Merge all pending locking commits into a single branch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
|
|
Pull ceph updates from Ilya Dryomov:
"Three CephFS fixes from Xiubo and Luis and a bunch of assorted
cleanups"
* tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client:
ceph: remove the incorrect Fw reference check when dirtying pages
ceph: Remove empty definition in header file
ceph: Fix typo in the comment
ceph: fix a memory leak on cap_auths in MDS client
ceph: flush all caps releases when syncing the whole filesystem
ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases()
libceph: use min() to simplify code in ceph_dns_resolve_name()
ceph: Convert to use jiffies macro
ceph: Remove unused declarations
|
|
Pull bitmap updates from Yury Norov:
- switch all bitmamp APIs from inline to __always_inline (Brian Norris)
The __always_inline series improves on code generation, and now with
the latest compiler versions is required to avoid compilation
warnings. It spent enough in my backlog, and I'm thankful to Brian
Norris for taking over and moving it forward.
- introduce GENMASK_U128() macro (Anshuman Khandual)
GENMASK_U128() is a prerequisite needed for arm64 development
* tag 'bitmap-for-6.12' of https://github.com/norov/linux:
lib/test_bits.c: Add tests for GENMASK_U128()
uapi: Define GENMASK_U128
nodemask: Switch from inline to __always_inline
cpumask: Switch from inline to __always_inline
bitmap: Switch from inline to __always_inline
find: Switch from inline to __always_inline
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) updates from Dave Jiang:
"Major changes address HDM decoder initialization from DVSEC ranges,
refactoring the code related to cxl mailboxes to be independent of the
memory devices, and adding support for shared upstream link
access_coordinate calculation, as well as a change to remove locking
from memory notifier callback.
In addition, a number of misc cleanups and refactoring of the code are
also included.
Address HDM decoder initialization from DVSEC ranges:
- Only register non-zero DVSEC ranges
- Remove duplicate implementation of waiting for memory_info_valid
- Simplify the checking of mem_enabled in cxl_hdm_decode_init()
Refactor the code related to cxl mailboxes to be independent of the memory devices:
- Move cxl headers in include/linux/ to include/cxl
- Move all mailbox related data to 'struct cxl_mailbox'
- Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of
memory device state
Add support for shared upstream link access_coordinate calculation for
configurations that have multiple targets under a switch or a root
port where the aggregated bandwidth can be greater than the upstream
link of the switch/RP upstream link:
- Preserve the CDAT access_coordinate from an endpoint
- Add the support for shared upstream link access_coordinate calculation
- Add documentation to explain how the calculations are done
Remove locking from memory notifier callback.
Misc cleanups:
- Convert devm_cxl_add_root() to return using ERR_CAST()
- cxl_test use dev_is_platform() instead of open coding
- Remove duplicate include of header core.h in core/cdat.c
- use scoped resource management to drop put_device() for cxl_port
- Use scoped_guard to drop device_lock() for cxl_port
- Refactor __devm_cxl_add_port() to drop gotos
- Rename cxl_setup_parent_dport to cxl_dport_init_aer and
cxl_dport_map_regs() to cxl_dport_map_ras()
- Refactor cxl_dport_init_aer() to be more concise
- Remove duplicate host_bridge->native_aer checking in
cxl_dport_init_ras_reporting()
- Fix comment for cxl_query_cmd()"
* tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
cxl: Add documentation to explain the shared link bandwidth calculation
cxl: Calculate region bandwidth of targets with shared upstream link
cxl: Preserve the CDAT access_coordinate for an endpoint
cxl: Fix comment regarding cxl_query_cmd() return data
cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input
cxl: Move mailbox related bits to the same context
cxl: move cxl headers to new include/cxl/ directory
cxl/region: Remove lock from memory notifier callback
cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init()
cxl/pci: Check Mem_info_valid bit for each applicable DVSEC
cxl/pci: Remove duplicated implementation of waiting for memory_info_valid
cxl/pci: Fix to record only non-zero ranges
cxl/pci: Remove duplicate host_bridge->native_aer checking
cxl/pci: cxl_dport_map_rch_aer() cleanup
cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()
cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern
cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
cxl/port: Use __free() to drop put_device() for cxl_port
cxl: Remove duplicate included header file core.h
tools/testing/cxl: Use dev_is_platform()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"19 hotfixes. 13 are cc:stable.
There's a focus on fixes for the memfd_pin_folios() work which was
added into 6.11. Apart from that, the usual shower of singleton fixes"
* tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
ocfs2: fix uninit-value in ocfs2_get_block()
zram: don't free statically defined names
memory tiers: use default_dram_perf_ref_source in log message
Revert "list: test: fix tests for list_cut_position()"
kselftests: mm: fix wrong __NR_userfaultfd value
compiler.h: specify correct attribute for .rodata..c_jump_table
mm/damon/Kconfig: update DAMON doc URL
mm: kfence: fix elapsed time for allocated/freed track
ocfs2: fix deadlock in ocfs2_get_system_file_inode
ocfs2: reserve space for inline xattr before attaching reflink tree
mm: migrate: annotate data-race in migrate_folio_unmap()
mm/hugetlb: simplify refs in memfd_alloc_folio
mm/gup: fix memfd_pin_folios alloc race panic
mm/gup: fix memfd_pin_folios hugetlb page allocation
mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
mm/filemap: fix filemap_get_folios_contig THP panic
mm: make SPLIT_PTE_PTLOCKS depend on SMP
tools: fix shared radix-tree build
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
"A second round of Xen related changes and features:
- a small fix of the xen-pciback driver for a warning issued by
sparse
- support PCI passthrough when using a PVH dom0
- enable loading the kernel in PVH mode at arbitrary addresses,
avoiding conflicts with the memory map when running as a Xen dom0
using the host memory layout"
* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pvh: Add 64bit relocation page tables
x86/kernel: Move page table macros to header
x86/pvh: Set phys_base when calling xen_prepare_pvh()
x86/pvh: Make PVH entrypoint PIC for x86-64
xen: sync elfnote.h from xen tree
xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
xen/privcmd: Add new syscall to get gsi from dev
xen/pvh: Setup gsi for passthrough device
xen/pci: Add a function to reset device for xen
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- Misc VDO fixes
- Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()
- Dm-delay: Improve kernel documentation
- Dm-crypt: Allow to specify the integrity key size as an option
- Dm-bufio: Remove pointless NULL check
- Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR;
use __assign_bit
- Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix
smatch warning
- Dm-integrity: Support recalculation in the 'I' mode
- Revert "dm: requeue IO if mapping table not yet available"
- Dm-crypt: Small refactoring to make the code more readable
- Dm-cache: Remove pointless error check
- Dm: Fix spelling errors
- Dm-verity: Restart or panic on an I/O error if restart or panic was
requested
- Dm-verity: Fallback to platform keyring also if key in trusted
keyring is rejected
* tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
dm verity: fallback to platform keyring also if key in trusted keyring is rejected
dm-verity: restart or panic on an I/O error
dm: fix spelling errors
dm-cache: remove pointless error check
dm vdo: handle unaligned discards correctly
dm vdo indexer: Convert comma to semicolon
dm-crypt: Use common error handling code in crypt_set_keyring_key()
dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key()
Revert "dm: requeue IO if mapping table not yet available"
dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
dm-integrity: support recalculation in the 'I' mode
dm integrity: Convert comma to semicolon
dm integrity: fix gcc 5 warning
dm: Make use of __assign_bit() API
dm integrity: Remove extra unlikely helper
dm: Convert to use ERR_CAST()
dm bufio: Remove NULL check of list_entry()
dm-crypt: Allow to specify the integrity key size as option
dm: Remove unused declaration and empty definition "dm_zone_map_bio"
dm delay: enhance kernel documentation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of patches for the driver core code for 6.12-rc1.
This set is the one that caused the most delay on my side, due to lots
of last-minute reports of problems in the async shutdown feature that
was added. In the end, I've reverted all of the patches in that series
so we are back to "normal" and the patch set is being reworked for the
next merge window.
Other than the async shutdown patches that were reverted, included in
here are:
- minor driver core cleanups
- minor driver core bus and class api cleanups and simplifications
for some callbacks
- some const markings of structures
- other even more minor cleanups
All of these, including the last minute reverts, have been in
linux-next, but all of the reports of problems in linux-next were
before the reverts happened. After the reverts, all is good"
* tag 'driver-core-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
Revert "driver core: don't always lock parent in shutdown"
Revert "driver core: separate function to shutdown one device"
Revert "driver core: shut down devices asynchronously"
Revert "nvme-pci: Make driver prefer asynchronous shutdown"
Revert "driver core: fix async device shutdown hang"
driver core: fix async device shutdown hang
driver core: attribute_container: Remove unused functions
driver core: Trivially simplify ((struct device_private *)curr)->device->p to @curr
devres: Correclty strip percpu address space of devm_free_percpu() argument
driver core: Make parameter check consistent for API cluster device_(for_each|find)_child()
bus: fsl-mc: make fsl_mc_bus_type const
nvme-pci: Make driver prefer asynchronous shutdown
driver core: shut down devices asynchronously
driver core: separate function to shutdown one device
driver core: don't always lock parent in shutdown
platform: Make platform_bus_type constant
driver core: class: Check namespace relevant parameters in class_register()
driver:base:core: Adding a "Return:" line in comment for device_link_add()
drivers/base: Introduce device_match_t for device finding APIs
firmware_loader: Block path traversal
...
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
if the pages were not already faulted in. During a normal page fault,
resv_huge_pages is consumed here:
hugetlb_fault()
alloc_hugetlb_folio()
dequeue_hugetlb_folio_vma()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
free_huge_pages--
resv_huge_pages--
During memfd_pin_folios, the page is created by calling
alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
resv_huge_pages is not modified:
memfd_alloc_folio()
alloc_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
free_huge_pages--
alloc_hugetlb_folio_nodemask has other callers that must not modify
resv_huge_pages. Therefore, to fix, define an alternate version of
alloc_hugetlb_folio_nodemask for this call site that adjusts
resv_huge_pages.
Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC update from Arnd Bergmann:
"Convert ep93xx to devicetree
This concludes a long journey towards replacing the old board files
with devictree description on the Cirrus Logic EP93xx platform.
Nikita Shubin has been working on this for a long time, for details
see the last post on
https://lore.kernel.org/lkml/20240909-ep93xx-v12-0-e86ab2423d4b@maquefel.me/"
* tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
dt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples
MAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer
soc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config
net: cirrus: use u8 for addr to calm down sparse
dmaengine: cirrus: use snprintf() to calm down gcc 13.3.0
dmaengine: ep93xx: Fix a NULL vs IS_ERR() check in probe()
pinctrl: ep93xx: Fix raster pins typo
spi: ep93xx: update kerneldoc comments for ep93xx_spi
clk: ep93xx: Fix off by one in ep93xx_div_recalc_rate()
clk: ep93xx: add module license
dmaengine: cirrus: remove platform code
ASoC: cirrus: edb93xx: Delete driver
ARM: ep93xx: soc: drop defines
ARM: ep93xx: delete all boardfiles
ata: pata_ep93xx: remove legacy pinctrl use
pwm: ep93xx: drop legacy pinctrl
ARM: ep93xx: DT for the Cirrus ep93xx SoC platforms
ARM: dts: ep93xx: Add EDB9302 DT
ARM: dts: ep93xx: add ts7250 board
ARM: dts: add Cirrus EP93XX SoC .dtsi
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are only two small patches, one cleanup for arch/alpha and a
preparation patch cleaning up the handling of runtime constants in the
linker scripts"
* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
runtime constants: move list of constants to vmlinux.lds.h
alpha: no need to include asm/xchg.h twice
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Not a lot happening in EFI land this cycle.
- Prevent kexec from crashing on a corrupted TPM log by using a
memory type that is reserved by default
- Log correctable errors reported via CPER
- A couple of cosmetic fixes"
* tag 'efi-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Remove redundant null pointer checks in efi_debugfs_init()
efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption
efi/cper: Print correctable AER information
efi: Remove unused declaration efi_initialize_iomem_resources()
|
|
This reverts commit fb97d2eb542faf19a8725afbd75cbc2518903210.
The logging was questionable to begin with, but it seems to actively
deadlock on the task lock.
"On second thought, let's not log core dump failures. 'Tis a silly place"
because if you can't tell your core dump is truncated, maybe you should
just fix your debugger instead of adding bugs to the kernel.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/all/d122ece6-3606-49de-ae4d-8da88846bef2@oracle.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|