Age | Commit message (Collapse) | Author |
|
AIL flushing can get stuck here:
[316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds.
[316649.007807] Not tainted 5.17.0-rc6-dgc+ #975
[316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[316649.011720] task:xfsaild/pmem1 state:D stack:14544 pid:324525 ppid: 2 flags:0x00004000
[316649.014112] Call Trace:
[316649.014841] <TASK>
[316649.015492] __schedule+0x30d/0x9e0
[316649.017745] schedule+0x55/0xd0
[316649.018681] io_schedule+0x4b/0x80
[316649.019683] xfs_buf_wait_unpin+0x9e/0xf0
[316649.021850] __xfs_buf_submit+0x14a/0x230
[316649.023033] xfs_buf_delwri_submit_buffers+0x107/0x280
[316649.024511] xfs_buf_delwri_submit_nowait+0x10/0x20
[316649.025931] xfsaild+0x27e/0x9d0
[316649.028283] kthread+0xf6/0x120
[316649.030602] ret_from_fork+0x1f/0x30
in the situation where flushing gets preempted between the unpin
check and the buffer trylock under nowait conditions:
blk_start_plug(&plug);
list_for_each_entry_safe(bp, n, buffer_list, b_list) {
if (!wait_list) {
if (xfs_buf_ispinned(bp)) {
pinned++;
continue;
}
Here >>>>>>
if (!xfs_buf_trylock(bp))
continue;
This means submission is stuck until something else triggers a log
force to unpin the buffer.
To get onto the delwri list to begin with, the buffer pin state has
already been checked, and hence it's relatively rare we get a race
between flushing and encountering a pinned buffer in delwri
submission to begin with. Further, to increase the pin count the
buffer has to be locked, so the only way we can hit this race
without failing the trylock is to be preempted between the pincount
check seeing zero and the trylock being run.
Hence to avoid this problem, just invert the order of trylock vs
pin check. We shouldn't hit that many pinned buffers here, so
optimising away the trylock for pinned buffers should not matter for
performance at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
After 963 iterations of generic/530, it deadlocked during recovery
on a pinned inode cluster buffer like so:
XFS (pmem1): Starting recovery (logdev: internal)
INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
Not tainted 5.17.0-rc6-dgc+ #975
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/8:0 state:D stack:13024 pid:306037 ppid: 2 flags:0x00004000
Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
Call Trace:
<TASK>
__schedule+0x30d/0x9e0
schedule+0x55/0xd0
schedule_timeout+0x114/0x160
__down+0x99/0xf0
down+0x5e/0x70
xfs_buf_lock+0x36/0xf0
xfs_buf_find+0x418/0x850
xfs_buf_get_map+0x47/0x380
xfs_buf_read_map+0x54/0x240
xfs_trans_read_buf_map+0x1bd/0x490
xfs_imap_to_bp+0x4f/0x70
xfs_iunlink_map_ino+0x66/0xd0
xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
xfs_iunlink_remove_inode+0xf2/0x1d0
xfs_inactive_ifree+0x1a3/0x900
xfs_inode_unlink+0xcc/0x210
xfs_inodegc_worker+0x1ac/0x2f0
process_one_work+0x1ac/0x390
worker_thread+0x56/0x3c0
kthread+0xf6/0x120
ret_from_fork+0x1f/0x30
</TASK>
task:mount state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x30d/0x9e0
schedule+0x55/0xd0
schedule_timeout+0x114/0x160
__down+0x99/0xf0
down+0x5e/0x70
xfs_buf_lock+0x36/0xf0
xfs_buf_find+0x418/0x850
xfs_buf_get_map+0x47/0x380
xfs_buf_read_map+0x54/0x240
xfs_trans_read_buf_map+0x1bd/0x490
xfs_imap_to_bp+0x4f/0x70
xfs_iget+0x300/0xb40
xlog_recover_process_one_iunlink+0x4c/0x170
xlog_recover_process_iunlinks.isra.0+0xee/0x130
xlog_recover_finish+0x57/0x110
xfs_log_mount_finish+0xfc/0x1e0
xfs_mountfs+0x540/0x910
xfs_fs_fill_super+0x495/0x850
get_tree_bdev+0x171/0x270
xfs_fs_get_tree+0x15/0x20
vfs_get_tree+0x24/0xc0
path_mount+0x304/0xba0
__x64_sys_mount+0x108/0x140
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
task:xfsaild/pmem1 state:D stack:14544 pid:324525 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x30d/0x9e0
schedule+0x55/0xd0
io_schedule+0x4b/0x80
xfs_buf_wait_unpin+0x9e/0xf0
__xfs_buf_submit+0x14a/0x230
xfs_buf_delwri_submit_buffers+0x107/0x280
xfs_buf_delwri_submit_nowait+0x10/0x20
xfsaild+0x27e/0x9d0
kthread+0xf6/0x120
ret_from_fork+0x1f/0x30
We have the mount process waiting on an inode cluster buffer read,
inodegc doing unlink waiting on the same inode cluster buffer, and
the AIL push thread blocked in writeback waiting for the inode
cluster buffer to become unpinned.
What has happened here is that the AIL push thread has raced with
the inodegc process modifying, committing and pinning the inode
cluster buffer here in xfs_buf_delwri_submit_buffers() here:
blk_start_plug(&plug);
list_for_each_entry_safe(bp, n, buffer_list, b_list) {
if (!wait_list) {
if (xfs_buf_ispinned(bp)) {
pinned++;
continue;
}
Here >>>>>>
if (!xfs_buf_trylock(bp))
continue;
Basically, the AIL has found the buffer wasn't pinned and got the
lock without blocking, but then the buffer was pinned. This implies
the processing here was pre-empted between the pin check and the
lock, because the pin count can only be increased while holding the
buffer locked. Hence when it has gone to submit the IO, it has
blocked waiting for the buffer to be unpinned.
With all executing threads now waiting on the buffer to be unpinned,
we normally get out of situations like this via the background log
worker issuing a log force which will unpinned stuck buffers like
this. But at this point in recovery, we haven't started the log
worker. In fact, the first thing we do after processing intents and
unlinked inodes is *start the log worker*. IOWs, we start it too
late to have it break deadlocks like this.
Avoid this and any other similar deadlock vectors in intent and
unlinked inode recovery by starting the log worker before we recover
intents and unlinked inodes. This part of recovery runs as though
the filesystem is fully active, so we really should have the same
infrastructure running as we normally do at runtime.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When an enum is used in the visible parts of a trace event that is
exported to user space, the user space applications like perf and
trace-cmd do not have a way to know what the value of the enum is. To
solve this, at boot up (or module load) the printk formats are modified to
replace the enum with their numeric value in the string output.
Array fields of the event are defined by [<nr-elements>] in the type
portion of the format file so that the user space parsers can correctly
parse the array into the appropriate size chunks. But in some trace
events, an enum is used in defining the size of the array, which once
again breaks the parsing of user space tooling.
This was solved the same way as the print formats were, but it modified
the type strings of the trace event. This caused crashes in some
architectures because, as supposed to the print string, is a const string
value. This was not detected on x86, as it appears that const strings are
still writable (at least in boot up), but other architectures this is not
the case, and writing to a const string will cause a kernel fault.
To fix this, use kstrdup() to copy the type before modifying it. If the
trace event is for the core kernel there's no need to free it because the
string will be in use for the life of the machine being on line. For
modules, create a link list to store all the strings being allocated for
modules and when the module is removed, free them.
Link: https://lore.kernel.org/all/yt9dr1706b4i.fsf@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220318153432.3984b871@gandalf.local.home
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: b3bc8547d3be ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.
The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.
However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:
15: 0f 90 c0 seto %al
18: c3 ret
19: cc int3
1a: 0f 1f 00 nopl (%rax)
1d: 0f 91 c0 setno %al
20: c3 ret
21: cc int3
22: 0f 1f 00 nopl (%rax)
25: 0f 92 c0 setb %al
28: c3 ret
29: cc int3
and this explodes like this:
int3: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012
RIP: 0010:setc+0x5/0x8 [kvm]
Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
Call Trace:
<TASK>
? x86_emulate_insn [kvm]
? x86_emulate_instruction [kvm]
? vmx_handle_exit [kvm_intel]
? kvm_arch_vcpu_ioctl_run [kvm]
? kvm_vcpu_ioctl [kvm]
? __x64_sys_ioctl
? do_syscall_64
? entry_SYSCALL_64_after_hwframe
</TASK>
Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fix from Arnd Bergmann:
"Here is one last regression fix for 5.17, reverting a patch that went
into 5.16 as a cleanup that ended up breaking external interrupts on
Layerscape chips.
The revert makes it work again, but also reintroduces a build time
warning about the nonstandard DT binding that will have to be dealt
with in the future"
* tag 'soc-fixes-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address cells"
|
|
The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Already available through the flowtable object, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if we can elide the load. Cancel if the new candidate
isn't identical to previous store.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.
In most cases, this is expected to cancel tracking as re-use is
unlikely.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
tunnel expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information. This patch does not perform bitwise tracking.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
xfrm expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
socket expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update the
register tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The immediate expression might clobber existing data on the registers,
cancel register tracking for the destination register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
osf expression performs. Always cancel register tracking for jhash since
this requires tracking multiple source registers in case of
concatenations. Perform register tracking (without bitwise) for symhash
since input does not come from source register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Allow to recycle the previous output of the OS fingerprint expression
if flags and ttl are the same.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Random and increment are stateful, each invocation results in fresh output.
Cancel register tracking for these two expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
its enough to export the meta get reduce helper and then call it
from nft_meta_bridge too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In most cases, nft_lookup will be read-only, i.e. won't clobber
registers. In case of map, we need to cancel the registers that will
see stores.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this ct
expression performs. This allows to skip this redundant operation. If
the destination contains a different selector, update the register
tracking information.
Export nft_expr_reduce_bitwise as a symbol since nft_ct might be
compiled as a module.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.
This patch adds three new helper functions:
- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.
Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.
This patch updates the following expressions:
- meta_bridge
- bitwise
- byteorder
- meta
- payload
to use these helper functions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.
This patch re-enables register tracking which was disabled in ed5f85d42290
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The function sets the pernet boolean to avoid the spurious warning from
nf_ct_lookup_helper() when assigning conntrack helpers via nftables.
Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small(ish) fixes, both in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Finish scsi_cmnd before dropping the spinlock
scsi: mpt3sas: Page fault in reply q processing
|
|
as of commit 4608fdfc07e1
("netfilter: conntrack: collect all entries in one cycle")
conntrack gc was changed to run every 2 minutes.
On systems where conntrack hash table is set to large value, most evictions
happen from gc worker rather than the packet path due to hash table
distribution.
This causes netlink event overflows when events are collected.
This change collects average expiry of scanned entries and
reschedules to the average remaining value, within 1 to 60 second interval.
To avoid event overflows, reschedule after each bucket and add a
limit for both run time and number of evictions per run.
If more entries have to be evicted, reschedule and restart 1 jiffy
into the future.
Reported-by: Karel Rericha <karel@maxtel.cz>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Clean up some cruft accumulated over time:
- The default definition of CONFIG_ZBOOT_ROM_* got fixed in commit
39c3e304567a ("ARM: 8984/1: Kconfig: set default ZBOOT_ROM_TEXT/BSS
value to 0x0"), so we don't need the explicit setting anymore.
- CONFIG_ABX500_CORE now explicitly depends on the ARMv7 ARCH_U8500,
so we don't need to disable that symbol explicitly anymore.
- CONFIG_DEBUG_FS was just moved around in the generated defconfig.
No change to the generated .config or savedefconfig.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220317183043.948432-5-andre.przywara@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Commit 91185d55b32e ("drm: Remove DRM_KMS_FB_HELPER Kconfig option")
led to de-selection of CONFIG_FB, which was a prerequisite for
BACKLIGHT_CLASS_DEVICE, which CONFIG_DRM_PANEL_SIMPLE depended on.
Explicitly set CONFIG_FB, to bring DRM_PANEL_SIMPLE, DRM_PANEL_EDP,
FB_IMX and FB_ATMEL back into the generated .config.
This also adds some new FB related features like fonts and the
framebuffer console.
See also commit 8c1768967e27 ("ARM: config: mutli v7: Reenable FB
dependency"), which solved the same problem for multi_v7_defconfig.
This relies on [1], to fix a broken Kconfig dependency.
[1] https://lore.kernel.org/dri-devel/20220315084559.23510-1-tzimmermann@suse.de/raw
Fixes: 91185d55b32e ("drm: Remove DRM_KMS_FB_HELPER Kconfig option")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220317183043.948432-4-andre.przywara@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Commit 06b93644f4d1 ("media: Kconfig: add an option to filter in/out
platform drivers") introduced CONFIG_MEDIA_PLATFORM_SUPPORT, to allow
more fine grained control over the inclusion of certain Kconfig files.
multi_v5_defconfig was selecting some drivers described in
drivers/media/platform/Kconfig, which now wasn't included anymore.
Explicitly set the new symbol in multi_v5_defconfig to bring those
drivers back.
This enables some new V4L2 and VIDEOBUF2 features, but as modules only.
Fixes: 06b93644f4d1 ("media: Kconfig: add an option to filter in/out platform drivers")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220317183043.948432-3-andre.przywara@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Some ARMv5 platforms got removed from the kernel over time, so we don't
need their symbols in the multi_v5_defconfig anymore:
- ARCH_U300 got removed with commit ce1380c9f4bc ("ARM: remove u300
platform"), so CONFIG_ARCH_U300 is not around anymore.
- ARCH_U300 was the only platform selecting ARCH_AMBA, which I2C_NOMADIK
depends on. So we won't need this symbol anymore, and can't select it
anyway.
- i.MX27 got converted to DT in 879c0e5e0ac7 ("ARM: imx: Remove i.MX27
board files"). Remove the now obsolete board file version symbols.
No change in the generated .config or savedefconfig.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220317183043.948432-2-andre.przywara@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The intel,sysmgr-syscon in EDAC/memory controller node is not a
recognized and documented property, so drop it to fix error:
sdr_edac@f87f8000: 'intel,sysmgr-syscon' does not match any of the regexes: 'pinctrl-[0-9]+'
Align also the node name with Devicetree specification (generic, not
specific, and EDAC is purely Linux term).
Fixes: ef82c9be844f ("arm64: dts: n5x: add sdr edac support")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20220318121044.108750-1-krzysztof.kozlowski@canonical.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220318103729.157574-21-Julia.Lawall@inria.fr'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Compiler plugins can be built starting with xtensa gcc 12. Enable plugin
support for xtensa when gcc-12 or newer is used.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Don't use numeric labels for complex branching logic. Mark each branch
with named local label and use them. Rearrange exit back to kernel mode
to avoid conditional label definition.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
NMI exit path to userspace should neither check TIF_DB_DISABLED nor call
check_tlb_sanity because NMI shouldn't touch anything related to
userspace. Drop kernel/userspace check in NMI exit path.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
xtensa currently has two different definitions for stack alignment.
Replace it with single definition usable in both C and assembly.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Align it with helpers like bpf_find_btf_id, so all functions returning
BTF in out parameter follow the same rule of raising reference
consistently, regardless of module or vmlinux BTF.
Adjust existing callers to handle the change accordinly.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220317115957.3193097-10-memxor@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Avoid iterating empty evlist, fixing a segfault with 'perf stat --null'
- Ignore case in topdown.slots check, fixing issue with Intel Icelake
JSON metrics.
- Fix symbol size calculation condition for fixing up corner case
symbol end address obtained from Kallsyms.
* tag 'perf-tools-fixes-for-v5.17-2022-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf parse-events: Ignore case in topdown.slots check
perf evlist: Avoid iteration for empty evlist.
perf symbols: Fix symbol size calculation condition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fix from Greg KH:
"Here is a single driver fix for 5.17-final that has been submitted
many times but I somehow missed it in my patch queue:
- fix for counter sysfs code for reported problem
This has been in linux-next all week with no reported issues"
* tag 'char-misc-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
counter: Stop using dev_get_drvdata() to get the counter device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small remaining USB fixes for 5.17-final.
They include:
- two USB gadget driver fixes for reported problems
- usbtmc driver fix for syzbot found issues
- musb patch partial revert to resolve a reported regression.
All of these have been in linux-next this week with no reported
problems"
* tag 'usb-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
usb: usbtmc: Fix bug in pipe direction for control transfers
partially Revert "usb: musb: Set the DT node on the child device"
usb: gadget: rndis: prevent integer overflow in rndis_set_response()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-03-18
1) XDP multi buffer support
This series enables XDP on non-linear legacy RQ in multi buffer mode.
When XDP is enabled, fragmentation scheme on non-linear legacy RQ is
adjusted to comply to limitations of XDP multi buffer (fragments of the
same size). DMA addresses of fragments are stored in struct page for the
completion handler to be able to unmap them. XDP_TX is supported.
XDP_REDIRECT is not yet supported, the XDP core blocks it for multi
buffer packets at the moment.
2) Trivial cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-03-19
1) Delete duplicated functions that calls same xfrm_api_check.
From Leon Romanovsky.
2) Align userland API of the default policy structure to the
internal structures. From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a netlink message is received, netlink_recvmsg() fills in the address
of the sender. One of the fields is the 32-bit bitfield nl_groups, which
carries the multicast group on which the message was received. The least
significant bit corresponds to group 1, and therefore the highest group
that the field can represent is 32. Above that, the UB sanitizer flags the
out-of-bounds shift attempts.
Which bits end up being set in such case is implementation defined, but
it's either going to be a wrong non-zero value, or zero, which is at least
not misleading. Make the latter choice deterministic by always setting to 0
for higher-numbered multicast groups.
To get information about membership in groups >= 32, userspace is expected
to use nl_pktinfo control messages[0], which are enabled by NETLINK_PKTINFO
socket option.
[0] https://lwn.net/Articles/147608/
The way to trigger this issue is e.g. through monitoring the BRVLAN group:
# bridge monitor vlan &
# ip link add name br type bridge
Which produces the following citation:
UBSAN: shift-out-of-bounds in net/netlink/af_netlink.c:162:19
shift exponent 32 is too large for 32-bit type 'int'
Fixes: f7fa9b10edbb ("[NETLINK]: Support dynamic number of multicast groups per netlink family")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2bef6aabf201d1fc16cca139a744700cff9dcb04.1647527635.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This code is fine, but it's easier to review if we use snprintf()
instead of sprintf().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220318074723.GA6617@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The skb will be checked in kfree_skb(), so remove the outside check.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20220318072728.2659578-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-03-17
This series contains updates to i40e and igb drivers.
Tom Rix moves a conversion to little endian to occur only when the
value is used for i40e. He also zeros out a structure to resolve
possible use of garbage value for igb as reported by clang.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igb: zero hwtstamp by default
i40e: little endian only valid checksums
====================
Link: https://lore.kernel.org/r/20220317160236.3534321-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cifs_demultiplexer_thread should only call cifs_reconnect.
If any other thread wants to trigger a reconnect, they can do
so by updating the server tcpStatus to CifsNeedReconnect.
The last patch attempted to use the same helper function for
both types of threads, but that causes other issues
with lock dependencies.
This patch creates a new helper for non-cifsd threads, that
will indicate to cifsd that the server needs reconnect.
Fixes: 2a05137a0575 ("cifs: mark sessions for reconnection in helper function")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Remove the spinlock around the tree traversal as we are calling possibly
sleeping functions.
We do not need a spinlock here as there will be no modifications to this
tree at this point.
This prevents warnings like this to occur in dmesg:
[ 653.774996] BUG: sleeping function called from invalid context at kernel/loc\
king/mutex.c:280
[ 653.775088] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1827, nam\
e: umount
[ 653.775152] preempt_count: 1, expected: 0
[ 653.775191] CPU: 0 PID: 1827 Comm: umount Tainted: G W OE 5.17.0\
-rc7-00006-g4eb628dd74df #135
[ 653.775195] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-\
1.fc33 04/01/2014
[ 653.775197] Call Trace:
[ 653.775199] <TASK>
[ 653.775202] dump_stack_lvl+0x34/0x44
[ 653.775209] __might_resched.cold+0x13f/0x172
[ 653.775213] mutex_lock+0x75/0xf0
[ 653.775217] ? __mutex_lock_slowpath+0x10/0x10
[ 653.775220] ? _raw_write_lock_irq+0xd0/0xd0
[ 653.775224] ? dput+0x6b/0x360
[ 653.775228] cifs_kill_sb+0xff/0x1d0 [cifs]
[ 653.775285] deactivate_locked_super+0x85/0x130
[ 653.775289] cleanup_mnt+0x32c/0x4d0
[ 653.775292] ? path_umount+0x228/0x380
[ 653.775296] task_work_run+0xd8/0x180
[ 653.775301] exit_to_user_mode_loop+0x152/0x160
[ 653.775306] exit_to_user_mode_prepare+0x89/0xd0
[ 653.775315] syscall_exit_to_user_mode+0x12/0x30
[ 653.775322] do_syscall_64+0x48/0x90
[ 653.775326] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 187af6e98b44e5d8f25e1d41a92db138eb54416f ("cifs: fix handlecache and multiuser")
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When session gets reconnected during mount then read size in super block fs context
gets set to zero and after negotiate, rsize is not modified which results in
incorrect read with requested bytes as zero. Fixes intermittent failure
of xfstest generic/240
Note that stable requires a different version of this patch which will be
sent to the stable mailing list.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
RHBZ:1997367
When we collapse a range in smb3_collapse_range() we must make sure
we update the inode size and pagecache accordingly.
If not, both inode size and pagecahce may be stale until it is refreshed.
This can be demonstrated for the inode size by running :
xfs_io -i -f -c "truncate 320k" -c "fcollapse 64k 128k" -c "fiemap -v" \
/mnt/testfile
where we can see the result of stale data in the fiemap output.
The third line of the output is wrong, all this data should be truncated.
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: hole 128
1: [128..383]: 128..383 256 0x1
2: [384..639]: hole 256
And the correct output, when the inode size has been updated correctly should
look like this:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: hole 128
1: [128..383]: 128..383 256 0x1
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In multiuser each individual user has their own tcon structure for the
share and thus their own handle for a cached directory.
When we umount such a share we much make sure to release the pinned down dentry
for each such tcon and not just the master tcon.
Otherwise we will get nasty warnings on umount that dentries are still in use:
[ 3459.590047] BUG: Dentry 00000000115c6f41{i=12000000019d95,n=/} still in use\
(2) [unmount of cifs cifs]
...
[ 3459.590492] Call Trace:
[ 3459.590500] d_walk+0x61/0x2a0
[ 3459.590518] ? shrink_lock_dentry.part.0+0xe0/0xe0
[ 3459.590526] shrink_dcache_for_umount+0x49/0x110
[ 3459.590535] generic_shutdown_super+0x1a/0x110
[ 3459.590542] kill_anon_super+0x14/0x30
[ 3459.590549] cifs_kill_sb+0xf5/0x104 [cifs]
[ 3459.590773] deactivate_locked_super+0x36/0xa0
[ 3459.590782] cleanup_mnt+0x131/0x190
[ 3459.590789] task_work_run+0x5c/0x90
[ 3459.590798] exit_to_user_mode_loop+0x151/0x160
[ 3459.590809] exit_to_user_mode_prepare+0x83/0xd0
[ 3459.590818] syscall_exit_to_user_mode+0x12/0x30
[ 3459.590828] do_syscall_64+0x48/0x90
[ 3459.590833] entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|