Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Remove unused ftrace_direct_funcs variables
- Fix a possible NULL pointer dereference race in eventfs
- Update do_div() usage in trace event benchmark test
- Speedup direct function registration with asynchronous RCU callback.
The synchronization was done in the registration code and this caused
delays when registering direct callbacks. Move the freeing to a
call_rcu() that will prevent delaying of the registering.
- Replace simple_strtoul() usage with kstrtoul()
* tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix a possible null pointer dereference in eventfs_find_events()
ftrace: Fix possible use-after-free issue in ftrace_location()
ftrace: Remove unused global 'ftrace_direct_func_count'
ftrace: Remove unused list 'ftrace_direct_funcs'
tracing: Improve benchmark test performance by using do_div()
ftrace: Use asynchronous grace period for register_ftrace_direct()
ftrace: Replaces simple_strtoul in ftrace
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- tracing/probes: Add new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'
- uprobes performance optimizations:
- Speed up the BPF uprobe event by delaying the fetching of the
uprobe event arguments that are not used in BPF
- Avoid locking by speculatively checking whether uprobe event is
valid
- Reduce lock contention by using read/write_lock instead of
spinlock for uprobe list operation. This improved BPF uprobe
benchmark result 43% on average
- rethook: Remove non-fatal warning messages when tracing stack from
BPF and skip rcu_is_watching() validation in rethook if possible
- objpool: Optimize objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching
nr_cpu_ids because it is a const value
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace
* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobe/ftrace: bail out if ftrace was killed
selftests/ftrace: Fix required features for VFS type test case
objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
objpool: enable inlining objpool_push() and objpool_pop() operations
rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
ftrace: make extra rcu_is_watching() validation check optional
uprobes: reduce contention on uprobes_tree access
rethook: Remove warning messages printed for finding return address of a frame.
fprobe: Add entry/exit callbacks types
selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
Documentation: tracing: add new type '%pd' and '%pD' for kprobe
tracing/probes: support '%pD' type for print struct file's name
tracing/probes: support '%pd' type for print struct dentry's name
uprobes: add speculative lockless system-wide uprobe filter check
uprobes: prepare uprobe args buffer lazily
uprobes: encapsulate preparation of uprobe args buffer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- Do not put unneeded quotes on the extra command line items which was
inserted from the bootconfig.
- Remove redundant spaces from the extra command line.
* tag 'bootconfig-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
init/main.c: Minor cleanup for the setup_command_line() function
init/main.c: Remove redundant space from saved_command_line
bootconfig: do not put quotes on cmdline items unless necessary
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove sentinel elements from ctl_table structs in kernel/*
Removing sentinels in ctl_table arrays reduces the build time size
and runtime memory consumed by ~64 bytes per array. Removals for
net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
through their respective subsystems making the next release the most
likely place where the final series that removes the check for
proc_name == NULL will land.
This adds to removals already in arch/, drivers/ and fs/.
- Adjust ctl_table definitions and references to allow constification
- Remove unused ctl_table function arguments
- Move non-const elements from ctl_table to ctl_table_header
- Make ctl_table pointers const in ctl_table_root structure
Making the static ctl_table structs const will increase safety by
keeping the pointers to proc_handler functions in .rodata. Though no
ctl_tables where made const in this PR, the ground work for making
that possible has started with these changes sent by Thomas
Weißschuh.
* tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: drop now unnecessary out-of-bounds check
sysctl: move sysctl type to ctl_table_header
sysctl: drop sysctl_is_perm_empty_ctl_table
sysctl: treewide: constify argument ctl_table_root::permissions(table)
sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
bpf: Remove the now superfluous sentinel elements from ctl_table array
delayacct: Remove the now superfluous sentinel elements from ctl_table array
kprobes: Remove the now superfluous sentinel elements from ctl_table array
printk: Remove the now superfluous sentinel elements from ctl_table array
scheduler: Remove the now superfluous sentinel elements from ctl_table array
seccomp: Remove the now superfluous sentinel elements from ctl_table array
timekeeping: Remove the now superfluous sentinel elements from ctl_table array
ftrace: Remove the now superfluous sentinel elements from ctl_table array
umh: Remove the now superfluous sentinel elements from ctl_table array
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT Bindings:
- Convert samsung,exynos5-dp, atmel,lcdc, aspeed,ast2400-wdt bindings
to schemas
- Add bindings for Allwinner H616 NMI controller, Renesas r8a779g0
irqc, Renesas R-Car V4M TMU and CMT timers, Freescale S32G3
linflexuart, and Mediatek MT7988 XHCI
- Add 'reg' constraints on DSI and SPI display panels
- More dropping of unnecessary quotes in schemas
- Use full paths rather than relative paths in schema $refs
- Drop redundant storing of phandle for reserved memory
DT Core:
- Use scope based cleanups for kfree() and of_node_put()
- Track interrupt-map and power-supplies for fw_devlink
- Add buffer overflow check in of_modalias()
- Add and use __of_prop_free() helper for freeing struct property"
* tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (25 commits)
of: property: Add fw_devlink support for interrupt-map property
dt-bindings: display: panel: constrain 'reg' in DSI panels
dt-bindings: display: panel: constrain 'reg' in SPI panels
dt-bindings: display: samsung,ams495qa01: add missing SPI properties ref
dt-bindings: Use full path to other schemas
dt-bindings: PCI: qcom,pcie-sm8350: Drop redundant 'oneOf' sub-schema
of: module: add buffer overflow check in of_modalias()
dt-bindings: PCI: microchip: increase number of items in ranges property
dt-bindings: Drop unnecessary quotes on keys
dt-bindings: interrupt-controller: mediatek,mt6577-sysirq: Drop unnecessary quotes
of: property: Use scope based cleanup on port_node
of: reserved_mem: Remove the use of phandle from the reserved_mem APIs
of: property: fw_devlink: Add support for "power-supplies" binding
dt-bindings: watchdog: aspeed,ast2400-wdt: Convert to DT schema
dt-bindings: irq: sun7i-nmi: Add binding for the H616 NMI controller
dt-bindings: interrupt-controller: renesas,irqc: Add r8a779g0 support
dt-bindings: timer: renesas,tmu: Add R-Car V4M support
dt-bindings: timer: renesas,cmt: Add R-Car V4M support
of: Use scope based of_node_put() cleanups
of: Use scope based kfree() cleanups
...
|
|
Vladimir said when adding this test:
The bridge driver fares particularly badly [...] mainly because
it does not implement IFF_UNICAST_FLT.
See commit 90b9566aa5cd ("selftests: forwarding: add a test for
local_termination.sh").
We don't want to hide the known gaps, but having a test which
always fails prevents us from catching regressions. Report
the cases we know may fail as XFAIL.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240516152513.1115270-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adjust the initialization sequence of KSZ88x3 switches to enable
802.1p priority control on Port 2 before configuring Port 1. This
change ensures the apptrust functionality on Port 1 operates
correctly, as it depends on the priority settings of Port 2. The
prior initialization sequence incorrectly configured Port 1 first,
which could lead to functional discrepancies.
Fixes: a1ea57710c9d ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240517050121.2174412-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove myself as reviewer for TI's ethernet drivers
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20240516082545.6412-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update the list with the current maintainers of TI's CPSW ethernet
peripheral.
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20240516054932.27597-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit a36e185e8c85
("udp: Handle ICMP errors for tunnels with same destination port on both endpoints")
UDP's handling of ICMP errors has allowed for UDP-encap tunnels to
determine socket associations in scenarios where the UDP hash lookup
could not.
Subsequently, commit d26796ae58940
("udp: check udp sock encap_type in __udp_lib_err")
subtly tweaked the approach such that UDP ICMP error handling would be
skipped for any UDP socket which has encapsulation enabled.
In the case of L2TP tunnel sockets using UDP-encap, this latter
modification effectively broke ICMP error reporting for the L2TP
control plane.
To a degree this isn't catastrophic inasmuch as the L2TP control
protocol defines a reliable transport on top of the underlying packet
switching network which will eventually detect errors and time out.
However, paying attention to the ICMP error reporting allows for more
timely detection of errors in L2TP userspace, and aids in debugging
connectivity issues.
Reinstate ICMP error handling for UDP encap L2TP tunnels:
* implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow
the L2TP code to handle ICMP errors;
* only implement error-handling for tunnels which have a managed
socket: unmanaged tunnels using a kernel socket have no userspace to
report errors back to;
* flag the error on the socket, which allows for userspace to get an
error such as -ECONNREFUSED back from sendmsg/recvmsg;
* pass the error into ip[v6]_icmp_error() which allows for userspace to
get extended error information via. MSG_ERRQUEUE.
Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- define sigset_t in parisc uapi header to fix build of util-linux
- define HAVE_ARCH_HUGETLB_UNMAPPED_AREA to avoid compiler warning
- drop unused 'exc_reg' struct in math-emu code
* tag 'parisc-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
parisc/math-emu: Remove unused struct 'exc_reg'
parisc: Define sigset_t in parisc uapi header
|
|
... and be more idiomatic when calculating ->pageofs_in.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425200017.GF1031757@ZenIV
[ Gao Xiang: don't use `offset_in_page(mptr)` due to EROFS_NO_KMAP. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
There's only one place where struct z_erofs_maprecorder ->kaddr is
used not in the same function that has assigned it -
the value read in unpack_compacted_index() gets calculated in
z_erofs_load_compact_lcluster(). With minor massage we can switch
to storing it with offset in block already added.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425195944.GE1031757@ZenIV
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Most of the callers of erofs_read_metabuf() have the following form:
block = erofs_blknr(sb, offset);
off = erofs_blkoff(sb, offset);
p = erofs_read_metabuf(...., erofs_pos(sb, block), ...);
if (IS_ERR(p))
return PTR_ERR(p);
q = p + off;
// no further uses of p, block or off.
The value passed to erofs_read_metabuf() is offset rounded down to block
size, i.e. offset - off. Passing offset as-is would increase the return
value by off in case of success and keep the return value unchanged in
in case of error. In other words, the same could be achieved by
q = erofs_read_metabuf(...., offset, ...);
if (IS_ERR(q))
return PTR_ERR(q);
This commit convert these simple cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425195915.GD1031757@ZenIV
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
just lift the call of erofs_pos() into the callers; it will
collapse in most of them, but that's better done caller-by-caller.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425195846.GC1031757@ZenIV
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Avoid unnecessary #ifdefs and simplify the code a bit.
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240517095652.2282972-1-hongzhen@linux.alibaba.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
Al Viro has a series of "->bd_inode elimination" which touches several
subsystems, but he also has EROFS-specific further cleanup patches
which I tend to go with EROFS tree for more testing.
Let's merge "#misc.erofs" as Al suggested in the previous email [1]:
"#misc.erofs (the first two commits) is put into never-rebased mode;
you pull it into your tree and do whatever's convenient with the rest.
I merge the same branch into block_device work; that way it doesn't
cause conflicts whatever else happens in our trees."
[1] https://lore.kernel.org/r/20240503041542.GV2118490@ZenIV
Signed-off-by: Gao Xiang <xiang@kernel.org>
|
|
Force the max_register of the test regmap to be one register longer
than the number of test registers, to prevent an array overflow in
the test loop.
The test defines num_reg_defaults = 6. With 6 registers and
stride == 2 the valid register addresses would be 0, 2, 4, 6, 8, 10.
However the loop checks attempting to access the odd address, so on
the final register it accesses address 11, and it writes entry [11]
of the read/written arrays.
Originally this worked because the max_register of the regmap was
hardcoded to be BLOCK_TEST_SIZE (== 12).
commit 710915743d53 ("regmap: kunit: Run sparse cache tests at non-zero
register addresses")
introduced the ability to start the test address range from any address,
which means adjusting the max_register. If max_register was not forced,
it was calculated either from num_reg_defaults or BLOCK_TEST_SIZE. This
correctly calculated that with num_reg_defaults == 6 and stride == 2 the
final valid address is 10. So the read/written arrays are allocated to
contain entries [0..10]. When stride attempted to access [11] it was
overflowing the array.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 710915743d53 ("regmap: kunit: Run sparse cache tests at non-zero register addresses")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://msgid.link/r/20240517144703.1200995-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
|
|
Pull ARM updates from Russell King:
- Updates to AMBA bus subsystem to drop .owner struct device_driver
initialisations, moving that to code instead.
- Add LPAE privileged-access-never support
- Add support for Clang CFI
- clkdev: report over-sized device or connection strings
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: (36 commits)
ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y
clkdev: report over-sized strings when creating clkdev entries
ARM: 9393/1: mm: Use conditionals for CFI branches
ARM: 9392/2: Support CLANG CFI
ARM: 9391/2: hw_breakpoint: Handle CFI breakpoints
ARM: 9390/2: lib: Annotate loop delay instructions for CFI
ARM: 9389/2: mm: Define prototypes for all per-processor calls
ARM: 9388/2: mm: Type-annotate all per-processor assembly routines
ARM: 9387/2: mm: Rewrite cacheflush vtables in CFI safe C
ARM: 9386/2: mm: Use symbol alias for cache functions
ARM: 9385/2: mm: Type-annotate all cache assembly routines
ARM: 9384/2: mm: Make tlbflush routines CFI safe
ARM: 9382/1: ftrace: Define ftrace_stub_graph
ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement
ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN
ARM: 9356/2: Move asm statements accessing TTBCR into C functions
ARM: 9355/2: Add TTBCR_* definitions to pgtable-3level-hwdef.h
ARM: 9379/1: coresight: tpda: drop owner assignment
ARM: 9378/1: coresight: etm4x: drop owner assignment
ARM: 9377/1: hwrng: nomadik: drop owner assignment
...
|
|
for_each_sibling_event() checks leader's ctx but it doesn't have the ctx
yet if it's the leader. Like in perf_event_validate_size(), we should
skip checking siblings in that case.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Fixes: f3c0eba28704 ("perf: Add a few assertions")
Reported-by: Greg Thelen <gthelen@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Tuan Phan <tuanphan@os.amperecomputing.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20240514180050.182454-1-namhyung@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit a46c27026da1 ("blk-mq: don't schedule block kworker on isolated CPUs")
rules out isolated CPUs from hctx->cpumask, and hctx->cpumask should only be
used for scheduling kworker.
Add helper blk_mq_cpu_mapped_to_hctx() and apply it into cpuhp handlers.
This patch avoids to forget clearing INACTIVE of hctx state in case that one
isolated CPU becomes online, and fixes hang issue when allocating request
from this hctx's tags.
Cc: Raju Cheerla <rcheerla@redhat.com>
Fixes: a46c27026da1 ("blk-mq: don't schedule block kworker on isolated CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240517020514.149771-1-ming.lei@redhat.com
Tested-by: Raju Cheerla <rcheerla@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This code calls folio_put() on an error pointer which will lead to a
crash. Check for both error pointers and NULL pointers before calling
folio_put().
Fixes: 5eea586b47f0 ("ext4: convert bd_buddy_page to bd_buddy_folio")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/eaafa1d9-a61c-4af4-9f97-d3ad72c60200@moroto.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The legacy decompressor has elaborate logic to ensure that the
randomized physical placement of the decompressed kernel image does not
conflict with any memory reservations, including ones specified on the
command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are
taken into account by the kernel proper at a later stage.
When booting in EFI mode, it is the firmware's job to ensure that the
chosen range does not conflict with any memory reservations that it
knows about, and this is trivially achieved by using the firmware's
memory allocation APIs.
That leaves reservations specified on the command line, though, which
the firmware knows nothing about, as these regions have no other special
significance to the platform. Since commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
these reservations are not taken into account when randomizing the
physical placement, which may result in conflicts where the memory
cannot be reserved by the kernel proper because its own executable image
resides there.
To avoid having to duplicate or reuse the existing complicated logic,
disable physical KASLR entirely when such overrides are specified. These
are mostly diagnostic tools or niche features, and physical KASLR (as
opposed to virtual KASLR, which is much more important as it affects the
memory addresses observed by code executing in the kernel) is something
we can live without.
Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com
Reported-by: Ben Chaney <bchaney@akamai.com>
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc: <stable@vger.kernel.org> # v6.1+
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Yi notes relative to commit f6944d4a0b87 ("vfio/pci: Collect hot-reset
devices to local buffer") that we previously tested the resulting
device count with a WARN_ON, which was removed when we switched to
the in-loop user copy in commit b56b7aabcf3c ("vfio/pci: Copy hot-reset
device info to userspace in the devices loop"). Finding no devices in
the bus/slot would be an unexpected condition, so let's restore the
warning and trigger a -ERANGE error here as success with no devices
would be an unexpected result to userspace as well.
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20240516174831.2257970-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
This reverts commit 2632e25217696712681dd1f3ecc0d71624ea3b23.
Johannes (and others) report data corruption with dm-crypt on Apple M1
which has been bisected to this change. Revert the offending commit
while we figure out what's going on.
Cc: stable@vger.kernel.org
Reported-by: Johannes Nixdorf <mixi@shadowice.org>
Link: https://lore.kernel.org/all/D1B7GPIR9K1E.5JFV37G0YTIF@shadowice.org/
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp
Merge OPP updates for v6.10 from Viresh Kumar:
"- Fix required_opp_tables for multiple genpds using same table (Viresh
Kumar)."
* tag 'opp-updates-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
OPP: Fix required_opp_tables for multiple genpds using same table
|
|
Prevent ecc_digits_from_bytes from reading too many bytes from the input
byte array in case an insufficient number of bytes is provided to fill the
output digit array of ndigits. Therefore, initialize the most significant
digits with 0 to avoid trying to read too many bytes later on. Convert the
function into a regular function since it is getting too big for an inline
function.
If too many bytes are provided on the input byte array the extra bytes
are ignored since the input variable 'ndigits' limits the number of digits
that will be filled.
Fixes: d67c96fb97b5 ("crypto: ecdsa - Convert byte arrays with key coordinates to digits")
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Using completion_done to determine whether the caller has gone
away only works after a complete call. Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.
Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.
Fixes: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery")
Cc: <stable@vger.kernel.org> #6.8+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications
at init time if needed") overlooked the case when a trip point that
has started as invalid is set to a valid temperature later. Namely,
the initial threshold value for all trips is zero, so if a previously
invalid trip becomes valid and its (new) low temperature is above the
zone temperature, a spurious trip crossing notification will occur and
it may trigger the WARN_ON() in handle_thermal_trip().
To address this, set the initial threshold for all trips to INT_MAX.
There is also the case when a valid writable trip becomes invalid that
requires special handling. First, in accordance with the change
mentioned above, the trip's threshold needs to be set to INT_MAX to
avoid the same issue. Second, if the trip in question is passive and
it has been crossed by the thermal zone temperature on the way up, the
zone's passive count has been incremented and it is in the passive
polling mode, so its passive count needs to be adjusted to allow the
passive polling to be turned off eventually.
Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core")
Reported-by: Zhang Rui <zhang.rui@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
|
|
The required_opp_tables parsing is not perfect, as the OPP core does the
parsing solely based on the DT node pointers.
The core sets the required_opp_tables entry to the first OPP table in
the "opp_tables" list, that matches with the node pointer.
If the target DT OPP table is used by multiple devices and they all
create separate instances of 'struct opp_table' from it, then it is
possible that the required_opp_tables entry may be set to the incorrect
sibling device.
Unfortunately, there is no clear way to initialize the right values
during the initial parsing and we need to do this at a later point of
time.
Cross check the OPP table again while the genpds are attached and fix
them if required.
Also add a new API for the genpd core to fetch the device pointer for
the genpd.
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Reported-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218682
Co-developed-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
*-objs suffix is reserved rather for (user-space) host programs while
usually *-y suffix is used for kernel drivers (although *-objs works
for that purpose for now).
Let's correct the old usages of *-objs in Makefiles.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240508152658.1445809-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Commit 262fc47ac174 ('xen/balloon: don't use PV mode extra memory for zone
device allocations') removed the addition of the extra memory ranges to the
unpopulated range allocator, using those only for the balloon driver.
This forces the unpopulated allocator to attach hotplug ranges even when spare
memory (as part of the extra memory ranges) is available. Furthermore, on PVH
domains it defeats the purpose of commit 38620fc4e893 ('x86/xen: attempt to
inflate the memory balloon on PVH'), as extra memory ranges would only be
used to map foreign memory if the kernel is built without XEN_UNPOPULATED_ALLOC
support.
Fix this by adding a helpers that adds the extra memory ranges to the list of
unpopulated pages, and zeroes the ranges so they are not also consumed by the
balloon driver.
This should have been part of 38620fc4e893, hence the fixes tag.
Note the current logic relies on unpopulated_init() (and hence
arch_xen_unpopulated_init()) always being called ahead of balloon_init(), so
that the extra memory regions are consumed by arch_xen_unpopulated_init().
Fixes: 38620fc4e893 ('x86/xen: attempt to inflate the memory balloon on PVH')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240429155053.72509-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Use try_cmpxchg() instead of cmpxchg(*ptr, old, new) == old.
The x86 CMPXCHG instruction returns success in the ZF flag,
so this change saves a compare after CMPXCHG.
Also, try_cmpxchg() implicitly assigns old *ptr value to "old"
when CMPXCHG fails. There is no need to explicitly assign
old *ptr value to the temporary, which can simplify the
surrounding source code.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20240405083335.507471-1-ubizjak@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Jiawen Wu says:
====================
Wangxun fixes
Fixed some bugs when using ethtool to operate network devices.
v4 -> v5:
- Simplify if...else... to fix features.
v3 -> v4:
- Require both ctag and stag to be enabled or disabled.
v2 -> v3:
- Drop the first patch.
v1 -> v2:
- Factor out the same code.
- Remove statistics printing with more than 64 queues.
- Detail the commit logs to describe issues.
- Remove reset flag check in wx_update_stats().
- Change to set VLAN CTAG and STAG to be consistent.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When VLAN tag strip is changed to enable or disable, the hardware requires
the Rx ring to be in a disabled state, otherwise the feature cannot be
changed.
Fixes: f3b03c655f67 ("net: wangxun: Implement vlan add and kill functions")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hardware requires VLAN CTAG and STAG configuration always matches. And
whether VLAN CTAG or STAG changes, the configuration needs to be changed
as well.
Fixes: 6670f1ece2c8 ("net: txgbe: Add netdev features support")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the issue where some Rx features cannot be changed.
When using ethtool -K to turn off rx offload, it returns error and
displays "Could not change any device features". And netdev->features
is not assigned a new value to actually configure the hardware.
Fixes: 6dbedcffcf54 ("net: libwx: Implement xx_set_features ops")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sparse warns about a large memset() call within
zcrypt_device_status_mask_ext():
drivers/s390/crypto/zcrypt_api.c:1303:15: warning: memset with byte count of 262144
Get rid of this warning by making sure that all callers of this function
allocate memory with __GFP_ZERO, which zeroes memory already at allocation
time, which again allows to remove the memset() call.
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
cpu_max_write()
In the cgroup v2 CPU subsystem, assuming we have a
cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max
# echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max
1000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst
1000000
Next we set cpu.max again and check cpu.max and
cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max
# cat /sys/fs/cgroup/test/cpu.max
2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst
1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned
by tg_get_cfs_burst() is microseconds, while in cpu_max_write(),
the burst unit used for calculation should be nanoseconds,
which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller")
Reported-by: Qixin Liao <liaoqixin@huawei.com>
Signed-off-by: Cheng Yu <serein.chengyu@huawei.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com
|
|
On 05/03/2024 15:05, Vincent Guittot wrote:
I'm fine with either and that was my first thought here, too, but it did seem like
the comment was mostly placed there to justify the 'unexpected' high utilization
when explicitly passing FREQUENCY_UTIL and the need to clamp it then.
So removing did feel slightly more natural to me anyway.
So alternatively:
From: Christian Loehle <christian.loehle@arm.com>
Date: Tue, 5 Mar 2024 09:34:41 +0000
Subject: [PATCH] sched/fair: Remove stale FREQUENCY_UTIL mention
effective_cpu_util() flags were removed, so remove mentioning of the
flag.
commit 9c0b4bb7f6303 ("sched/cpufreq: Rework schedutil governor performance estimation")
reworked effective_cpu_util() removing enum cpu_util_type. Modify the
comment accordingly.
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/0e2833ee-0939-44e0-82a2-520a585a0153@arm.com
|
|
Change se->load.weight to se_weight(se) in the calculation for the
initial util_avg to avoid unnecessarily inflating the util_avg by 1024
times.
The reason is that se->load.weight has the unit/scale as the scaled-up
load, while cfs_rg->avg.load_avg has the unit/scale as the true task
weight (as mapped directly from the task's nice/priority value). With
CONFIG_32BIT, the scaled-up load is equal to the true task weight. With
CONFIG_64BIT, the scaled-up load is 1024 times the true task weight.
Thus, the current code may inflate the util_avg by 1024 times. The
follow-up capping will not allow the util_avg value to go wild. But the
calculation should have the correct logic.
Signed-off-by: Dawei Li <daweilics@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Link: https://lore.kernel.org/r/20240315015916.21545-1-daweilics@gmail.com
|
|
Add a clarification that domain levels are system-specific
and where to check for system details.
Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/42b177a2e897cdf880caf9c2025f5b609e820334.1714488502.git.vitaly@bursov.com
|
|
Knowing domain's level exactly can be useful when setting
relax_domain_level or cpuset.sched_relax_domain_level
Usage:
cat /debug/sched/domains/cpu0/domain1/level
to dump cpu0 domain1's level.
SDM macro is not used because sd->level is 'int' and
it would hide the type mismatch between 'int' and 'u32'.
Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/9489b6475f6dd6fbc67c617752d4216fa094da53.1714488502.git.vitaly@bursov.com
|
|
Change relax_domain_level checks so that it would be possible
to include or exclude all domains from newidle balancing.
This matches the behavior described in the documentation:
-1 no request. use system default or follow request of others.
0 no search.
1 search siblings (hyperthreads in a core).
"2" enables levels 0 and 1, level_max excludes the last (level_max)
level, and level_max+1 includes all levels.
Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core")
Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com
|
|
Commit in Fixes moved the optimize_nops() call inside apply_relocation()
and made it a second optimization pass after the relocations have been
done.
Since optimize_nops() works only on NOPs, that is fine and it'll simply
jump over instructions which are not NOPs.
However, it made that call with repl_len as the buffer length to
optimize.
However, it can happen that there are alternatives calls like this one:
alternative("mfence; lfence", "", ALT_NOT(X86_FEATURE_APIC_MSRS_FENCE));
where the replacement length is 0. And using repl_len is wrong because
apply_alternatives() expands the buffer size to the length of the source
insn that is being patched, by padding it with one-byte NOPs:
for (; insn_buff_sz < a->instrlen; insn_buff_sz++)
insn_buff[insn_buff_sz] = 0x90;
Long story short: pass the length of the original instruction(s) as the
length of the temporary buffer which to optimize.
Result:
SMP alternatives: feat: 11*32+27, old: (lapic_next_deadline+0x9/0x50 (ffffffff81061829) len: 6), repl: (ffffffff89b1cc60, len: 0) flags: 0x1
SMP alternatives: ffffffff81061829: old_insn: 0f ae f0 0f ae e8
SMP alternatives: ffffffff81061829: final_insn: 90 90 90 90 90 90
=>
SMP alternatives: feat: 11*32+27, old: (lapic_next_deadline+0x9/0x50 (ffffffff81061839) len: 6), repl: (ffffffff89b1cc60, len: 0) flags: 0x1
SMP alternatives: ffffffff81061839: [0:6) optimized NOPs: 66 0f 1f 44 00 00
SMP alternatives: ffffffff81061839: old_insn: 0f ae f0 0f ae e8
SMP alternatives: ffffffff81061839: final_insn: 66 0f 1f 44 00 00
Fixes: da8f9cf7e721 ("x86/alternatives: Get rid of __optimize_nops()")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240515104804.32004-1-bp@kernel.org
|
|
After enabling -Wimplicit-fallthrough for the x86 boot code, clang
warns:
arch/x86/boot/printf.c:257:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
257 | case 'u':
| ^
Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.
Fixes: dd0716c2b877 ("x86/boot: Add a fallthrough annotation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240516-x86-boot-fix-clang-implicit-fallthrough-v1-1-04dc320ca07c@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202405162054.ryP73vy1-lkp@intel.com/
|
|
trafgen performance considerably sank on hosts with many cores
after the blamed commit.
packet_read_pending() is very expensive, and calling it
in af_packet fast path defeats Daniel intent in commit
b013840810c2 ("packet: use percpu mmap tx frame pending refcount")
tpacket_destruct_skb() makes room for one packet, we can immediately
wakeup a producer, no need to completely drain the tx ring.
Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rtnl_lock would stay locked if allocating promisc_allmulti failed.
Also changed the allocation to GFP_KERNEL.
Fixes: ff7c7d9f5261 ("virtio_net: Remove command data from control_buf")
Reported-by: Eric Dumazet <edumaset@google.com>
Link: https://lore.kernel.org/netdev/CANn89iLazVaUCvhPm6RPJJ0owra_oFnx7Fhc8d60gV-65ad3WQ@mail.gmail.com/
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240515163125.569743-1-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1]
Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node)
[1]
WARNING: possible circular locking dependency detected
6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted
------------------------------------------------------
syz-executor350/5129 is trying to acquire lock:
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
but task is already holding lock:
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (nr_node_list_lock){+...}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_remove_node net/netrom/nr_route.c:299 [inline]
nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355
nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&nr_node->node_lock){+...}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_node_lock include/net/netrom.h:152 [inline]
nr_dec_obs net/netrom/nr_route.c:464 [inline]
nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(nr_node_list_lock);
lock(&nr_node->node_lock);
lock(nr_node_list_lock);
lock(&nr_node->node_lock);
*** DEADLOCK ***
1 lock held by syz-executor350/5129:
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
stack backtrace:
CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_node_lock include/net/netrom.h:152 [inline]
nr_dec_obs net/netrom/nr_route.c:464 [inline]
nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|