Age | Commit message (Collapse) | Author |
|
Similar to PSWPOUT, we should count the number of base pages instead of
large folios.
Link: https://lkml.kernel.org/r/20241023210201.2798-1-21cnbao@gmail.com
Fixes: 242d12c98174 ("mm: support large folios swap-in for sync io devices")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
An anonymous large folio can be split into non order-0 folios,
try_to_map_unused_to_zeropage() should not VM_BUG_ON compound pages but
just return false. This fixes the crash when splitting anonymous large
folios to non order-0 folios.
Link: https://lkml.kernel.org/r/20241023171236.1122535-1-ziy@nvidia.com
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When we compile and load lib/slub_kunit.c,it will cause a panic.
The root cause is that __kmalloc_cache_noprof was directly called instead
of kmem_cache_alloc,which resulted in no alloc_tag being allocated.This
caused current->alloc_tag to be null,leading to a null pointer dereference
in alloc_tag_ref_set.
Despite the fact that my colleague Pei Xiao will later fix the code in
slub_kunit.c,we still need fix null pointer check logic for ref and tag to
avoid panic caused by a null pointer dereference.
Here is the log for the panic:
[ 74.779373][ T2158] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[ 74.780130][ T2158] Mem abort info:
[ 74.780406][ T2158] ESR = 0x0000000096000004
[ 74.780756][ T2158] EC = 0x25: DABT (current EL), IL = 32 bits
[ 74.781225][ T2158] SET = 0, FnV = 0
[ 74.781529][ T2158] EA = 0, S1PTW = 0
[ 74.781836][ T2158] FSC = 0x04: level 0 translation fault
[ 74.782288][ T2158] Data abort info:
[ 74.782577][ T2158] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 74.783068][ T2158] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 74.783533][ T2158] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 74.784010][ T2158] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000105f34000
[ 74.784586][ T2158] [0000000000000020] pgd=0000000000000000, p4d=0000000000000000
[ 74.785293][ T2158] Internal error: Oops: 0000000096000004 [#1] SMP
[ 74.785805][ T2158] Modules linked in: slub_kunit kunit ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle 4
[ 74.790661][ T2158] CPU: 0 UID: 0 PID: 2158 Comm: kunit_try_catch Kdump: loaded Tainted: G W N 6.12.0-rc3+ #2
[ 74.791535][ T2158] Tainted: [W]=WARN, [N]=TEST
[ 74.791889][ T2158] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 74.792479][ T2158] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 74.793101][ T2158] pc : alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.793607][ T2158] lr : alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.794095][ T2158] sp : ffff800084d33cd0
[ 74.794418][ T2158] x29: ffff800084d33cd0 x28: 0000000000000000 x27: 0000000000000000
[ 74.795095][ T2158] x26: 0000000000000000 x25: 0000000000000012 x24: ffff80007b30e314
[ 74.795822][ T2158] x23: ffff000390ff6f10 x22: 0000000000000000 x21: 0000000000000088
[ 74.796555][ T2158] x20: ffff000390285840 x19: fffffd7fc3ef7830 x18: ffffffffffffffff
[ 74.797283][ T2158] x17: ffff8000800e63b4 x16: ffff80007b33afc4 x15: ffff800081654c00
[ 74.798011][ T2158] x14: 0000000000000000 x13: 205d383531325420 x12: 5b5d383734363537
[ 74.798744][ T2158] x11: ffff800084d337e0 x10: 000000000000005d x9 : 00000000ffffffd0
[ 74.799476][ T2158] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008219d188 x6 : c0000000ffff7fff
[ 74.800206][ T2158] x5 : ffff0003fdbc9208 x4 : ffff800081edd188 x3 : 0000000000000001
[ 74.800932][ T2158] x2 : 0beaa6dee1ac5a00 x1 : 0beaa6dee1ac5a00 x0 : ffff80037c2cb000
[ 74.801656][ T2158] Call trace:
[ 74.801954][ T2158] alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.802494][ T2158] __kmalloc_cache_noprof+0x148/0x33c
[ 74.802976][ T2158] test_kmalloc_redzone_access+0x4c/0x104 [slub_kunit]
[ 74.803607][ T2158] kunit_try_run_case+0x70/0x17c [kunit]
[ 74.804124][ T2158] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[ 74.804768][ T2158] kthread+0x10c/0x118
[ 74.805141][ T2158] ret_from_fork+0x10/0x20
[ 74.805540][ T2158] Code: b9400a80 11000400 b9000a80 97ffd858 (f94012d3)
[ 74.806176][ T2158] SMP: stopping secondary CPUs
[ 74.808130][ T2158] Starting crashdump kernel...
Link: https://lkml.kernel.org/r/20241020070819.307944-1-hao.ge@linux.dev
Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) family
of functions, and requests "too many" pages, then the call will
erroneously leave pages pinned. This is visible in user space as an
actual memory leak.
Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls to
exhaust memory.
The root cause of the problem is this sequence, within
__gup_longterm_locked():
__get_user_pages_locked()
rc = check_and_migrate_movable_pages()
...which gets retried in a loop. The loop error handling is incomplete,
clearly due to a somewhat unusual and complicated tri-state error API.
But anyway, if -ENOMEM, or in fact, any unexpected error is returned from
check_and_migrate_movable_pages(), then __gup_longterm_locked() happily
returns the error, while leaving the pages pinned.
In the failed case, which is an app that requests (via a device driver)
30720000000 bytes to be pinned, and then exits, I see this:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 2048
And after applying this patch, it returns to balanced pins:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 7502048
Note that the child routine, check_and_migrate_movable_folios(), avoids
this problem, by unpinning any folios in the **folios argument, before
returning an error.
Fix this by making check_and_migrate_movable_pages() behave in exactly the
same way as check_and_migrate_movable_folios(): unpin all pages in
**pages, before returning an error.
Also, documentation was an aggravating factor, so:
1) Consolidate the documentation for these two routines, now that they
have identical external behavior.
2) Rewrite the consolidated documentation:
a) Clearly list the three return code cases, and what happens in
each case.
b) Mention that one of the cases unpins the pages or folios, before
returning an error code.
Link: https://lkml.kernel.org/r/20241018223411.310331-1-jhubbard@nvidia.com
Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Shigeru Yoshida <syoshida@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
linux-firmware commit 808cba84 ("mtk_wed: add firmware for mt7988
Wireless Ethernet Dispatcher") added mt7988_wo_{0,1}.bin in the
'mediatek/mt7988' directory while driver current expects the files in
the 'mediatek' directory.
Change path in the driver header now that the firmware has been added.
Fixes: e2f64db13aa1 ("net: ethernet: mtk_wed: introduce WED support for MT7988")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/Zxz0GWTR5X5LdWPe@pidgin.makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Petr Machata says:
====================
mlxsw: Fixes
In this patchset:
- Tx header should be pushed for each packet which is transmitted via
Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
sure that there is both enough room to push the Tx header and that the
SKB header is not cloned and can be modified.
- Commit b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers
allocation") converted mlxsw to use page pool for Rx buffers allocation.
Sync for CPU and for device should be done for Rx pages. In patches #2
and #3, add the missing calls to sync pages for, respectively, CPU and
the device.
- Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds
a generic forwarding test that fails with mlxsw ports prior to the fix.
====================
Link: https://patch.msgid.link/cover.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Test that after changing the remote address of an ip6gre net device
traffic is forwarded as expected. Test with both flat and hierarchical
topologies and with and without an input / output keys.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.
Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.
Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.
[1]
# ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
# ip link set dev bla type ip6gre remote 2001:db8:3::1
# ip link del dev bla
# devlink dev reload pci/0000:01:00.0
[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
<TASK>
mlxsw_sp_router_netdevice_event+0x55f/0x1240
notifier_call_chain+0x5a/0xd0
call_netdevice_notifiers_info+0x39/0x90
unregister_netdevice_many_notify+0x63e/0x9d0
rtnl_dellink+0x16b/0x3a0
rtnetlink_rcv_msg+0x142/0x3f0
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x242/0x390
netlink_sendmsg+0x1de/0x420
____sys_sendmsg+0x2bd/0x320
___sys_sendmsg+0x9a/0xe0
__sys_sendmsg+0x7a/0xd0
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[3]
unreferenced object 0xffff898081f597a0 (size 32):
comm "ip", pid 1626, jiffies 4294719324
hex dump (first 32 bytes):
20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01 ...............
21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00 !Ia.............
backtrace (crc fd9be911):
[<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
[<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
[<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
[<00000000743e7757>] notifier_call_chain+0x5a/0xd0
[<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
[<000000002509645d>] register_netdevice+0x5f7/0x7a0
[<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
[<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
[<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
[<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
[<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
[<00000000908bca63>] netlink_unicast+0x242/0x390
[<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
[<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
[<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
[<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0
Fixes: cf42911523e0 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Non-coherent architectures, like ARM, may require invalidating caches
before the device can use the DMA mapped memory, which means that before
posting pages to device, drivers should sync the memory for device.
Sync for device can be configured as page pool responsibility. Set the
relevant flag and define max_len for sync.
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/92e01f05c4f506a4f0a9b39c10175dcc01994910.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When Rx packet is received, drivers should sync the pages for CPU, to
ensure the CPU reads the data written by the device and not stale
data from its cache.
Add the missing sync call in Rx path, sync the actual length of data for
each fragment.
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/461486fac91755ca4e04c2068c102250026dcd0b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tx header should be pushed for each packet which is transmitted via
Spectrum ASICs. The cited commit moved the call to skb_cow_head() from
mlxsw_sp_port_xmit() to functions which handle Tx header.
In case that mlxsw_sp->ptp_ops->txhdr_construct() is used to handle Tx
header, and txhdr_construct() is mlxsw_sp_ptp_txhdr_construct(), there is
no call for skb_cow_head() before pushing Tx header size to SKB. This flow
is relevant for Spectrum-1 and Spectrum-4, for PTP packets.
Add the missing call to skb_cow_head() to make sure that there is both
enough room to push the Tx header and that the SKB header is not cloned and
can be modified.
An additional set will be sent to net-next to centralize the handling of
the Tx header by pushing it to every packet just before transmission.
Cc: Richard Cochran <richardcochran@gmail.com>
Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/5145780b07ebbb5d3b3570f311254a3a2d554a44.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability
can only checksum TCP and UDP over IPv6 if the IP header does not
contains extension.
This is enforced for UDP packets emitted from user-space to an IPv6
address as they go through ip6_make_skb(), which calls
__ip6_append_data() where a check is done on the header size before
setting CHECKSUM_PARTIAL.
But the introduction of UDP encapsulation with fou6 added a code-path
where it is possible to get an skb with a partial UDP checksum and an
IPv6 header with extension:
* fou6 adds a UDP header with a partial checksum if the inner packet
does not contains a valid checksum.
* ip6_tunnel adds an IPv6 header with a destination option extension
header if encap_limit is non-zero (the default value is 4).
The thread linked below describes in more details how to reproduce the
problem with GRE-in-UDP tunnel.
Add a check on the network header size in skb_csum_hwoffload_help() to
make sure no IPv6 packet with extension header is handed to a network
device with NETIF_F_IPV6_CSUM capability.
Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u
Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: Benoît Monin <benoit.monin@gmx.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The llvm fix [1] did not make it for 19.0.0, but ended up getting
backported to llvm 19.1.3 [2]. Thus, fix the version requirement to
correctly specify which versions have the bug.
Link: https://github.com/llvm/llvm-project/pull/104826 [1]
Link: https://github.com/llvm/llvm-project/pull/113938 [2]
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202410281414.c351044e-oliver.sang@intel.com
Fixes: 8b8ca9c25fe6 ("cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20241030-cfi-icall-1913-v1-1-ab8a26e13733@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
There is currently no eyeq6 compatible string defined in the binding
documentation. Only eyeq6h version is defined, so let's use it.
Note that there are actually no codes relying on eyeq6h; the purpose
of this patch is mainly to be coherent with the documentation.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
For the bcm6358/6368 SoC the brcm,bmips-cbr-reg due to bootloader
misconfiguration or HW bug from running the system from TP1.
A workaround is now present to handle broken system that suffer from
this bug hence add the now required property.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409251520.pE12GzHd-lkp@intel.com/
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
While most MIPS64 CPUs use 32-bit values for their VP Local Reset
Exception Base registers, some I6500 CPUs can utilize a 64-bit value,
allowing addressing up to 47 bits of physical memory.
For the EyeQ6H CPU, where physical memory addresses exceed the 4GB
limit, utilizing this feature is mandatory to enable SMP support.
Unfortunately, there is no way to detect this capability based solely
on the ID of the CPU. According to Imagination, which designed the
CPU, the only reliable method is to fill the reset base field with
0xFF and then read back its value. If the upper part of the read-back
value is zero, it indicates that the address space is limited to 32
bits.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
When MIPS_FP_SUPPORT is disabled, __sanitize_fcr31() is defined as
nothing, which triggers a gcc warning:
In file included from kernel/sched/core.c:79:
kernel/sched/core.c: In function 'context_switch':
./arch/mips/include/asm/switch_to.h:114:39: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
114 | __sanitize_fcr31(next); \
| ^
kernel/sched/core.c:5316:9: note: in expansion of macro 'switch_to'
5316 | switch_to(prev, next, prev);
| ^~~~~~~~~
Fix this by providing an empty body for __sanitize_fcr31() like one is
defined for __mips_mt_fpaff_switch_to().
Fixes: 36a498035bd2 ("MIPS: Avoid FCSR sanitization when CONFIG_MIPS_FP_SUPPORT=n")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Replace open-coded pieces with sysfs_emit() helper in sysfs .show()
callbacks.
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
When we remount filesystem with 'abort' mount option while changing
other mount options as well (as is LTP test doing), we can return error
from the system call after commit d3476f3dad4a ("ext4: don't set
SB_RDONLY after filesystem errors") because the application of mount
option changes detects shutdown filesystem and refuses to do anything.
The behavior of application of other mount options in presence of
'abort' mount option is currently rather arbitary as some mount option
changes are handled before 'abort' and some after it.
Move aborting of the filesystem to the end of remount handling so all
requested changes are properly applied before the filesystem is shutdown
to have a reasonably consistent behavior.
Fixes: d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors")
Reported-by: Jan Stancek <jstancek@redhat.com>
Link: https://lore.kernel.org/all/Zvp6L+oFnfASaoHl@t14s
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Jan Stancek <jstancek@redhat.com>
Link: https://patch.msgid.link/20241004221556.19222-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
find_group_other() and find_group_orlov() read *_lo, *_hi with
ext4_free_inodes_count without additional locking. This can cause
data-race warning, but since the lock is held for most writes and free
inodes value is generally not a problem even if it is incorrect, it is
more appropriate to use READ_ONCE()/WRITE_ONCE() than to add locking.
==================================================================
BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set
write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1:
ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405
__ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0:
ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349
find_group_other fs/ext4/ialloc.c:594 [inline]
__ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://patch.msgid.link/20241003125337.47283-1-aha310510@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
An ext4_journal_stop(handle) call was immediately used after a return value
check for a ext4_orphan_add() call in this function implementation.
Thus call such a function only once instead directly before the check.
This issue was transformed by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/cf895072-43cf-412c-bced-8268498ad13e@web.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take
uretprobe-specific SRCU lock and keep it active as kernel transfers
control back to user space.
Given we can't rely on user space returning from traced function within
reasonable time period, we need to make sure not to keep SRCU lock
active for too long, though. To that effect, we employ a timer callback
which is meant to terminate SRCU lock region after predefined timeout
(currently set to 100ms), and instead transfer underlying struct
uprobe's lifetime protection to refcounting.
This fallback to less scalable refcounting after 100ms is a fine
tradeoff from uretprobe's scalability and performance perspective,
because uretprobing *long running* user functions inherently doesn't run
into scalability issues (there is just not enough frequency to cause
noticeable issues with either performance or scalability).
The overall trick is in ensuring synchronization between current thread
and timer's callback fired on some other thread. To cope with that with
minimal logic complications, we add hprobe wrapper which is used to
contain all the synchronization related issues behind a small number of
basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected
state to refcounted state, and a hprobe_consume() and hprobe_finalize()
pair of single-use consuming helpers. Other than that, whatever current
thread's logic is there stays the same, as timer thread cannot modify
return_instance state (or add new/remove old return_instances). It only
takes care of SRCU unlock and uprobe refcounting, which is hidden from
the higher-level uretprobe handling logic.
We use atomic xchg() in hprobe_consume(), which is called from
performance critical handle_uretprobe_chain() function run in the
current context. When uncontended, this xchg() doesn't seem to hurt
performance as there are no other competing CPUs fighting for the same
cache line. We also mark struct return_instance as ____cacheline_aligned
to ensure no false sharing can happen.
Another technical moment. We need to make sure that the list of return
instances can be safely traversed under RCU from timer callback, so we
delay return_instance freeing with kfree_rcu() and make sure that list
modifications use RCU-aware operations.
Also, given SRCU lock survives transition from kernel to user space and
back we need to use lower-level __srcu_read_lock() and
__srcu_read_unlock() to avoid lockdep complaining.
Just to give an impression of a kind of performance improvements this
change brings, below are benchmarking results with and without these
SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks
Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to
uprobe lookup) are left intact:
WITHOUT SRCU for uretprobes
===========================
uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu)
uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu)
uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu)
uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu)
uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu)
uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu)
uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu)
uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu)
uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu)
WITH SRCU for uretprobes
========================
uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu)
uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu)
uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu)
uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu)
uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu)
uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu)
uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu)
uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu)
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241024044159.3156646-3-andrii@kernel.org
|
|
Currently put_uprobe() might trigger mutex_lock()/mutex_unlock(), which
makes it unsuitable to be called from more restricted context like softirq.
Let's make put_uprobe() agnostic to the context in which it is called,
and use work queue to defer the mutex-protected clean up steps.
RB tree removal step is also moved into work-deferred callback to avoid
potential deadlock between softirq-based timer callback, added in the
next patch, and the rest of uprobe code.
We can rework locking altogher as a follow up, but that's significantly
more tricky, so warrants its own patch set. For now, we need to make
sure that changes in the next patch that add timer thread work correctly
with existing approach, while concentrating on SRCU + timeout logic.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241024044159.3156646-2-andrii@kernel.org
|
|
The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.
Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241010142604.770192-2-kan.liang@linux.intel.com
|
|
There are extra codes in the CPU hotplug function to allocate rapl pmus.
The generic PMU hotplug support is hard to be applied.
As long as the rapl pmus can be allocated upfront for each die/socket,
the code doesn't need to be implemented in the CPU hotplug function.
Move the code to the init_rapl_pmus(), and allocate a PMU for each
possible die/socket.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Link: https://lore.kernel.org/r/20241010142604.770192-1-kan.liang@linux.intel.com
|
|
Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs
switched_from_fair()") forgot about scx :/
Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20241030104934.GK14555@noisy.programming.kicks-ass.net
|
|
The barrier_nospec() in 64-bit copy_from_user() is slow. Instead use
pointer masking to force the user pointer to all 1's for an invalid
address.
The kernel test robot reports a 2.6% improvement in the per_thread_ops
benchmark [1].
This is a variation on a patch originally by Josh Poimboeuf [2].
Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com [1]
Link: https://lore.kernel.org/5b887fe4c580214900e21f6c61095adf9a142735.1730166635.git.jpoimboe@kernel.org [2]
Tested-and-reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Update more header copies with the kernel sources, including const.h,
msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h
- The return from 'write' isn't a pid, fix cut'n'paste error in 'perf
trace'
- Fix up the python binding build on architectures without
HAVE_KVM_STAT_SUPPORT
- Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to
collect syscall pointer arguments in 'perf trace') to make the
resulting bytecode to pass the kernel BPF verifier, allowing us to go
back accepting clang 12.0.1 as the minimum version required for
compiling BPF sources
- Add __NR_capget for x86 to fix a regression on running perf + intel
PT (hw tracing) as non-root setting up the capabilities as described
in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
- Fix missing syscalltbl in non-explicitly listed architectures,
noticed on ARM 32-bit, that still needs a .tbl generator for the
syscall id<->name tables, should be added for v6.13
- Handle 'perf test' failure when handling broken DWARF for ASM files
* tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cap: Add __NR_capget to arch/x86 unistd
tools headers: Update the linux/unaligned.h copy with the kernel sources
tools headers arm64: Sync arm64's cputype.h with the kernel sources
tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT
perf test: Handle perftool-testsuite_probe failure due to broken DWARF
tools headers UAPI: Sync kvm headers with the kernel sources
perf trace: Fix non-listed archs in the syscalltbl routines
perf build: Change the clang check back to 12.0.1
perf trace augmented_raw_syscalls: Add more checks to pass the verifier
perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
perf trace: The return from 'write' isn't a pid
tools headers UAPI: Sync linux/const.h with the kernel headers
|
|
Dai pointed out that the xa_init_flags() in rpcrdma_add_one() needs
to have a matching xa_destroy() in rpcrdma_remove_one() to release
underlying memory that the xarray might have accrued during
operation.
Reported-by: Dai Ngo <dai.ngo@oracle.com>
Fixes: 7e86845a0346 ("rpcrdma: Implement generic device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Hou Tao says:
====================
The patch set fixes several issues in bits iterator. Patch #1 fixes the
kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem
of nr_bits. Patch #4 fixes the potential stack corruption when bits
iterator is used on 32-bit host. Patch #5 adds more test cases for bits
iterator.
Please see the individual patches for more details. And comments are
always welcome.
---
v4:
* patch #1: add ack from Yafang
* patch #3: revert code-churn like changes:
(1) compute nr_bytes and nr_bits before the check of nr_words.
(2) use nr_bits == 64 to check for single u64, preventing build
warning on 32-bit hosts.
* patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)"
v3: https://lore.kernel.org/bpf/20241025013233.804027-1-houtao@huaweicloud.com/T/#t
* split the bits-iterator related patches from "Misc fixes for bpf"
patch set
* patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration
* patch #2: add a new helper for the overflow problem
* patch #3: decrease the limitation from 512 to 511 and check whether
nr_bytes is too large for bpf memory allocator explicitly
* patch #5: add two more test cases for bit iterator
v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com
====================
Link: https://lore.kernel.org/r/20241030100516.3633640-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add more test cases for bits iterator:
(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.
Also rename bits_nomem to bits_too_big to better reflect its purpose.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
content of the u64. However, bits_copy is only 4 bytes, leading to stack
corruption.
The straightforward solution would be to replace u64 with unsigned long
in bpf_iter_bits_new(). However, this introduces confusion and problems
for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
but it is treated as 4-bytes after passed to bpf_iter_bits_new().
Fix it by changing the type of both bits and bit_count from unsigned
long to u64. However, the change is not enough. The main reason is that
bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
pointer passed to find_next_bit() is an unsigned long pointer instead
of a u64 pointer. For 32-bit little-endian host, it is fine but it is
not the case for 32-bit big-endian host. Because under 32-bit big-endian
host, the first iterated unsigned long will be the bits 32-63 of the u64
instead of the expected bits 0-31. Therefore, in addition to changing
the type, swap the two unsigned longs within the u64 for 32-bit
big-endian host.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check the validity of nr_words in bpf_iter_bits_new(). Without this
check, when multiplication overflow occurs for nr_bits (e.g., when
nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
Fix it by limiting the maximum value of nr_words to 511. The value is
derived from the current implementation of BPF memory allocator. To
ensure compatibility if the BPF memory allocator's size limitation
changes in the future, use the helper bpf_mem_alloc_check_size() to
check whether nr_bytes is too larger. And return -E2BIG instead of
-ENOMEM for oversized nr_bytes.
Fixes: 4665415975b0 ("bpf: Add bits iterator")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce bpf_mem_alloc_check_size() to check whether the allocation
size exceeds the limitation for the kmalloc-equivalent allocator. The
upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than
non-percpu allocation, so a percpu argument is added to the helper.
The helper will be used in the following patch to check whether the size
parameter passed to bpf_mem_alloc() is too big.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
bits are dynamically allocated. However, the check is incorrect and may
cause a kmemleak as shown below:
unreferenced object 0xffff88812628c8c0 (size 32):
comm "swapper/0", pid 1, jiffies 4294727320
hex dump (first 32 bytes):
b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0 ..U...........
f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00 ..............
backtrace (crc 781e32cc):
[<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
[<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
[<00000000597124d6>] __alloc.isra.0+0x89/0xb0
[<000000004ebfffcd>] alloc_bulk+0x2af/0x720
[<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
[<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
[<000000008b616eac>] bpf_global_ma_init+0x19/0x30
[<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
[<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
[<00000000b119f72f>] kernel_init+0x20/0x160
[<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
[<0000000004671da4>] ret_from_fork_asm+0x1a/0x30
That is because nr_bits will be set as zero in bpf_iter_bits_next()
after all bits have been iterated.
Fix the issue by setting kit->bit to kit->nr_bits instead of setting
kit->nr_bits to zero when the iteration completes in
bpf_iter_bits_next(). In addition, use "!nr_bits || bits >= nr_bits" to
check whether the iteration is complete and still use "nr_bits > 64" to
indicate whether bits are dynamically allocated. The "!nr_bits" check is
necessary because bpf_iter_bits_new() may fail before setting
kit->nr_bits, and this condition will stop the iteration early instead
of accessing the zeroed or freed kit->bits.
Considering the initial value of kit->bits is -1 and the type of
kit->nr_bits is unsigned int, change the type of kit->nr_bits to int.
The potential overflow problem will be handled in the following patch.
Fixes: 4665415975b0 ("bpf: Add bits iterator")
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fix __hci_cmd_sync_sk() to return not NULL for unknown opcodes.
__hci_cmd_sync_sk() returns NULL if a command returns a status event.
However, it also returns NULL where an opcode doesn't exist in the
hci_cc table because hci_cmd_complete_evt() assumes status = skb->data[0]
for unknown opcodes.
This leads to null-ptr-deref in cmd_sync for HCI_OP_READ_LOCAL_CODECS as
there is no hci_cc for HCI_OP_READ_LOCAL_CODECS, which always assumes
status = skb->data[0].
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 1 PID: 2000 Comm: kworker/u9:5 Not tainted 6.9.0-ga6bcb805883c-dirty #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci7 hci_power_on
RIP: 0010:hci_read_supported_codecs+0xb9/0x870 net/bluetooth/hci_codec.c:138
Code: 08 48 89 ef e8 b8 c1 8f fd 48 8b 75 00 e9 96 00 00 00 49 89 c6 48 ba 00 00 00 00 00 fc ff df 4c 8d 60 70 4c 89 e3 48 c1 eb 03 <0f> b6 04 13 84 c0 0f 85 82 06 00 00 41 83 3c 24 02 77 0a e8 bf 78
RSP: 0018:ffff888120bafac8 EFLAGS: 00010212
RAX: 0000000000000000 RBX: 000000000000000e RCX: ffff8881173f0040
RDX: dffffc0000000000 RSI: ffffffffa58496c0 RDI: ffff88810b9ad1e4
RBP: ffff88810b9ac000 R08: ffffffffa77882a7 R09: 1ffffffff4ef1054
R10: dffffc0000000000 R11: fffffbfff4ef1055 R12: 0000000000000070
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810b9ac000
FS: 0000000000000000(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6ddaa3439e CR3: 0000000139764003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
hci_read_local_codecs_sync net/bluetooth/hci_sync.c:4546 [inline]
hci_init_stage_sync net/bluetooth/hci_sync.c:3441 [inline]
hci_init4_sync net/bluetooth/hci_sync.c:4706 [inline]
hci_init_sync net/bluetooth/hci_sync.c:4742 [inline]
hci_dev_init_sync net/bluetooth/hci_sync.c:4912 [inline]
hci_dev_open_sync+0x19a9/0x2d30 net/bluetooth/hci_sync.c:4994
hci_dev_do_open net/bluetooth/hci_core.c:483 [inline]
hci_power_on+0x11e/0x560 net/bluetooth/hci_core.c:1015
process_one_work kernel/workqueue.c:3267 [inline]
process_scheduled_works+0x8ef/0x14f0 kernel/workqueue.c:3348
worker_thread+0x91f/0xe50 kernel/workqueue.c:3429
kthread+0x2cb/0x360 kernel/kthread.c:388
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in drivers (ufs and scsi_debug)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix another deadlock during RTC update
scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy length
|
|
The error flow in nfsd4_copy() calls cleanup_async_copy(), which
already decrements nn->pending_async_copies.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When printing a dynamic array in a trace event, the method is rather ugly.
It has the format of:
__print_array(__get_dynamic_array(array),
__get_dynmaic_array_len(array) / el_size, el_size)
Since dynamic arrays are known to the tracing infrastructure, create a
helper macro that does the above for you.
__print_dynamic_array(array, el_size)
Which would expand to the same output.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-3-avadhut.naik@amd.com
|
|
Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.
However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.
The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.
Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.
Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.
[ bp: Restore reverse x-mas tree order of function vars declarations. ]
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com
|
|
The domain attributes returned by the perf protocol can end up reporting
identical names across domains, resulting in debugfs node creation failure.
Use the GENPD_FLAG_DEV_NAME_FW to ensure the genpd providers end up with an
unique name.
Logs: [X1E reports 'NCC' for all its scmi perf domains]
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Cc: stable@vger.kernel.org
Message-ID: <20241030125512.2884761-6-quic_sibis@quicinc.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Introduce GENPD_FLAG_DEV_NAME_FW flag which instructs genpd to generate
an unique device name using ida. It is aimed to be used by genpd providers
which derive their names directly from FW making them susceptible to
debugfs node creation failures.
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
Fixes: 718072ceb211 ("PM: domains: create debugfs nodes when adding power domains")
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Cc: stable@vger.kernel.org
Message-ID: <20241030125512.2884761-5-quic_sibis@quicinc.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
If we don't do that, the group is considered usable by userspace, but
all further GROUP_SUBMIT will fail with -EINVAL.
Changes in v3:
- Add R-bs
Changes in v2:
- New patch
Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-3-boris.brezillon@collabora.com
|
|
Userspace can use GROUP_SUBMIT errors as a trigger to check the group
state and recreate the group if it became unusable. Make sure we
report an error when the group became unusable.
Changes in v3:
- None
Changes in v2:
- Add R-bs
Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-2-boris.brezillon@collabora.com
|
|
The system and GPU MMU page size might differ, which becomes a
problem for FW sections that need to be mapped at explicit addresses
since our PAGE_SIZE alignment might cover a VA range that's
expected to be used for another section.
Make sure we never map more than we need.
Changes in v3:
- Add R-bs
Changes in v2:
- Plan for per-VM page sizes so the MCU VM and user VM can
have different pages sizes
Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030150231.768949-1-boris.brezillon@collabora.com
|
|
This was previously fixed with commit 1147dd0503564fa0e0348
("nvme: fix error-handling for io_uring nvme-passthrough"), but the
change was mistakenly undone in a later commit.
Fixes: d6aacee9255e7f ("nvme: use bio_integrity_map_user")
Cc: stable@vger.kernel.org
Reported-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
ctrl->dh_key might be used across multiple calls to nvmet_setup_dhgroup()
for the same controller. So it's better to nullify it after release on
error path in order to avoid double free later in nvmet_destroy_auth().
Found by Linux Verification Center (linuxtesting.org) with Svace.
Fixes: 7a277c37d352 ("nvmet-auth: Diffie-Hellman key exchange support")
Cc: stable@vger.kernel.org
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@maxima.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
A recent commit enables integrity checks for formats the previous kernel
versions registered with the "nop" integrity profile. This means
namespaces using that format become unreadable when upgrading the kernel
past that commit.
Introduce a module parameter to restore the "nop" integrity profile so
that storage can be readable once again. This could be a boot device, so
the setting needs to happen at module load time.
Fixes: 921e81db524d17 ("nvme: allow integrity when PI is not in first bytes")
Reported-by: David Wei <dw@davidwei.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The seed is only used for kernel generation and verification. That
doesn't happen for user buffers, so passing the seed around doesn't
accomplish anything.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20241016201309.1090320-1-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Quirk is needed to enable headset microphone on missing pin 0x19.
Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|