summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-24powerpc/mm: Convert to using lock_mm_and_find_vma()Michael Ellerman
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24arm64/mm: Convert to using lock_mm_and_find_vma()Linus Torvalds
This converts arm64 to use the new page fault helper. It was very straightforward, but still needed a fix for the "obvious" conversion I initially did. Thanks to Suren for the fix and testing. Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com> Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24mm: make the page fault mmap locking killableLinus Torvalds
This is done as a separate patch from introducing the new lock_mm_and_find_vma() helper, because while it's an obvious change, it's not what x86 used to do in this area. We already abort the page fault on fatal signals anyway, so why should we wait for the mmap lock only to then abort later? With the new helper function that returns without the lock held on failure anyway, this is particularly easy and straightforward. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24mm: introduce new 'lock_mm_and_find_vma()' page fault helperLinus Torvalds
.. and make x86 use it. This basically extracts the existing x86 "find and expand faulting vma" code, but extends it to also take the mmap lock for writing in case we actually do need to expand the vma. We've historically short-circuited that case, and have some rather ugly special logic to serialize the stack segment expansion (since we only hold the mmap lock for reading) that doesn't match the normal VM locking. That slight violation of locking worked well, right up until it didn't: the maple tree code really does want proper locking even for simple extension of an existing vma. So extract the code for "look up the vma of the fault" from x86, fix it up to do the necessary write locking, and make it available as a helper function for other architectures that can use the common helper. Note: I say "common helper", but it really only handles the normal stack-grows-down case. Which is all architectures except for PA-RISC and IA64. So some rare architectures can't use the helper, but if they care they'll just need to open-code this logic. It's also worth pointing out that this code really would like to have an optimistic "mmap_upgrade_trylock()" to make it quicker to go from a read-lock (for the common case) to taking the write lock (for having to extend the vma) in the normal single-threaded situation where there is no other locking activity. But that _is_ all the very uncommon special case, so while it would be nice to have such an operation, it probably doesn't matter in reality. I did put in the skeleton code for such a possible future expansion, even if it only acts as pseudo-documentation for what we're doing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24fbdev: fix potential OOB read in fast_imageblit()Zhang Shurong
There is a potential OOB read at fast_imageblit, for "colortab[(*src >> 4)]" can become a negative value due to "const char *s = image->data, *src". This change makes sure the index for colortab always positive or zero. Similar commit: https://patchwork.kernel.org/patch/11746067 Potential bug report: https://groups.google.com/g/syzkaller-bugs/c/9ubBXKeKXf4/m/k-QXy4UgAAAJ Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
2023-06-24hwmon: (corsair-psu) update Series 2022 and 2023 supportWilken Gottwalt
The series 2022/2023 reports slightly longer vendor/product strings and shares USB ids. Technically the reply size is the USB HID packet size (64 bytes) but all the supported commands do not use more than 8 bytes and replies reporting back strings do not use more then 24 bytes (vendor and product are in one string in the newer devices now). The rest of the reply is always filled with '\0'. Also update comments and documentation accordingly. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net> Link: https://lore.kernel.org/r/ZJbB72CAPmLflhHG@monster.localdomain Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-06-24xtensa: dump userspace code around the exception PCMax Filippov
In the absence of other debug facilities dumping user code around the unhandled exception address may help debugging the issue. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2023-06-24spi: dt-bindings: atmel,at91rm9200-spi: fix broken sam9x7 compatibleKrzysztof Kozlowski
Commit a3eb95484f27 ("spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7 compatible") adding sam9x7 compatible did not make any sense as it added new compatible into middle of existing compatible list. The intention was probably to add new set of compatibles with sam9x7 as first one. Fixes: a3eb95484f27 ("spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7 compatible") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/Message-Id: <20230624082054.37697-1-krzysztof.kozlowski@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2023-06-24MAINTAINERS: adjust entry in VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER DRIVERLukas Bulwahn
Commit d4313a68ec91 ("fbdev/media: Use GPIO descriptors for VIA GPIO") moves via-gpio.h from include/linux to drivers/video/fbdev/via, but misses to adjust the file entry for the VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER DRIVER section. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Remove the file entry in VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER DRIVER, as the new location of the header is already covered by the file entry drivers/video/fbdev/via/. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Fixes: d4313a68ec91 ("fbdev/media: Use GPIO descriptors for VIA GPIO") Signed-off-by: Helge Deller <deller@gmx.de>
2023-06-23Merge branch 'mlxsw-maintain-candidate-rifs'Jakub Kicinski
Petr Machata says: ==================== mlxsw: Maintain candidate RIFs The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. The situation is going to be made better by implementing a range of replays and post-hoc offloads. This patch set lays the ground for replay of next hops. The particular issue that it deals with is that currently, driver-specific bookkeeping for next hops is hooked off RIF objects, which come and go across the lifetime of a netdevice. We would rather keep these objects at an entity that mirrors the lifetime of the netdevice itself. That way they are at hand and can be offloaded when a RIF is eventually created. To that end, with this patchset, mlxsw keeps a hash table of CRIFs: candidate RIFs, persistent handles for netdevices that mlxsw deems potentially interesting. The lifetime of a CRIF matches that of the underlying netdevice, and thus a RIF can always assume a CRIF exists. A CRIF is where next hops are kept, and when RIF is created, these next hops can be easily offloaded. (Previously only the next hops created after the RIF was created were offloaded.) - Patches #1 and #2 are minor adjustments. - In patches #3 and #4, add CRIF bookkeeping. - In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF, the corresponding CRIF is easy to look up. - Patch #6 is a clean-up allowed by the previous patches - Patches #7 and #8 move next hop tracking to CRIFs No observable effects are intended as of yet. This will be useful once there is support for RIF creation for netdevices that become mlxsw uppers, which will come in following patch sets. ==================== Link: https://lore.kernel.org/r/cover.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Track next hops at CRIFsPetr Machata
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The reason is that eventually, next hops for mlxsw uppers should be offloaded and unoffloaded on demand as a netdevice becomes an upper, or stops being one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the netdevice lifetime. Correspondingly, track at each next hop not its RIF, but its CRIF (from which a RIF can always be deduced). Note that now that next hops are tracked at a CRIF, it is not necessary to move each over to a new RIF when it is necessary to edit a RIF. Therefore drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy() call mlxsw_sp_nexthop_rif_update() directly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Split nexthop finalization to two stagesPetr Machata
Nexthop finalization consists of two steps: the part where the offload is removed, because the backing RIF is now gone; and the part where the association to the RIF is severed. Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later be called independently. Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini() vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a historical happenstance than a conscious decision. The two cleanups do not depend on each other, and this change should have no observable effects. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_indexPetr Machata
A previous patch added a pointer to loopback CRIF to the router data structure. That makes the loopback RIF index redundant, as everything necessary can be derived from the CRIF. Drop the field and adjust the code accordingly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Link CRIFs to RIFsPetr Machata
When a RIF is about to be created, the registration of the netdevice that it should be associated with must have been seen in the past, and a CRIF created. Therefore make this a hard requirement by looking up the CRIF during RIF creation, and complaining loudly when there isn't one. This then allows to keep a link between a RIF and its corresponding CRIF (and back, as the relationship is one-to-at-most-one), which do. The CRIF will later be useful as the objects tracked there will be offloaded lazily as a result of RIF creation. CRIFs are created when an "interesting" netdevice is registered, and destroyed after such device is unregistered. CRIFs are supposed to already exist when a RIF creation request arises, and exist at least as long as that RIF exists. This makes for a simple invariant: it is always safe to dereference CRIF pointer from "its" RIF. To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER event is delivered. The reason is that if a RIF's netdevices has an IPv6 address, removal of this address is notified in an atomic block. To remove the RIF, the IPv6 removal handler schedules a work item. It must be safe for this work item to access the associated CRIF as well. Thus when a netdevice that backs the CRIF is removed, if it still has a RIF, do not actually free the CRIF, only toggle its can_destroy flag, which this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIFPetr Machata
CRIFs are generally not maintained for loopback RIFs. However, the RIF for the default VRF is used for offloading of blackhole nexthops. Nexthops expect to have a valid CRIF. Therefore in this patch, add code to maintain CRIF for the loopback RIF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain a hash table of CRIFsPetr Machata
CRIFs are objects that mlxsw maintains for netdevices that may not have an associated RIF (i.e. they may not have been instantiated in the ASIC), but if indeed they do not, it is quite possible they will in the future. These netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are created include e.g. bridges, LAGs, or front panel ports. The idea is that next hops would be kept at CRIFs, not RIFs, and thus it would be easier to offload and unoffload the entities that have been added before the RIF was created. In this patch, add the code for low-level CRIF maintenance: create and destroy, and keep in a table keyed by the netdevice pointer for easy recall. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIFPetr Machata
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the function mentioned in the subject. As such it forms an external interface of the router code. In future patches we will want to maintain connection between RIFs and the CRIFs (introduced in the next patch) that back them. That will not hold for the VRF-based loopback netdevices, so the whole CRIF business can be kept hidden from the rest of mlxsw. But for the main VRF loopback RIF we do want to keep the RIF-CRIF connection, because that RIF is used for blackhole next hops, and the next hop code can be kept simpler for assuming rif->crif is valid. Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback RIF. This being an internal function will take the CRIF argument anyway. Furthermore, the function does not lock, which is not necessary at this point in code yet. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Add extack argument to mlxsw_sp_lb_rif_init()Petr Machata
The extack will be handy in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23bonding: do not assume skb mac_header is setEric Dumazet
Drivers must not assume in their ndo_start_xmit() that skbs have their mac_header set. skb->data is all what is needed. bonding seems to be one of the last offender as caught by syzbot: WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 skb_mac_offset include/linux/skbuff.h:2913 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 __bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470 Modules linked in: CPU: 1 PID: 12155 Comm: syz-executor.3 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 RIP: 0010:skb_mac_header include/linux/skbuff.h:2907 [inline] RIP: 0010:skb_mac_offset include/linux/skbuff.h:2913 [inline] RIP: 0010:bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline] RIP: 0010:bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline] RIP: 0010:bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline] RIP: 0010:__bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline] RIP: 0010:bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470 Code: 8b 7c 24 30 e8 76 dd 1a 01 48 85 c0 74 0d 48 89 c3 e8 29 67 2e fe e9 15 ef ff ff e8 1f 67 2e fe e9 10 ef ff ff e8 15 67 2e fe <0f> 0b e9 45 f8 ff ff e8 09 67 2e fe e9 dc fa ff ff e8 ff 66 2e fe RSP: 0018:ffffc90002fff6e0 EFLAGS: 00010283 RAX: ffffffff835874db RBX: 000000000000ffff RCX: 0000000000040000 RDX: ffffc90004dcf000 RSI: 00000000000000b5 RDI: 00000000000000b6 RBP: ffffc90002fff8b8 R08: ffffffff83586d16 R09: ffffffff83586584 R10: 0000000000000007 R11: ffff8881599fc780 R12: ffff88811b6a7b7e R13: 1ffff110236d4f6f R14: ffff88811b6a7ac0 R15: 1ffff110236d4f76 FS: 00007f2e9eb47700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e421000 CR3: 000000010e6d4000 CR4: 00000000003526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> [<ffffffff8471a49f>] netdev_start_xmit include/linux/netdevice.h:4925 [inline] [<ffffffff8471a49f>] __dev_direct_xmit+0x4ef/0x850 net/core/dev.c:4380 [<ffffffff851d845b>] dev_direct_xmit include/linux/netdevice.h:3043 [inline] [<ffffffff851d845b>] packet_direct_xmit+0x18b/0x300 net/packet/af_packet.c:284 [<ffffffff851c7472>] packet_snd net/packet/af_packet.c:3112 [inline] [<ffffffff851c7472>] packet_sendmsg+0x4a22/0x64d0 net/packet/af_packet.c:3143 [<ffffffff8467a4b2>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff8467a4b2>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff8467a4b2>] __sys_sendto+0x472/0x5f0 net/socket.c:2139 [<ffffffff8467a715>] __do_sys_sendto net/socket.c:2151 [inline] [<ffffffff8467a715>] __se_sys_sendto net/socket.c:2147 [inline] [<ffffffff8467a715>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147 [<ffffffff8553071f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff8553071f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80 [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 7b8fc0103bb5 ("bonding: add a vlan+srcmac tx hashing option") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jarod Wilson <jarod@redhat.com> Cc: Moshe Tal <moshet@nvidia.com> Cc: Jussi Maki <joamaki@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230622152304.2137482-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23net: bcmgenet: Ensure MDIO unregistration has clocks enabledFlorian Fainelli
With support for Ethernet PHY LEDs having been added, while unregistering a MDIO bus and its child device liks PHYs there may be "late" accesses to the MDIO bus. One typical use case is setting the PHY LEDs brightness to OFF for instance. We need to ensure that the MDIO bus controller remains entirely functional since it runs off the main GENET adapter clock. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230617155500.4005881-1-andrew@lunn.ch/ Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230622103107.1760280-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-24Add Renesas PMIC RAA215300 and built-in RTCMark Brown
Merge series from Biju Das <biju.das.jz@bp.renesas.com>: This patch series aims to add support for Renesas PMIC RAA215300 and built-in RTC found on this PMIC device. The details of PMIC can be found here[1]. Renesas PMIC RAA215300 exposes two separate i2c devices, one for the main device and another for rtc device.
2023-06-23kernel/time/posix-stubs.c: remove duplicated includeYang Li
./kernel/time/posix-stubs.c: linux/syscalls.h is included more than once. Link: https://lkml.kernel.org/r/20230608082312.123939-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5463 Fixes: c1956519cd7e ("syscalls: add sys_ni_posix_timers prototype") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23ocfs2: remove redundant assignment to variable bit_offColin Ian King
Variable bit_off is being assigned a value that is never read, it is being re-assigned a new value in the following while loop. Remove the assignment. Cleans up clang scan build warning: fs/ocfs2/localalloc.c:976:18: warning: Although the value stored to 'bit_off' is used in the enclosing expression, the value is never actually read from 'bit_off' [deadcode.DeadStores] Link: https://lkml.kernel.org/r/20230622102736.2831126-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDYLukas Bulwahn
Commit a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific") accidentially introduces a typo in one of the config dependencies of HARDLOCKUP_DETECTOR_PREFER_BUDDY. Fix this accidental typo. Link: https://lkml.kernel.org/r/20230623040717.8645-1-lukas.bulwahn@gmail.com Fixes: a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.hDouglas Anderson
The powerpc architecture was the only one that defined arch_trigger_cpumask_backtrace() in asm/nmi.h instead of asm/irq.h. Move it to be consistent. This fixes compile time errors introduced by commit 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH"). That commit caused <asm/nmi.h> to stop being included if the hardlockup detector wasn't enabled. The specific errors were: error: implicit declaration of function `nmi_cpu_backtrace' error: implicit declaration of function `nmi_trigger_cpumask_backtrace' NOTE: when moving this into irq.h, we also change the guards from just checking if "CONFIG_NMI_IPI" is defined to also checking if "CONFIG_PPC_BOOK3S_64" is defined. This matches the code in arch/powerpc/kernel/stacktrace.c. Previously this worked because <asm.nmi.h> was included if "CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH" was defined. For powerpc that's only selected if "CONFIG_PPC_BOOK3S_64" is defined. [dianders@chromium.org: change the guards to include CONFIG_PPC_BOOK3S_64] Link: https://lkml.kernel.org/r/20230622202816.v2.1.Ice67126857506712559078e7de26d32d26e64631@changeid Link: https://lkml.kernel.org/r/20230621164809.1.Ice67126857506712559078e7de26d32d26e64631@changeid Fixes: 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Closes: https://lore.kernel.org/r/871qi5otdh.fsf@mail.lhotse Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Douglas Anderson <dianders@chromium.org> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23devres: show which resource was invalid in __devm_ioremap_resource()Ben Dooks
The other error prints in this call show the resource which wsan't valid, so add this to the first print when it checks for basic validity of the resource. Link: https://lkml.kernel.org/r/20230621163050.477668-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm/hugetlb: remove hugetlb_set_page_subpool()Sidhartha Kumar
All users have been converted to hugetlb_set_folio_subpool() so we can safely remove this function. Link: https://lkml.kernel.org/r/20230623054948.280627-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: nommu: correct the range of mmap_sem_read_lock in task_mem()lipeifeng
During the seq_printf,the mmap_sem_read_lock protection is not required. Link: https://lkml.kernel.org/r/20230622040152.1173-1-lipeifeng@oppo.com Signed-off-by: lipeifeng <lipeifeng@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23hugetlb: revert use of page_cache_next_miss()Mike Kravetz
Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the Closes tag. The issue showed up after the conversion of hugetlb page cache lookup code to use page_cache_next_miss. User visible effects are: - hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet in the file. - hugetlb pages will not be included in core dumps if they need to be brought in via GUP. - userfaultfd UFFDIO_COPY will not notice pages already present in the cache. It may try to allocate a new page and potentially return ENOMEM as opposed to EEXIST. Revert the use page_cache_next_miss() in hugetlb code. IMPORTANT NOTE FOR STABLE BACKPORTS: This patch will apply cleanly to v6.3. However, due to the change of filemap_get_folio() return values, it will not function correctly. This patch must be modified for stable backports. [dan.carpenter@linaro.org: fix hugetlbfs_pagecache_present()] Link: https://lkml.kernel.org/r/efa86091-6a2c-4064-8f55-9b44e1313015@moroto.mountain Link: https://lkml.kernel.org/r/20230621212403.174710-2-mike.kravetz@oracle.com Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reported-by: Ackerley Tng <ackerleytng@google.com> Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23Revert "page cache: fix page_cache_next/prev_miss off by one"Mike Kravetz
This reverts commit 9425c591e06a9ab27a145ba655fb50532cf0bcc9 The reverted commit fixed up routines primarily used by readahead code such that they could also be used by hugetlb. Unfortunately, this caused a performance regression as pointed out by the Closes: tag. The hugetlb code which uses page_cache_next_miss will be addressed in a subsequent patch. Link: https://lkml.kernel.org/r/20230621212403.174710-1-mike.kravetz@oracle.com Fixes: 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202306211346.1e9ff03e-oliver.sang@intel.com Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Ackerley Tng <ackerleytng@google.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm/vmscan: fix root proactive reclaim unthrottling unbalanced nodeYosry Ahmed
When memory.reclaim was introduced, it became the first case where cgroup_reclaim() is true for the root cgroup. Johannes concluded [1] that for most cases this is okay, except for one case. Historically, kswapd would throttle reclaim on a node if a lot of pages marked for reclaim are under writeback (aka the node is congested). This occurred by setting LRUVEC_CONGESTED bit in lruvec->flags. The bit would be cleared when the node is balanced. Similarly, cgroup reclaim would set the same bit when an lruvec is congested, and clear it on the way out of reclaim (to throttle local reclaimers). Before the introduction of memory.reclaim, the root memcg was the only target of kswapd reclaim, and non-root memcgs were the only targets of cgroup reclaim, so they would never interfere. Using the same bit for both was fine. After memory.reclaim, it is possible for cgroup reclaim on the root cgroup to clear the bit set by kswapd. This would result in reclaim on the node to be unthrottled before the node is balanced. Fix this by introducing separate bits for cgroup-level and node-level congestion. kswapd can unthrottle an lruvec that is marked as congested by cgroup reclaim (as the entire node should no longer be congested), but not vice versa (to prevent premature unthrottling before the entire node is balanced). [1]https://lore.kernel.org/lkml/20230405200150.GA35884@cmpxchg.org/ Link: https://lkml.kernel.org/r/20230621023101.432780-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Closes: https://lore.kernel.org/lkml/20230405200150.GA35884@cmpxchg.org/ Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: memcg: rename and document global_reclaim()Yosry Ahmed
Evidently, global_reclaim() can be a confusing name. Especially that it used to exist before with a subtly different definition (removed by commit b5ead35e7e1d ("mm: vmscan: naming fixes: global_reclaim() and sane_reclaim()"). It can be interpreted as non-cgroup reclaim, even though it returns true for cgroup reclaim on the root memcg (through memory.reclaim). Rename it to root_reclaim() in an attempt to make it less ambiguous, and add documentation to it as well as cgroup_reclaim. Link: https://lkml.kernel.org/r/20230621023053.432374-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Closes: https://lore.kernel.org/lkml/20230405200150.GA35884@cmpxchg.org/ Acked-by: Yu Zhao <yuzhao@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: kill [add|del]_page_to_lru_list()Kefeng Wang
Now no one call [add|del]_page_to_lru_list(), let's drop unused page interfaces. Link:https://lkml.kernel.org/r/20230619110718.65679-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: James Gowans <jgowans@amazon.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: compaction: convert to use a folio in isolate_migratepages_block()Kefeng Wang
Directly use a folio instead of page_folio() when page successfully isolated (hugepage and movable page) and after folio_get_nontail_page(), which removes several calls to compound_head(). Link: https://lkml.kernel.org/r/20230619110718.65679-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: James Gowans <jgowans@amazon.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: zswap: fix double invalidate with exclusive loadsYosry Ahmed
If exclusive loads are enabled for zswap, we invalidate the entry before returning from zswap_frontswap_load(), after dropping the local reference. However, the tree lock is dropped during decompression after the local reference is acquired, so the entry could be invalidated before we drop the local ref. If this happens, the entry is freed once we drop the local ref, and zswap_invalidate_entry() tries to invalidate an already freed entry. Fix this by: (a) Making sure zswap_invalidate_entry() is always called with a local ref held, to avoid being called on a freed entry. (b) Making sure zswap_invalidate_entry() only drops the ref if the entry was actually on the rbtree. Otherwise, another invalidation could have already happened, and the initial ref is already dropped. With these changes, there is no need to check that there is no need to make sure the entry still exists in the tree in zswap_reclaim_entry() before invalidating it, as zswap_reclaim_entry() will make this check internally. Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: remove unnecessary pagevec includesMatthew Wilcox (Oracle)
These files no longer need pagevec.h, mostly due to function declarations being moved out of it. Link: https://lkml.kernel.org/r/20230621164557.3510324-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: remove references to pagevecMatthew Wilcox (Oracle)
Most of these should just refer to the LRU cache rather than the data structure used to implement the LRU cache. Link: https://lkml.kernel.org/r/20230621164557.3510324-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: rename invalidate_mapping_pagevec to mapping_try_invalidateMatthew Wilcox (Oracle)
We don't use pagevecs for the LRU cache any more, and we don't know that the failed invalidations were due to the folio being in an LRU cache. So rename it to be more accurate. Link: https://lkml.kernel.org/r/20230621164557.3510324-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: remove struct pagevecMatthew Wilcox (Oracle)
All users are now converted to use the folio_batch so we can get rid of this data structure. Link: https://lkml.kernel.org/r/20230621164557.3510324-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23net: convert sunrpc from pagevec to folio_batchMatthew Wilcox (Oracle)
Remove the last usage of pagevecs. There is a slight change here; we now free the folio_batch as soon as it fills up instead of freeing the folio_batch when we try to add a page to a full batch. This should have no effect in practice. Link: https://lkml.kernel.org/r/20230621164557.3510324-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23i915: convert i915_gpu_error to use a folio_batchMatthew Wilcox (Oracle)
Remove one of the last remaining users of pagevec. Link: https://lkml.kernel.org/r/20230621164557.3510324-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23pagevec: rename fbatch_count()Matthew Wilcox (Oracle)
This should always have been called folio_batch_count(). Link: https://lkml.kernel.org/r/20230621164557.3510324-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: remove check_move_unevictable_pages()Matthew Wilcox (Oracle)
All callers have now been converted to call check_move_unevictable_folios(). Link: https://lkml.kernel.org/r/20230621164557.3510324-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23drm: convert drm_gem_put_pages() to use a folio_batchMatthew Wilcox (Oracle)
Remove a few hidden compound_head() calls by converting the returned page to a folio once and using the folio APIs. Link: https://lkml.kernel.org/r/20230621164557.3510324-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23i915: convert shmem_sg_free_table() to use a folio_batchMatthew Wilcox (Oracle)
Remove a few hidden compound_head() calls by converting the returned page to a folio once and using the folio APIs. We also only increment the refcount on the folio once instead of once for each page. Ideally, we would have a for_each_sgt_folio macro, but until then this will do. Link: https://lkml.kernel.org/r/20230621164557.3510324-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23scatterlist: add sg_set_folio()Matthew Wilcox (Oracle)
This wrapper for sg_set_page() lets drivers add folios to a scatterlist more easily. We could, perhaps, do better by using a different page in the folio if offset is larger than UINT_MAX, but let's hope we get a better data structure than this before we need to care about such large folios. Link: https://lkml.kernel.org/r/20230621164557.3510324-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: add __folio_batch_release()Matthew Wilcox (Oracle)
This performs the same role as __pagevec_release(), ie skipping the check for batch length of 0. Link: https://lkml.kernel.org/r/20230621164557.3510324-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23afs: convert pagevec to folio_batch in afs_extend_writeback()Matthew Wilcox (Oracle)
Patch series "Remove pagevecs". Removes a folio->page->folio conversion for each folio that's involved. More importantly, removes one of the last few uses of a pagevec. Link: https://lkml.kernel.org/r/20230621164557.3510324-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230621164557.3510324-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: page_alloc: use the correct type of list for free pagesBaolin Wang
Commit bf75f200569d ("mm/page_alloc: add page->buddy_list and page->pcp_list") introduces page->buddy_list and page->pcp_list as a union with page->lru, but missed to change get_page_from_free_area() to use page->buddy_list to clarify the correct type of list for a free page. Link: https://lkml.kernel.org/r/7e7ab533247d40c0ea0373c18a6a48e5667f9e10.1687333557.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23mm: backing-dev: make bdi_class a static const structureIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, move the bdi_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at load time. Link: https://lkml.kernel.org/r/20230620183314.682822-2-gregkh@linuxfoundation.org Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>