summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-26sunhme: Use (net)dev_foo wherever possibleSean Anderson
Wherever possible, use the associated netdev (or device) when printing errors or other messages. This makes it immediately clear what device caused the error, and provides more information than just the device name. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Convert printk(KERN_FOO ...) to pr_foo(...)Sean Anderson
This is a mostly-mechanical translation of the existing printks into pr_foos. In several places, I have pasted messages which were broken over several lines to allow for easier grepping. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Clean up debug infrastructureSean Anderson
Remove all the single-use debug conditionals, and just collect the debug defines at the top of the file. HMD seems like it is used for general debug info, so just redefine it as pr_debug. Additionally, instead of using the default loglevel, use the debug loglevel for debugging. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Convert FOO((...)) to FOO(...)Sean Anderson
With the power of variadic macros, double parentheses are unnecessary. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: switch to devresRolf Eike Beer
This not only removes a lot of code, it also fixes the memleak of the DMA memory when register_netdev() fails. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> [ rebased onto net-next/master; fixed error reporting ] Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Regularize probe errorsSean Anderson
This fixes several error paths to ensure they return an appropriate error (instead of ENODEV). Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Return an ERR_PTR from quattro_pci_findSean Anderson
In order to differentiate between a missing bridge and an OOM condition, return ERR_PTRs from quattro_pci_find. This also does some general linting in the area. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: forward the error code from pci_enable_device()Rolf Eike Beer
This already returns a proper error value, so pass it to the caller. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: Remove versionSean Anderson
Module versions are not very useful: > The basic problem is, the version string does not identify the sources > with enough accuracy. It says nothing about back ported fixes in > stable kernels. It tells you nothing about vendor patches to the > network core, etc. https://lore.kernel.org/all/Yf6mtvA1zO7cdzr7@lunn.ch/ While we're at it, inline the author and use the driver name a bit more. Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26sunhme: remove unused tx_dump_ring()Rolf Eike Beer
I can't find a reference to it in the entire git history. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge branch 'net-dsa-remove-unnecessary-i2c_set_clientdata'Jakub Kicinski
Yang Yingliang says: ==================== net: dsa: remove unnecessary i2c_set_clientdata() This patchset https://lore.kernel.org/all/20220921140524.3831101-8-yangyingliang@huawei.com/T/ removed all set_drvdata(NULL) in driver remove function. i2c_set_clientdata() is another wrapper of set drvdata function, to follow the same convention, remove i2c_set_clientdata() called in driver remove function in drivers/net/dsa/. ==================== Link: https://lore.kernel.org/r/20220923143742.87093-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: xrs700x: remove unnecessary i2c_set_clientdata()Yang Yingliang
Remove unnecessary i2c_set_clientdata() in ->remove(), the driver_data will be set to NULL in device_unbind_cleanup() after calling ->remove(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: ksz9477: remove unnecessary i2c_set_clientdata()Yang Yingliang
Remove unnecessary i2c_set_clientdata() in ->remove(), the driver_data will be set to NULL in device_unbind_cleanup() after calling ->remove(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: lan9303: remove unnecessary i2c_set_clientdata()Yang Yingliang
Remove unnecessary i2c_set_clientdata() in ->remove(), the driver_data will be set to NULL in device_unbind_cleanup() after calling ->remove(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205Niklas Cassel
Commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") added an explicit entry for AMD Green Sardine AHCI controller using the board_ahci_mobile configuration (this configuration has later been renamed to board_ahci_low_power). The board_ahci_low_power configuration enables support for low power modes. This explicit entry takes precedence over the generic AHCI controller entry, which does not enable support for low power modes. Therefore, when commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") was backported to stable kernels, it make some Pioneer optical drives, which was working perfectly fine before the commit was backported, stop working. The real problem is that the Pioneer optical drives do not handle low power modes correctly. If these optical drives would have been tested on another AHCI controller using the board_ahci_low_power configuration, this issue would have been detected earlier. Unfortunately, the board_ahci_low_power configuration is only used in less than 15% of the total AHCI controller entries, so many devices have never been tested with an AHCI controller with low power modes. Fixes: 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") Cc: stable@vger.kernel.org Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-26Merge tag 'x86_urgent_for_v6.0-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: - A performance fix for recent large AMD systems that avoids an ancient cpu idle hardware workaround - A new Intel model number. Folks like these upstream as soon as possible so that each developer doing feature development doesn't need to carry their own #define - SGX fixes for a userspace crash and a rare kernel warning * tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems x86/sgx: Handle VA page allocation failure for EAUG on PF. x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd x86/cpu: Add CPU model numbers for Meteor Lake
2022-09-26ARM: dts: integrator: Fix DMA rangesLinus Walleij
A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges. The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain: drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma() phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map. There is a bug in the device trees because the DMA ranges are incorrectly specified, and the patch uncovers this bug. Solution: - Fix the LB (logic bus) ranges to be 1-to-1 like they should have always been. - Provide a 1:1 dma-ranges attribute to the PL110. - Mark the PL110 display controller as DMA coherent. This makes the DMA ranges work right and makes the PL110 framebuffer work again. Fixes: af6f23b88e95 ("ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220926073311.1610568-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-26xdp: Adjust xdp_frame layout to avoid using bitfieldsJesper Dangaard Brouer
Practical experience (and advice from Alexei) tell us that bitfields in structs lead to un-optimized assembly code. I've verified this change does lead to better x86_64 assembly, both via objdump and playing with code snippets in godbolt.org. Using scripts/bloat-o-meter shows the code size is reduced with 24 bytes for xdp_convert_buff_to_frame() that gets inlined e.g. in i40e_xmit_xdp_tx_ring() which were used for microbenchmarking. Microbenchmarking results do show improvements, but very small and varying between 0.5 to 2 nanosec improvement per packet. The member @metasize is changed from u8 to u32. Future users of this area could split this into two u16 fields. I've also benchmarked with two u16 fields showing equal performance gains and code size reduction. The moved member @frame_sz doesn't change sizeof struct due to existing padding. Like xdp_buff member @frame_sz is placed next to @flags, which allows compiler to optimize assignment of these. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/166393728005.2213882.4162674859542409548.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge tag 'mm-hotfixes-stable-2022-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull last (?) hotfixes from Andrew Morton: "26 hotfixes. 8 are for issues which were introduced during this -rc cycle, 18 are for earlier issues, and are cc:stable" * tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) x86/uaccess: avoid check_object_size() in copy_from_user_nmi() mm/page_isolation: fix isolate_single_pageblock() isolation behavior mm,hwpoison: check mm when killing accessing process mm/hugetlb: correct demote page offset logic mm: prevent page_frag_alloc() from corrupting the memory mm: bring back update_mmu_cache() to finish_fault() frontswap: don't call ->init if no ops are registered mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() mm: fix madivse_pageout mishandling on non-LRU page powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush mm: gup: fix the fast GUP race against THP collapse mm: fix dereferencing possible ERR_PTR vmscan: check folio_test_private(), not folio_get_private() mm: fix VM_BUG_ON in __delete_from_swap_cache() tools: fix compilation after gfp_types.h split mm/damon/dbgfs: fix memory leak when using debugfs_lookup() mm/migrate_device.c: copy pte dirty bit to page mm/migrate_device.c: add missing flush_cache_page() mm/migrate_device.c: flush TLB while holding PTL x86/mm: disable instrumentations of mm/pgprot.c ...
2022-09-26net: hippi: Add missing pci_disable_device() in rr_init_one()ruanjinjie
Add missing pci_disable_device() if rr_init_one() fails Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20220923094320.3109154-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge branch 'improve-tsn_lib-selftests-for-future-distributed-tasks'Jakub Kicinski
Vladimir Oltean says: ==================== Improve tsn_lib selftests for future distributed tasks Some of the boards I am working with are limited in the number of ports that they offer, and as more TSN related selftests are added, it is important to be able to distribute the work among multiple boards. A large part of implementing that is ensuring network-wide synchronization, but also permitting more streams of data to flow through the network. There is the more important aspect of also coordinating the timing characteristics of those streams, and that is also something that is tackled, although not in this modest patch set. The goal here is not to introduce new selftests yet, but just to lay a better foundation for them. These patches are a part of the cleanup work I've done while working on selftests for frame preemption. They are regression-tested with psfp.sh. ==================== Link: https://lore.kernel.org/r/20220923210016.3406301-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26selftests: net: tsn_lib: run phc2sys in automatic modeVladimir Oltean
We can make the phc2sys helper not only synchronize a PHC to CLOCK_REALTIME, which is what it currently does, but also CLOCK_REALTIME to a PHC, which is going to be needed in distributed TSN tests. Instead of making the complexity of the arguments passed to phc2sys_start() explode, we can let it figure out the sync direction automatically, based on ptp4l's port states. Towards that goal, pass just the path to the desired ptp4l instance's UNIX domain socket, and remove the $if_name argument (from which it derives the PHC). Also adapt the one caller from the ocelot psfp.sh test. In the case of psfp.sh, phc2sys_start is able to properly figure out that CLOCK_REALTIME is the source clock and swp1's PHC is the destination, because of the way in which ptp4l_start for the UDS_ADDRESS_SWP1 was called: with slave_only=false, so it will always win the BMCA and always become the sync master between itself and $h1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26selftests: net: tsn_lib: allow multiple isochron receiversVladimir Oltean
Move the PID variable for the isochron receiver into a separate namespace per stats port, to allow multiple receivers (and/or orchestration daemons) to be instantiated by the same script. Preserve the existing behavior by making isochron_do() use the default stats TCP port of 5000. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26selftests: net: tsn_lib: allow running ptp4l on multiple interfacesVladimir Oltean
Switch ports will want to act as Boundary Clocks, which are configured using ptp4l by specifying the "-i" argument multiple times. Since we track a log file and a pid file for each ptp4l instance, and we want to be compatible with the existing single-port callers of ptp4l_start and ptp4l_stop, pass the interface list as a single string of space-separated values. Based on this, we create a label for each ptp4l instance, where the spaces are replaced with underscores (ptp4l_start "eth0 eth1" generates "ptp4l_pid_eth0_eth1"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26selftests: net: tsn_lib: don't overwrite isochron receiver extra args with UDSVladimir Oltean
The extra_args argument ($3) of isochron_recv_start is overwritten with uds ($2), if that argument exists. This is currently not a problem, because the only TSN selftest (ocelot/psfp.sh) omits remote sync so it does not specify to the receiver a UNIX domain socket for ptp4l. So $uds is currently an empty string. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probePeng Wu
The devm_ioremap() function returns NULL on error, it doesn't return error pointers. Fixes: 3a1a274e933f ("mlxbf_gige: compute MDIO period based on i1clk") Signed-off-by: Peng Wu <wupeng58@huawei.com> Link: https://lore.kernel.org/r/20220923023640.116057-1-wupeng58@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26cxgb4: fix missing unlock on ETHOFLD desc collect fail pathRafael Mendonca
The label passed to the QDESC_GET for the ETHOFLD TXQ, RXQ, and FLQ, is the 'out' one, which skips the 'out_unlock' label, and thus doesn't unlock the 'uld_mutex' before returning. Additionally, since commit 5148e5950c67 ("cxgb4: add EOTID tracking and software context dump"), the access to these ETHOFLD hardware queues should be protected by the 'mqprio_mutex' instead. Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support") Fixes: 5148e5950c67 ("cxgb4: add EOTID tracking and software context dump") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Link: https://lore.kernel.org/r/20220922175109.764898-1-rafaelmendsr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge tag 'ext4_for_linus_fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull missed ext4 fix from Ted Ts'o: "Fix an potential unitialzied variable bug; this was a fixup that I had forgotten to apply before the last pull request for ext4. My bad" * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fixup possible uninitialized variable access in ext4_mb_choose_next_group_cr1()
2022-09-26net: ethernet: adin1110: Add missing MODULE_DEVICE_TABLEYang Yingliang
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220922070438.586692-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: vertexcom: mse102x: Silence no spi_device_id warningsWei Yongjun
SPI devices use the spi_device_id for module autoloading even on systems using device tree, after commit 5fa6863ba692 ("spi: Check we have a spi_device_id for each DT compatible"), kernel warns as follows since the spi_device_id is missing: SPI driver mse102x has no spi_device_id for vertexcom,mse1021 SPI driver mse102x has no spi_device_id for vertexcom,mse1022 Add spi_device_id entries to silence the warnings, and ensure driver module autoloading works. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20220922065717.1448498-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: ethernet: adi: Fix return value check in adin1110_probe_netdevs()Wei Yongjun
In case of error, the function get_phy_device() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20220922021023.811581-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge branch ↵Jakub Kicinski
'net-dsa-microchip-ksz9477-enable-interrupt-for-internal-phy-link-detection' Arun Ramadoss says: ==================== net: dsa: microchip: ksz9477: enable interrupt for internal phy link detection This patch series implements the common interrupt handling for ksz9477 based switches and lan937x. The ksz9477 and lan937x has similar interrupt registers except ksz9477 has 4 port based interrupts whereas lan937x has 6 interrupts. The patch moves the phy interrupt hanler implemented in lan937x_main.c to ksz_common.c, along with the mdio_register functionality. ==================== Link: https://lore.kernel.org/r/20220922071028.18012-1-arun.ramadoss@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: phy: micrel: enable interrupt for ksz9477 phyArun Ramadoss
Config_intr and handle_interrupt are enabled for ksz9477 phy. It is similar to all other phys in the micrel phys. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: use common irq routines for girq and pirqArun Ramadoss
The global port interrupt routines and individual ports interrupt routines has similar implementation except the mask & status register and number of nested irqs in them. The mask & status register and pointer to ksz_device is added to ksz_irq and uses the ksz_irq as irq_chip_data. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: move interrupt handling logic from lan937x to ksz_commonArun Ramadoss
To support the phy link detection through interrupt method for ksz9477 based switch, the interrupt handling routines are moved from lan937x_main.c to ksz_common.c. The only changes made are functions names are prefixed with ksz_ instead of lan937x_. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: lan937x: return zero if mdio node not presentArun Ramadoss
Currently, if the mdio node is not present in the dts file then lan937x_mdio_register return -ENODEV and entire probing process fails. To make the mdio_register generic for all ksz series switches and to maintain back-compatibility with existing dts file, return -ENODEV is replaced with return 0. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: enable phy interrupts only if interrupt enabled in dtsArun Ramadoss
In the lan937x_mdio_register function, phy interrupts are enabled irrespective of irq is enabled in the switch. Now, the check is added to enable the phy interrupt only if the irq is enabled in the switch. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: dsa: microchip: determine number of port irq based on switch typeArun Ramadoss
Currently the number of port irqs is hard coded for the lan937x switch as 6. In order to make the generic interrupt handler for ksz switches, number of port irq supported by the switch is added to the ksz_chip_data. It is 4 for ksz9477, 2 for ksz9897 and 3 for ksz9567. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net: sched: act_ct: fix possible refcount leak in tcf_ct_init()Hangyu Hua
nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params to avoid possible refcount leak when tcf_ct_flow_table_get fails. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220923020046.8021-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net/sched: taprio: simplify list iteration in taprio_dev_notifier()Vladimir Oltean
taprio_dev_notifier() subscribes to netdev state changes in order to determine whether interfaces which have a taprio root qdisc have changed their link speed, so the internal calculations can be adapted properly. The 'qdev' temporary variable serves no purpose, because we just use it only once, and can just as well use qdisc_dev(q->root) directly (or the "dev" that comes from the netdev notifier; this is because qdev is only interesting if it was the subject of the state change, _and_ its root qdisc belongs in the taprio list). The 'found' variable also doesn't really serve too much of a purpose either; we can just call taprio_set_picos_per_byte() within the loop, and exit immediately afterwards. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20220923145921.3038904-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26x86/uaccess: avoid check_object_size() in copy_from_user_nmi()Kees Cook
The check_object_size() helper under CONFIG_HARDENED_USERCOPY is designed to skip any checks where the length is known at compile time as a reasonable heuristic to avoid "likely known-good" cases. However, it can only do this when the copy_*_user() helpers are, themselves, inline too. Using find_vmap_area() requires taking a spinlock. The check_object_size() helper can call find_vmap_area() when the destination is in vmap memory. If show_regs() is called in interrupt context, it will attempt a call to copy_from_user_nmi(), which may call check_object_size() and then find_vmap_area(). If something in normal context happens to be in the middle of calling find_vmap_area() (with the spinlock held), the interrupt handler will hang forever. The copy_from_user_nmi() call is actually being called with a fixed-size length, so check_object_size() should never have been called in the first place. Given the narrow constraints, just replace the __copy_from_user_inatomic() call with an open-coded version that calls only into the sanitizers and not check_object_size(), followed by a call to raw_copy_from_user(). [akpm@linux-foundation.org: no instrument_copy_from_user() in my tree...] Link: https://lkml.kernel.org/r/20220919201648.2250764-1-keescook@chromium.org Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Yu Zhao <yuzhao@google.com> Reported-by: Florian Lehner <dev@der-flo.net> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Florian Lehner <dev@der-flo.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/page_isolation: fix isolate_single_pageblock() isolation behaviorZi Yan
set_migratetype_isolate() does not allow isolating MIGRATE_CMA pageblocks unless it is used for CMA allocation. isolate_single_pageblock() did not have the same behavior when it is used together with set_migratetype_isolate() in start_isolate_page_range(). This allows alloc_contig_range() with migratetype other than MIGRATE_CMA, like MIGRATE_MOVABLE (used by alloc_contig_pages()), to isolate first and last pageblock but fail the rest. The failure leads to changing migratetype of the first and last pageblock to MIGRATE_MOVABLE from MIGRATE_CMA, corrupting the CMA region. This can happen during gigantic page allocations. Like Doug said here: https://lore.kernel.org/linux-mm/a3363a52-883b-dcd1-b77f-f2bb378d6f2d@gmail.com/T/#u, for gigantic page allocations, the user would notice no difference, since the allocation on CMA region will fail as well as it did before. But it might hurt the performance of device drivers that use CMA, since CMA region size decreases. Fix it by passing migratetype into isolate_single_pageblock(), so that set_migratetype_isolate() used by isolate_single_pageblock() will prevent the isolation happening. Link: https://lkml.kernel.org/r/20220914023913.1855924-1-zi.yan@sent.com Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Doug Berger <opendmb@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm,hwpoison: check mm when killing accessing processShuai Xue
The GHES code calls memory_failure_queue() from IRQ context to queue work into workqueue and schedule it on the current CPU. Then the work is processed in memory_failure_work_func() by kworker and calls memory_failure(). When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") make memory_failure() call kill_accessing_process() that: - holds mmap locking of current->mm - does pagetable walk to find the error virtual address - and sends SIGBUS to the current process with error info. However, the mm of kworker is not valid, resulting in a null-pointer dereference. So check mm when killing the accessing process. [akpm@linux-foundation.org: remove unrelated whitespace alteration] Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/hugetlb: correct demote page offset logicDoug Berger
With gigantic pages it may not be true that struct page structures are contiguous across the entire gigantic page. The nth_page macro is used here in place of direct pointer arithmetic to correct for this. Mike said: : This error could cause addressing exceptions. However, this is only : possible in configurations where CONFIG_SPARSEMEM && : !CONFIG_SPARSEMEM_VMEMMAP. Such a configuration option is rare and : unknown to be the default anywhere. Link: https://lkml.kernel.org/r/20220914190917.3517663-1-opendmb@gmail.com Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: prevent page_frag_alloc() from corrupting the memoryMaurizio Lombardi
A number of drivers call page_frag_alloc() with a fragment's size > PAGE_SIZE. In low memory conditions, __page_frag_cache_refill() may fail the order 3 cache allocation and fall back to order 0; In this case, the cache will be smaller than the fragment, causing memory corruptions. Prevent this from happening by checking if the newly allocated cache is large enough for the fragment; if not, the allocation will fail and page_frag_alloc() will return NULL. Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: bring back update_mmu_cache() to finish_fault()Sergei Antonov
Running this test program on ARMv4 a few times (sometimes just once) reproduces the bug. int main() { unsigned i; char paragon[SIZE]; void* ptr; memset(paragon, 0xAA, SIZE); ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0); if (ptr == MAP_FAILED) return 1; printf("ptr = %p\n", ptr); for (i=0;i<10000;i++){ memset(ptr, 0xAA, SIZE); if (memcmp(ptr, paragon, SIZE)) { printf("Unexpected bytes on iteration %u!!!\n", i); break; } } munmap(ptr, SIZE); } In the "ptr" buffer there appear runs of zero bytes which are aligned by 16 and their lengths are multiple of 16. Linux v5.11 does not have the bug, "git bisect" finds the first bad commit: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Before the commit update_mmu_cache() was called during a call to filemap_map_pages() as well as finish_fault(). After the commit finish_fault() lacks it. Bring back update_mmu_cache() to finish_fault() to fix the bug. Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more closely reproduce the code of alloc_set_pte() function that existed before the commit. On many platforms update_mmu_cache() is nop: x86, see arch/x86/include/asm/pgtable ARMv6+, see arch/arm/include/asm/tlbflush.h So, it seems, few users ran into this bug. Link: https://lkml.kernel.org/r/20220908204809.2012451-1-saproj@gmail.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Sergei Antonov <saproj@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26frontswap: don't call ->init if no ops are registeredChristoph Hellwig
If no frontswap module (i.e. zswap) was registered, frontswap_ops will be NULL. In such situation, swapon crashes with the following stack trace: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse Unloaded tainted modules: cppc_cpufreq():1 CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1 Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : frontswap_init+0x38/0x60 lr : __do_sys_swapon+0x8a8/0x9f4 sp : ffff80000969bcf0 x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000 x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064 x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560 x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014 x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001 x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88 x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000 x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: frontswap_init+0x38/0x60 __do_sys_swapon+0x8a8/0x9f4 __arm64_sys_swapon+0x28/0x3c invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0xd4/0xf4 do_el0_svc+0x38/0x4c el0_svc+0x34/0x10c el0t_64_sync_handler+0x11c/0x150 el0t_64_sync+0x190/0x194 Code: d000e283 910003fd f9006c41 f946d461 (f9400021) ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de Fixes: 1da0d94a3ec8 ("frontswap: remove support for multiple ops") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()Naoya Horiguchi
NULL pointer dereference is triggered when calling thp split via debugfs on the system with offlined memory blocks. With debug option enabled, the following kernel messages are printed out: page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000 flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff) raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page page:000000007d7ab72e is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1248! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 16 PID: 20964 Comm: bash Tainted: G I 6.0.0-rc3-foll-numa+ #41 ... RIP: 0010:split_huge_pages_write+0xcf4/0xe30 This shows that page_to_nid() in page_zone() is unexpectedly called for an offlined memmap. Use pfn_to_online_page() to get struct page in PFN walker. Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: fix madivse_pageout mishandling on non-LRU pageMinchan Kim
MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from isolate_lru_page below. Fix it by checking PageLRU in advance. ------------[ cut here ]------------ trying to isolate tail page WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140 Modules linked in: CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:isolate_lru_page+0x130/0x140 Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/ Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: 韩天ç`• <hantianshuo@iie.ac.cn> Suggested-by: Yang Shi <shy828301@gmail.com> Acked-by: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flushYang Shi
The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will move to use RCU instead of disabling local interrupts in fast-GUP. Using an IPI is the old-styled way of serializing against fast-GUP although it still works as expected now. And fast-GUP now fixed the potential race with THP collapse by checking whether PMD is changed or not. So IPI broadcast in radix pmd collapse flush is not necessary anymore. But it is still needed for hash TLB. Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>