git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-01-04	filemap: Convert tracing of page cache operations to folio	Matthew Wilcox (Oracle)
	Pass the folio instead of a page. The page was already implicitly a folio as it accessed page->mapping directly. Add the order of the folio to the tracepoint, as this is important information. Also drop printing the address of the struct page as the pfn provides better information than the struct page address. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	filemap: Add filemap_unaccount_folio()	Matthew Wilcox (Oracle)
	Replace unaccount_page_cache_page() with filemap_unaccount_folio(). The bug handling path could be a bit more robust (eg taking into account the mapcounts of tail pages), but it's really never supposed to happen. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	filemap: Convert page_cache_delete to take a folio	Matthew Wilcox (Oracle)
	It was already assuming a head page, so this is a straightforward conversion. Convert the one caller to call page_folio(), even though it must currently be passing in a head page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	filemap: Add folio_put_wait_locked()	Matthew Wilcox (Oracle)
	Convert all three callers of put_and_wait_on_page_locked() to folio_put_wait_locked(). This shrinks the kernel overall by 19 bytes. filemap_update_page() shrinks by 19 bytes while __migration_entry_wait() is unchanged. folio_put_wait_locked() is 14 bytes smaller than put_and_wait_on_page_locked(), but pmd_migration_entry_wait() grows by 14 bytes. It removes the assumption from pmd_migration_entry_wait() that pages cannot be larger than a PMD (which is true today, but may be interesting to explore in the future). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	mm: Add folio_test_pmd_mappable()	Matthew Wilcox (Oracle)
	Add a predicate to determine if the folio might be mapped by a PMD entry. If CONFIG_TRANSPARENT_HUGEPAGE is disabled, we know it can't be, even if it's large enough. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	iov_iter: Convert iter_xarray to use folios	Matthew Wilcox (Oracle)
	Take advantage of how kmap_local_folio() works to simplify the loop. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	iov_iter: Add copy_folio_to_iter()	Matthew Wilcox (Oracle)
	This wrapper around copy_page_to_iter() works because copy_page_to_iter() handles compound pages correctly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04	dm btree spine: eliminate duplicate le32_to_cpu() in node_check()	Joe Thornber
	Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04	dm btree spine: remove extra node_check function declaration	Joe Thornber
	Should have been removed as part of commit f73e2e70ec48 ("dm btree spine: remove paranoid node_check call in node_prep_for_write()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04	EDAC/i10nm: Release mdev/mbase when failing to detect HBM	Qiuxu Zhuo
	On systems without HBM (High Bandwidth Memory) mdev/mbase are not released/unmapped. Add the code to release mdev/mbase when failing to detect HBM. [Tony: re-word commit message] Cc: <stable@vger.kernel.org> Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com
2022-01-04	Merge tag 'mac80211-next-for-net-next-2022-01-04' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Just a few more changes: - mac80211: allow non-standard VHT MCSes 10/11 - mac80211: add sleepable station iterator for drivers - nl80211: clarify a comment - mac80211: small cleanup to use typed element helpers * tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: mac80211: use ieee80211_bss_get_elem() nl80211: clarify comment for mesh PLINK_BLOCKED state mac80211: Add stations iterator where the iterator function may sleep mac80211: allow non-standard VHT MCS-10/11 ==================== Link: https://lore.kernel.org/r/20220104153403.69749-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04	erofs: use meta buffers for zmap operations	Gao Xiang
	Get rid of old erofs_get_meta_page() within zmap operations by using on-stack meta buffers in order to prepare subpage and folio features. Finally, erofs_get_meta_page() is useless. Get rid of it! Link: https://lore.kernel.org/r/20220102040017.51352-6-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04	erofs: use meta buffers for xattr operations	Gao Xiang
	Get rid of old erofs_get_meta_page() within xattr operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-5-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04	erofs: use meta buffers for super operations	Gao Xiang
	Get rid of old erofs_get_meta_page() within super operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102081317.109797-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04	erofs: use meta buffers for inode operations	Gao Xiang
	Get rid of old erofs_get_meta_page() within inode operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-3-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04	erofs: introduce meta buffer operations	Gao Xiang
	In order to support subpage and folio for all uncompressed files, introduce meta buffer descriptors, which can be effectively stored on stack, in place of meta page operations. This converts the uncompressed data path to meta buffers. Link: https://lore.kernel.org/r/20220102040017.51352-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-04	fs: dlm: print cluster addr if non-cluster node connects	Alexander Aring
	This patch prints the cluster node address if a non-cluster node (according to the dlm config setting) tries to connect. The current hexdump call will print in a different loglevel and only available if dynamic debug is enabled. Additional we using the ip address format strings to print an IETF ip4/6 string represenation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-01-04	x86: intel_epb: Allow model specific normal EPB value	Srinivas Pandruvada
	The current EPB "normal" is defined as 6 and set whenever power-up EPB value is 0. This setting resulted in the desired out of box power and performance for several CPU generations. But this value is not suitable for AlderLake mobile CPUs, as this resulted in higher uncore power. Since EPB is model specific, this is not unreasonable to have different behavior. Allow a capability where "normal" EPB can be redefined. For AlderLake mobile CPUs this desired normal value is 7. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-04	Merge tag 'mac80211-for-net-2022-01-04' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more changes: - mac80211: initialize a variable to avoid using it uninitialized - mac80211 mesh: put some data structures into the container to fix bugs with and not have to deal with allocation failures * tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit ==================== Link: https://lore.kernel.org/r/20220104144449.64937-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04	regulator: remove redundant ret variable	Minghao Chi
	Return value from regmap_update_bits() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Link: https://lore.kernel.org/r/20220104104139.601031-1-chi.minghao@zte.com.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-04	spi: ar934x: fix transfer size	Oskari Lemmela
	If bits_per_word is configured, transfer only word amount of data per iteration. Signed-off-by: Oskari Lemmela <oskari@lemmela.net> Link: https://lore.kernel.org/r/20211222055958.1383233-2-oskari@lemmela.net Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-04	arm64: perf: Don't register user access sysctl handler multiple times	Will Deacon
	Commit e2012600810c ("arm64: perf: Add userspace counter access disable switch") introduced a new 'perf_user_access' sysctl file to enable and disable direct userspace access to the PMU counters. Sadly, Geert reports that on his big.LITTLE SoC ('Renesas Salvator-XS w/ R-Car H3'), the file is created for each PMU type probed, resulting in a splat during boot: \| hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available \| sysctl duplicate entry: /kernel//perf_user_access \| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3-arm64-renesas-00003-ge2012600810c #1420 \| Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT) \| Call trace: \| dump_backtrace+0x0/0x190 \| show_stack+0x14/0x20 \| dump_stack_lvl+0x88/0xb0 \| dump_stack+0x14/0x2c \| __register_sysctl_table+0x384/0x818 \| register_sysctl+0x20/0x28 \| armv8_pmu_init.constprop.0+0x118/0x150 \| armv8_a57_pmu_init+0x1c/0x28 \| arm_pmu_device_probe+0x1b4/0x558 \| armv8_pmu_device_probe+0x18/0x20 \| platform_probe+0x64/0xd0 \| hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available Introduce a state variable to track creation of the sysctl file and ensure that it is only created once. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: e2012600810c ("arm64: perf: Add userspace counter access disable switch") Link: https://lore.kernel.org/r/CAMuHMdVcDxR9sGzc5pcnORiotonERBgc6dsXZXMd6wTvLGA9iw@mail.gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04	mac80211: use ieee80211_bss_get_elem()	Johannes Berg
	Instead of ieee80211_bss_get_ie(), use the more typed ieee80211_bss_get_elem(). Link: https://lore.kernel.org/r/20211220113609.56f8e2a70152.Id5a56afb8a4f9b38d10445e5a1874e93e84b5251@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	nl80211: clarify comment for mesh PLINK_BLOCKED state	Felix Fietkau
	When a mesh link is in blocked state, it is very useful to still allow auth requests from the peer to re-establish it. When a remote node is power cycled, the peer state can easily end up in blocked state if multiple auth attempts are performed. Since this can lead to several minutes of downtime, we should accept auth attempts of the peer after it has come back. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20211220105147.88625-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	mac80211: Add stations iterator where the iterator function may sleep	Martin Blumenstingl
	ieee80211_iterate_active_interfaces() and ieee80211_iterate_active_interfaces_atomic() already exist, where the former allows the iterator function to sleep. Add ieee80211_iterate_stations() which is similar to ieee80211_iterate_stations_atomic() but allows the iterator to sleep. This is needed for adding SDIO support to the rtw88 driver. Some interators there are reading or writing registers. With the SDIO ops (sdio_readb, sdio_writeb and friends) this means that the iterator function may sleep. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20211228211501.468981-2-martin.blumenstingl@googlemail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	mac80211: allow non-standard VHT MCS-10/11	Ping-Ke Shih
	Some AP can possibly try non-standard VHT rate and mac80211 warns and drops packets, and leads low TCP throughput. Rate marked as a VHT rate but data is invalid: MCS: 10, NSS: 2 WARNING: CPU: 1 PID: 7817 at net/mac80211/rx.c:4856 ieee80211_rx_list+0x223/0x2f0 [mac8021 Since commit c27aa56a72b8 ("cfg80211: add VHT rate entries for MCS-10 and MCS-11") has added, mac80211 adds this support as well. After this patch, throughput is good and iw can get the bitrate: rx bitrate: 975.1 MBit/s VHT-MCS 10 80MHz short GI VHT-NSS 2 or rx bitrate: 1083.3 MBit/s VHT-MCS 11 80MHz short GI VHT-NSS 2 Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192891 Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20220103013623.17052-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	ACPI / x86: Skip AC and battery devices on x86 Android tablets with broken DSDTs	Hans de Goede
	So far all of the tablets for which the skip i2c-client/serdev enumeration quirks have been added also all have broken ACPI AC / battery devices extend the existing quirks for these tablets to also skip the broken AC / battery devices. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-04	ACPI / x86: Introduce an acpi_quirk_skip_acpi_ac_and_battery() helper	Hans de Goede
	Some x86 ACPI boards have broken AC and battery ACPI devices in their ACPI tables. This is often tied to these devices using certain PMICs where the factory OS image seems to be using native charger and fuel-gauge drivers instead. So far both the AC and battery drivers have almost identical checks for these PMICs including both of them having a DMI based mechanism to force usage of the ACPI AC and battery drivers on some boards even though one of these PMICs is present, with the same 2 boards listed in both driver's DMI tables for this. The only difference is that the AC driver checks for 2 PMICs and the battery driver only for one. This has grown this way because the other (Whiskey Cove) PMIC is only used on a few boards (3 known boards) and although some of these do have non working ACPI battery devices, their _STA method always returns 0, but that really should not be relied on. This patch factors out the shared checks into a new acpi_quirk_skip_acpi_ac_and_battery() helper and moves the AC and battery drivers over to this new helper. Note the DMI table is shared with acpi_quirk_skip_i2c_client_enumeration() and acpi_quirk_skip_serdev_enumeration(), because boards needing DMI quirks for either of these typically also have broken AC and battery ACPI devices. The ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY quirk is not set yet on boards already in this DMI table, to avoid introducing any functional changes in this refactoring patch. Besided sharing the code between the AC and battery drivers this refactoring also moves this quirk handling to under #ifdef CONFIG_X86, removing this x86 specific code from non x86 ACPI builds. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-04	Merge branch 'acpi-scan' into acpi-x86	Rafael J. Wysocki
	Merge recent device enumeration changes to satisfy dependencies.
2022-01-04	RDMA/rxe: Prevent double freeing rxe_map_set()	Li Zhijian
	The same rxe_map_set could be freed twice: rxe_reg_user_mr() -> rxe_mr_init_user() -> rxe_mr_free_map_set() # 1st -> rxe_drop_ref() ... -> rxe_mr_cleanup() -> rxe_mr_free_map_set() # 2nd Follow normal convection and put resource cleanup either in the error unwind of the allocator, or the overall free function. Leave the object unchanged with a NULL cur_map_set on failure and remove the unncessary free in rxe_mr_init_user(). Link: https://lore.kernel.org/r/20211228014406.1033444-1-lizhijian@cn.fujitsu.com Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-04	mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh	Pavel Skripkin
	Syzbot hit NULL deref in rhashtable_free_and_destroy(). The problem was in mesh_paths and mpp_paths being NULL. mesh_pathtbl_init() could fail in case of memory allocation failure, but nobody cared, since ieee80211_mesh_init_sdata() returns void. It led to leaving 2 pointers as NULL. Syzbot has found null deref on exit path, but it could happen anywhere else, because code assumes these pointers are valid. Since all ieee80211_*_setup_sdata functions are void and do not fail, let's embedd mesh_paths and mpp_paths into parent struct to avoid adding error handling on higher levels and follow the pattern of others setup_sdata functions Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable") Reported-and-tested-by: syzbot+860268315ba86ea6b96b@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20211230195547.23977-1-paskripkin@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	mac80211: initialize variable have_higher_than_11mbit	Tom Rix
	Clang static analysis reports this warnings mlme.c:5332:7: warning: Branch condition evaluates to a garbage value have_higher_than_11mbit) ^~~~~~~~~~~~~~~~~~~~~~~ have_higher_than_11mbit is only set to true some of the time in ieee80211_get_rates() but is checked all of the time. So have_higher_than_11mbit needs to be initialized to false. Fixes: 5d6a1b069b7f ("mac80211: set basic rates earlier") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211223162848.3243702-1-trix@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04	drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check	Dan Carpenter
	The devm_ioremap() function does not return error pointers. It returns NULL. Fixes: 036a7584bede ("drivers: perf: Add LLC-TAD perf counter support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211217145907.GA16611@kili Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04	perf/smmuv3: Fix unused variable warning when CONFIG_OF=n	Will Deacon
	The kbuild robot reports that building the SMMUv3 PMU driver with CONFIG_OF=n results in a warning for W=1 builds: >> drivers/perf/arm_smmuv3_pmu.c:889:34: warning: unused variable 'smmu_pmu_of_match' [-Wunused-const-variable] static const struct of_device_id smmu_pmu_of_match[] = { ^ Guard the match table with #ifdef CONFIG_OF. Link: https://lore.kernel.org/r/202201041700.01KZEzhb-lkp@intel.com Fixes: 3f7be4356176 ("perf/smmuv3: Add devicetree support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04	headers/uninline: Uninline single-use function: kobject_has_children()	Ingo Molnar
	This was the only usage of <linux/kref_api.h> in <linux/kobject_api.h>, so we'll able to decouple the two after this change. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-04	ethernet/sfc: remove redundant rc variable	Minghao Chi
	Return value from efx_mcdi_rpc() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	Merge branch 'namespacify-mtu-ipv4'	David S. Miller
	xu xin says: ==================== ipv4: Namespaceify two sysctls related with mtu The following patch series enables the min_pmtu and mtu_expires to be visible and configurable per net namespace. Different namespace application might have different requirements on the setting of min_pmtu and mtu_expires. If these two patches are applied, inside a net namespace we create, we can see two more sysctls under /proc/sys/net/ipv4/route: 1. min_pmtu 2. mtu_expires where min_pmtu and mtu_expires are configurable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	Namespaceify mtu_expires sysctl	xu xin
	This patch enables the sysctl mtu_expires to be configured per net namespace. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	Namespaceify min_pmtu sysctl	xu xin
	This patch enables the sysctl min_pmtu to be configured per net namespace. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc	Eric Dumazet
	tx_queue_len can be set to ~0U, we need to be more careful about overflows. __fls(0) is undefined, as this report shows: UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24 shift exponent 51770272 is too large for 32-bit type 'int' CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330 qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430 qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253 tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x5b9/0x910 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmsg+0x280/0x370 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	netrom: fix copying in user data in nr_setsockopt	Christoph Hellwig
	This code used to copy in an unsigned long worth of data before the sockptr_t conversion, so restore that. Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	net: fixup build after bpf header changes	Jakub Kicinski
	Recent bpf-next merge brought in header changes which uncovered includes missing in net-next which were not present in bpf-next. Build problems happen only on less-popular arches like hppa, sparc, alpha etc. I could repro the build problem with ice but not the mlx5 problem Abdul was reporting. mlx5 does look like it should include filter.h, anyway. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: e63a02348958 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") Link: https://lore.kernel.org/all/7c03768d-d948-c935-a7ab-b1f963ac7eed@linux.vnet.ibm.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	net: lantiq_xrx200: add ingress SG DMA support	Aleksander Jan Bajkowski
	This patch adds support for scatter gather DMA. DMA in PMAC splits the packet into several buffers when the MTU on the CPU port is less than the MTU of the switch. The first buffer starts at an offset of NET_IP_ALIGN. In subsequent buffers, dma ignores the offset. Thanks to this patch, the user can still connect to the device in such a situation. For normal configurations, the patch has no effect on performance. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	Merge branch 'srv6-traceroute'	David S. Miller
	Andrew Lunn says: ==================== Fix traceroute in the presence of SRv6 When using SRv6 the destination IP address in the IPv6 header is not always the true destination, it can be a router along the path that SRv6 is using. When ICMP reports an error, e.g, time exceeded, which is what traceroute uses, it included the packet which invoked the error into the ICMP message body. Upon receiving such an ICMP packet, the invoking packet is examined and an attempt is made to find the socket which sent the packet, so the error can be reported. Lookup is performed using the source and destination address. If the intermediary router IP address from the IP header is used, the lookup fails. It is necessary to dig into the header and find the true destination address in the Segment Router header, SRH. v2: Play games with the skb->network_header rather than clone the skb v3: Move helpers into seg6.c v4: Move short helper into header file. Rework getting SRH destination address v5: Fix comment to describe function, not caller Patch 1 exports a helper which can find the SRH in a packet Patch 2 does the actual examination of the invoking packet Patch 3 makes use of the results when trying to find the socket. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	udp6: Use Segment Routing Header for dest address if present	Andrew Lunn
	When finding the socket to report an error on, if the invoking packet is using Segment Routing, the IPv6 destination address is that of an intermediate router, not the end destination. Extract the ultimate destination address from the segment address. This change allows traceroute to function in the presence of Segment Routing. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	icmp: ICMPV6: Examine invoking packet for Segment Route Headers.	Andrew Lunn
	RFC8754 says: ICMP error packets generated within the SR domain are sent to source nodes within the SR domain. The invoking packet in the ICMP error message may contain an SRH. Since the destination address of a packet with an SRH changes as each segment is processed, it may not be the destination used by the socket or application that generated the invoking packet. For the source of an invoking packet to process the ICMP error message, the ultimate destination address of the IPv6 header may be required. The following logic is used to determine the destination address for use by protocol-error handlers. * Walk all extension headers of the invoking IPv6 packet to the routing extension header preceding the upper-layer header. - If routing header is type 4 Segment Routing Header (SRH) o The SID at Segment List[0] may be used as the destination address of the invoking packet. Mangle the skb so the network header points to the invoking packet inside the ICMP packet. The seg6 helpers can then be used on the skb to find any segment routing headers. If found, mark this fact in the IPv6 control block of the skb, and store the offset into the packet of the SRH. Then restore the skb back to its old state. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	seg6: export get_srh() for ICMP handling	Andrew Lunn
	An ICMP error message can contain in its message body part of an IPv6 packet which invoked the error. Such a packet might contain a segment router header. Export get_srh() so the ICMP code can make use of it. Since his changes the scope of the function from local to global, add the seg6_ prefix to keep the namespace clean. And move it into seg6.c so it is always available, not just when IPV6_SEG6_LWTUNNEL is enabled. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	phy: nxp-c45-tja11xx: add extts and perout support	Radu Pirea (NXP OSS)
	Add support for external timestamp and periodic signal output. TJA1103 have one periodic signal and one external time stamp signal that can be multiplexed on all 11 gpio pins. The periodic signal can be only enabled or disabled. Have no start time and if is enabled will be generated with a period of one second in sync with the LTC seconds counter. The phase change is possible only with a half of a second. The external timestamp signal has no interrupt and no valid bit and that's why the timestamps are handled by polling in .do_aux_work. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	Merge branch 'act_tc-offload-originating-device'	David S. Miller
	Paul Blakey says: ==================== net/sched: Pass originating device to drivers offloading ct connection Currently, drivers register to a ct zone that can be shared by multiple devices. This can be inefficient for the driver to offload, as it needs to handle all the cases where the tuple can come from, instead of where it's most likely will arive from. For example, consider the following tc rules: tc filter add dev dev1 ... flower action ct commit zone 5 \ action mirred egress redirect dev dev2 tc filter add dev dev2 ... flower action ct zone 5 \ action goto chain chain 2 tc filter add dev dev2 ... flower ct_state +trk+est ... \ action mirred egress redirect dev dev1 Both dev2 and dev1 register to the zone 5 flow table (created by act_ct). A tuple originating on dev1, going to dev2, will be offloaded to both devices, and both will need to offload both directions, resulting in 4 total rules. The traffic will only hit originiating tuple on dev1, and reply tuple on dev2. By passing the originating device that created the connection with the tuple, dev1 can choose to offload only the originating tuple, and dev2 only the reply tuple. Resulting in a more efficient offload. The first patch adds an act_ct nf conntrack extension, to temporarily store the originiating device from the skb before offloading the connection once the connection is established. Once sent to offload, it fills the tuple originating device. The second patch get this information from tuples which pass in openvswitch. The third patch is Mellanox driver ct offload implementation using this information to provide a hint to firmware of where this offloaded tuple packets will arrive from (LOCAL or UPLINK port), and thus increase insertion rate. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04	net/mlx5: CT: Set flow source hint from provided tuple device	Paul Blakey
	Get originating device from tuple offload metadata match ingress_ifindex, and set flow_source hint to either LOCAL for vf/sf reps, UPLINK for uplink/wire/tunnel devices/bond, or ANY (as before this patch) for all others. This allows lower layer (software steering or firmware) to insert the tuple rule only in one table (either rx or tx) instead of two (rx and tx). Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>