summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-26btrfs: defrag: remove the old infrastructureQu Wenruo
Now the old infrastructure can all be removed, defrag Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()Qu Wenruo
The function defrag_one_cluster() is able to defrag one range well enough, we only need to do preparation for it, including: - Clamp and align the defrag range - Exclude invalid cases - Proper inode locking The old infrastructures will not be removed in this patch, as it would be too noisy to review. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag one clusterQu Wenruo
This new helper, defrag_one_cluster(), will defrag one cluster (at most 256K): - Collect all initial targets - Kick in readahead when possible - Call defrag_one_range() on each initial target With some extra range clamping. - Update @sectors_defragged parameter This involves one behavior change, the defragged sectors accounting is no longer as accurate as old behavior, as the initial targets are not consistent. We can have new holes punched inside the initial target, and we will skip such holes later. But the defragged sectors accounting doesn't need to be that accurate anyway, thus I don't want to pass those extra accounting burden into defrag_one_range(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag a rangeQu Wenruo
A new helper, defrag_one_range(), is introduced to defrag one range. This function will mostly prepare the needed pages and extent status for defrag_one_locked_target(). As we can only have a consistent view of extent map with page and extent bits locked, we need to re-check the range passed in to get a real target list for defrag_one_locked_target(). Since defrag_collect_targets() will call defrag_lookup_extent() and lock extent range, we also need to teach those two functions to skip extent lock. Thus new parameter, @locked, is introduced to skip extent lock if the caller has already locked the range. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag a contiguous prepared rangeQu Wenruo
A new helper, defrag_one_locked_target(), introduced to do the real part of defrag. The caller needs to ensure both page and extents bits are locked, and no ordered extent exists for the range, and all writeback is finished. The core defrag part is pretty straight-forward: - Reserve space - Set extent bits to defrag - Update involved pages to be dirty Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to collect target file extentsQu Wenruo
Introduce a helper, defrag_collect_targets(), to collect all possible targets to be defragged. This function will not consider things like max_sectors_to_defrag, thus caller should be responsible to ensure we don't exceed the limit. This function will be the first stage of later defrag rework. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: factor out page preparation into a helperQu Wenruo
In cluster_pages_for_defrag(), we have complex code block inside one for() loop. The code block is to prepare one page for defrag, this will ensure: - The page is locked and set up properly. - No ordered extent exists in the page range. - The page is uptodate. This behavior is pretty common and will be reused by later defrag rework. So factor out the code into its own helper, defrag_prepare_one_page(), for later usage, and cleanup the code by a little. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: replace hard coded PAGE_SIZE with sectorsizeQu Wenruo
When testing subpage defrag support, I always find some strange inode nbytes error, after a lot of debugging, it turns out that defrag_lookup_extent() is using PAGE_SIZE as size for lookup_extent_mapping(). Since lookup_extent_mapping() is calling __lookup_extent_mapping() with @strict == 1, this means any extent map smaller than one page will be ignored, prevent subpage defrag to grab a correct extent map. There are quite some PAGE_SIZE usage in ioctl.c, but most of them are correct usages, and can be one of the following cases: - ioctl structure size check We want ioctl structure to be contained inside one page. - real page operations The remaining cases in defrag_lookup_extent() and check_defrag_in_cache() will be addressed in this patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: also check PagePrivate for subpage cases in ↵Qu Wenruo
cluster_pages_for_defrag() In function cluster_pages_for_defrag() we have a window where we unlock page, either start the ordered range or read the content from disk. When we re-lock the page, we need to make sure it still has the correct page->private for subpage. Thus add the extra PagePrivate check here to handle subpage cases properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: pass file_ra_state instead of file to btrfs_defrag_file()Qu Wenruo
Currently btrfs_defrag_file() accepts both "struct inode" and "struct file" as parameter. We can easily grab "struct inode" from "struct file" using file_inode() helper. The reason why we need "struct file" is just to re-use its f_ra. Change this to pass "struct file_ra_state" parameter, so that it's more clear what we really want. Since we're here, also add some comments on the function btrfs_defrag_file(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: rename and switch to bool btrfs_chunk_readonlyAnand Jain
btrfs_chunk_readonly() checks if the given chunk is writeable. It returns 1 for readonly, and 0 for writeable. So the return argument type bool shall suffice instead of the current type int. Also, rename btrfs_chunk_readonly() to btrfs_chunk_writeable() as we check if the bg is writeable, and helps to keep the logic at the parent function simpler to understand. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: reflink: initialize return value to 0 in btrfs_extent_same()Sidong Yang
Fix a warning reported by smatch that ret could be returned without initialized. The dedupe operations are supposed to to return 0 for a 0 length range but the caller does not pass olen == 0. To keep this behaviour and also fix the warning initialize ret to 0. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: pack all subpage bitmaps into a larger bitmapQu Wenruo
Currently we use u16 bitmap to make 4k sectorsize work for 64K page size. But this u16 bitmap is not large enough to contain larger page size like 128K, nor is space efficient for 16K page size. To handle both cases, here we pack all subpage bitmaps into a larger bitmap, now btrfs_subpage::bitmaps[] will be the ultimate bitmap for subpage usage. Each sub-bitmap will has its start bit number recorded in btrfs_subpage_info::*_start, and its bitmap length will be recorded in btrfs_subpage_info::bitmap_nr_bits. All subpage bitmap operations will be converted from using direct u16 operations to bitmap operations, with above *_start calculated. For 64K page size with 4K sectorsize, this should not cause much difference. While for 16K page size, we will only need 1 unsigned long (u32) to store all the bitmaps, which saves quite some space. Furthermore, this allows us to support larger page size like 128K and 258K. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26fs: remove leftover comments from mandatory locking removalJeff Layton
Stragglers from commit f7e33bdbd6d1 ("fs: remove mandatory file locking support"). Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-10-26MAINTAINERS: drop obsolete file pattern in SDHCI DRIVER sectionLukas Bulwahn
Commit 5c67aa59bd8f ("mmc: sdhci-pci: Remove dead code (struct sdhci_pci_data et al)") removes ./include/linux/mmc/sdhci-pci-data.h; so, there is no further file that matches 'include/linux/mmc/sdhci*'. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains: warning: no file matches F: include/linux/mmc/sdhci* Drop this obsolete file pattern in SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) DRIVER. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20211022054740.25222-1-lukas.bulwahn@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26mmc: sdhci-esdhc-imx: add NXP S32G2 supportChester Lin
Support the SDHCI controller found on NXP S32G2 platform. The new flag ESDHC_FLAG_SKIP_ERR004536 is used because the hardware erratum bit is not applicable for S32G2. Signed-off-by: Chester Lin <clin@suse.com> Link: https://lore.kernel.org/r/20211021071333.32485-3-clin@suse.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26dt-bindings: mmc: fsl-imx-esdhc: add NXP S32G2 supportChester Lin
Add support for the SDHC binding of S32G2. Signed-off-by: Chester Lin <clin@suse.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Link: https://lore.kernel.org/r/20211021071333.32485-2-clin@suse.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26Merge branch 'fixes' into nextUlf Hansson
2021-10-26mmc: cqhci: clear HALT state after CQE enableWenbin Mei
While mmc0 enter suspend state, we need halt CQE to send legacy cmd(flush cache) and disable cqe, for resume back, we enable CQE and not clear HALT state. In this case MediaTek mmc host controller will keep the value for HALT state after CQE disable/enable flow, so the next CQE transfer after resume will be timeout due to CQE is in HALT state, the log as below: <4>.(4)[318:kworker/4:1H]mmc0: cqhci: timeout for tag 2 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: ============ CQHCI REGISTER DUMP =========== <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Caps: 0x100020b6 | Version: 0x00000510 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Config: 0x00001103 | Control: 0x00000001 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: TDL base: 0xfd05f000 | TDL up32: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Doorbell: 0x8000203c | TCN: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00001000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: Resp idx: 0x00000000 | Resp arg: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: CRNQP: 0x00000000 | CRNQDUN: 0x00000000 <4>.(4)[318:kworker/4:1H]mmc0: cqhci: CRNQIS: 0x00000000 | CRNQIE: 0x00000000 This change check HALT state after CQE enable, if CQE is in HALT state, we will clear it. Signed-off-by: Wenbin Mei <wenbin.mei@mediatek.com> Cc: stable@vger.kernel.org Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host") Link: https://lore.kernel.org/r/20211026070812.9359-1-wenbin.mei@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26mmc: vub300: fix control-message timeoutsJohan Hovold
USB control-message timeouts are specified in milliseconds and should specifically not vary with CONFIG_HZ. Fixes: 88095e7b473a ("mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver") Cc: stable@vger.kernel.org # 3.0 Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211025115608.5287-1-johan@kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26mmc: dw_mmc: exynos: fix the finding clock sample valueJaehoon Chung
Even though there are candiates value if can't find best value, it's returned -EIO. It's not proper behavior. If there is not best value, use a first candiate value to work eMMC. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Christian Hewitt <christianshewitt@gmail.com> Cc: stable@vger.kernel.org Fixes: c537a1c5ff63 ("mmc: dw_mmc: exynos: add variable delay tuning sequence") Link: https://lore.kernel.org/r/20211022082106.1557-1-jh80.chung@samsung.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1Christoph Niedermaier
Add maintainers for DH electronics DHCOM i.MX6 and DHCOM/DHCOR STM32MP1 boards. Signed-off-by: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kernel@dh-electronics.com Cc: arnd@arndb.de Link: https://lore.kernel.org/r/20211025073706.2794-1-cniedermaier@dh-electronics.com' To: soc@kernel.org To: linux-kernel@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-26block: drain queue after disk is removed from sysfsMing Lei
Before removing disk from sysfs, userspace still may change queue via sysfs, such as switching elevator or setting wbt latency, both may reinitialize wbt, then the warning in blk_free_queue_stats() will be triggered since rq_qos_exit() is moved to del_gendisk(). Fixes the issue by moving draining queue & tearing down after disk is removed from sysfs, at that time no one can come into queue's store()/show(). Reported-by: Yi Zhang <yi.zhang@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211026101204.2897166-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26blk-mq: don't issue request directly in case that current is to be blockedMing Lei
When flushing plug list in case that current will be blocked, we can't issue request directly because ->queue_rq() may sleep, otherwise scheduler may complain. Fixes: dc5fc361d891 ("block: attempt direct issue of plug list") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211026082257.2889890-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26Merge tag 'qcom-arm64-fixes-for-5.15-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm ARM64 DTS one more fix for 5.15 This reverts a clock change in the Qualcomm RB5 devicetree which in some combinations of firmware and configuration causes the device to crash during boot. Data on an adjacent platform indicates that this is probably not be the root cause of the problem, but this resolves the regression seen on RB5 and will allow the SM8250 platform to boot v5.15. * tag 'qcom-arm64-fixes-for-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target" Link: https://lore.kernel.org/r/20211025201213.1145348-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-26MAINTAINERS: please remove myself from the Prestera driverVadym Kochan
Signed-off-by: Vadym Kochan <vkochan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: lan78xx: fix division by zero in send pathJohan Hovold
Add the missing endpoint max-packet sanity check to probe() to avoid division by zero in lan78xx_tx_bh() in case a malicious device has broken descriptors (or when doing descriptor fuzz testing). Note that USB core will reject URBs submitted for endpoints with zero wMaxPacketSize but that drivers doing packet-size calculations still need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip endpoint descriptors with maxpacket=0")). Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: stable@vger.kernel.org # 4.3 Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26Merge branch 'phy-supported-interfaces-bitmap'David S. Miller
Russell King says: ==================== Introduce supported interfaces bitmap This series introduces a new bitmap to allow us to indicate which phy_interface_t modes are supported. Currently, phylink will call ->validate with PHY_INTERFACE_MODE_NA to request all link mode capabilities from the MAC driver before choosing an interface to use. This leads in some cases to some rather hairly code. This can be simplified if phylink is aware of the interface modes that the MAC supports, and it can instead walk those modes, calling ->validate for each one, and combining the results. This series merely introduces the support; there is no change of behaviour until MAC drivers populate their supported_interfaces bitmap. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phylink: use supported_interfaces for phylink validationRussell King (Oracle)
If the network device supplies a supported interface bitmap, we can use that during phylink's validation to simplify MAC drivers in two ways by using the supported_interfaces bitmap to: 1. reject unsupported interfaces before calling into the MAC driver. 2. generate the set of all supported link modes across all supported interfaces (used mainly for SFP, but also some 10G PHYs.) Suggested-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phylink: add MAC phy_interface_t bitmapRussell King
Add a phy_interface_t bitmap so the MAC driver can specifiy which PHY interface modes it supports. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: phy: add phy_interface_t bitmap supportRussell King (Oracle)
Add support for a bitmap for phy interface modes, which includes: - a macro to declare the interface bitmap - an inline helper to zero the interface bitmap - an inline helper to detect an empty interface bitmap - inline helpers to do a bitwise AND and OR operations on two interface bitmaps Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26Merge branch 'dsa-isolation-prep'David S. Miller
Vladimir Oltean says: ==================== DSA preparations for FDB isolation between bridges This series makes 2 small changes to DSA's SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handler, which will make it possible to offer switch drivers a stable association between a FDB entry and a bridge device in a future series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: dsa: stop calling dev_hold in dsa_slave_fdb_eventVladimir Oltean
Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have finished executing by the time we leave our bridge upper interface, we've established a stronger boundary condition for how long the dsa_slave_switchdev_event_work() might run. As such, it is no longer possible for DSA slave interfaces to become unregistered, since they are still bridge ports. So delete the unnecessary dev_hold() and dev_put(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: dsa: flush switchdev workqueue when leaving the bridgeVladimir Oltean
DSA is preparing to offer switch drivers an API through which they can associate each FDB entry with a struct net_device *bridge_dev. This can be used to perform FDB isolation (the FDB lookup performed on the ingress of a standalone, or bridged port, should not find an FDB entry that is present in the FDB of another bridge). In preparation of that work, DSA needs to ensure that by the time we call the switch .port_fdb_add and .port_fdb_del methods, the dp->bridge_dev pointer is still valid, i.e. the port is still a bridge port. This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API requires drivers that must have sleepable context to handle those events to schedule the deferred work themselves. DSA does this through the dsa_owq. It can happen that a port leaves a bridge, del_nbp() flushes the FDB on that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context, DSA schedules its deferred work, but del_nbp() finishes unlinking the bridge as a master from the port before DSA's deferred work is run. Fundamentally, the port must not be unlinked from the bridge until all FDB deletion deferred work items have been flushed. The bridge must wait for the completion of these hardware accesses. An attempt has been made to address this issue centrally in switchdev by making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev level, which would offer implicit synchronization with del_nbp: https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/ but it seems that any attempt to modify switchdev's behavior and make the events blocking there would introduce undesirable side effects in other switchdev consumers. The most undesirable behavior seems to be that switchdev_deferred_process_work() takes the rtnl_mutex itself, which would be worse off than having the rtnl_mutex taken individually from drivers which is what we have now (except DSA which has removed that lock since commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")). So to offer the needed guarantee to DSA switch drivers, I have come up with a compromise solution that does not require switchdev rework: we already have a hook at the last moment in time when the bridge is still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush the dsa_owq manually from there, which makes all FDB deletions synchronous. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26ifb: Depend on netfilter alternatively to tcLukas Wunner
IFB originally depended on NET_CLS_ACT for traffic redirection. But since v4.5, that may be achieved with NFT_FWD_NETDEV as well. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: <stable@vger.kernel.org> # v4.5+: bcfabee1afd9: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26mctp: Implement extended addressingJeremy Kerr
This change allows an extended address struct - struct sockaddr_mctp_ext - to be passed to sendmsg/recvmsg. This allows userspace to specify output ifindex and physical address information (for sendmsg) or receive the input ifindex/physaddr for incoming messages (for recvmsg). This is typically used by userspace for MCTP address discovery and assignment operations. The extended addressing facility is conditional on a new sockopt: MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before the kernel will consume/populate the extended address data. Includes a fix for an uninitialised var: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: ax88796c: Remove pointless check in ax88796c_open()Nathan Chancellor
Clang warns: drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of array 'ax_local->phydev->advertising' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] if (ax_local->phydev->advertising && ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~ advertising cannot be NULL here if ax_local is not NULL, which cannot happen due to the check in ax88796c_probe(). Remove the check. Link: https://github.com/ClangBuiltLinux/linux/issues/1492 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: ax88796c: Fix clang -Wimplicit-fallthrough in ax88796c_set_mac()Nathan Chancellor
Clang warns: drivers/net/ethernet/asix/ax88796c_main.c:696:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] case SPEED_10: ^ drivers/net/ethernet/asix/ax88796c_main.c:696:2: note: insert 'break;' to avoid fall-through case SPEED_10: ^ break; drivers/net/ethernet/asix/ax88796c_main.c:706:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] case DUPLEX_HALF: ^ drivers/net/ethernet/asix/ax88796c_main.c:706:2: note: insert 'break;' to avoid fall-through case DUPLEX_HALF: ^ break; Clang is a little more pedantic than GCC, which permits implicit fallthroughs to cases that contain just break or return. Clang's version is more in line with the kernel's own stance in deprecated.rst, which states that all switch/case blocks must end in either break, fallthrough, continue, goto, or return. Add the missing breaks to fix the warning. Link: https://github.com/ClangBuiltLinux/linux/issues/1491 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: mana: Allow setting the number of queues while the NIC is downHaiyang Zhang
The existing code doesn't allow setting the number of queues while the NIC is down. Update the ethtool handler functions to support setting the number of queues while the NIC is at down state. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: hsr: Add support for redbox supervision framesAndreas Oetken
added support for the redbox supervision frames as defined in the IEC-62439-3:2018. Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: batman-adv: fix error handlingPavel Skripkin
Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was in wrong error handling in batadv_mesh_init(). Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case of any batadv_*_init() calls failure. This approach may work well, when there is some kind of indicator, which can tell which parts of batadv are initialized; but there isn't any. All written above lead to cleaning up uninitialized fields. Even if we hide ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1] To fix these bugs we can unwind batadv_*_init() calls one by one. It is good approach for 2 reasons: 1) It fixes bugs on error handling path 2) It improves the performance, since we won't call unneeded batadv_*_free() functions. So, this patch makes all batadv_*_init() clean up all allocated memory before returning with an error to no call correspoing batadv_*_free() and open-codes batadv_mesh_free() with proper order to avoid touching uninitialized fields. Link: https://lore.kernel.org/netdev/000000000000c87fbd05cef6bcb0@google.com/ [1] Reported-and-tested-by: syzbot+28b0702ada0bf7381f58@syzkaller.appspotmail.com Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26Merge branch 'tcp_stream_alloc_skb'David S. Miller
Eric Dumazet says: ==================== tcp: tcp_stream_alloc_skb() changes sk_stream_alloc_skb() is only used by TCP. Rename it to tcp_stream_alloc_skb() and apply small optimizations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26tcp: remove unneeded code from tcp_stream_alloc_skb()Eric Dumazet
Aligning @size argument to 4 bytes is not needed. The header alignment has nothing to do with @size. It really depends on skb->head alignment and MAX_TCP_HEADER. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26tcp: use MAX_TCP_HEADER in tcp_stream_alloc_skbEric Dumazet
Both IPv4 and IPv6 uses same reserve, no need risking cache line misses to fetch its value. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26tcp: rename sk_stream_alloc_skbEric Dumazet
sk_stream_alloc_skb() is only used by TCP. Rename it to make this clear, and move its declaration to include/net/tcp.h Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26tracing/hwlat: Make some internal symbols staticWang ShaoBo
The sparse tool complains as follows: kernel/trace/trace_hwlat.c:82:27: warning: symbol 'hwlat_single_cpu_data' was not declared. Should it be static? kernel/trace/trace_hwlat.c:83:1: warning: symbol '__pcpu_scope_hwlat_per_cpu_data' was not declared. Should it be static? This symbol is not used outside of trace_hwlat.c, so this commit marks it static. Link: https://lkml.kernel.org/r/20211021035225.1050685-1-bobo.shaobowang@huawei.com Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix missing trace_boot_init_histograms kstrdup NULL checksMathieu Desnoyers
trace_boot_init_histograms misses NULL pointer checks for kstrdup failure. Link: https://lkml.kernel.org/r/20211015195550.22742-1-mathieu.desnoyers@efficios.com Fixes: 64dc7f6958ef5 ("tracing/boot: Show correct histogram error command") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26net: annotate data-race in neigh_output()Eric Dumazet
neigh_output() reads n->nud_state and hh->hh_len locklessly. This is fine, but we need to add annotations and document this. We evaluate skip_cache first to avoid reading these fields if the cache has to by bypassed. syzbot report: BUG: KCSAN: data-race in __neigh_event_send / ip_finish_output2 write to 0xffff88810798a885 of 1 bytes by interrupt on cpu 1: __neigh_event_send+0x40d/0xac0 net/core/neighbour.c:1128 neigh_event_send include/net/neighbour.h:444 [inline] neigh_resolve_output+0x104/0x410 net/core/neighbour.c:1476 neigh_output include/net/neighbour.h:510 [inline] ip_finish_output2+0x80a/0xaa0 net/ipv4/ip_output.c:221 ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423 dst_output include/net/dst.h:450 [inline] ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126 __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525 ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539 __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405 tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline] tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline] tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064 tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079 tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline] tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626 tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421 expire_timers+0x135/0x240 kernel/time/timer.c:1466 __run_timers+0x368/0x430 kernel/time/timer.c:1734 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747 __do_softirq+0x12c/0x26e kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline] acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline] acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline] acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688 cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237 cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351 call_cpuidle kernel/sched/idle.c:158 [inline] cpuidle_idle_call kernel/sched/idle.c:239 [inline] do_idle+0x1a3/0x250 kernel/sched/idle.c:306 cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403 secondary_startup_64_no_verify+0xb1/0xbb read to 0xffff88810798a885 of 1 bytes by interrupt on cpu 0: neigh_output include/net/neighbour.h:507 [inline] ip_finish_output2+0x79a/0xaa0 net/ipv4/ip_output.c:221 ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423 dst_output include/net/dst.h:450 [inline] ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126 __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525 ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539 __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405 tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline] tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline] tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064 tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079 tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline] tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626 tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421 expire_timers+0x135/0x240 kernel/time/timer.c:1466 __run_timers+0x368/0x430 kernel/time/timer.c:1734 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747 __do_softirq+0x12c/0x26e kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline] acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline] acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline] acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688 cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237 cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351 call_cpuidle kernel/sched/idle.c:158 [inline] cpuidle_idle_call kernel/sched/idle.c:239 [inline] do_idle+0x1a3/0x250 kernel/sched/idle.c:306 cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403 rest_init+0xee/0x100 init/main.c:734 arch_call_rest_init+0xa/0xb start_kernel+0x5e4/0x669 init/main.c:1142 secondary_startup_64_no_verify+0xb1/0xbb value changed: 0x20 -> 0x01 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26tipc: fix size validations for the MSG_CRYPTO typeMax VA
The function tipc_crypto_key_rcv is used to parse MSG_CRYPTO messages to receive keys from other nodes in the cluster in order to decrypt any further messages from them. This patch verifies that any supplied sizes in the message body are valid for the received message. Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange") Signed-off-by: Max VA <maxv@sentinelone.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26nfc: port100: fix using -ERRNO as command type maskKrzysztof Kozlowski
During probing, the driver tries to get a list (mask) of supported command types in port100_get_command_type_mask() function. The value is u64 and 0 is treated as invalid mask (no commands supported). The function however returns also -ERRNO as u64 which will be interpret as valid command mask. Return 0 on every error case of port100_get_command_type_mask(), so the probing will stop. Cc: <stable@vger.kernel.org> Fixes: 0347a6ab300a ("NFC: port100: Commands mechanism implementation") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>