summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-05-27net_sched: get rid of unnecessary dev_qdisc_reset()Cong Wang
Resetting old qdisc on dev_queue->qdisc_sleeping in dev_qdisc_reset() is redundant, because this qdisc, even if not same with dev_queue->qdisc, is reset via qdisc_put() right after calling dev_graft_qdisc() when hitting refcnt 0. This is very easy to observe with qdisc_reset() tracepoint and stack traces. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net_sched: avoid resetting active qdisc for multiple timesCong Wang
Except for sch_mq and sch_mqprio, each dev queue points to the same root qdisc, so when we reset the dev queues with netdev_for_each_tx_queue() we end up resetting the same instance of the root qdisc for multiple times. Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in each iteration, so for sch_mq/sch_mqprio, we still reset all of them like before, for the rest, we only reset it once. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net_sched: add a tracepoint for qdisc creationCong Wang
With this tracepoint, we could know when qdisc's are created, especially those default qdisc's. Sample output: tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0 tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100 tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200 tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100 tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200 tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()Cong Wang
Add two tracepoints for qdisc_reset() and qdisc_destroy() to track qdisc resetting and destroying. Sample output: tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net_sched: use qdisc_reset() in qdisc_destroy()Cong Wang
qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(), so just call it directly, and cosolidate the code for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27drivers: ipa: remove discription of nonexistent elementWang Wenhu
No element named "client" exists within "struct ipa_endpoint". It might be a heritage forgotten to be removed. Delete it now. Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27drivers: ipa: fix typoes for ipaWang Wenhu
Change "transactio" -> "transaction". Also an alignment correction. Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27Merge branch 'tcp-tcp_v4_err-cleanups'David S. Miller
Eric Dumazet says: ==================== tcp: tcp_v4_err() cleanups This series is a followup of patch 239174945dac ("tcp: tcp_v4_err() icmp skb is named icmp_skb"). Move the RFC 6069 code into a helper, and rename icmp_skb to standard skb name so that tcp_v4_err() and tcp_v6_err() are using consistent names. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27tcp: rename tcp_v4_err() skb parameterEric Dumazet
This essentially reverts 4d1a2d9ec1c1 ("Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()") Now we have tcp_ld_RTO_revert() helper, we can use the usual name for sk_buff parameter, so that tcp_v4_err() and tcp_v6_err() use similar names. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27tcp: add tcp_ld_RTO_revert() helperEric Dumazet
RFC 6069 logic has been implemented for IPv4 only so far, right in the middle of tcp_v4_err() and was error prone. Move this code to one helper, to make tcp_v4_err() more readable and to eventually expand RFC 6069 to IPv6 in the future. Also perform sock_owned_by_user() check a bit sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27Merge branch 'hns3-next'David S. Miller
Huazhong Tan says: ==================== net: hns3: misc updates for -next This patchset includes some misc updates for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: hns3: add a print for initializing CMDQ when reset pendingHuazhong Tan
When initializing CMDQ fails because of reset pending, there is no hint for debugging, so adds a log for it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: hns3: remove unnecessary MAC enable in app loopbackYufeng Mo
Packets will not pass through MAC during app loopback. Therefore, it is meaningless to enable MAC while doing app loopback. This patch removes this unnecessary action. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: hns3: change the order of reinitializing RoCE and NIC client during resetYufeng Mo
The HNS RDMA driver will support VF device later, whose re-initialization should be done after PF's. This patch changes the order of hclge_reset_prepare_up() and hclge_notify_roce_client(), so that PF's RoCE client will be reinitialized before VF's. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: hns3: add a resetting check in hclgevf_init_nic_client_instance()Guangbin Huang
To prevent from initializing VF NIC client in reset handling state, this patch adds resetting check in hclgevf_init_nic_client_instance(). Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27Merge branch 'net-mscc-allow-forwarding-ioctl-operations-to-attached-PHYs'David S. Miller
Antoine Tenart says: ==================== net: mscc: allow forwarding ioctl operations to attached PHYs These two patches allow forwarding ioctl to the PHY MII implementation, and support is added for offloading timestamping operations to compatible attached PHYs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: mscc: allow offloading timestamping operations to the PHYAntoine Tenart
This patch adds support for offloading timestamping operations not only to the Ocelot switch (as already supported) but to compatible PHYs. When both the PHY and the Ocelot switch support timestamping operations, the PHY implementation is chosen as the timestamp will happen closer to the medium. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: mscc: use the PHY MII ioctl interface when possibleAntoine Tenart
Allow ioctl to be implemented by the PHY, when a PHY is attached to the Ocelot switch. In case the ioctl is a request to set or get the hardware timestamp, use the Ocelot switch implementation for now. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27copy_xstate_to_kernel(): don't leave parts of destination uninitializedAl Viro
copy the corresponding pieces of init_fpstate into the gaps instead. Cc: stable@kernel.org Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27netfilter: nf_tables: skip flowtable hooknum and priority on device updatesPablo Neira Ayuso
On device updates, the hooknum and priority attributes are not required. This patch makes optional these two netlink attributes. Moreover, bail out with EOPNOTSUPP if userspace tries to update the hooknum and priority for existing flowtables. While at this, turn EINVAL into EOPNOTSUPP in case the hooknum is not ingress. EINVAL is reserved for missing netlink attribute / malformed netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: allow to register flowtable with no devicesPablo Neira Ayuso
A flowtable might be composed of dynamic interfaces only. Such dynamic interfaces might show up at a later stage. This patch allows users to register a flowtable with no devices. Once the dynamic interface becomes available, the user adds the dynamic devices to the flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: delete devices from flowtablePablo Neira Ayuso
This patch allows users to delete devices from existing flowtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: add devices to existing flowtablePablo Neira Ayuso
This patch allows users to add devices to an existing flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: pass hook list to flowtable event notifierPablo Neira Ayuso
Update the flowtable netlink notifier to take the list of hooks as input. This allows to reuse this function in incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: add nft_flowtable_hooks_destroy()Pablo Neira Ayuso
This patch adds a helper function destroy the flowtable hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: pass hook list to nft_{un,}register_flowtable_net_hooks()Pablo Neira Ayuso
This patch prepares for incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: generalise flowtable hook parsingPablo Neira Ayuso
Update nft_flowtable_parse_hook() to take the flowtable hook list as parameter. This allows to reuse this function to update the hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: ctnetlink: add kernel side filtering for dumpRomain Bellan
Conntrack dump does not support kernel side filtering (only get exists, but it returns only one entry. And user has to give a full valid tuple) It means that userspace has to implement filtering after receiving many irrelevant entries, consuming resources (conntrack table is sometimes very huge, much more than a routing table for example). This patch adds filtering in kernel side. To achieve this goal, we: * Add a new CTA_FILTER netlink attributes, actually a flag list to parametize filtering * Convert some *nlattr_to_tuple() functions, to allow a partial parsing of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not fully set) Filtering is now possible on: * IP SRC/DST values * Ports for TCP and UDP flows * IMCP(v6) codes types and IDs Filtering is done as an "AND" operator. For example, when flags PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all values are dumped. Changes since v1: Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered Changes since v2: Move several constants to nf_internals.h Move a fix on netlink values check in a separate patch Add a check on not-supported flags Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack (not yet implemented) Code style issues Changes since v3: Fix compilation warning reported by kbuild test robot Changes since v4: Fix a regression introduced in v3 (returned EINVAL for valid netlink messages without CTA_MARK) Changes since v5: Change definition of CTA_FILTER_F_ALL Fix a regression when CTA_TUPLE_ZONE is not set Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27net: dsa: felix: send VLANs on CPU port as egress-taggedVladimir Oltean
As explained in other commits before (b9cd75e66895 and 87b0f983f66f), ocelot switches have a single egress-untagged VLAN per port, and the driver would deny adding a second one while an egress-untagged VLAN already exists. But on the CPU port (where the VLAN configuration is implicit, because there is no net device for the bridge to control), the DSA core attempts to add a VLAN using the same flags as were used for the front-panel port. This would make adding any untagged VLAN fail due to the CPU port rejecting the configuration: bridge vlan add dev swp0 vid 100 pvid untagged [ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1 [ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16 (note that port 5 is the CPU port and not the front-panel swp0). So this hardware will send all VLANs as tagged towards the CPU. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: dsa: felix: accept VLAN config regardless of bridge VLAN awareness stateVladimir Oltean
The ocelot core library is written with the idea in mind that the VLAN table is populated by the bridge. Otherwise, not even a sane default pvid is provided: in standalone mode, the default pvid is 0, and the core expects the bridge layer to change it to 1. So without this patch, the VLAN table is completely empty at the end of the commands below, and traffic is broken as a result: ip link add dev br0 type bridge vlan_filtering 0 && ip link set dev br0 up for eth in $(ls /sys/bus/pci/devices/0000\:00\:00.5/net/); do ip link set dev $eth master br0 ip link set dev $eth up done ip link set dev br0 type bridge vlan_filtering 1 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: add large ecmp group nexthop testsStephen Worley
Add a couple large ecmp group nexthop selftests to cover the remnant fixed by d69100b8eee27c2d60ee52df76e0b80a8d492d34. The tests create 100 x32 ecmp groups of ipv4 and ipv6 and then dump them. On kernels without the fix, they will fail due to data remnant during the dump. Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27bridge: multicast: work around clang bugArnd Bergmann
Clang-10 and clang-11 run into a corner case of the register allocator on 32-bit ARM, leading to excessive stack usage from register spilling: net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=] Work around this by marking one of the internal functions as noinline_for_stack. Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()Dongli Zhang
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g., when doing live reset while polling the nvme device. CPU X CPU Y nvme_poll() nvme_dev_disable() -> nvme_stop_queues() -> nvme_suspend_io_queues() -> nvme_suspend_queue() -> spin_lock(&nvmeq->cq_poll_lock); -> nvme_reap_pending_cqes() -> nvme_process_cq() -> nvme_process_cq() In the above scenario, the nvme_process_cq() for the same queue may be running on both CPU X and CPU Y concurrently. It is much more easier to reproduce the issue when CONFIG_PREEMPT is enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace period. This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in nvme_reap_pending_cqes(). Fixes: fa46c6fb5d61 ("nvme/pci: move cqe check after device shutdown") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27mtk-star-emac: mark PM functions as __maybe_unusedArnd Bergmann
Without CONFIG_PM, the compiler warns about two unused functions: drivers/net/ethernet/mediatek/mtk_star_emac.c:1472:12: error: unused function 'mtk_star_suspend' [-Werror,-Wunused-function] drivers/net/ethernet/mediatek/mtk_star_emac.c:1488:12: error: unused function 'mtk_star_resume' [-Werror,-Wunused-function] Mark these as __maybe_unused. Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27bridge: mrp: Rework the MRP netlink interfaceHoratiu Vultur
This patch reworks the MRP netlink interface. Before, each attribute represented a binary structure which made it hard to be extended. Therefore update the MRP netlink interface such that each existing attribute to be a nested attribute which contains the fields of the binary structures. In this way the MRP netlink interface can be extended without breaking the backwards compatibility. It is also using strict checking for attributes under the MRP top attribute. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: dsa: b53: remove redundant premature assignment to new_pvidColin Ian King
Variable new_pvid is being assigned with a value that is never read, the following if statement updates new_pvid with a new value in both of the if paths. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: ethernet: mtk-star-emac: fix error path in RX handlingBartosz Golaszewski
The dma_addr field in desc_data must not be overwritten until after the new skb is mapped. Currently we do replace it with uninitialized value in error path. This change fixes it by moving the assignment before the label to which we jump after mapping or allocation errors. Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> # build Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27mlxsw: spectrum_router: remove redundant initialization of pointer br_devColin Ian King
The pointer br_dev is being initialized with a value that is never read and is being updated with a new value later on. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nexthop: Fix type of event_type in call_nexthop_notifiersNathan Chancellor
Clang warns: net/ipv4/nexthop.c:841:30: warning: implicit conversion from enumeration type 'enum nexthop_event_type' to different enumeration type 'enum fib_event_type' [-Wenum-conversion] call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh); ~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~ 1 warning generated. Use the right type for event_type so that clang does not warn. Fixes: 8590ceedb701 ("nexthop: add support for notifiers") Link: https://github.com/ClangBuiltLinux/linux/issues/1038 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27vsock: fix timeout in vsock_accept()Stefano Garzarella
The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nfp: flower: fix used time of merge flow statisticsHeinrich Kuhn
Prior to this change the correct value for the used counter is calculated but not stored nor, therefore, propagated to user-space. In use-cases such as OVS use-case at least this results in active flows being removed from the hardware datapath. Which results in both unnecessary flow tear-down and setup, and packet processing on the host. This patch addresses the problem by saving the calculated used value which allows the value to propagate to user-space. Found by inspection. Fixes: aa6ce2ea0c93 ("nfp: flower: support stats update for merge flows") Signed-off-by: Heinrich Kuhn <heinrich.kuhn@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net/sched: fix infinite loop in sch_fq_pieDavide Caratti
this command hangs forever: # tc qdisc add dev eth0 root fq_pie flows 65536 watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028] [...] CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167 RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie] Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48 RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590 RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699 R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0 R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8 FS: 00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0 Call Trace: qdisc_create+0x3fd/0xeb0 tc_modify_qdisc+0x3be/0x14a0 rtnetlink_rcv_msg+0x5f3/0x920 netlink_rcv_skb+0x121/0x350 netlink_unicast+0x439/0x630 netlink_sendmsg+0x714/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5b4/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 we can't accept 65536 as a valid number for 'nflows', because the loop on 'idx' in fq_pie_init() will never end. The extack message is correct, but it doesn't say that 0 is not a valid number for 'flows': while at it, fix this also. Add a tdc selftest to check correct validation of 'flows'. CC: Ivan Vecera <ivecera@redhat.com> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27Merge tag 'fsnotify_for_v5.7-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify FAN_DIR_MODIFY disabling from Jan Kara: "A single patch that disables FAN_DIR_MODIFY support that was merged in this merge window. When discussing further functionality we realized it may be more logical to guard it with a feature flag or to call things slightly differently (or maybe not) so let's not set the API in stone for now." * tag 'fsnotify_for_v5.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: turn off support for FAN_DIR_MODIFY
2020-05-27Merge branch 'for-5.7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Reverted stricter synchronization for cgroup recursive stats which was prepping it for event counter usage which never got merged. The change was causing performation regressions in some cases. - Restore bpf-based device-cgroup operation even when cgroup1 device cgroup is disabled. - An out-param init fix. * 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: device_cgroup: Cleanup cgroup eBPF device filter code xattr: fix uninitialized out-param Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
2020-05-27RDMA/core: Fix double destruction of uobjectJason Gunthorpe
Fix use after free when user user space request uobject concurrently for the same object, within the RCU grace period. In that case, remove_handle_idr_uobject() is called twice and we will have an extra put on the uobject which cause use after free. Fix it by leaving the uobject write locked after it was removed from the idr. Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of UVERBS_LOOKUP_WRITE will do the work. refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x94/0xce panic+0x234/0x56f __warn+0x1cc/0x1e1 report_bug+0x200/0x310 fixup_bug.part.11+0x32/0x80 do_error_trap+0xd3/0x100 do_invalid_op+0x31/0x40 invalid_op+0x1e/0x30 RIP: 0010:refcount_warn_saturate+0xfe/0x1a0 Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3 R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08 uverbs_uobject_put+0xfd/0x140 __uobj_perform_destroy+0x3d/0x60 ib_uverbs_close_xrcd+0x148/0x170 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49 RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate") Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27fanotify: turn off support for FAN_DIR_MODIFYAmir Goldstein
FAN_DIR_MODIFY has been enabled by commit 44d705b0370b ("fanotify: report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are planning further extensions to the fanotify API and during that we realized that FAN_DIR_MODIFY may behave slightly differently to be more consistent with extensions we plan. So until we finalize these extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to userland. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-27Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve fix from Eric Biederman: "While working on my exec cleanups I found a bug in exec that winds up miscomputing the ambient credentials during exec. Andy appears to have to been confused as to why credentials are computed for both the script and the interpreter From the original patch description: [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. The only value in struct cred that gets changed in cap_bprm_set_creds that I could find that might persist between the script and the interpreter was cap_ambient. Which is fixed with this trivial change" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: exec: Always set cap_ambient in cap_bprm_set_creds
2020-05-27x86: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27ia64: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU is not selected). For the case when INTEL_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but INTEL_IOMMU not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200518120855.27822-1-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27Merge tag 'gpio-fixes-for-v5.7' of ↵Linus Walleij
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes gpio fixes for v5.7 - fix mutex and spinlock ordering in gpio-mlxbf2 - fix the return value checks on devm_platform_ioremap_resource in gpio-pxa and gpio-bcm-kona