summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-10Merge branch 'net-Fix-bridge-enslavement-failure'David S. Miller
Ido Schimmel says: ==================== net: Fix bridge enslavement failure Patch #1 fixes an issue in which an upper netdev cannot be enslaved to a bridge when it has multiple netdevs with different parent identifiers beneath it. Patch #2 adds a test case using two netdevsim instances. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10selftests: rtnetlink: Test bridge enslavement with different parent IDsIdo Schimmel
Test that an upper device of netdevs with different parent IDs can be enslaved to a bridge. The test fails without previous commit. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Fix bridge enslavement failureIdo Schimmel
When a netdev is enslaved to a bridge, its parent identifier is queried. This is done so that packets that were already forwarded in hardware will not be forwarded again by the bridge device between netdevs belonging to the same hardware instance. The operation fails when the netdev is an upper of netdevs with different parent identifiers. Instead of failing the enslavement, have dev_get_port_parent_id() return '-EOPNOTSUPP' which will signal the bridge to skip the query operation. Other callers of the function are not affected by this change. Fixes: 7e1146e8c10c ("net: devlink: introduce devlink_compat_switch_id_get() helper") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: mvneta: fix possible use-after-free in mvneta_xdp_put_buffLorenzo Bianconi
Release first buffer as last one since it contains references to subsequent fragments. This code will be optimized introducing multi-buffer bit in xdp_buff structure. Fixes: ca0e014609f05 ("net: mvneta: move skb build after descriptors processing") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10clk: qcom: lpass: Correct goto target in lpass_core_sc7180_probe()Jing Xiangfeng
lpass_core_sc7180_probe() misses to call pm_clk_destroy() and pm_runtime_disable() in error paths. Correct goto target to fix it. This issue is found by code inspection. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Link: https://lore.kernel.org/r/20200827141629.101802-1-jingxiangfeng@huawei.com Fixes: edab812d802d ("clk: qcom: lpass: Add support for LPASS clock controller for SC7180") Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-09-10s390/qeth: delay draining the TX buffersJulian Wiedmann
Wait until the QDIO data connection is severed. Otherwise the device might still be processing the buffers, and end up accessing skb data that we already freed. Fixes: 8b5026bc1693 ("s390/qeth: fix qdio teardown after early init error") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Fix broken NETIF_F_CSUM_MASK spell in netdev_features.hMiaohe Lin
Remove the weird space inside the NETIF_F_CSUM_MASK. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Correct the comment of dst_dev_put()Miaohe Lin
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of loopback device anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10selftests/mptcp: Better delay & reordering configurationChristoph Paasch
The delay was intended to be configured to "simulate" a high(er) BDP link. As such, it needs to be set as part of the loss-configuration and not as part of the netem reordering configuration. The reordering-config also requires a delay but that delay is the reordering-extend. So, a good approach is to set the reordering-extend as a function of the configured latency. E.g., 25% of the overall latency. To speed up the selftests, we limit the delay to 50ms maximum to avoid having the selftests run for too long. Finally, the intention of tc_reorder was that when it is unset, the test picks a random configuration. However, currently it is always initialized and thus the random config won't be picked up. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/6 Reported-and-reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'tcp-add-tos-reflection-feature'David S. Miller
Wei Wang says: ==================== tcp: add tos reflection feature This patch series adds a new tcp feature to reflect TOS value received in SYN, and send it out in SYN-ACK, and eventually set the TOS value of the established socket with this reflected TOS value. This provides a way to set the traffic class/QoS level for all traffic in the same connection to be the same as the incoming SYN. It could be useful for datacenters to provide equivalent QoS according to the incoming request. This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by default turned off. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10tcp: reflect tos value received in SYN to the socketWei Wang
This commit adds a new TCP feature to reflect the tos value received in SYN, and send it out on the SYN-ACK, and eventually set the tos value of the established socket with this reflected tos value. This provides a way to set the traffic class/QoS level for all traffic in the same connection to be the same as the incoming SYN request. It could be useful in data centers to provide equivalent QoS according to the incoming request. This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by default turned off. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10ip: pass tos into ip_build_and_send_pkt()Wei Wang
This commit adds tos as a new passed in parameter to ip_build_and_send_pkt() which will be used in the later commit. This is a pure restructure and does not have any functional change. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10tcp: record received TOS value in the request socketWei Wang
A new field is added to the request sock to record the TOS value received on the listening socket during 3WHS: When not under syn flood, it is recording the TOS value sent in SYN. When under syn flood, it is recording the TOS value sent in the ACK. This is a preparation patch in order to do TOS reflection in the later commit. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge tag 'f2fs-for-5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Small bug fixes for: - SMR drive fix - infinite loop when building free node ids - EOF at DIO read" * tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Return EOF on unaligned end of file DIO read f2fs: fix indefinite loop scanning for free nid f2fs: Fix type of section block count variables
2020-09-10net: mventa: drop mvneta_stats from mvneta_swbm_rx_frame signatureLorenzo Bianconi
Remove mvneta_stats from mvneta_swbm_rx_frame signature since now stats are accounted in mvneta_run_xdp routine Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'netpoll-make-sure-napi_list-is-safe-for-RCU-traversal'David S. Miller
Jakub Kicinski says: ==================== netpoll: make sure napi_list is safe for RCU traversal This series is a follow-up to the fix in commit 96e97bc07e90 ("net: disable netpoll on fresh napis"). To avoid any latent race conditions convert dev->napi_list to a proper RCU list. We need minor restructuring because it looks like netif_napi_del() used to be idempotent, and it may be quite hard to track down everyone who depends on that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: make sure napi_list is safe for RCU traversalJakub Kicinski
netpoll needs to traverse dev->napi_list under RCU, make sure it uses the right iterator and that removal from this list is handled safely. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: manage napi add/del idempotence explicitlyJakub Kicinski
To RCUify napi->dev_list we need to replace list_del_init() with list_del_rcu(). There is no _init() version for RCU for obvious reasons. Up until now netif_napi_del() was idempotent so to make sure it remains such add a bit which is set when NAPI is listed, and cleared when it removed. Since we don't expect multiple calls to netif_napi_add() to be correct, add a warning on that side. Now that napi_hash_add / napi_hash_del are only called by napi_add / del we can actually steal its bit. We just need to make sure hash node is initialized correctly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: remove napi_hash_del() from driver-facing APIJakub Kicinski
We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10hdlc_ppp: add range checks in ppp_cp_parse_cr()Dan Carpenter
There are a couple bugs here: 1) If opt[1] is zero then this results in a forever loop. If the value is less than 2 then it is invalid. 2) It assumes that "len" is more than sizeof(valid_accm) or 6 which can result in memory corruption. In the case of LCP_OPTION_ACCM, then we should check "opt[1]" instead of "len" because, if "opt[1]" is less than sizeof(valid_accm) then "nak_len" gets out of sync and it can lead to memory corruption in the next iterations through the loop. In case of LCP_OPTION_MAGIC, the only valid value for opt[1] is 6, but the code is trying to log invalid data so we should only discard the data when "len" is less than 6 because that leads to a read overflow. Reported-by: ChenNan Of Chaitin Security Research Lab <whutchennan@gmail.com> Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: phy: call phy_disable_interrupts() in phy_attach_direct() insteadYoshihiro Shimoda
Since the micrel phy driver calls phy_init_hw() as a workaround, the commit 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") disables the interrupt unexpectedly. So, call phy_disable_interrupts() in phy_attach_direct() instead. Otherwise, the phy cannot link up after the ethernet cable was disconnected. Note that other drivers (like at803x.c) also calls phy_init_hw(). So, perhaps, the driver caused a similar issue too. Fixes: 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10hv_netvsc: Cache the current data path to avoid duplicate call and messageDexuan Cui
The previous change "hv_netvsc: Switch the data path at the right time during hibernation" adds the call of netvsc_vf_changed() upon NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message when the VF is brought UP or DOWN. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10hv_netvsc: Switch the data path at the right time during hibernationDexuan Cui
When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet, so in the future the host might sliently fail the call netvsc_vf_changed() -> netvsc_switch_datapath() there, even if the call works now. Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time the mlx5 VF NIC has been resumed. Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'mlx4-avoid-devlink-port-type-not-set-warnings'David S. Miller
Jakub Kicinski says: ==================== mlx4: avoid devlink port type not set warnings This small set addresses the issue of mlx4 potentially not setting devlink port type when Ethernet or IB driver is not built, but port has that type. v2: - add patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10mlx4: make sure to always set the port typeJakub Kicinski
Even tho mlx4_core registers the devlink ports, it's mlx4_en and mlx4_ib which set their type. In situations where one of the two is not built yet the machine has ports of given type we see the devlink warning from devlink_port_type_warn() trigger. Having ports of a type not supported by the kernel may seem surprising, but it does occur in practice - when the unsupported port is not plugged in to a switch anyway users are more than happy not to see it (and potentially allocate any resources to it). Set the type in mlx4_core if type-specific driver is not built. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10devlink: don't crash if netdev is NULLJakub Kicinski
Following change will add support for a corner case where we may not have a netdev to pass to devlink_port_type_eth_set() but we still want to set port type. This is definitely a corner case, and drivers should not normally pass NULL netdev - print a warning message when this happens. Sadly for other port types (ib) switches don't have a device reference, the way we always do for Ethernet, so we can't put the warning in __devlink_port_type_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: mvneta: rely on MVNETA_MAX_RX_BUF_SIZE for pkt split in ↵Lorenzo Bianconi
mvneta_swbm_rx_frame() In order to easily change the rx buffer size, rely on MVNETA_MAX_RX_BUF_SIZE instead of PAGE_SIZE in mvneta_swbm_rx_frame routine for rx buffer split. Currently this is not an issue since we set MVNETA_MAX_RX_BUF_SIZE to PAGE_SIZE - MVNETA_SKB_PAD but it is a good to have to configure a different rx buffer size. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: sch_generic: aviod concurrent reset and enqueue op for lockless qdiscYunsheng Lin
Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10selftests: bpf: Test iterating a sockmapLorenz Bauer
Add a test that exercises a basic sockmap / sockhash iteration. For now we simply count the number of elements seen. Once sockmap update from iterators works we can extend this to perform a full copy. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200909162712.221874-4-lmb@cloudflare.com
2020-09-10net: dsa: microchip: look for phy-mode in port nodesHelmut Grohne
Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode property should be specified on port nodes. However, the microchip drivers read it from the switch node. Let the driver use the per-port property and fall back to the old location with a warning. Fix in-tree users. Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/ Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Allow iterating sockmap and sockhashLorenz Bauer
Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and hashtable implementation. sockmap and sockhash share the same iteration context: a pointer to an arbitrary key and a pointer to a socket. Both pointers may be NULL, and so BPF has to perform a NULL check before accessing them. Technically it's not possible for sockhash iteration to yield a NULL socket, but we ignore this to be able to use a single iteration point. Iteration will visit all keys that remain unmodified during the lifetime of the iterator. It may or may not visit newly added ones. Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain another guard rail if CONFIG_PROVE_RCU is enabled. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com
2020-09-10net: sockmap: Remove unnecessary sk_fullsock checksLorenz Bauer
The lookup paths for sockmap and sockhash currently include a check that returns NULL if the socket we just found is not a full socket. However, this check is not necessary. On insertion we ensure that we have a full socket (caveat around sock_ops), so request sockets are not a problem. Time-wait sockets are allocated separate from the original socket and then fed into the hashdance. They don't affect the sockets already stored in the sockmap. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com
2020-09-10mptcp: fix kmalloc flag in mptcp_pm_nl_get_local_idGeliang Tang
mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context. [ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498 [ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3 [ 280.209814] INFO: lockdep is turned off. [ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146 [ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 280.209820] Workqueue: events mptcp_worker [ 280.209822] Call Trace: [ 280.209824] <IRQ> [ 280.209826] dump_stack+0x77/0xa0 [ 280.209829] ___might_sleep.cold+0xa6/0xb6 [ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290 [ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410 [ 280.209840] subflow_init_req+0x1e9/0x2ea [ 280.209843] ? inet_reqsk_alloc+0x1c/0x120 [ 280.209845] ? kmem_cache_alloc+0x264/0x290 [ 280.209849] tcp_conn_request+0x303/0xae0 [ 280.209854] ? printk+0x53/0x6a [ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374 [ 280.209859] tcp_rcv_state_process+0x28f/0x1374 [ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209869] tcp_v4_rcv+0xed6/0xfa0 [ 280.209873] ip_protocol_deliver_rcu+0x28/0x270 [ 280.209875] ip_local_deliver_finish+0x89/0x120 [ 280.209877] ip_local_deliver+0x180/0x220 [ 280.209881] ip_rcv+0x166/0x210 [ 280.209885] __netif_receive_skb_one_core+0x82/0x90 [ 280.209888] process_backlog+0xd6/0x230 [ 280.209891] net_rx_action+0x13a/0x410 [ 280.209895] __do_softirq+0xcf/0x468 [ 280.209899] asm_call_on_stack+0x12/0x20 [ 280.209901] </IRQ> [ 280.209903] ? ip_finish_output2+0x240/0x9a0 [ 280.209906] do_softirq_own_stack+0x4d/0x60 [ 280.209908] do_softirq.part.0+0x2b/0x60 [ 280.209911] __local_bh_enable_ip+0x9a/0xa0 [ 280.209913] ip_finish_output2+0x264/0x9a0 [ 280.209916] ? rcu_read_lock_held+0x4d/0x60 [ 280.209920] ? ip_output+0x7a/0x250 [ 280.209922] ip_output+0x7a/0x250 [ 280.209925] ? __ip_finish_output+0x330/0x330 [ 280.209928] __ip_queue_xmit+0x1dc/0x5a0 [ 280.209931] __tcp_transmit_skb+0xa0f/0xc70 [ 280.209937] tcp_connect+0xb03/0xff0 [ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209942] ? ktime_get_with_offset+0x125/0x150 [ 280.209944] ? trace_hardirqs_on+0x1c/0xe0 [ 280.209948] tcp_v4_connect+0x449/0x550 [ 280.209953] __inet_stream_connect+0xbb/0x320 [ 280.209955] ? mark_held_locks+0x49/0x70 [ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0 [ 280.209963] inet_stream_connect+0x32/0x50 [ 280.209966] __mptcp_subflow_connect+0x1fd/0x242 [ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600 [ 280.209975] mptcp_worker+0x543/0x7a0 [ 280.209980] process_one_work+0x26d/0x5b0 [ 280.209984] ? process_one_work+0x5b0/0x5b0 [ 280.209987] worker_thread+0x48/0x3d0 [ 280.209990] ? process_one_work+0x5b0/0x5b0 [ 280.209993] kthread+0x117/0x150 [ 280.209996] ? kthread_park+0x80/0x80 [ 280.209998] ret_from_fork+0x22/0x30 Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'mptcp-fix-subflow-s-local_id-remote_id-issues'David S. Miller
Geliang Tang says: ==================== mptcp: fix subflow's local_id/remote_id issues v2: - add Fixes tags; - simply with 'return addresses_equal'; - use 'reversed Xmas tree' way. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10mptcp: fix subflow's remote_id issuesGeliang Tang
This patch set the init remote_id to zero, otherwise it will be a random number. Then it added the missing subflow's remote_id setting code both in __mptcp_subflow_connect and in subflow_ulp_clone. Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM") Fixes: ec3edaa7ca6ce ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: f296234c98a8f ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10mptcp: fix subflow's local_id issuesGeliang Tang
In mptcp_pm_nl_get_local_id, skc_local is the same as msk_local, so it always return 0. Thus every subflow's local_id is 0. It's incorrect. This patch fixed this issue. Also, we need to ignore the zero address here, like 0.0.0.0 in IPv4. When we use the zero address as a local address, it means that we can use any one of the local addresses. The zero address is not a new address, we don't need to add it to PM, so this patch added a new function address_zero to check whether an address is the zero address, if it is, we ignore this address. Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'Allow-more-than-255-IPv4-multicast-interfaces'David S. Miller
Paul Davey says: ==================== Allow more than 255 IPv4 multicast interfaces Currently it is not possible to use more than 255 multicast interfaces for IPv4 due to the format of the igmpmsg header which only has 8 bits available for the VIF ID. There is space available in the igmpmsg header to store the full VIF ID in the form of an unused byte following the VIF ID field. There is also enough space for the full VIF ID in the Netlink cache notifications, however the value is currently taken directly from the igmpmsg header and has thus already been truncated. Adding the high byte of the VIF ID into the unused3 byte of igmpmsg allows use of more than 255 IPv4 multicast interfaces. The full VIF ID is also available in the Netlink notification by assembling it from both bytes from the igmpmsg. Additionally this reveals a deficiency in the Netlink cache report notifications, they lack any means for differentiating cache reports relating to different multicast routing tables. This is easily resolved by adding the multicast route table ID to the cache reports. changes in v2: - Added high byte of VIF ID to igmpmsg struct replacing unused3 member. - Assemble VIF ID in Netlink notification from both bytes in igmpmsg header. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10ipmr: Use full VIF ID in netlink cache reportsPaul Davey
Insert the full 16 bit VIF ID into ipmr Netlink cache reports. The VIF_ID attribute has 32 bits of space so can store the full VIF ID extracted from the high and low byte fields in the igmpmsg. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10ipmr: Add high byte of VIF ID to igmpmsgPaul Davey
Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the VIF ID. If using more than 255 IPv4 multicast interfaces it is necessary to have access to a VIF ID for cache reports that is wider than 8 bits, the VIF ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide in the igmpmsg header. Adding the high 8 bits of the 16 bit VIF ID in the unused byte allows use of more than 255 IPv4 multicast interfaces. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10ipmr: Add route table ID to netlink cache reportsPaul Davey
Insert the multicast route table ID as a Netlink attribute to Netlink cache report notifications. When multiple route tables are in use it is necessary to have a way to determine which route table a given cache report belongs to when receiving the cache report. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10tipc: fix shutdown() of connection oriented socketTetsuo Handa
I confirmed that the problem fixed by commit 2a63866c8b51a3f7 ("tipc: fix shutdown() of connectionless socket") also applies to stream socket. ---------- #include <sys/socket.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char *argv[]) { int fds[2] = { -1, -1 }; socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds); if (fork() == 0) _exit(read(fds[0], NULL, 1)); shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */ wait(NULL); /* To be woken up by _exit(). */ return 0; } ---------- Since shutdown(SHUT_RDWR) should affect all processes sharing that socket, unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right behavior. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10tools: bpftool: Include common options from separate fileQuentin Monnet
Nearly all man pages for bpftool have the same common set of option flags (--help, --version, --json, --pretty, --debug). The description is duplicated across all the pages, which is more difficult to maintain if the description of an option changes. It may also be confusing to sort out what options are not "common" and should not be copied when creating new manual pages. Let's move the description for those common options to a separate file, which is included with a RST directive when generating the man pages. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162500.17010-3-quentin@isovalent.com
2020-09-10tools: bpftool: Print optional built-in features along with versionQuentin Monnet
Bpftool has a number of features that can be included or left aside during compilation. This includes: - Support for libbfd, providing the disassembler for JIT-compiled programs. - Support for BPF skeletons, used for profiling programs or iterating on the PIDs of processes associated with BPF objects. In order to make it easy for users to understand what features were compiled for a given bpftool binary, print the status of the two features above when showing the version number for bpftool ("bpftool -V" or "bpftool version"). Document this in the main manual page. Example invocations: $ bpftool version ./bpftool v5.9.0-rc1 features: libbfd, skeletons $ bpftool -p version { "version": "5.9.0-rc1", "features": { "libbfd": true, "skeletons": true } } Some other parameters are optional at compilation ("DISASM_FOUR_ARGS_SIGNATURE", LIBCAP support) but they do not impact significantly bpftool's behaviour from a user's point of view, so their status is not reported. Available commands and supported program types depend on the version number, and are therefore not reported either. Note that they are already available, albeit without JSON, via bpftool's help messages. v3: - Use a simple list instead of boolean values for plain output. v2: - Fix JSON (object instead or array for the features). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162500.17010-2-quentin@isovalent.com
2020-09-10selftests, bpftool: Add bpftool (and eBPF helpers) documentation buildQuentin Monnet
eBPF selftests include a script to check that bpftool builds correctly with different command lines. Let's add one build for bpftool's documentation so as to detect errors or warning reported by rst2man when compiling the man pages. Also add a build to the selftests Makefile to make sure we build bpftool documentation along with bpftool when building the selftests. This also builds and checks warnings for the man page for eBPF helpers, which is built along bpftool's documentation. This change adds rst2man as a dependency for selftests (it comes with Python's "docutils"). v2: - Use "--exit-status=1" option for rst2man instead of counting lines from stderr. - Also build bpftool as part as the selftests build (and not only when the tests are actually run). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162251.15498-3-quentin@isovalent.com
2020-09-10tools: bpftool: Log info-level messages when building bpftool man pagesQuentin Monnet
To build man pages for bpftool (and for eBPF helper functions), rst2man can log different levels of information. Let's make it log all levels to keep the RST files clean. Doing so, rst2man complains about double colons, used for literal blocks, that look like underlines for section titles. Let's add the necessary blank lines. v2: - Use "--verbose" instead of "-r 1" (same behaviour but more readable). - Pass it through a RST2MAN_OPTS variable so we can easily pass other options too. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909162251.15498-2-quentin@isovalent.com
2020-09-10bpf: Remove duplicate headersChen Zhou
Remove duplicate headers which are included twice. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com
2020-09-10powercap: make documentation reflect codeAmit Kucheria
Fix up the documentation of the struct powercap_control_type members to match the code. Also fixup stray whitespace. Signed-off-by: Amit Kucheria <amitk@kernel.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-10PM: <linux/device.h>: fix @em_pd kernel-doc warningRandy Dunlap
Fix kernel-doc warning in <linux/device.h>: ../include/linux/device.h:613: warning: Function parameter or member 'em_pd' not described in 'device' Fixes: 1bc138c62295 ("PM / EM: add support for other devices than CPUs in Energy Model") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-10powercap/intel_rapl: add support for AlderLakeZhang Rui
Add intel_rapl support for the AlderLake platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-10powercap/intel_rapl: add support for RocketLakeZhang Rui
Add intel_rapl support for the RocketLake platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>