summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-11-22ethtool: add support to set/get tx copybreak buf size via ethtoolHao Chen
Add support for ethtool to set/get tx copybreak buf size. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20af_unix: fix regression in read after shutdownVincent Whitchurch
On kernels before v5.15, calling read() on a unix socket after shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data previously written or EOF. But now, while read() after shutdown(SHUT_RD) still behaves the same way, read() after shutdown(SHUT_RDWR) always fails with -EINVAL. This behaviour change was apparently inadvertently introduced as part of a bug fix for a different regression caused by the commit adding sockmap support to af_unix, commit 94531cfcbe79c359 ("af_unix: Add unix_stream_proto for sockmap"). Those commits, for unclear reasons, started setting the socket state to TCP_CLOSE on shutdown(SHUT_RDWR), while this state change had previously only been done in unix_release_sock(). Restore the original behaviour. The sockmap tests in tests/selftests/bpf continue to pass after this patch. Fixes: d0c6416bd7091647f60 ("unix: Fix an issue in unix_shutdown causing the other end read/write failures") Link: https://lore.kernel.org/lkml/20211111140000.GA10779@axis.com/ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20mptcp: use delegate action to schedule 3rd ack retransPaolo Abeni
Scheduling a delack in mptcp_established_options_mp() is not a good idea: such function is called by tcp_send_ack() and the pending delayed ack will be cleared shortly after by the tcp_event_ack_sent() call in __tcp_transmit_skb(). Instead use the mptcp delegated action infrastructure to schedule the delayed ack after the current bh processing completes. Additionally moves the schedule_3rdack_retransmission() helper into protocol.c to avoid making it visible in a different compilation unit. Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20mptcp: fix delack timerEric Dumazet
To compute the rtx timeout schedule_3rdack_retransmission() does multiple things in the wrong way: srtt_us is measured in usec/8 and the timeout itself is an absolute value. Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20mptcp: sockopt: add SOL_IP freebind & transparent optionsFlorian Westphal
These options also need to be set before bind, so do the sync of msk to new ssk socket a bit earlier. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20mptcp: Support for IP_TOS for MPTCP setsockopt()Poorva Sonparote
SOL_IP provides a way to configure network layer attributes in a socket. This patch adds support for IP_TOS for setsockopt(.. ,SOL_IP, ..) Support for SOL_IP is added in mptcp_setsockopt() and IP_TOS is handled in a private function. The idea here is to take in the value passed for IP_TOS and set it to the current subflow, open subflows as well new subflows that might be created after the initial call to setsockopt(). This sync is done using sync_socket_options(.., ssk) and setting the value of tos using __ip_sock_set_tos(ssk,..). The patch has been tested using the packetdrill script here - https://github.com/multipath-tcp/mptcp_net-next/issues/220#issuecomment-947863717 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/220 Signed-off-by: Poorva Sonparote <psonparo@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20ipv4: Exposing __ip_sock_set_tos() in ip.hPoorva Sonparote
Making the static function __ip_sock_set_tos() from net/ipv4/ip_sockglue.c accessible by declaring it in include/net/ip.h The reason for doing this is to use this function to set IP_TOS value in mptcp_setsockopt() without the lock. Signed-off-by: Poorva Sonparote <psonparo@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20net: kunit: add a test for dev_addr_listsJakub Kicinski
Add a KUnit test for the dev_addr API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20dev_addr_list: put the first addr on the treeJakub Kicinski
Since all netdev->dev_addr modifications go via dev_addr_mod() we can put it on the list. When address is change remove it and add it back. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20dev_addr: add a modification checkJakub Kicinski
netdev->dev_addr should only be modified via helpers, but someone may be casting off the const. Add a runtime check to catch abuses. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20net: unexport dev_addr_init() & dev_addr_flush()Jakub Kicinski
There are no module callers in-tree and it's hard to justify why anyone would init or flush addresses of a netdev (note the flush is more of a destructor, it frees netdev->dev_addr). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20net: constify netdev->dev_addrJakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. We converted all users to make modifications via appropriate helpers, make netdev->dev_addr const. The update helpers need to upcast from the buffer to struct netdev_hw_addr. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmapJohn Fastabend
When a sock is added to a sock map we evaluate what proto op hooks need to be used. However, when the program is removed from the sock map we have not been evaluating if that changes the required program layout. Before the patch listed in the 'fixes' tag this was not causing failures because the base program set handles all cases. Specifically, the case with a stream parser and the case with out a stream parser are both handled. With the fix below we identified a race when running with a proto op that attempts to read skbs off both the stream parser and the skb->receive_queue. Namely, that a race existed where when the stream parser is empty checking the skb->receive_queue from recvmsg at the precies moment when the parser is paused and the receive_queue is not empty could result in skipping the stream parser. This may break a RX policy depending on the parser to run. The fix tag then loads a specific proto ops that resolved this race. But, we missed removing that proto ops recv hook when the sock is removed from the sockmap. The result is the stream parser is stopped so no more skbs will be aggregated there, but the hook and BPF program continues to be attached on the psock. User space will then get an EBUSY when trying to read the socket because the recvmsg() handler is now waiting on a stopped stream parser. To fix we rerun the proto ops init() function which will look at the new set of progs attached to the psock and rest the proto ops hook to the correct handlers. And in the above case where we remove the sock from the sock map the RX prog will no longer be listed so the proto ops is removed. Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211119181418.353932-3-john.fastabend@gmail.com
2021-11-20bpf, sockmap: Attach map progs to psock early for feature probesJohn Fastabend
When a TCP socket is added to a sock map we look at the programs attached to the map to determine what proto op hooks need to be changed. Before the patch in the 'fixes' tag there were only two categories -- the empty set of programs or a TX policy. In any case the base set handled the receive case. After the fix we have an optimized program for receive that closes a small, but possible, race on receive. This program is loaded only when the map the psock is being added to includes a RX policy. Otherwise, the race is not possible so we don't need to handle the race condition. In order for the call to sk_psock_init() to correctly evaluate the above conditions all progs need to be set in the psock before the call. However, in the current code this is not the case. We end up evaluating the requirements on the old prog state. If your psock is attached to multiple maps -- for example a tx map and rx map -- then the second update would pull in the correct maps. But, the other pattern with a single rx enabled map the correct receive hooks are not used. The result is the race fixed by the patch in the fixes tag below may still be seen in this case. To fix we simply set all psock->progs before doing the call into sock_map_init(). With this the init() call gets the full list of programs and chooses the correct proto ops on the first iteration instead of requiring the second update to pull them in. This fixes the race case when only a single map is used. Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211119181418.353932-2-john.fastabend@gmail.com
2021-11-19net/bridge: replace simple_strtoul to kstrtolBernard Zhao
simple_strtoull is obsolete, use kstrtol instead. Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19ethtool: stats: Use struct_group() to clear all stats at onceKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark region of struct stats_reply_data that should be initialized, which can now be done in a single memset() call. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19net/af_iucv: Use struct_group() to zero struct iucv_sock regionKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark the region of struct iucv_sock that gets initialized to zero. Avoid the future warning: In function 'fortify_memset_chk', inlined from 'iucv_sock_alloc' at net/iucv/af_iucv.c:476:2: ./include/linux/fortify-string.h:199:4: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 199 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/lkml/19ff61a0-0cda-6000-ce56-dc6b367c00d6@linux.ibm.com/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19ipv6: Use memset_after() to zero rt6_infoKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Use memset_after() to clear everything after the dst_entry member of struct rt6_info. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19net: 802: Use memset_startat() to clear struct fieldsKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Use memset_startat() so memset() doesn't get confused about writing beyond the destination member that is intended to be the starting point of zeroing through the end of the struct. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19net: dccp: Use memset_startat() for TP zeroingKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Use memset_startat() so memset() doesn't get confused about writing beyond the destination member that is intended to be the starting point of zeroing through the end of the struct. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19net/af_iucv: fix kernel doc commentsHeiko Carstens
Fix kernel doc comments where appropriate, or remove incorrect kernel doc indicators. Acked-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19net/iucv: fix kernel doc commentsHeiko Carstens
Fix kernel doc comments where appropriate or remove incorrect kernel doc indicators. Also move kernel doc comments directly before functions. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Add selftest for vrf+conntrack, from Florian Westphal. 2) Extend nfqueue selftest to cover nfqueue, also from Florian. 3) Remove duplicated include in nft_payload, from Wan Jiabing. 4) Several improvements to the nat port shadowing selftest, from Phil Sutter. 5) Fix filtering of reply tuple in ctnetlink, from Florent Fourcot. 6) Do not override error with -EINVAL in filter setup path, also from Florent. 7) Honor sysctl_expire_nodest_conn regardless conn_reuse_mode for reused connections, from yangxingwu. 8) Replace snprintf() by sysfs_emit() in xt_IDLETIMER as reported by Coccinelle, from Jing Yao. 9) Incorrect IPv6 tunnel match in flowtable offload, from Will Mortensen. 10) Switch port shadow selftest to use socat, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19cfg80211: move offchan_cac_event to a dedicated workLorenzo Bianconi
In order to make cfg80211_offchan_cac_abort() (renamed from cfg80211_offchan_cac_event) callable in other contexts and without so much locking restrictions, make it trigger a new work instead of operating directly. Do some other renames while at it to clarify. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/6145c3d0f30400a568023f67981981d24c7c6133.1635325205.git.lorenzo@kernel.org [rewrite commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-19mac80211: introduce set_radar_offchan callbackLorenzo Bianconi
Similar to cfg80211, introduce set_radar_offchan callback in mac80211_ops in order to configure a dedicated offchannel chain available on some hw (e.g. mt7915) to perform offchannel CAC detection and avoid tx/rx downtime. Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/201110606d4f3a7dfdf31440e351f2e2c375d4f0.1634979655.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-19cfg80211: implement APIs for dedicated radar detection HWLorenzo Bianconi
If a dedicated (off-channel) radar detection hardware (chain) is available in the hardware/driver, allow this to be used by calling the NL80211_CMD_RADAR_DETECT command with a new flag attribute requesting off-channel radar detection is used. Offchannel CAC (channel availability check) avoids the CAC downtime when switching to a radar channel or when turning on the AP. Drivers advertise support for this using the new feature flag NL80211_EXT_FEATURE_RADAR_OFFCHAN. Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/7468e291ef5d05d692c1738d25b8f778d8ea5c3f.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/1e60e60fef00e14401adae81c3d49f3e5f307537.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/85fa50f57fc3adb2934c8d9ca0be30394de6b7e8.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/4b6c08671ad59aae0ac46fc94c02f31b1610eb72.1634979655.git.lorenzo@kernel.org Link: https://lore.kernel.org/r/241849ccaf2c228873c6f8495bf87b19159ba458.1634979655.git.lorenzo@kernel.org [remove offchan_mutex, fix cfg80211_stop_offchan_radar_detection(), remove gfp_t argument, fix documentation, fix tracing] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18Merge tag 'net-5.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, mac80211. Current release - regressions: - devlink: don't throw an error if flash notification sent before devlink visible - page_pool: Revert "page_pool: disable dma mapping support...", turns out there are active arches who need it Current release - new code bugs: - amt: cancel delayed_work synchronously in amt_fini() Previous releases - regressions: - xsk: fix crash on double free in buffer pool - bpf: fix inner map state pruning regression causing program rejections - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue, preventing mis-selecting the best effort queue - mac80211: do not access the IV when it was stripped - mac80211: fix radiotap header generation, off-by-one - nl80211: fix getting radio statistics in survey dump - e100: fix device suspend/resume Previous releases - always broken: - tcp: fix uninitialized access in skb frags array for Rx 0cp - bpf: fix toctou on read-only map's constant scalar tracking - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs - tipc: only accept encrypted MSG_CRYPTO msgs - smc: transfer remaining wait queue entries during fallback, fix missing wake ups - udp: validate checksum in udp_read_sock() (when sockmap is used) - sched: act_mirred: drop dst for the direction from egress to ingress - virtio_net_hdr_to_skb: count transport header in UFO, prevent allowing bad skbs into the stack - nfc: reorder the logic in nfc_{un,}register_device, fix unregister - ipsec: check return value of ipv6_skip_exthdr - usb: r8152: add MAC passthrough support for more Lenovo Docks" * tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits) ptp: ocp: Fix a couple NULL vs IS_ERR() checks net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound ipv6: check return value of ipv6_skip_exthdr e100: fix device suspend/resume devlink: Don't throw an error if flash notification sent before devlink visible page_pool: Revert "page_pool: disable dma mapping support..." ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() octeontx2-af: debugfs: don't corrupt user memory NFC: add NCI_UNREG flag to eliminate the race NFC: reorder the logic in nfc_{un,}register_device NFC: reorganize the functions in nci_request tipc: check for null after calling kmemdup i40e: Fix display error code in dmesg i40e: Fix creation of first queue by omitting it if is not power of two i40e: Fix warning message and call stack during rmmod i40e driver i40e: Fix ping is lost after configuring ADq on VF i40e: Fix changing previously set num_queue_pairs for PFs i40e: Fix NULL ptr dereference on VSI filter sync i40e: Fix correct max_pkt_size on VF RX queue ...
2021-11-18xfrm: Remove duplicate assignmentluo penghao
The statement in the switch is repeated with the statement at the beginning of the while loop, so this statement is meaningless. The clang_analyzer complains as follows: net/xfrm/xfrm_policy.c:3392:2 warning: Value stored to 'exthdr' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-11-18ipv6/esp6: Remove structure variables and alignment statementsluo penghao
The definition of this variable is just to find the length of the structure after aligning the structure. The PTR alignment function is to optimize the size of the structure. In fact, it doesn't seem to be of much use, because both members of the structure are of type u32. So I think that the definition of the variable and the corresponding alignment can be deleted, the value of extralen can be directly passed in the size of the structure. The clang_analyzer complains as follows: net/ipv6/esp6.c:117:27 warning: Value stored to 'extra' during its initialization is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-11-18mctp/test: Update refcount checking in route fragment testsJeremy Kerr
In 99ce45d5e, we moved a route refcount decrement from mctp_do_fragment_route into the caller. This invalidates the assumption that the route test makes about refcount behaviour, so the route tests fail. This change fixes the test case to suit the new refcount behaviour. Fixes: 99ce45d5e7db ("mctp: Implement extended addressing") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18ipv6: ah6: use swap() to make code cleanerYao Jing
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yao Jing <yao.jing2@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18ipv6: check return value of ipv6_skip_exthdrJordy Zomer
The offset value is used in pointer math on skb->data. Since ipv6_skip_exthdr may return -1 the pointer to uh and th may not point to the actual udp and tcp headers and potentially overwrite other stuff. This is why I think this should be checked. EDIT: added {}'s, thanks Kees Signed-off-by: Jordy Zomer <jordy@pwning.systems> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18devlink: Don't throw an error if flash notification sent before devlink visibleLeon Romanovsky
The mlxsw driver calls to various devlink flash routines even before users can get any access to the devlink instance itself. For example, mlxsw_core_fw_rev_validate() one of such functions. __mlxsw_core_bus_device_register -> mlxsw_core_fw_rev_validate -> mlxsw_core_fw_flash -> mlxfw_firmware_flash -> mlxfw_status_notify -> devlink_flash_update_status_notify -> __devlink_flash_update_notify -> WARN_ON(...) It causes to the WARN_ON to trigger warning about devlink not registered. Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Reported-by: Danielle Ratson <danieller@nvidia.com> Tested-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18page_pool: Revert "page_pool: disable dma mapping support..."Yunsheng Lin
This reverts commit d00e60ee54b12de945b8493cf18c1ada9e422514. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee54b1 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: "kernelci.org bot" <bot@kernelci.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17ipv4/raw: support binding to nonlocal addressesRiccardo Paolo Bestetti
Add support to inet v4 raw sockets for binding to nonlocal addresses through the IP_FREEBIND and IP_TRANSPARENT socket options, as well as the ipv4.ip_nonlocal_bind kernel parameter. Add helper function to inet_sock.h to check for bind address validity on the base of the address type and whether nonlocal address are enabled for the socket via any of the sockopts/sysctl, deduplicating checks in ipv4/ping.c, ipv4/af_inet.c, ipv6/af_inet6.c (for mapped v4->v6 addresses), and ipv4/raw.c. Add test cases with IP[V6]_FREEBIND verifying that both v4 and v6 raw sockets support binding to nonlocal addresses after the change. Add necessary support for the test cases to nettest. Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211117090010.125393-1-pbl@bestov.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17NFC: add NCI_UNREG flag to eliminate the raceLin Ma
There are two sites that calls queue_work() after the destroy_workqueue() and lead to possible UAF. The first site is nci_send_cmd(), which can happen after the nci_close_device as below nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | flush_workqueue | del_timer_sync | nci_unregister_device | nfc_get_device destroy_workqueue | nfc_dev_up nfc_unregister_device | nci_dev_up device_del | nci_open_device | __nci_request | nci_send_cmd | queue_work !!! Another site is nci_cmd_timer, awaked by the nci_cmd_work from the nci_send_cmd. ... | ... nci_unregister_device | queue_work destroy_workqueue | nfc_unregister_device | ... device_del | nci_cmd_work | mod_timer | ... | nci_cmd_timer | queue_work !!! For the above two UAF, the root cause is that the nfc_dev_up can race between the nci_unregister_device routine. Therefore, this patch introduce NCI_UNREG flag to easily eliminate the possible race. In addition, the mutex_lock in nci_close_device can act as a barrier. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17NFC: reorder the logic in nfc_{un,}register_deviceLin Ma
There is a potential UAF between the unregistration routine and the NFC netlink operations. The race that cause that UAF can be shown as below: (FREE) | (USE) nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | nci_unregister_device | nfc_get_device nfc_unregister_device | nfc_dev_up rfkill_destory | device_del | rfkill_blocked ... | ... The root cause for this race is concluded below: 1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after the device_is_registered check. 2. Since the netlink operations are possible just after the device_add in nfc_register_device, the nfc_dev_up() can happen anywhere during the rfkill creation process, which leads to data race. This patch reorder these actions to permit 1. Once device_del is finished, the nfc_dev_up cannot dereference the rfkill object. 2. The rfkill_register need to be placed after the device_add of nfc_dev because the parent device need to be created first. So this patch keeps the order but inject device_lock to prevent the data race. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: be055b2f89b5 ("NFC: RFKILL support") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17NFC: reorganize the functions in nci_requestLin Ma
There is a possible data race as shown below: thread-A in nci_request() | thread-B in nci_close_device() | mutex_lock(&ndev->req_lock); test_bit(NCI_UP, &ndev->flags); | ... | test_and_clear_bit(NCI_UP, &ndev->flags) mutex_lock(&ndev->req_lock); | | This race will allow __nci_request() to be awaked while the device is getting removed. Similar to commit e2cb6b891ad2 ("bluetooth: eliminate the potential race condition when removing the HCI controller"). this patch alters the function sequence in nci_request() to prevent the data races between the nci_close_device(). Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Link: https://lore.kernel.org/r/20211115145600.8320-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17tipc: check for null after calling kmemdupTadeusz Struk
kmemdup can return a null pointer so need to check for it, otherwise the null key will be dereferenced later in tipc_crypto_key_xmit as can be seen in the trace [1]. Cc: tipc-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org # 5.15, 5.14, 5.10 [1] https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a58 Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20211115160143.5099-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17net: no longer stop all TX queues in dev_watchdog()Eric Dumazet
There is no reason for stopping all TX queues from dev_watchdog() Not only this stops feeding the NIC, it also migrates all qdiscs to be serviced on the cpu calling netif_tx_unlock(), causing a potential latency artifact. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: do not inline netif_tx_lock()/netif_tx_unlock()Eric Dumazet
These are not fast path, there is no point in inlining them. Also provide netif_freeze_queues()/netif_unfreeze_queues() so that we can use them from dev_watchdog() in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: annotate accesses to queue->trans_startEric Dumazet
In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: use an atomic_long_t for queue->trans_timeoutEric Dumazet
tx_timeout_show() assumed dev_watchdog() would stop all the queues, to fetch queue->trans_timeout under protection of the queue->_xmit_lock. As we want to no longer disrupt transmits, we use an atomic_long_t instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: david decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17Merge tag 'for-net-next-2021-11-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for AOSP Bluetooth Quality Report - Enables AOSP extension for Mediatek Chip (MT7921 & MT7922) - Rework of HCI command execution serialization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: sched: act_mirred: drop dst for the direction from egress to ingressXin Long
Without dropping dst, the packets sent from local mirred/redirected to ingress will may still use the old dst. ip_rcv() will drop it as the old dst is for output and its .input is dst_discard. This patch is to fix by also dropping dst for those packets that are mirred or redirected from egress to ingress in act_mirred. Note that we don't drop it for the direction change from ingress to egress, as on which there might be a user case attaching a metadata dst by act_tunnel_key that would be used later. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net: align static siphash keysEric Dumazet
siphash keys use 16 bytes. Define siphash_aligned_key_t macro so that we can make sure they are not crossing a cache line boundary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16Merge tag 'mac80211-for-net-2021-11-16' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Couple of fixes: * bad dont-reorder check * throughput LED trigger for various new(ish) paths * radiotap header generation * locking assertions in mac80211 with monitor mode * radio statistics * don't try to access IV when not present * call stop_ap for P2P_GO as well as we should * tag 'mac80211-for-net-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: fix throughput LED trigger mac80211: fix monitor_sdata RCU/locking assertions mac80211: drop check for DONT_REORDER in __ieee80211_select_queue mac80211: fix radiotap header generation mac80211: do not access the IV when it was stripped nl80211: fix radio statistics in survey dump cfg80211: call cfg80211_stop_ap when switch from P2P_GO type ==================== Link: https://lore.kernel.org/r/20211116160845.157214-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-11-16 We've added 12 non-merge commits during the last 5 day(s) which contain a total of 23 files changed, 573 insertions(+), 73 deletions(-). The main changes are: 1) Fix pruning regression where verifier went overly conservative rejecting previsouly accepted programs, from Alexei Starovoitov and Lorenz Bauer. 2) Fix verifier TOCTOU bug when using read-only map's values as constant scalars during verification, from Daniel Borkmann. 3) Fix a crash due to a double free in XSK's buffer pool, from Magnus Karlsson. 4) Fix libbpf regression when cross-building runqslower, from Jean-Philippe Brucker. 5) Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_*() helpers in tracing programs due to deadlock possibilities, from Dmitrii Banshchikov. 6) Fix checksum validation in sockmap's udp_read_sock() callback, from Cong Wang. 7) Various BPF sample fixes such as XDP stats in xdp_sample_user, from Alexander Lobakin. 8) Fix libbpf gen_loader error handling wrt fd cleanup, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: udp: Validate checksum in udp_read_sock() bpf: Fix toctou on read-only map's constant scalar tracking samples/bpf: Fix build error due to -isystem removal selftests/bpf: Add tests for restricted helpers bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs libbpf: Perform map fd cleanup for gen_loader in case of error samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu tools/runqslower: Fix cross-build samples/bpf: Fix summary per-sec stats in xdp_sample_user selftests/bpf: Check map in map pruning bpf: Fix inner map state pruning regression. xsk: Fix crash on double free in buffer pool ==================== Link: https://lore.kernel.org/r/20211116141134.6490-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16Bluetooth: Attempt to clear HCI_LE_ADV on adv set terminated error eventArchie Pusaka
We should clear the flag if the adv instance removed due to receiving this error status is the last one we have. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>