summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-16net: hns3: cleanup of stray struct hns3_link_mode_mappingSalil Mehta
This patch cleans-up the stray left over code. It has no functionality impact. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: fix fastopen for non-blocking connect()Ursula Braun
FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to TCP native during connection start, the FASTOPEN setsockopts trigger this fallback, if the SMC-socket is still in state SMC_INIT. But if a FASTOPEN setsockopt is called after a non-blocking connect(), this is broken, and fallback does not make sense. This change complements commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect") and fixes the syzbot reported problem "WARNING in smc_unhash_sk". Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bonding: symmetric ICMP transmitMatteo Croce
A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports to balance packets between slaves. With some network errors, we receive an ICMP error packet by the remote host or a router. If sent by a router, the source IP can differ from the remote host one. Additionally the ICMP protocol has no port numbers, so a layer3+4 bonding will get a different hash than the previous one. These two conditions could let the packet go through a different interface than the other packets of the same flow: # tcpdump -qltnni veth0 |sed 's/^/0: /' & # tcpdump -qltnni veth1 |sed 's/^/1: /' & # hping3 -2 192.168.0.2 -p 9 0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 An ICMP error packet contains the header of the packet which caused the network error, so inspect it and match the flow against it, so we can send the ICMP via the same interface of the previous packet in the flow. Move the IP and port dissect code into a generic function bond_flow_ip() and if we are dissecting an ICMP error packet, call it again with the adjusted offset. # hping3 -2 192.168.0.2 -p 9 1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: mscc: ocelot: omit error check from of_get_phy_modeHoratiu Vultur
The commit 0c65b2b90d13c ("net: of_get_phy_mode: Change API to solve int/unit warnings") updated the function of_get_phy_mode declaration. Now it returns an error code and in case the node doesn't contain the property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would set the phy_interface_t to PHY_INTERFACE_MODE_NA. Ocelot VSC7514 has 4 internal phys which have the phy interface PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need to add the error check. Updates for v2: - drop error check because of_get_phy_mode already assigns phy_interface to PHY_INTERFACE_MODE in case of error. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: core: allow fast GRO for skbs with Ethernet header in headAlexander Lobakin
Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely non-linear packets") back in May'09 (v2.6.31-rc1) has changed the original condition '!skb_headlen(skb)' to 'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since the drivers that need this optimisation all provide completely non-linear packets" (note that this condition has become the current 'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit ced14f6804a9 ("net: Correct comparisons and calculations using skb->tail and skb-transport_header") without any functional changes). For now, we have the following rough statistics for v5.4-rc7: 1) napi_gro_frags: 14 2) napi_gro_receive with skb->head containing (most of) payload: 83 3) napi_gro_receive with skb->head containing all the headers: 20 4) napi_gro_receive with skb->head containing only Ethernet header: 2 With the current condition, fast GRO with the usage of NAPI_GRO_CB(skb)->frag0 is available only in the [1] case. Packets pushed by [2] and [3] go through the 'slow' path, but it's not a problem for them as they already contain all the needed headers in skb->head, so pskb_may_pull() only moves skb->data. The layout of skbs in the fourth [4] case at the moment of dev_gro_receive() is identical to skbs that have come through [1], as napi_frags_skb() pulls Ethernet header to skb->head. The only difference is that the mentioned condition is always false for them, because skb_put() and friends irreversibly alter the tail pointer. They also go through the 'slow' path, but now every single pskb_may_pull() in every single .gro_receive() will call the *really* slow __pskb_pull_tail() to pull headers to head. This significantly decreases the overall performance for no visible reasons. The only two users of method [4] is: * drivers/staging/qlge * drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq) Note that in case with wireless drivers we can't use [1] (napi_gro_frags()) at least for now and mac80211 stack always performs pushes and pulls anyways, so performance hit is inavoidable. At the moment of v2.6.31 the mentioned change was necessary (that's why I don't add the "Fixes:" tag), but it became obsolete since skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro: restore frag0 optimization"), so we can simply revert the condition in gro_reset_offset() to allow skbs from [4] go through the 'fast' path just like in case [1]. This was tested on a 600 MHz MIPS CPU and a custom driver and this patch gave boosts up to 40 Mbps to method [4] in both directions comparing to net-next, which made overall performance relatively close to [1] (without it, [4] is the slowest). v2: - Add more references and explanations to commit message - Fix some typos ibid - No functional changes Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16rds: ib: update WR sizes when bringing up connectionDag Moxnes
Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result, a connection being down while rds_ib_sysctl_max_send_wr or rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when it comes back up. Move resizing of WRs to rds_ib_setup_qp so that connections will be setup with the most current WR sizes. Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: gemini: add missed free_netdevChuhong Yuan
This driver forgets to free allocated netdev in remove like what is done in probe failure. Add the free to fix it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'bnx2x-Remove-function-casts'David S. Miller
Kees Cook says: ==================== bnx2x: Remove function casts In order to make the entire kernel usable under Clang's Control Flow Integrity protections, function prototype casts need to be avoided because this will trip CFI checks at runtime (i.e. a mismatch between the caller's expected function prototype and the destination function's prototype). Many of these cases can be found with -Wcast-function-type, which found that bnx2x had a bunch of needless (or at least confusing) function casts. This series removes them all. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bnx2x: Remove hw_reset_t function castsKees Cook
All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a void return type. No callers of .hw_reset check a return value and bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all hw_reset_t casts and fix the return type to void. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bnx2x: Remove format_fw_ver_t function castsKees Cook
The return values for format_fw_ver_t callbacks are supposed to be "int", not "u8". Ultimately, the top-level caller doesn't actually check the return value at all, but just clean this all up anyway and fix the prototypes so that casts are no longer needed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bnx2x: Remove config_init_t function castsKees Cook
No callers of .config_init check return values. Remove the casting and change all callbacks to have the correct function prototype. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bnx2x: Remove read_status_t function castsKees Cook
The function casts for .read_status callbacks end up casting some int return values to u8. This seems to be bug-prone (-EINVAL being returned into something that appears to be true/false), but fixing the function prototypes doesn't change the existing behavior. Fix the return values to remove the casts. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bnx2x: Drop redundant callback function castsKees Cook
NULL is already "void *" so it will auto-cast in assignments and initializers. Additionally, all the callbacks for .link_reset, .config_loopback, .set_link_led, and .phy_specific_func are already correct. No casting is needed for these, so remove them. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16enetc: update TSN Qbv PSPEED set according to adjust link speedPo Liu
ENETC has a register PSPEED to indicate the link speed of hardware. It is need to update accordingly. PSPEED field needs to be updated with the port speed for QBV scheduling purposes. Or else there is chance for gate slot not free by frame taking the MAC if PSPEED and phy speed not match. So update PSPEED when link adjust. This is implement by the adjust_link. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16enetc: Configure the Time-Aware Scheduler via tc-taprio offloadPo Liu
ENETC supports in hardware for time-based egress shaping according to IEEE 802.1Qbv. This patch implement the Qbv enablement by the hardware offload method qdisc tc-taprio method. Also update cbdr writeback to up level since control bd ring may writeback data to control bd ring. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16page_pool: do not release pool until inflight == 0.Jonathan Lemon
The page pool keeps track of the number of pages in flight, and it isn't safe to remove the pool until all pages are returned. Disallow removing the pool until all pages are back, so the pool is always available for page producers. Make the page pool responsible for its own delayed destruction instead of relying on XDP, so the page pool can be used without the xdp memory model. When all pages are returned, free the pool and notify xdp if the pool is registered with the xdp memory system. Have the callback perform a table walk since some drivers (cpsw) may share the pool among multiple xdp_rxq_info. Note that the increment of pages_state_release_cnt may result in inflight == 0, resulting in the pool being released. Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'smc-last-part-of-termination-improvements'David S. Miller
Karsten Graul says: ==================== last part of termination improvements Patches 1 and 2 finish the set of termination patches, introducing a reboot handler that terminates all link groups. Patch 3 adds an rcu_barrier before the module is unloaded, and patch 4 is cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: remove unused constantUrsula Braun
Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since commit 3d502067599f ("net/smc: simplify wait when closing listen socket") no longer used. Remove it. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: use rcu_barrier() on module unloadUrsula Braun
Add rcu_barrier() to make sure no RCU readers or callbacks are pending when the module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: guarantee removal of link groups in rebootUrsula Braun
When rebooting it should be guaranteed all link groups are cleaned up and freed. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net/smc: introduce bookkeeping of SMCR link groupsUrsula Braun
If the smc module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. Counters for the total number of SMCR link groups and for the total number of SMCR links per IB device are introduced. smc module unloading continues only if the total number of SMCR link groups is zero. IB device removal continues only it the total number of SMCR links per IB device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvidVladimir Oltean
This sequence of operations: ip link set dev br0 type bridge vlan_filtering 1 bridge vlan del dev swp2 vid 1 ip link set dev br0 type bridge vlan_filtering 1 ip link set dev br0 type bridge vlan_filtering 0 apparently fails with the message: [ 31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering [ 31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0) [ 31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2 [ 31.335599] ------------[ cut here ]------------ [ 31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4 [ 31.349981] br0: Commit of attribute (id=6) failed. [ 31.354890] Modules linked in: [ 31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062 [ 31.366167] Hardware name: Freescale LS1021A [ 31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14) [ 31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c) [ 31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c) [ 31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8) [ 31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4) [ 31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118) [ 31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518) [ 31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c) [ 31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60) [ 31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c) [ 31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110) [ 31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8) [ 31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4) [ 31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250) [ 31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c) [ 31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28) [ 31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0) [ 31.505122] 7fa0: 00000002 b6f2e060 00000003 beabd6a4 00000000 00000000 [ 31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4 [ 31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68 The reason is the implementation of br_get_pvid: static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg) { if (!vg) return 0; smp_rmb(); return vg->pvid; } Since VID 0 is an invalid pvid from the bridge's point of view, let's add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that doesn't really exist. Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'seg6-fixes-to-Segment-Routing-in-IPv6'David S. Miller
Andrea Mayer says: ==================== seg6: fixes to Segment Routing in IPv6 This patchset is divided in 2 patches and it introduces some fixes to Segment Routing in IPv6, which are: - in function get_srh() fix the srh pointer after calling pskb_may_pull(); - fix the skb->transport_header after calling decap_and_validate() function; Any comments on the patchset are welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix skb transport_header after decap_and_validate()Andrea Mayer
in the receive path (more precisely in ip6_rcv_core()) the skb->transport_header is set to skb->network_header + sizeof(*hdr). As a consequence, after routing operations, destination input expects to find skb->transport_header correctly set to the next protocol (or extension header) that follows the network protocol. However, decap behaviors (DX*, DT*) remove the outer IPv6 and SRH extension and do not set again the skb->transport_header pointer correctly. For this reason, the patch sets the skb->transport_header to the skb->network_header + sizeof(hdr) in each DX* and DT* behavior. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix srh pointer in get_srh()Andrea Mayer
pskb_may_pull may change pointers in header. For this reason, it is mandatory to reload any pointer that points into skb header. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: stmmac: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to STMicroelectronics based Multi-Gigabit Ethernet driver. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16octeontx2-af: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to Marvell OcteonTX2 network devices. It uses an expilict block comment for the SPDX License Identifier. Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "11 fixes" MM fixes and one xz decompressor fix. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/debug.c: PageAnon() is true for PageKsm() pages mm/debug.c: __dump_page() prints an extra line mm/page_io.c: do not free shared swap slots mm/memory_hotplug: fix try_offline_node() mm,thp: recheck each page before collapsing file THP mm: slub: really fix slab walking for init_on_free mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations mm: fix trying to reclaim unevictable lru page when calling madvise_pageout mm: mempolicy: fix the wrong return value and potential pages leak of mbind
2019-11-16x86/speculation: Fix redundant MDS mitigation messageWaiman Long
Since MDS and TAA mitigations are inter-related for processors that are affected by both vulnerabilities, the followiing confusing messages can be printed in the kernel log: MDS: Vulnerable MDS: Mitigation: Clear CPU buffers To avoid the first incorrect message, defer the printing of MDS mitigation after the TAA mitigation selection has been done. However, that has the side effect of printing TAA mitigation first before MDS mitigation. [ bp: Check box is affected/mitigations are disabled first before printing and massage. ] Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mark Gross <mgross@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-3-longman@redhat.com
2019-11-16usb: typec: tcpm: Remove tcpc_config configuration mechanismHans de Goede
All configuration can and should be done through fwnodes instead of through the tcpc_config struct and there are no existing users left of struct tcpc_config, so lets remove it. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20191114111840.40876-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-16staging: rtl*: Remove tasklet callback castsKees Cook
In order to make the entire kernel usable under Clang's Control Flow Integrity protections, function prototype casts need to be avoided because this will trip CFI checks at runtime (i.e. a mismatch between the caller's expected function prototype and the destination function's prototype). Many of these cases can be found with -Wcast-function-type, which found that the rtl wifi drivers had a bunch of needless function casts. Remove function casts for tasklet callbacks in the various drivers. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/201911150926.2894A4F973@keescook Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-16x86/speculation: Fix incorrect MDS/TAA mitigation statusWaiman Long
For MDS vulnerable processors with TSX support, enabling either MDS or TAA mitigations will enable the use of VERW to flush internal processor buffers at the right code path. IOW, they are either both mitigated or both not. However, if the command line options are inconsistent, the vulnerabilites sysfs files may not report the mitigation status correctly. For example, with only the "mds=off" option: vulnerabilities/mds:Vulnerable; SMT vulnerable vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable The mds vulnerabilities file has wrong status in this case. Similarly, the taa vulnerability file will be wrong with mds mitigation on, but taa off. Change taa_select_mitigation() to sync up the two mitigation status and have them turned off if both "mds=off" and "tsx_async_abort=off" are present. Update documentation to emphasize the fact that both "mds=off" and "tsx_async_abort=off" have to be specified together for processors that are affected by both TAA and MDS to be effective. [ bp: Massage and add kernel-parameters.txt change too. ] Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-doc@vger.kernel.org Cc: Mark Gross <mgross@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
2019-11-16mei: bus: add more client attributes to sysfsAlexander Usyskin
Export more client attributes via sysfs that are usually obtained upon connection. In some cases, for example a monitoring application may wish to know the attributes without actually performing the connection. Added attributes: max number of connections, fixed address, max message length. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20191116142136.17535-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-16x86/entry/64: Remove pointless jump in paranoid_exitThomas Gleixner
Jump directly to restore_regs_and_return_to_kernel instead of making a pointless extra jump through .Lparanoid_exit_restore Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191023123117.779277679@linutronix.de
2019-11-16x86/entry/32: Remove unused resume_userspace labelThomas Gleixner
The C reimplementation of SYSENTER left that unused ENTRY() label around. Remove it. Fixes: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path") Originally-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191023123117.686514045@linutronix.de
2019-11-16ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event deviceBenjamin Gaignard
On platforms implementing CPU power management, the CPUidle subsystem can allow CPUs to enter idle states where local timers logic is lost on power down. To keep the software timers functional the kernel relies on an always-on broadcast timer to be present in the platform to relay the interrupt signalling the timer expiries. For platforms implementing CPU core gating that do not implement an always-on HW timer or implement it in a broken way, this patch adds code to initialize the kernel hrtimer based clock event device upon boot (which can be chosen as tick broadcast device by the kernel). It relies on a dynamically chosen CPU to be always powered-up. This CPU then relays the timer interrupt to CPUs in deep-idle states through its HW local timer device. Having a CPU always-on has implications on power management platform capabilities and makes CPUidle suboptimal, since at least a CPU is kept always in a shallow idle state by the kernel to relay timer interrupts, but at least leaves the kernel with a functional system with some working power management capabilities. The hrtimer based clock event device is unconditionally registered, but has the lowest possible rating such that any broadcast-capable HW clock event device present will be chosen in preference as the tick broadcast device. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-11-16ARM: 8937/1: spectre-v2: remove Brahma-B53 from hardeningDoug Berger
When the default processor handling was added to the function cpu_v7_spectre_init() it only excluded other ARM implemented processor cores. The Broadcom Brahma B53 core is not implemented by ARM so it ended up falling through into the set of processors that attempt to use the ARM_SMCCC_ARCH_WORKAROUND_1 service to harden the branch predictor. Since this workaround is not necessary for the Brahma-B53 this commit explicitly checks for it and prevents it from applying a branch predictor hardening workaround. Fixes: 10115105cb3a ("ARM: spectre-v2: add firmware based hardening") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-11-16x86/entry/32: Clarify register saving in __switch_to_asm()Thomas Gleixner
commit 6690e86be83a ("sched/x86: Save [ER]FLAGS on context switch") re-introduced the flags saving on context switch to prevent AC leakage. The pushf/popf instructions are right among the callee saved register section, so the comment explaining the save/restore is not entirely correct. Add a seperate comment to pushf/popf explaining the reason. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16selftests/x86/iopl: Extend test to cover IOPL emulationThomas Gleixner
Add tests that the now emulated iopl() functionality: - does not longer allow user space to disable interrupts. - does restore a I/O bitmap when IOPL is dropped Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/ioperm: Extend IOPL config to control ioperm() as wellThomas Gleixner
If iopl() is disabled, then providing ioperm() does not make much sense. Rename the config option and disable/enable both syscalls with it. Guard the code with #ifdefs where appropriate. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/iopl: Remove legacy IOPL optionThomas Gleixner
The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy cruft dealing with the (e)flags based IOPL mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts) Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16x86/iopl: Restrict iopl() permission scopeThomas Gleixner
The access to the full I/O port range can be also provided by the TSS I/O bitmap, but that would require to copy 8k of data on scheduling in the task. As shown with the sched out optimization TSS.io_bitmap_base can be used to switch the incoming task to a preallocated I/O bitmap which has all bits zero, i.e. allows access to all I/O ports. Implementing this allows to provide an iopl() emulation mode which restricts the IOPL level 3 permissions to I/O port access but removes the STI/CLI permission which is coming with the hardware IOPL mechansim. Provide a config option to switch IOPL to emulation mode, make it the default and while at it also provide an option to disable IOPL completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16x86/iopl: Fixup misleading commentThomas Gleixner
The comment for the sys_iopl() implementation is outdated and actively misleading in some parts. Fix it up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16selftests/x86/ioperm: Extend testing so the shared bitmap is exercisedThomas Gleixner
Add code to the fork path which forces the shared bitmap to be duplicated and the reference count to be dropped. Verify that the child modifications did not affect the parent. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/ioperm: Share I/O bitmap if identicalThomas Gleixner
The I/O bitmap is duplicated on fork. That's wasting memory and slows down fork. There is no point to do so. As long as the bitmap is not modified it can be shared between threads and processes. Add a refcount and just share it on fork. If a task modifies the bitmap then it has to do the duplication if and only if it is shared. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16x86/ioperm: Remove bitmap if all permissions droppedThomas Gleixner
If ioperm() results in a bitmap with all bits set (no permissions to any I/O port), then handling that bitmap on context switch and exit to user mode is pointless. Drop it. Move the bitmap exit handling to the ioport code and reuse it for both the thread exit path and dropping it. This allows to reuse this code for the upcoming iopl() emulation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16x86/ioperm: Move TSS bitmap update to exit to user workThomas Gleixner
There is no point to update the TSS bitmap for tasks which use I/O bitmaps on every context switch. It's enough to update it right before exiting to user space. That reduces the context switch bitmap handling to invalidating the io bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP set. The invaldiation is done on purpose when a task with an IO bitmap switches out to prevent any possible leakage of an activated IO bitmap. It also removes the requirement to update the tasks bitmap atomically in ioperm(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/ioperm: Add bitmap sequence numberThomas Gleixner
Add a globally unique sequence number which is incremented when ioperm() is changing the I/O bitmap of a task. Store the new sequence number in the io_bitmap structure and compare it with the sequence number of the I/O bitmap which was last loaded on a CPU. Only update the bitmap if the sequence is different. That should further reduce the overhead of I/O bitmap scheduling when there are only a few I/O bitmap users on the system. The 64bit sequence counter is sufficient. A wraparound of the sequence counter assuming an ioperm() call every nanosecond would require about 584 years of uptime. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/ioperm: Move iobitmap data into a structThomas Gleixner
No point in having all the data in thread_struct, especially as upcoming changes add more. Make the bitmap in the new struct accessible as array of longs and as array of characters via a union, so both the bitmap functions and the update logic can avoid type casts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16x86/tss: Move I/O bitmap data into a seperate structThomas Gleixner
Move the non hardware portion of I/O bitmap data into a seperate struct for readability sake. Originally-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>