summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-16net: dsa: sja1105: Add static config tables for schedulingVladimir Oltean
In order to support tc-taprio offload, the TTEthernet egress scheduling core registers must be made visible through the static interface. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: dsa: Pass ndo_setup_tc slave callback to driversVladimir Oltean
DSA currently handles shared block filters (for the classifier-action qdisc) in the core due to what I believe are simply pragmatic reasons - hiding the complexity from drivers and offerring a simple API for port mirroring. Extend the dsa_slave_setup_tc function by passing all other qdisc offloads to the driver layer, where the driver may choose what it implements and how. DSA is simply a pass-through in this case. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16taprio: Add support for hardware offloadingVinicius Costa Gomes
This allows taprio to offload the schedule enforcement to capable network cards, resulting in more precise windows and less CPU usage. The gate mask acts on traffic classes (groups of queues of same priority), as specified in IEEE 802.1Q-2018, and following the existing taprio and mqprio semantics. It is up to the driver to perform conversion between tc and individual netdev queues if for some reason it needs to make that distinction. Full offload is requested from the network interface by specifying "flags 2" in the tc qdisc creation command, which in turn corresponds to the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit. The important detail here is the clockid which is implicitly /dev/ptpN for full offload, and hence not configurable. A reference counting API is added to support the use case where Ethernet drivers need to keep the taprio offload structure locally (i.e. they are a multi-port switch driver, and configuring a port depends on the settings of other ports as well). The refcount_t variable is kept in a private structure (__tc_taprio_qopt_offload) and not exposed to drivers. In the future, the private structure might also be expanded with a backpointer to taprio_sched *q, to implement the notification system described in the patch (of when admin became oper, or an error occurred, etc, so the offload can be monitored with 'tc qdisc show'). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge tag 'core-process-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd/waitid updates from Christian Brauner: "This contains two features and various tests. First, it adds support for waiting on process through pidfds by adding the P_PIDFD type to the waitid() syscall. This completes the basic functionality of the pidfd api (cf. [1]). In the meantime we also have a new adition to the userspace projects that make use of the pidfd api. The qt project was nice enough to send a mail pointing out that they have a pr up to switch to the pidfd api (cf. [2]). Second, this tag contains an extension to the waitid() syscall to make it possible to wait on the current process group in a race free manner (even though the actual problem is very unlikely) by specifing 0 together with the P_PGID type. This extension traces back to a discussion on the glibc development mailing list. There are also a range of tests for the features above. Additionally, the test-suite which detected the pidfd-polling race we fixed in [3] is included in this tag" [1] https://lwn.net/Articles/794707/ [2] https://codereview.qt-project.org/c/qt/qtbase/+/108456 [3] commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state") * tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: waitid: Add support for waiting for the current process group tests: add pidfd poll tests tests: move common definitions and functions into pidfd.h pidfd: add pidfd_wait tests pidfd: add P_PIDFD to waitid()
2019-09-16net: phylink: clarify where phylink should be usedRussell King
Update the phylink documentation to make it clear that phylink is designed to be used on the MAC facing side of the link, rather than between a SFP and PHY. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge branch 'bnxt_en-error-recovery-follow-up-patches'David S. Miller
Michael Chan says: ==================== bnxt_en: error recovery follow-up patches. A follow-up patchset for the recently added health and error recovery feature. The first fix is to prevent .ndo_set_rx_mode() from proceeding when reset is in progress. The 2nd fix is for the firmware coredump command. The 3rd and 4th patches update the error recovery process slightly to add a state that polls and waits for the firmware to be down. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.Vasundhara Volam
This new state is required when firmware indicates that the error recovery process requires polling for firmware state to be completely down before initiating reset. For example, firmware may take some time to collect the crash dump before it is down and ready to be reset. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16bnxt_en: Update firmware interface spec. to 1.10.0.100.Michael Chan
Some error recovery updates to the spec., among other minor changes. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16bnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commandsVasundhara Volam
Firmware coredump messages take much longer than standard messages, so increase the timeout accordingly. Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16bnxt_en: Don't proceed in .ndo_set_rx_mode() when device is not in open state.Michael Chan
Check the BNXT_STATE_OPEN flag instead of netif_running() in bnxt_set_rx_mode(). If the driver is going through any reset, such as firmware reset or even TX timeout, it may not be ready to set the RX mode and may crash. The new rx mode settings will be picked up when the device is opened again later. Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16null_blk: format pr_* logs with pr_fmtAndré Almeida
Instead of writing "null_blk: " at the beginning of each pr_err/info/warn log message, format messages using pr_fmt() macro. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16null_blk: match the type of parameter nr_devicesAndré Almeida
Since the variable nr_devices is an unsigned int, the module_param() should also use this type. Change the type so they can match. Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16null_blk: do not fail the module load with zero devicesAndré Almeida
The module load should fail only if there is something wrong with the configuration or if an error prevents it to work properly. The module should be able to be loaded with (nr_device == 0), since it will not trigger errors or be in malfunction state. Preventing loading with zero devices also breaks applications that configures this module using configfs API. Remove the nr_device check to fix this. Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16tcp: Add snd_wnd to TCP_INFOThomas Higdon
Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP performance problems -- > (1) Usually when we're diagnosing TCP performance problems, we do so > from the sender, since the sender makes most of the > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc). > From the sender-side the thing that would be most useful is to see > tp->snd_wnd, the receive window that the receiver has advertised to > the sender. This serves the purpose of adding an additional __u32 to avoid the would-be hole caused by the addition of the tcpi_rcvi_ooopack field. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16tcp: Add TCP_INFO counter for packets received out-of-orderThomas Higdon
For receive-heavy cases on the server-side, we want to track the connection quality for individual client IPs. This counter, similar to the existing system-wide TCPOFOQueue counter in /proc/net/netstat, tracks out-of-order packet reception. By providing this counter in TCP_INFO, it will allow understanding to what degree receive-heavy sockets are experiencing out-of-order delivery and packet drops indicating congestion. Please note that this is similar to the counter in NetBSD TCP_INFO, and has the same name. Also note that we avoid increasing the size of the tcp_sock struct by taking advantage of a hole. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: mdio: switch to using gpiod_get_optional()Dmitry Torokhov
The MDIO device reset line is optional and now that gpiod_get_optional() returns proper value when GPIO support is compiled out, there is no reason to use fwnode_get_named_gpiod() that I plan to hide away. Let's switch to using more standard gpiod_get_optional() and gpiod_set_consumer_name() to keep the nice "PHY reset" label. Also there is no reason to only try to fetch the reset GPIO when we have OF node, gpiolib can fetch GPIO data from firmwares as well. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-09-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Now that initial BPF backend for gcc has been merged upstream, enable BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to bpf_sysctl.file_pos, from Ilya. 2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions related to recent work on exposing BTF info through sysfs, from Andrii. 3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem headroom to be added twice, from Ciara. 4) Refactoring work to convert sock opt tests into test_progs framework in BPF kselftests, from Stanislav. 5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke. 6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge branch 'sched/rt' into sched/core, to pick up -rt changesIngo Molnar
Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-16s390: remove pointless drivers-y in drivers/s390/MakefileMasahiro Yamada
This is unused. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-16s390/cpum_sf: Fix line length and format stringThomas Richter
Rewrite some lines to match line length and replace format string 0x%x to %#x. Add and remove blank line. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-16s390/pci: fix MSI message dataSebastian Ott
After recent changes the MSI message data needs to specify the function-relative IRQ number. Reported-and-tested-by: Alexander Schmidt <alexs@linux.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-16bpf: fix accessing bpf_sysctl.file_pos on s390Ilya Leoshkevich
"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value != nux". This is because verifier rewrites a complete 32-bit bpf_sysctl.file_pos update to a partial update of the first 32 bits of 64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian systems. Fix by using an offset on big-endian systems. Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect a problem there, since it expects to see 0, which it gets with high probability in error cases, so change it to seek to offset 3 and expect 3 in bpf_sysctl.file_pos. Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
2019-09-16xdp: Fix race in dev_map_hash_update_elem() when replacing elementToke Høiland-Jørgensen
syzbot found a crash in dev_map_hash_update_elem(), when replacing an element with a new one. Jesper correctly identified the cause of the crash as a race condition between the initial lookup in the map (which is done before taking the lock), and the removal of the old element. Rather than just add a second lookup into the hashmap after taking the lock, fix this by reworking the function logic to take the lock before the initial lookup. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16Merge branch 'bpf-af-xdp-unaligned-fixes'Daniel Borkmann
Ciara Loftus says: ==================== This patch set contains some fixes for AF_XDP zero copy in the i40e and ixgbe drivers as well as a fix for the 'xdpsock' sample application when running in unaligned mode. Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which caused the umem headroom to be added to the xdp handle twice, resulting in an incorrect value being received by the user for the case where the umem headroom is non-zero. Patch 3 fixes an issue with the xdpsock sample application whereby the start of the tx packet data (offset) was not being set correctly when the application was being run in unaligned mode. This patch set has been applied against commit a2c11b034142 ("kcm: use BPF_PROG_RUN") ==================== Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16samples/bpf: fix xdpsock l2fwd tx for unaligned modeCiara Loftus
Preserve the offset of the address of the received descriptor, and include it in the address set for the tx descriptor, so the kernel can correctly locate the start of the packet data. Fixes: 03895e63ff97 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16ixgbe: fix xdp handle calculationsCiara Loftus
Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the ixgbe_zca_free, ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16i40e: fix xdp handle calculationsCiara Loftus
Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the i40e_zca_free, i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function i40_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16selftests/bpf: add bpf-gcc supportIlya Leoshkevich
Now that binutils and gcc support for BPF is upstream, make use of it in BPF selftests using alu32-like approach. Share as much as possible of CFLAGS calculation with clang. Fixes only obvious issues, leaving more complex ones for later: - Use gcc-provided bpf-helpers.h instead of manually defining the helpers, change bpf_helpers.h include guard to avoid conflict. - Include <linux/stddef.h> for __always_inline. - Add $(OUTPUT)/../usr/include to include path in order to use local kernel headers instead of system kernel headers when building with O=. In order to activate the bpf-gcc support, one needs to configure binutils and gcc with --target=bpf and make them available in $PATH. In particular, gcc must be installed as `bpf-gcc`, which is the default. Right now with binutils 25a2915e8dba and gcc r275589 only a handful of tests work: # ./test_progs_bpf_gcc # Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED The reason for those failures are as follows: - Build errors: - `error: too many function arguments for eBPF` for __always_inline functions read_str_var and read_map_var - must be inlining issue, and for process_l3_headers_v6, which relies on optimizing away function arguments. - `error: indirect call in function, which are not supported by eBPF` where there are no obvious indirect calls in the source calls, e.g. in __encap_ipip_none. - `error: field 'lock' has incomplete type` for fields of `struct bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h, so its usage is sensitive to order of #includes. - `error: eBPF stack limit exceeded` in sysctl_tcp_mem. - Load errors: - Missing object files due to above build errors. - `libbpf: failed to create map (name: 'test_ver.bss')`. - `libbpf: object file doesn't contain bpf program`. - `libbpf: Program '.text' contains unrecognized relo data pointing to section 0`. - `libbpf: BTF is required, but is missing or corrupted` - no BTF support in gcc yet. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16net: stmmac: socfpga: re-use the `interface` parameter from platform dataAlexandru Ardelean
The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac` struct and parses it on init. The shared `stmmac_probe_config_dt()` function also parses this from the device-tree and makes it available on the returned `plat_data` (which is the same data available via `netdev_priv()`). All that's needed now is to dig that information out, via some `dev_get_drvdata()` && `netdev_priv()` calls and re-use it. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge branch 'More-fixes-for-unlocked-cls-hardware-offload-API-refactoring'David S. Miller
Vlad Buslov says: ==================== More fixes for unlocked cls hardware offload API refactoring Two fixes for my "Refactor cls hardware offload API to support rtnl-independent drivers" series and refactoring patch that implements infrastructure necessary for the fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: use get_dev() action API in flow_action infraVlad Buslov
When filling in hardware intermediate representation tc_setup_flow_action() directly obtains, checks and takes reference to dev used by mirred action, instead of using act->ops->get_dev() API created specifically for this purpose. In order to remove code duplication, refactor flow_action infra to use action API when obtaining mirred action target dev. Extend get_dev() with additional argument that is used to provide dev destructor to the user. Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: take reference to psample group in flow_action infraVlad Buslov
With recent patch set that removed rtnl lock dependency from cls hardware offload API rtnl lock is only taken when reading action data and can be released after action-specific data is parsed into intermediate representation. However, sample action psample group is passed by pointer without obtaining reference to it first, which makes it possible to concurrently overwrite the action and deallocate object pointed by psample_group pointer after rtnl lock is released but before driver finished using the pointer. To prevent such race condition, obtain reference to psample group while it is used by flow_action infra. Extend psample API with function psample_group_take() that increments psample group reference counter. Extend struct tc_action_ops with new get_psample_group() API. Implement the API for action sample using psample_group_take() and already existing psample_group_put() as a destructor. Use it in tc_setup_flow_action() to take reference to psample group pointed to by entry->sample.psample_group and release it in tc_cleanup_flow_action(). Disable bh when taking psample_groups_lock. The lock is now taken while holding action tcf_lock that is used by data path and requires bh to be disabled, so doing the same for psample_groups_lock is necessary to preserve SOFTIRQ-irq-safety. Fixes: 918190f50eb6 ("net: sched: flower: don't take rtnl lock for cls hw offloads API") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: extend flow_action_entry with destructorVlad Buslov
Generalize flow_action_entry cleanup by extending the structure with pointer to destructor function. Set the destructor in tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call entry->destructor() instead of using switch that dispatches by entry->id and manually executes cleanup. This refactoring is necessary for following patches in this series that require destructor to use tc_action->ops callbacks that can't be easily obtained in tc_cleanup_flow_action(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16MAINTAINERS: update FORCEDETH MAINTAINERS infoRain River
Many FORCEDETH NICs are used in our hosts. Several bugs are fixed and some features are developed for FORCEDETH NICs. And I have been reviewing patches for FORCEDETH NIC for several months. Mark me as the FORCEDETH NIC maintainer. I will send out the patches and maintain FORCEDETH NIC. Signed-off-by: Rain River <rain.1986.08.12@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net/wan: dscc4: remove broken dscc4 driverDan Carpenter
Using static analysis, I discovered that the "dpriv->pci_priv->pdev" pointer is always NULL. This pointer was supposed to be initialized during probe and is essential for the driver to work. It would be easy to add a "ppriv->pdev = pdev;" to dscc4_found1() but this driver has been broken since before we started using git and no one has complained so probably we should just remove it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16MAINTAINERS: xen-netback: update my email addressPaul Durrant
My Citrix email address will expire shortly. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: stmmac: Hold rtnl lock in suspend/resume callbacksJose Abreu
We need to hold rnl lock in suspend and resume callbacks because phylink requires it. Otherwise we will get a WARN() in suspend and resume. Also, move phylink start and stop callbacks to inside device's internal lock so that we prevent concurrent HW accesses. Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic") Reported-by: Christophe ROULLIER <christophe.roullier@st.com> Tested-by: Christophe ROULLIER <christophe.roullier@st.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16ip6_gre: fix a dst leak in ip6erspan_tunnel_xmitXin Long
In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to be freed on the tx_err path. Otherwise when deleting a netns, it would cause dst/dev to leak, and dmesg shows: unregister_netdevice: waiting for lo to become free. Usage count = 1 Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16qed: fix spelling mistake "fullill" -> "fulfill"Colin Ian King
There is a spelling mistake in a DP_VERBOSE debug message. Fix it. (Using American English spelling as this is the most common way to spell this in the kernel). Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: dsa: b53: Add support for port_egress_floods callbackFlorian Fainelli
Add support for configuring the per-port egress flooding control for both Unicast and Multicast traffic. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16udp: correct reuseport selection with connected socketsWillem de Bruijn
UDP reuseport groups can hold a mix unconnected and connected sockets. Ensure that connections only receive all traffic to their 4-tuple. Fast reuseport returns on the first reuseport match on the assumption that all matches are equal. Only if connections are present, return to the previous behavior of scoring all sockets. Record if connections are present and if so (1) treat such connected sockets as an independent match from the group, (2) only return 2-tuple matches from reuseport and (3) do not return on the first 2-tuple reuseport match to allow for a higher scoring match later. New field has_conns is set without locks. No other fields in the bitmap are modified at runtime and the field is only ever set unconditionally, so an RMW cannot miss a change. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge tag 'asoc-v5.4-2' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Final merge window fixes for v5.4 A few small fixes and one feature that came in since I sent you the earlier pull request.
2019-09-15block: also check RQF_STATS in blk_mq_need_time_stamp()Hou Tao
In __blk_mq_end_request() if block stats needs update, we should ensure now is valid instead of 0 even when iostat is disabled. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-15block: make rq sector size accessible for block statsHou Tao
Currently rq->data_len will be decreased by partial completion or zeroed by completion, so when blk_stat_add() is invoked, data_len will be zero and there will never be samples in poll_cb because blk_mq_poll_stats_bkt() will return -1 if data_len is zero. We could move blk_stat_add() back to __blk_mq_complete_request(), but that would make the effort of trying to call ktime_get_ns() once in vain. Instead we can reuse throtl_size field, and use it for both block stats and block throttle, and adjust the logic in blk_mq_poll_stats_bkt() accordingly. Fixes: 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()") Tested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-15Linux 5.3v5.3Linus Torvalds
2019-09-15Revert "ext4: make __ext4_get_inode_loc plug"Linus Torvalds
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7. This is sad, and done for all the wrong reasons. Because that commit is good, and does exactly what it says: avoids a lot of small disk requests for the inode table read-ahead. However, it turns out that it causes an entirely unrelated problem: the getrandom() system call was introduced back in 2014 by commit c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people use it as a convenient source of good random numbers. But part of the current semantics for getrandom() is that it waits for the entropy pool to fill at least partially (unlike /dev/urandom). And at least ArchLinux apparently has a systemd that uses getrandom() at boot time, and the improvements in IO patterns means that existing installations suddenly start hanging, waiting for entropy that will never happen. It seems to be an unlucky combination of not _quite_ enough entropy, together with a particular systemd version and configuration. Lennart says that the systemd-random-seed process (which is what does this early access) is supposed to not block any other boot activity, but sadly that doesn't actually seem to be the case (possibly due bogus dependencies on cryptsetup for encrypted swapspace). The correct fix is to fix getrandom() to not block when it's not appropriate, but that fix is going to take a lot more discussion. Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"? Do we add a boot-time option? Or do we just limit the amount of time it will wait for entropy? So in the meantime, we do the revert to give us time to discuss the eventual fix for the fundamental problem, at which point we can re-apply the ext4 inode table access optimization. Reported-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-15net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'Gerd Rausch
All entries in 'rds_ib_stat_names' are stringified versions of the corresponding "struct rds_ib_statistics" element without the "s_"-prefix. Fix entry 'ib_evt_handler_call' to do the same. Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance") Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15net_sched: let qdisc_put() accept NULL pointerCong Wang
When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL pointer which leads to a crash in sfb_destroy(). Similar for sch_dsmark. Instead of fixing each separately, Linus suggested to just accept NULL pointer in qdisc_put(), which would make callers easier. (For sch_dsmark, the bug probably exists long before commit 6529eaba33f0.) Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15net: dsa: Fix load order between DSA drivers and taggersAndrew Lunn
The DSA core, DSA taggers and DSA drivers all make use of module_init(). Hence they get initialised at device_initcall() time. The ordering is non-deterministic. It can be a DSA driver is bound to a device before the needed tag driver has been initialised, resulting in the message: No tagger for this switch Rather than have this be fatal, return -EPROBE_DEFER so that it is tried again later once all the needed drivers have been loaded. Fixes: d3b8c04988ca ("dsa: Add boilerplate helper to register DSA tag driver modules") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15net/sched: fix race between deactivation and dequeue for NOLOCK qdiscPaolo Abeni
The test implemented by some_qdisc_is_busy() is somewhat loosy for NOLOCK qdisc, as we may hit the following scenario: CPU1 CPU2 // in net_tx_action() clear_bit(__QDISC_STATE_SCHED...); // in some_qdisc_is_busy() val = (qdisc_is_running(q) || test_bit(__QDISC_STATE_SCHED, &q->state)); // here val is 0 but... qdisc_run(q) // ... CPU1 is going to run the qdisc next As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset() in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as both the above bit operations are under the root qdisc lock(). After commit 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue") the race can cause use after free and/or null ptr dereference, but the root cause is likely older. This patch addresses the issue explicitly checking for deactivation under the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical scenario becomes a no-op. Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(), but that is safe due to the skb_array() locking, and we can't avoid that for NOLOCK qdiscs. Fixes: 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue") Reported-by: Li Shuang <shuali@redhat.com> Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>