summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-11libbpf: Fix build with gcc-8Andrey Ignatov
Reported in [1]. With gcc 8.3.0 the following error is issued: cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi -fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2 -Werror -Wall -Wno-pointer-arith -Wno-sign-compare -MD -MQ 'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o 'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c ../src/libbpf.c: In function 'bpf_object__elf_collect': ../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (map_def_sz <= sizeof(struct bpf_map_def)) { ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../src/libbpf.c:827:18: note: 'map_def_sz' was declared here int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0; ^~~~~~~~~~ According to [2] -Wmaybe-uninitialized is enabled by -Wall. Same error is generated by clang's -Wconditional-uninitialized. [1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601 [2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html Fixes: d859900c4c56 ("bpf, libbpf: support global data/bss/rodata sections") Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11mailmap: add entry for email addressesDaniel Borkmann
Redirect email addresses from git log to the mainly used ones for Alexei and myself such that it is consistent with the ones in MAINTAINERS file. Useful in particular when git mailmap is enabled on broader scope, for example: $ git config --global log.mailmap true Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2019-04-11net: fou: remove redundant code in gue_udp_recvLorenzo Bianconi
Remove not useful protocol version check in gue_udp_recv since just gue version 0 can hit that code. Moreover remove duplicated hdrlen computation Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recvLorenzo Bianconi
gue tunnels run iptunnel_pull_offloads on received skbs. This can determine a possible use-after-free accessing guehdr pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device) Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10tipc: missing entries in name table of publicationsHoang Le
When binding multiple services with specific type 1Ki, 2Ki.., this leads to some entries in the name table of publications missing when listed out via 'tipc name show'. The problem is at identify zero last_type conditional provided via netlink. The first is initial 'type' when starting name table dummping. The second is continuously with zero type (node state service type). Then, lookup function failure to finding node state service type in next iteration. To solve this, adding more conditional to marked as dirty type and lookup correct service type for the next iteration instead of select the first service as initial 'type' zero. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10fou: correct spelling of encapsulationSimon Horman
Correct spelling of encapsulation. Found by inspection. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10vhost: reject zero size iova rangeJason Wang
We used to accept zero size iova range which will lead a infinite loop in translate_desc(). Fixing this by failing the request in this case. Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()Jakub Kicinski
Unlike '&&' operator, the '&' does not have short-circuit evaluation semantics. IOW both sides of the operator always get evaluated. Fix the wrong operator in tls_is_sk_tx_device_offloaded(), which would lead to out-of-bounds access for for non-full sockets. Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10failover: allow name change on IFF_UP slave interfacesSi-Wei Liu
When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. It's less risky to lift up the rename restriction on failover slave which is already UP. Although it's possible this change may potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to listen for the rename events on failover slaves. Userspace component interacting with slaves is expected to be changed to operate on failover master interface instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and userspace components should only deal with failover master in the long run. Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->sizeXiong Zhang
When fb is tiled and fb->height isn't the multiple of tile's height, the format fb->size = fb->stride * fb->height, will get a smaller size than the actual size. As the memory height of tiled fb should be multiple of tile's height. Fixes: 7f1a93b1f1d1 ("drm/i915/gvt: Correct the calculation of plane size") Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-04-10Merge branch 'net-sched-taprio-fix-picos_per_byte-miscalculation'David S. Miller
Leandro Dorileo says: ==================== net/sched: taprio: fix picos_per_byte miscalculation This set fixes miscalculations based on invalid link speed values. Changes in v6: + Avoid locking a spinlock while calling __ethtool_get_link_ksettings() (suggested by: Cong Wang); Changes in v5: + Don't iterate over all the net_device maintained list (suggested by: Florian Fainelli); Changes in v4: + converted pr_info calls to netdev_dbg (suggested by: Florian Fainelli); Changes in v3: + yet pr_info() format warnings; Changes in v2: + fixed pr_info() format both on cbs and taprio patches; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/sched: cbs: fix port_rate miscalculationLeandro Dorileo
The Credit Based Shaper heavily depends on link speed to calculate the scheduling credits, we can't properly calculate the credits if the device has failed to report the link speed. In that case we can't dequeue packets assuming a wrong port rate that will result into an inconsistent credit distribution. This patch makes sure we fail to dequeue case: 1) __ethtool_get_link_ksettings() reports error or 2) the ethernet driver failed to set the ksettings' speed value (setting link speed to SPEED_UNKNOWN). Additionally we properly re calculate the port rate whenever the link speed is changed. Fixes: 3d0bd028ffb4a ("net/sched: Add support for HW offloading for CBS") Signed-off-by: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Reviewed-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/sched: taprio: fix picos_per_byte miscalculationLeandro Dorileo
The Time Aware Priority Scheduler is heavily dependent to link speed, it relies on it to calculate transmission bytes per cycle, we can't properly calculate the so called budget if the device has failed to report the link speed. In that case we can't dequeue packets assuming a wrong budget. This patch makes sure we fail to dequeue case: 1) __ethtool_get_link_ksettings() reports error or 2) the ethernet driver failed to set the ksettings' speed value (setting link speed to SPEED_UNKNOWN). Additionally we re calculate the budget whenever the link speed is changed. Fixes: 5a781ccbd19e4 ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Reviewed-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10team: set slave to promisc if team is already in promisc modeHangbin Liu
After adding a team interface to bridge, the team interface will enter promisc mode. Then if we add a new slave to team0, the slave will keep promisc off. Fix it by setting slave to promisc on if team master is already in promisc mode, also do the same for allmulti. v2: add promisc and allmulti checking when delete ports Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/tls: fix build without CONFIG_TLS_DEVICEJakub Kicinski
buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n. Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx() wouldn't be compiled either in this case. Fixes: 35b71a34ada6 ("net/tls: don't leak partially sent record in device mode") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10clk: x86: Add system specific quirk to mark clocks as criticalDavid Müller
Since commit 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are unconditionally gated off. Unfortunately this will break systems where these clocks are used for external purposes beyond the kernel's knowledge. Fix it by implementing a system specific quirk to mark the necessary pmc_plt_clks as critical. Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: David Müller <dave.mueller@gmx.ch> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-04-10block: do not leak memory in bio_copy_user_iov()Jérôme Glisse
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it. Cc: linux-block@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10net: strparser: fix commentJakub Kicinski
Fix comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10PCI: pciehp: Ignore Link State Changes after powering off a slotSergey Miroshnichenko
During a safe hot remove, the OS powers off the slot, which may cause a Data Link Layer State Changed event. The slot has already been set to OFF_STATE, so that event results in re-enabling the device, making it impossible to safely remove it. Clear out the Presence Detect Changed and Data Link Layer State Changed events when the disabled slot has settled down. It is still possible to re-enable the device if it remains in the slot after pressing the Attention Button by pressing it again. Fixes the problem that Micah reported below: an NVMe drive power button may not actually turn off the drive. Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237 Reported-by: Micah Parrish <micah.parrish@hpe.com> Tested-by: Micah Parrish <micah.parrish@hpe.com> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> [bhelgaas: changelog, add bugzilla URL] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.19+
2019-04-10Merge branch 'tls-leaks'David S. Miller
Jakub Kicinski says: ==================== net: tls: fix memory leaks and freeing skbs This series fixes two memory issues and a stack overflow. First two patches are fairly simple leaks. Third patch partially reverts an optimization made to the strparser which causes creation of skb->frag_list->skb->frag_list... chains of 100s of skbs, leading to recursive kfree_skb() filling up the kernel stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: strparser: partially revert "strparser: Call skb_unclone conditionally"Jakub Kicinski
This reverts the first part of commit 4e485d06bb8c ("strparser: Call skb_unclone conditionally"). To build a message with multiple fragments we need our own root of frag_list. We can't simply use the frag_list of orig_skb, because it will lead to linking all orig_skbs together creating very long frag chains, and causing stack overflow on kfree_skb() (which is called recursively on the frag_lists). BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5) kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP RIP: 0010:free_one_page+0x2b/0x490 Call Trace: __free_pages_ok+0x143/0x2c0 skb_release_data+0x8e/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 [...] skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 __kfree_skb+0xe/0x20 tcp_disconnect+0xd6/0x4d0 tcp_close+0xf4/0x430 ? tcp_check_oom+0xf0/0xf0 tls_sk_proto_close+0xe4/0x1e0 [tls] inet_release+0x36/0x60 __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0xa2/0x1d0 task_work_run+0x89/0xb0 exit_to_usermode_loop+0x9a/0xa0 do_syscall_64+0xc0/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Let's leave the second unclone conditional, as I'm not entirely sure what is its purpose :) Fixes: 4e485d06bb8c ("strparser: Call skb_unclone conditionally") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/tls: don't leak partially sent record in device modeJakub Kicinski
David reports that tls triggers warnings related to sk->sk_forward_alloc not being zero at destruction time: WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170 When sender fills up the write buffer and dies from SIGPIPE. This is due to the device implementation not cleaning up the partially_sent_record. This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") moved the partial record cleanup to the SW-only path. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/tls: fix the IV leaksJakub Kicinski
Commit f66de3ee2c16 ("net/tls: Split conf to rx + tx") made freeing of IV and record sequence number conditional to SW path only, but commit e8f69799810c ("net/tls: Add generic NIC offload infrastructure") also allocates that state for the device offload configuration. Remember to free it. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10sparc64/pci_sun4v: fix ATU checks for large DMA masksChristoph Hellwig
Now that we allow drivers to always need to set larger than required DMA masks we need to be a little more careful in the sun4v PCI iommu driver to chose when to select the ATU support - a larger DMA mask can be set even when the platform does not support ATU, so we always have to check if it is avaiable before using it. Add a little helper for that and use it in all the places where we make ATU usage decisions based on the DMA mask. Fixes: 24132a419c68 ("sparc64/pci_sun4v: allow large DMA masks") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Meelis Roos <mroos@linux.ee> Acked-by: David S. Miller <davem@davemloft.net>
2019-04-10ipv4: Handle RTA_GATEWAY set to 0David Ahern
Govindarajulu reported a regression with Network Manager which sends an RTA_GATEWAY attribute with the address set to 0. Fixup the handling of RTA_GATEWAY to only set fc_gw_family if the gateway address is actually set. Fixes: f35b794b3b405 ("ipv4: Prepare fib_config for IPv6 gateway") Reported-by: Govindarajulu Varadarajan <govind.varadar@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Several driver bug fixes posted in the last several weeks - Several bug fixes for the hfi1 driver 'TID RDMA' functionality merged into 5.1. Since TID RDMA is on by default these all seem to be regressions. - Wrong software permission checks on memory in mlx5 - Memory leak in vmw_pvrdma during driver remove - Several bug fixes for hns driver features merged into 5.1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/hfi1: Do not flush send queue in the TID RDMA second leg RDMA/hns: Bugfix for SCC hem free RDMA/hns: Fix bug that caused srq creation to fail RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove IB/mlx5: Reset access mask when looping inside page fault handler IB/hfi1: Fix the allocation of RSM table IB/hfi1: Eliminate opcode tests on mr deref IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state IB/hfi1: Failed to drain send queue when QP is put into error state
2019-04-10Merge branch 'ibmvnic-features'David S. Miller
Thomas Falcon says: ==================== ibmvnic: Fix netdev features settings on reset In its current state, a driver reset clobbers any feature settings a user may have toggled and will disable GRO as it is not explicitly enabled in the driver. This patch set enables GRO and tries to retain user settings after a reset. If the underlying carrier changes, however, the driver will disable features unsupported by the new carrier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10ibmvnic: Fix netdev feature clobbering during a resetThomas Falcon
While determining offload capabilities of backing hardware during a device reset, the driver is clobbering current feature settings. Update hw_features on reset instead of features unless a feature is enabled that is no longer supported on the current backing device. Also enable features that were not supported prior to the reset but were previously enabled or requested by the user. This can occur if the reset is the result of a carrier change, such as a device failover or partition migration. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10ibmvnic: Enable GROThomas Falcon
Enable Generic Receive Offload in the ibmvnic driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10Merge branch 'net-sched-move-back-qlen-to-per-CPU-accounting'David S. Miller
Paolo Abeni says: ==================== net: sched: move back qlen to per CPU accounting The commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location") introduced some measurable regression in the contended scenarios for lock qdisc. As Eric suggested we could replace q.qlen access with calls to qdisc_is_empty() in the datapath and revert the above commit. The TC subsystem updates qdisc->is_empty in a somewhat loose way: notably 'is_empty' is set only when the qdisc dequeue() calls return a NULL ptr. That is, the invocation after the last packet is dequeued. The above is good enough for BYPASS implementation - the only downside is that we end up avoiding the optimization for a very small time-frame - but will break hard things when internal structures consistency for classful qdisc relies on child qdisc_is_empty(). A more strict 'is_empty' update adds a relevant complexity to its life-cycle, so this series takes a different approach: we allow lockless qdisc to switch from per CPU accounting to global stats accounting when the NOLOCK bit is cleared. Since most pieces of infrastructure are already in place, this requires very little changes to the pfifo_fast qdisc, and any later NOLOCK qdisc can hook there with little effort - no need to maintain two different implementations. The first 2 patches removes direct qlen access from non core TC code, the 3rd and 4th patches place and use the infrastructure to allow stats account switching and the 5th patch is the actual revert. v1 -> v2: - fixed build issues - more descriptive commit message for patch 5/5 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10Revert: "net: sched: put back q.qlen into a single location"Paolo Abeni
This revert commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location"). After the previous patch, when a NOLOCK qdisc is enslaved to a locking qdisc it switches to global stats accounting. As a consequence, when a classful qdisc accesses directly a child qdisc's qlen, such qdisc is not doing per CPU accounting and qlen value is consistent. In the control path nobody uses directly qlen since commit e5f0e8f8e45 ("net: sched: introduce and use qdisc tree flush/purge helpers"), so we can remove the contented atomic ops from the datapath. v1 -> v2: - complete the qdisc_qstats_atomic_qlen_dec() -> qdisc_qstats_cpu_qlen_dec() replacement, fix build issue - more descriptive commit message Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, tooPaolo Abeni
Since stats updating is always consistent with TCQ_F_CPUSTATS flag, we can disable it at qdisc creation time flipping such bit. In my experiments, if the NOLOCK flag is cleared, per CPU stats accounting does not give any measurable performance gain, but it waste some memory. Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving a NOLOCK qdisc to 'lock' one. Use stats update helper inside pfifo_fast, to cope correctly with TCQ_F_CPUSTATS flag change. As a side effect, q.qlen value for any child qdiscs is always consistent for all lock classfull qdiscs. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: sched: always do stats accounting according to TCQ_F_CPUSTATSPaolo Abeni
The core sched implementation checks independently for NOLOCK flag to acquire/release the root spin lock and for qdisc_is_percpu_stats() to account per CPU values in many places. This change update the last few places checking the TCQ_F_NOLOCK to do per CPU stats accounting according to qdisc_is_percpu_stats() value. The above allows to clean dev_requeue_skb() implementation a bit and makes stats update always consistent with a single flag. v1 -> v2: - do not move qdisc_is_empty definition, fix build issue Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: sched: prefer qdisc_is_empty() over direct qlen accessPaolo Abeni
When checking for root qdisc queue length, do not access directly q.qlen. In the following patches we will move back qlen accounting to per CPU values for NOLOCK qdiscs. Instead, prefer the qdisc_is_empty() helper usage. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net: caif: avoid using qdisc_qlen()Paolo Abeni
Such helper does not cope correctly with NOLOCK qdiscs. In the following patches we will move back qlen to per CPU values for such qdiscs, so qdisc_qlen_sum() is not an option, too. Instead, use qlen only for lock qdiscs, and always set flow off for NOLOCK qdiscs with a not empty tx queue. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10Merge branch 'mlxsw-Various-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Various fixes This patchset contains various small fixes for mlxsw. Patch #1 fixes a warning generated by switchdev core when the driver fails to insert an MDB entry in the commit phase. Patches #2-#4 fix a warning in check_flush_dependency() that can be triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush a non-WQ_MEM_RECLAIM workqueue. It seems that the semantics of the WQ_MEM_RECLAIM flag are not very clear [1] and that various patches have been sent to remove it from various workqueues throughout the kernel [2][3][4] in order to silence the warning. These patches do the same for the workqueues created by mlxsw that probably should not have been created with this flag in the first place. Patch #5 fixes a regression where an IP address cannot be assigned to a VRF upper due to erroneous MAC validation check. Patch #6 adds a test case. Patch #7 adjusts Spectrum-2 shared buffer configuration to be compatible with Spectrum-1. The problem and fix are described in detail in the commit message. Please consider patches #1-#5 for 5.0.y. I verified they apply cleanly. [1] https://patchwork.kernel.org/patch/10791315/ [2] Commit ce162bfbc0b6 ("mac80211_hwsim: don't use WQ_MEM_RECLAIM") [3] Commit 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM") [4] Commit 75215e5bb22c ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2Ido Schimmel
In Spectrum-1, when a multicast packet is admitted to the shared buffer it increases the quotas of all the ports and {port, TC} to which it is forwarded to. The above means that multicast packets are accounted multiple times in the shared buffer and can therefore cause the associated shared buffer pool to fill up very quickly. To work around this issue, commit e83c045e53d7 ("mlxsw: spectrum_buffers: Configure MC pool") added a dedicated multicast pool in which multicast packets are accounted. The issue is not present in Spectrum-2, but in order to be backward compatible with Spectrum-1, its default behavior is to allow a multicast packet to increase multiple egress quotas instead of one. Until the new (non-backward compatible) mode is supported, configure a dedicated multicast pool as in Spectrum-1. Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10selftests: mlxsw: Test VRF MAC vetoingIdo Schimmel
Test that it is possible to set an IP address on a VRF and that it is not vetoed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: spectrum_router: Do not check VRF MAC addressIdo Schimmel
Commit 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") enabled the driver to veto router interface (RIF) MAC addresses that it cannot support. This check should only be performed for interfaces for which the driver actually configures a RIF. A VRF upper is not one of them, so ignore it. Without this patch it is not possible to set an IP address on the VRF device and use it as a loopback. Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueueIdo Schimmel
The workqueue is used to periodically update the networking stack about activity / statistics of various objects such as neighbours and TC actions. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. Fixes: 3d5479e92087 ("mlxsw: core: Remove deprecated create_workqueue") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueueIdo Schimmel
The ordered workqueue is used to offload various objects such as routes and neighbours in the order they are notified. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker tries to flush a non-WQ_MEM_RECLAIM workqueue. [1] [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130 ... [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130 ... [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086 [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000 [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046 [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0 [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0 [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0 [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000 [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0 [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [97703.543115] Call Trace: [97703.543129] __flush_work+0xbd/0x1e0 [97703.543137] ? __cancel_work_timer+0x136/0x1b0 [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0 [97703.543154] __cancel_work_timer+0x136/0x1b0 [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core] [97703.543184] cancel_work_sync+0x10/0x20 [97703.543191] rhashtable_free_and_destroy+0x23/0x140 [97703.543198] rhashtable_destroy+0xd/0x10 [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum] [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum] [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum] [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum] [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum] [97703.543484] process_one_work+0x1fd/0x400 [97703.543493] worker_thread+0x34/0x410 [97703.543500] kthread+0x121/0x140 [97703.543507] ? process_one_work+0x400/0x400 [97703.543512] ? kthread_park+0x90/0x90 [97703.543523] ret_from_fork+0x35/0x40 Fixes: a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB offload") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Semion Lisyansky <semionl@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueueIdo Schimmel
The EMAD workqueue is used to handle retransmission of EMAD packets that contain configuration data for the device's firmware. Given the workers need to allocate these packets and that the code is not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM flag. Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10mlxsw: spectrum_switchdev: Add MDB entries in prepare phaseIdo Schimmel
The driver cannot guarantee in the prepare phase that it will be able to write an MDB entry to the device. In case the driver returned success during the prepare phase, but then failed to add the entry in the commit phase, a WARNING [1] will be generated by the switchdev core. Fix this by doing the work in the prepare phase instead. [1] [ 358.544486] swp12s0: Commit of object (id=2) failed. [ 358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0 [ 358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350 [ 358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 358.580582] Workqueue: events switchdev_deferred_process_work [ 358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0 ... [ 358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286 [ 358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000 [ 358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8 [ 358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000 [ 358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000 [ 358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200 [ 358.659790] FS: 0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000 [ 358.668820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0 [ 358.683188] Call Trace: [ 358.685918] switchdev_port_obj_add_deferred+0x13/0x60 [ 358.691655] switchdev_deferred_process+0x6b/0xf0 [ 358.696907] switchdev_deferred_process_work+0xa/0x10 [ 358.702548] process_one_work+0x1f5/0x3f0 [ 358.707022] worker_thread+0x28/0x3c0 [ 358.711099] ? process_one_work+0x3f0/0x3f0 [ 358.715768] kthread+0x10d/0x130 [ 358.719369] ? __kthread_create_on_node+0x180/0x180 [ 358.724815] ret_from_fork+0x35/0x40 Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecsHans Holmberg
The introduction of multipage bio vectors broke pblk's partial read logic due to it not being prepared for multipage bio vectors. Use bio vector iterators instead of direct bio vector indexing. Fixes: 07173c3ec276 ("block: enable multipage bvecs") Reported-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Updated description. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10IB/hfi1: Do not flush send queue in the TID RDMA second legKaike Wan
When a QP is put into error state, the send queue will be flushed. This mechanism is implemented in both the first and the second leg of the send engine. Since the second leg is only responsible for data transactions in the KDETH space for the TID RDMA WRITE request, it should not perform the flushing of the send queue. This patch removes the flushing function of the second leg, but still keeps the bailing out of the QP if it is put into error state. Fixes: 70dcb2e3dc6a ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Several fixes, add more reviewers to the list" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: Honour 'may_reduce_num' in vring_create_virtqueue MAiNTAINERS: add Paolo, Stefan for virtio blk/scsi virtio_pci: fix a NULL pointer reference in vp_del_vqs
2019-04-10RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systemsAnup Patel
The Maximum Physical Memory 2GiB option for 64bit systems is currently broken because kernel hangs at boot-time when this option is enabled and the underlying system has more than 2GiB memory. This issue can be easily reproduced on SiFive Unleashed board where we have 8GiB of memory. This patch fixes above issue by removing unusable memory region in setup_bootmem(). Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-04-10ASoC: wcd9335: Fix missing regmap requirementMarc Gonzalez
wcd9335.c: undefined reference to 'devm_regmap_add_irq_chip' Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-10drm/i915/dp: revert back to max link rate and lane count on eDPJani Nikula
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") started to optize the eDP 1.4+ link config, both per spec and as preparation for display stream compression support. Sadly, we again face panels that flat out fail with parameters they claim to support. Revert, and go back to the drawing board. v2: Actually revert to max params instead of just wide-and-slow. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109959 Fixes: 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: "Lee, Shawn C" <shawn.c.lee@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.0+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Tested-by: Albert Astals Cid <aacid@kde.org> # v5.0 backport Tested-by: Emanuele Panigati <ilpanich@gmail.com> # v5.0 backport Tested-by: Matteo Iervasi <matteoiervasi@gmail.com> # v5.0 backport Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190405075220.9815-1-jani.nikula@intel.com (cherry picked from commit f11cb1c19ad0563b3c1ea5eb16a6bac0e401f428) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-04-10drm/i915/icl: Fix port disable sequence for mipi-dsiVandita Kulkarni
Re-enable clock gating of DDI clocks. v2: Fix the default ddi clk state for mipi-dsi (Imre) Fixes: 1026bea00381 ("drm/i915/icl: Ungate DSI clocks") Signed-off-by: Vandita Kulkarni <vandita.kulkarni@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1553513202-13863-2-git-send-email-vandita.kulkarni@intel.com (cherry picked from commit 942d1cf48eae3fcd7e973cfb708d5c4860f0c713) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>