summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-20Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: "Just one fix for a problem in the csum_partial_copy_from_user() implementation when software PAN is enabled" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
2017-12-20Merge tag 'acpi-4.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a recently introduced issue in the ACPI CPPC driver and an obscure error hanling bug in the APEI code. Specifics: - Fix an error handling issue in the ACPI APEI implementation of the >read callback in struct pstore_info (Takashi Iwai). - Fix a possible out-of-bounds arrar read in the ACPI CPPC driver (Colin Ian King)" * tag 'acpi-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: APEI / ERST: Fix missing error handling in erst_reader() ACPI: CPPC: remove initial assignment of pcc_ss_data
2017-12-20Merge tag 'pm-4.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a regression in the ondemand and conservative cpufreq governors that was introduced during the 4.13 cycle, a recent regression in the imx6q cpufreq driver and a regression in the PCI handling of hibernation from the 4.14 cycle. Specifics: - Fix an issue in the PCI handling of the "thaw" transition during hibernation (after creating an image), introduced by a bug fix from the 4.13 cycle and exposed by recent changes in the IRQ subsystem, that caused pci_restore_state() to be called for devices in low-power states in some cases which is incorrect and breaks MSI management on some systems (Rafael Wysocki). - Fix a recent regression in the imx6q cpufreq driver that broke speed grading on i.MX6 QuadPlus by omitting checks causing invalid operating performance points (OPPs) to be disabled on that SoC as appropriate (Lucas Stach). - Fix a regression introduced during the 4.14 cycle in the ondemand and conservative cpufreq governors that causes the sampling interval used by them to be shorter than the tick period in some cases which leads to incorrect decisions (Rafael Wysocki)" * tag 'pm-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: governor: Ensure sufficiently large sampling intervals cpufreq: imx6q: fix speed grading regression on i.MX6 QuadPlus PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-20Merge tag 'spi-fix-v4.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A bunch of really small fixes here, all driver specific and mostly in error handling and remove paths. The most important fixes are for the a3700 clock configuration and a fix for a nasty stall which could potentially cause data corruption with the xilinx driver" * tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: atmel: fixed spin_lock usage inside atmel_spi_remove spi: sun4i: disable clocks in the remove function spi: rspi: Do not set SPCR_SPE in qspi_set_config_register() spi: Fix double "when" spi: a3700: Fix clk prescaling for coefficient over 15 spi: xilinx: Detect stall with Unknown commands spi: imx: Update device tree binding documentation
2017-12-20Merge tag 'mfd-fixes-4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MDF bugfixes from Lee Jones: - Fix message timing issues and report correct state when an error occurs in cros_ec_spi - Reorder enums used for Power Management in rtsx_pci - Use correct OF helper for obtaining child nodes in twl4030-audio and twl6040 * tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: Fix RTS5227 (and others) powermanagement mfd: cros ec: spi: Fix "in progress" error signaling mfd: twl6040: Fix child-node lookup mfd: twl4030-audio: Fix sibling-node lookup mfd: cros ec: spi: Don't send first message too soon
2017-12-20Merge tag 'sound-4.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All stable fixes here: - a regression fix of USB-audio for the previous hardening patch - a potential UAF fix in rawmidi - HD-audio and USB-audio quirks, the missing new ID" * tag 'sound-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU ALSA: hda/realtek - Fix Dell AIO LineOut issue ALSA: rawmidi: Avoid racy info ioctl via ctl device ALSA: hda - Add vendor id for Cannonlake HDMI codec ALSA: usb-audio: Add native DSD support for Esoteric D-05X
2017-12-20null_blk: unalign call_single_dataJens Axboe
Commit 966a967116e6 randomly added alignment to this structure, but it's actually detrimental to performance of null_blk. Test case: Running on both the home and remote node shows a ~5% degradation in performance. While in there, move blk_status_t to the hole after the integer tag in the nullb_cmd structure. After this patch, we shrink the size from 192 to 152 bytes. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20block: unalign call_single_data in struct requestJens Axboe
A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Cc: stable@vger.kernel.org # v4.14 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20ipv4: Fix use-after-free when flushing FIB tablesIdo Schimmel
Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20s390/qeth: fix error handling in checksum cmd callbackJulian Wiedmann
Make sure to check both return code fields before processing the response. Otherwise we risk operating on invalid data. Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20tipc: remove joining group member from congested listJon Maloy
When we receive a JOIN message from a peer member, the message may contain an advertised window value ADV_IDLE that permits removing the member in question from the tipc_group::congested list. However, since the removal has been made conditional on that the advertised window is *not* ADV_IDLE, we miss this case. This has the effect that a sender sometimes may enter a state of permanent, false, broadcast congestion. We fix this by unconditinally removing the member from the congested list before calling tipc_member_update(), which might potentially sort it into the list again. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20selftests: net: Adding config fragment CONFIG_NUMA=yNaresh Kamboju
kernel config fragement CONFIG_NUMA=y is need for reuseport_bpf_numa. Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge tag 'mlx5-fixes-2017-12-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: =================== Mellanox, mlx5 fixes 2017-12-19 The follwoing series includes some fixes for mlx5 core and etherent driver. Please pull and let me know if there is any problem. This series doesn't introduce any conflict with the ongoing mlx5 for-next submission. For -stable: kernels >= v4.7.y ("net/mlx5e: Fix possible deadlock of VXLAN lock") ("net/mlx5e: Add refcount to VXLAN structure") ("net/mlx5e: Prevent possible races in VXLAN control flow") ("net/mlx5e: Fix features check of IPv6 traffic") kernels >= v4.9.y ("net/mlx5: Fix error flow in CREATE_QP command") ("net/mlx5: Fix rate limit packet pacing naming and struct") kernels >= v4.13.y ("net/mlx5: FPGA, return -EINVAL if size is zero") kernels >= v4.14.y ("Revert "mlx5: move affinity hints assignments to generic code") All above patches apply and compile with no issues on corresponding -stable. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20block-throttle: avoid double chargeShaohua Li
If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal <vgoyal@redhat.com> Cc: stable@vger.kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20Merge branch 'cls_bpf-fix-offload-state-tracking-with-block-callbacks'David S. Miller
Jakub Kicinski says: =================== cls_bpf: fix offload state tracking with block callbacks After introduction of block callbacks classifiers can no longer track offload state. cls_bpf used to do that in an attempt to move common code from drivers to the core. Remove that functionality and fix drivers. The user-visible bug this is fixing is that trying to offload a second filter would trigger a spurious DESTROY and in turn disable the already installed one. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20nfp: bpf: keep track of the offloaded programJakub Kicinski
After TC offloads were converted to callbacks we have no choice but keep track of the offloaded filter in the driver. The check for nn->dp.bpf_offload_xdp was a stop gap solution to make sure failed TC offload won't disable XDP, it's no longer necessary. nfp_net_bpf_offload() will return -EBUSY on TC vs XDP conflicts. Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20cls_bpf: fix offload assumptions after callback conversionJakub Kicinski
cls_bpf used to take care of tracking what offload state a filter is in, i.e. it would track if offload request succeeded or not. This information would then be used to issue correct requests to the driver, e.g. requests for statistics only on offloaded filters, removing only filters which were offloaded, using add instead of replace if previous filter was not added etc. This tracking of offload state no longer functions with the new callback infrastructure. There could be multiple entities trying to offload the same filter. Throw out all the tracking and corresponding commands and simply pass to the drivers both old and new bpf program. Drivers will have to deal with offload state tracking by themselves. Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: Fix double free and memory corruption in get_net_ns_by_id()Eric W. Biederman
(I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge branch 'mvneta-fixes'David S. Miller
Gregory CLEMENT says: ==================== Few mvneta fixes here it is a small series of fixes found on the mvneta driver. They had been already used in the vendor kernel and are now ported to mainline. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: eliminate wrong call to handle rx descriptor errorYelena Krivosheev
There are few reasons in mvneta_rx_swbm() function when received packet is dropped. mvneta_rx_error() should be called only if error bit [16] is set in rx descriptor. [gregory.clement@free-electrons.com: add fixes tag] Cc: stable@vger.kernel.org Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: use proper rxq_number in loop on rx queuesYelena Krivosheev
When adding the RX queue association with each CPU, a typo was made in the mvneta_cleanup_rxqs() function. This patch fixes it. [gregory.clement@free-electrons.com: add commit log and fixes tag] Cc: stable@vger.kernel.org Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: clear interface link status on port disableYelena Krivosheev
When port connect to PHY in polling mode (with poll interval 1 sec), port and phy link status must be synchronize in order don't loss link change event. [gregory.clement@free-electrons.com: add fixes tag] Cc: <stable@vger.kernel.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge branch 'acpi-cppc'Rafael J. Wysocki
* acpi-cppc: ACPI: CPPC: remove initial assignment of pcc_ss_data
2017-12-20Merge branch 'pm-pci'Rafael J. Wysocki
* pm-pci: PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-19Do not hash userspace addresses in fault handlersKees Cook
The hashing of %p was designed to restrict kernel addresses. There is no reason to hash the userspace values seen during a segfault report, so switch these to %px. (Some architectures already use %lx.) Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-20bpf: Fix tools and testing build.David Miller
I'm getting various build failures on sparc64. The key is usually that the userland tools get built 32-bit. 1) clock_gettime() is in librt, so that must be added to the link libraries. 2) "sizeof(x)" must be printed with "%Z" printf prefix. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-19net/mlx5: Stay in polling mode when command EQ destroy failsMoshe Shemesh
During unload, on mlx5_stop_eqs we move command interface from events mode to polling mode, but if command interface EQ destroy fail we move back to events mode. That's wrong since even if we fail to destroy command interface EQ, we do release its irq, so no interrupts will be received. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Cleanup IRQs in case of unload failureMoshe Shemesh
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error. In such failure flow the function will return without releasing all EQs irqs and then pci_free_irq_vectors will fail. Fix by only warn on destroy EQ failure and continue to release other EQs and their irqs. It fixes the following kernel trace: kernel: kernel BUG at drivers/pci/msi.c:352! ... ... kernel: Call Trace: kernel: pci_disable_msix+0xd3/0x100 kernel: pci_free_irq_vectors+0xe/0x20 kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core] Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix steering memory leakMaor Gottlieb
Flow steering priority and namespace are software only objects that didn't have the proper destructors and were not freed during steering cleanup. Fix it by adding destructor functions for these objects. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Prevent possible races in VXLAN control flowGal Pressman
When calling add/remove VXLAN port, a lock must be held in order to prevent race scenarios when more than one add/remove happens at the same time. Fix by holding our state_lock (mutex) as done by all other parts of the driver. Note that the spinlock protecting the radix-tree is still needed in order to synchronize radix-tree access from softirq context. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Add refcount to VXLAN structureGal Pressman
A refcount mechanism must be implemented in order to prevent unwanted scenarios such as: - Open an IPv4 VXLAN interface - Open an IPv6 VXLAN interface (different socket) - Remove one of the interfaces With current implementation, the UDP port will be removed from our VXLAN database and turn off the offloads for the other interface, which is still active. The reference count mechanism will only allow UDP port removals once all consumers are gone. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Fix possible deadlock of VXLAN lockGal Pressman
mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user context) and mlx5e_features_check (softirq), but the lock acquired does not disable bottom half and might result in deadlock. Fix it by simply replacing spin_lock() with spin_lock_bh(). While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh(). lockdep's WARNING: inconsistent lock state [ 654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes: [ 654.028321] (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028528] {SOFTIRQ-ON-W} state was registered at: [ 654.028607] _raw_spin_lock+0x3c/0x70 [ 654.028689] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028794] mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core] [ 654.028878] process_one_work+0x1e9/0x640 [ 654.028942] worker_thread+0x4a/0x3f0 [ 654.029002] kthread+0x141/0x180 [ 654.029056] ret_from_fork+0x24/0x30 [ 654.029114] irq event stamp: 579088 [ 654.029174] hardirqs last enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0 [ 654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0 [ 654.029446] softirqs last enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80 [ 654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0 [ 654.029684] other info that might help us debug this: [ 654.029781] Possible unsafe locking scenario: [ 654.029868] CPU0 [ 654.029908] ---- [ 654.029947] lock(&(&vxlan_db->lock)->rlock); [ 654.030045] <Interrupt> [ 654.030090] lock(&(&vxlan_db->lock)->rlock); [ 654.030162] *** DEADLOCK *** Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix error flow in CREATE_QP commandMoni Shoua
In error flow, when DESTROY_QP command should be executed, the wrong mailbox was set with data, not the one that is written to hardware, Fix that. Fixes: 09a7d9eca1a6 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc' Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix misspelling in the error message and commentEugenia Emantayev
Fix misspelling in word syndrome. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Fix defaulting RX ring size when not neededEugenia Emantayev
Fixes the bug when turning on/off CQE compression mechanism resets the RX rings size to default value when it is not needed. Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Fix features check of IPv6 trafficGal Pressman
The assumption that the next header field contains the transport protocol is wrong for IPv6 packets with extension headers. Instead, we should look the inner-most next header field in the buffer. This will fix TSO offload for tunnels over IPv6 with extension headers. Performance testing: 19.25x improvement, cool! Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap. CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] TSO: Enabled Before: 4,926.24 Mbps Now : 94,827.91 Mbps Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5e: Fix ETS BW checkHuy Nguyen
Fix bug that allows ets bw sum to be 0% when ets tc type exists. Fixes: 08fb1dacdd76 ('net/mlx5e: Support DCBNL IEEE ETS') Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix rate limit packet pacing naming and structEran Ben Elisha
In mlx5_ifc, struct size was not complete, and thus driver was sending garbage after the last defined field. Fixed it by adding reserved field to complete the struct size. In addition, rename all set_rate_limit to set_pp_rate_limit to be compliant with the Firmware <-> Driver definition. Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates") Fixes: 1466cc5b23d1 ("net/mlx5: Rate limit tables support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19Revert "mlx5: move affinity hints assignments to generic code"Saeed Mahameed
Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by: Jes Sorensen <jsorensen@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: FPGA, return -EINVAL if size is zeroKamal Heib
Currently, if a size of zero is passed to mlx5_fpga_mem_{read|write}_i2c() the "err" return value will not be initialized, which triggers gcc warnings: [..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error: uninitialized symbol 'err'. [..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error: uninitialized symbol 'err'. fix that. Fixes: a9956d35d199 ('net/mlx5: FPGA, Add SBU infrastructure') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19ipv4: fib: Fix metrics match when deleting a routePhil Sutter
The recently added fib_metrics_match() causes a regression for routes with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has TCP_CONG_NEEDS_ECN flag set: | # ip link add d0 type dummy | # ip link set d0 up | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp | RTNETLINK answers: No such process During route insertion, fib_convert_metrics() detects that the given CC algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in RTAX_FEATURES. During route deletion though, fib_metrics_match() compares stored RTAX_FEATURES value with that from userspace (which obviously has no knowledge about DST_FEATURE_ECN_CA) and fails. Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: stmmac: Fix bad RX timestamp extractionFredrik Hallenberg
As noted in dwmac4_wrback_get_rx_timestamp_status the timestamp is found in the context descriptor following the current descriptor. However the current code looks for the context descriptor in the current descriptor, which will always fail. Signed-off-by: Fredrik Hallenberg <megahallon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: stmmac: Fix TX timestamp calculationFredrik Hallenberg
When using GMAC4 the value written in PTP_SSIR should be shifted however the shifted value is also used in subsequent calculations which results in a bad timestamp value. Signed-off-by: Fredrik Hallenberg <megahallon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19tipc: fix list sorting bug in function tipc_group_update_member()Jon Maloy
When, during a join operation, or during message transmission, a group member needs to be added to the group's 'congested' list, we sort it into the list in ascending order, according to its current advertised window size. However, we miss the case when the member is already on that list. This will have the result that the member, after the window size has been decremented, might be at the wrong position in that list. This again may have the effect that we during broadcast and multicast transmissions miss the fact that a destination is not yet ready for reception, and we end up sending anyway. From this point on, the behavior during the remaining session is unpredictable, e.g., with underflowing window sizes. We now correct this bug by unconditionally removing the member from the list before (re-)sorting it in. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip6_tunnel: get the min mtu properly in ip6_tnl_xmitXin Long
Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but IPV6_MIN_MTU actually only works when the inner packet is ipv6. With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst couldn't be set less than 1280. It would cause tx_err and the packet to be dropped when the outer dst pmtu is close to 1280. Jianlin found it by running ipv4 traffic with the topo: (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server) After changing eth2 mtu to 1300, the performance became very low, or the connection was even broken. The issue also affects ip4ip6 and ip6ip6 tunnels. So if the inner packet is ipv4, 576 should be considered as the min mtu. Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP. This patch using 576 as the min mtu for non-ipv6 packet works for all those cases. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip6_gre: remove the incorrect mtu limit for ipgre tapXin Long
The same fix as the patch "ip_gre: remove the incorrect mtu limit for ipgre tap" is also needed for ip6_gre. Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip_gre: remove the incorrect mtu limit for ipgre tapXin Long
ipgre tap driver calls ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. It causes the dev mtu of the ipgre tap device to not be greater than 1500, this limit value is not correct for ipgre tap device. Besides, it's .change_mtu already does the right check. So this patch is just to set max_mtu as 0, and leave the check to it's .change_mtu. Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19vxlan: update skb dst pmtu on tx pathXin Long
Unlike ip tunnels, now vxlan doesn't do any pmtu update for upper dst pmtu, even if it doesn't match the lower dst pmtu any more. The problem can be reproduced when reducing the vxlan lower dev's pmtu when running netperf. In jianlin's testing, the performance went to 1/7 of the previous. This patch is to update the upper dst pmtu to match the lower dst pmtu on tx path so that packets can be sent out even when lower dev's pmtu has been changed. It also works for metadata dst. Note that this patch doesn't process any pmtu icmp packet. But even in the future, the support for pmtu icmp packets process of udp tunnels will also needs this. The same thing will be done for geneve in another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: arc_emac: restart stalled EMACAlexander Kochetkov
Under certain conditions EMAC stop reception of incoming packets and continuously increment R_MISS register instead of saving data into provided buffer. The commit implement workaround for such situation. Then the stall detected EMAC will be restarted. On device the stall looks like the device lost it's dynamic IP address. ifconfig shows that interface error counter rapidly increments. At the same time on the DHCP server we can see continues DHCP-requests from device. In real network stalls happen really rarely. To make them frequent the broadcast storm[1] should be simulated. For simulation it is necessary to make following connections: 1. connect radxarock to 1st port of switch 2. connect some PC to 2nd port of switch 3. connect two other free ports together using standard ethernet cable, in order to make a switching loop. After that, is necessary to make a broadcast storm. For example, running on PC 'ping' to some IP address triggers ARP-request storm. After some time (~10sec), EMAC on rk3188 will stall. Observed and tested on rk3188 radxarock. [1] https://en.wikipedia.org/wiki/Broadcast_radiation Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: arc_emac: fix arc_emac_rx() error pathsAlexander Kochetkov
arc_emac_rx() has some issues found by code review. In case netdev_alloc_skb_ip_align() or dma_map_single() failure rx fifo entry will not be returned to EMAC. In case dma_map_single() failure previously allocated skb became lost to driver. At the same time address of newly allocated skb will not be provided to EMAC. Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>