summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-15Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a bug on sparc where we may dereference freed stack memory" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: Work around deallocated stack frame reference gcc bug on sparc.
2017-06-15Merge tag 'acpi-4.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These revert an ACPICA commit from the 4.11 cycle that causes problems to happen on some systems and add a protection against possible kernel crashes due to table reference counter imbalance. Specifics: - Revert a 4.11 ACPICA change that made assumptions which are not satisfied on some systems and caused the enumeration of resources to fail on them (Rafael Wysocki). - Add a mechanism to prevent tables from being unmapped prematurely due to reference counter overflows (Lv Zheng)" * tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
2017-06-15Merge tag 'pm-4.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These revert a recent cpufreq schedutil governor change that turned out to be problematic and fix a few minor issues in cpufreq, cpuidle and the Exynos devfreq drivers. Specifics: - Revert a recent cpufreq schedutil governor change that caused some systems to behave undesirably (Rafael Wysocki). - Fix a cpufreq conservative governor issue introduced during the 3.10 cycle that prevents it from working as expected in some situations (Tomasz Wilczyński). - Fix an error code path in the generic cpuidle driver for DT-based systems (Christophe Jaillet). - Fix three minor issues in devfreq drivers for Exynos (Arvind Yadav, Krzysztof Kozlowski)" * tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: dt: Add missing 'of_node_put()' cpufreq: conservative: Allow down_threshold to take values from 1 to 10 Revert "cpufreq: schedutil: Reduce frequencies slower" PM / devfreq: exynos-ppmu: Staticize event list PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
2017-06-15Merge branch 'for-4.12/driver-matching-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fix from Jiri Kosina: - ifdef-based bandaid for a long-standing issue with HID driver matching, avoiding regressions in cases where specific driver is not enabled in kernel .config, from Jiri Kosina * 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: let generic driver yield control iff specific driver has been enabled
2017-06-15Merge tag 'media/v4.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - some build dependency issues at CEC core with randconfigs - fix an off by one error at vb2 - a race fix at cec core - driver fixes at tc358743, sir_ir and rainshadow-cec * tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] media/cec.h: use IS_REACHABLE instead of IS_ENABLED [media] cec: race fix: don't return -ENONET in cec_receive() [media] sir_ir: infinite loop in interrupt handler [media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE [media] cec: improve MEDIA_CEC_RC dependencies [media] vb2: Fix an off by one error in 'vb2_plane_vaddr' [media] rainshadow-cec: Fix missing spin_lock_init() [media] tc358743: fix register i2c_rd/wr function fix
2017-06-15ufs_truncate_blocks(): fix the case when size is in the last direct blockAl Viro
The logics when deciding whether we need to do anything with direct blocks is broken when new size is within the last direct block. It's better to find the path to the last byte _not_ to be removed and use that instead of the path to the beginning of the first block to be freed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15KVM: PPC: Book3S HV: Preserve userspace HTM state properlyPaul Mackerras
If userspace attempts to call the KVM_RUN ioctl when it has hardware transactional memory (HTM) enabled, the values that it has put in the HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by guest values. To fix this, we detect this condition and save those SPR values in the thread struct, and disable HTM for the task. If userspace goes to access those SPRs or the HTM facility in future, a TM-unavailable interrupt will occur and the handler will reload those SPRs and re-enable HTM. If userspace has started a transaction and suspended it, we would currently lose the transactional state in the guest entry path and would almost certainly get a "TM Bad Thing" interrupt, which would cause the host to crash. To avoid this, we detect this case and return from the KVM_RUN ioctl with an EINVAL error, with the KVM exit reason set to KVM_EXIT_FAIL_ENTRY. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exitPaul Mackerras
This restores several special-purpose registers (SPRs) to sane values on guest exit that were missed before. TAR and VRSAVE are readable and writable by userspace, and we need to save and restore them to prevent the guest from potentially affecting userspace execution (not that TAR or VRSAVE are used by any known program that run uses the KVM_RUN ioctl). We save/restore these in kvmppc_vcpu_run_hv() rather than on every guest entry/exit. FSCR affects userspace execution in that it can prohibit access to certain facilities by userspace. We restore it to the normal value for the task on exit from the KVM_RUN ioctl. IAMR is normally 0, and is restored to 0 on guest exit. However, with a radix host on POWER9, it is set to a value that prevents the kernel from executing user-accessible memory. On POWER9, we save IAMR on guest entry and restore it on guest exit to the saved value rather than 0. On POWER8 we continue to set it to 0 on guest exit. PSPB is normally 0. We restore it to 0 on guest exit to prevent userspace taking advantage of the guest having set it non-zero (which would allow userspace to set its SMT priority to high). UAMOR is normally 0. We restore it to 0 on guest exit to prevent the AMR from being used as a covert channel between userspace processes, since the AMR is not context-switched at present. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15ufs: more deadlock prevention on tail unpackingAl Viro
->s_lock is not needed for ufs_change_blocknr() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15ufs: avoid grabbing ->truncate_mutex if possibleAl Viro
tail unpacking is done in a wrong place; the deadlocks galore is best dealt with by doing that in ->write_iter() (and switching to iomap, while we are at it), but that's rather painful to backport. The trouble comes from grabbing pages that cover the beginning of tail from inside of ufs_new_fragments(); ongoing pageout of any of those is going to deadlock on ->truncate_mutex with process that got around to extending the tail holding that and waiting for page to get unlocked, while ->writepage() on that page is waiting on ->truncate_mutex. The thing is, we don't need ->truncate_mutex when the fragment we are trying to map is within the tail - the damn thing is allocated (tail can't contain holes). Let's do a plain lookup and if the fragment is present, we can just pretend that we'd won the race in almost all cases. The only exception is a fragment between the end of tail and the end of block containing tail. Protect ->i_lastfrag with ->meta_lock - read_seqlock_excl() is sufficient. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14i40e: Fix a sleep-in-atomic bugJia-Ju Bai
The driver may sleep under a spin lock, and the function call path is: i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh) i40e_vsi_remove_pvid i40e_vlan_stripping_disable i40e_aq_update_vsi_params i40e_asq_send_command mutex_lock --> may sleep To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and the lock is acquired again after this function. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14ufs_get_locked_page(): make sure we have buffer_headsAl Viro
callers rely upon that, but find_lock_page() racing with attempt of page eviction by memory pressure might have left us with * try_to_free_buffers() successfully done * __remove_mapping() failed, leaving the page in our mapping * find_lock_page() returning an uptodate page with no buffer_heads attached. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15Merge branch 'acpica-fixes'Rafael J. Wysocki
* acpica-fixes: ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
2017-06-15Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'Rafael J. Wysocki
* pm-cpufreq: cpufreq: conservative: Allow down_threshold to take values from 1 to 10 Revert "cpufreq: schedutil: Reduce frequencies slower" * pm-cpuidle: cpuidle: dt: Add missing 'of_node_put()' * pm-devfreq: PM / devfreq: exynos-ppmu: Staticize event list PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
2017-06-14Merge tag 'sunxi-clk-fixes-for-4.12' of ↵Stephen Boyd
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes Allwinner clock fixes for 4.12 Some fixes that fix some bindings that went in 4.12, fix a few reset and clock offsets and a build error fix * tag 'sunxi-clk-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: clk: sunxi-ng: a64: Export PLL_PERIPH0 clock for the PRCM clk: sunxi-ng: h3: Export PLL_PERIPH0 clock for the PRCM dt-bindings: clock: sunxi-ccu: Add pll-periph to PRCM's needed clocks clk: sunxi-ng: enable SUNXI_CCU_MP for PRCM clk: sunxi-ng: v3s: Fix usb otg device reset bit clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
2017-06-14ufs: fix s_size/s_dsize usersAl Viro
For UFS2 we need 64bit variants; we even store them in uspi, but use 32bit ones instead. One wrinkle is in handling of reserved space - recalculating it every time had been stupid all along, but now it would become really ugly. Just calculate it once... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs: fix reserved blocks checkAl Viro
a) honour ->s_minfree; don't just go with default (5) b) don't bother with capability checks until we know we'll need them Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14rxrpc: Cache the congestion window settingDavid Howells
Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm. The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14liquidio: fix VF driver off-by-one bug when setting ethtool -C ethX rx-framesWeilin Chang
Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14ufs: make ufs_freespace() return signedAl Viro
as it is, checking that its return value is <= 0 is useless and that's how it's being used. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14net: don't global ICMP rate limit packets originating from loopbackJesper Dangaard Brouer
Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net/mlxfw: fix a NULL dereferenceDan Carpenter
If we hit this error path we end up returning ERR_PTR(0) which is NULL. The caller is not expecting that so it results in a NULL dereference. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14block: Fix a blk_exit_rl() regressionBart Van Assche
Avoid that the following complaint is reported: BUG: sleeping function called from invalid context at kernel/workqueue.c:2790 in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3 1 lock held by rcuop/3/41: #0: (rcu_callback){......}, at: [<ffffffff8111f9a2>] rcu_nocb_kthread+0x282/0x500 Call Trace: dump_stack+0x86/0xcf ___might_sleep+0x174/0x260 __might_sleep+0x4a/0x80 flush_work+0x7e/0x2e0 __cancel_work_timer+0x143/0x1c0 cancel_work_sync+0x10/0x20 blk_throtl_exit+0x25/0x60 blkcg_exit_queue+0x35/0x40 blk_release_queue+0x42/0x130 kobject_put+0xa9/0x190 This happens since we invoke callbacks that need to block from the queue release handler. Fix this by pushing the final release to a workqueue. Reported-by: Ross Zwisler <zwisler@gmail.com> Fixes: commit b425e5049258 ("block: Avoid that blk_exit_rl() triggers a use-after-free") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Updated changelog Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-14rdma/cxgb4: Fix memory leaks during module exitRaju Rangoju
Fix memory leaks of iw_cxgb4 module in the exit path Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14net/act_pedit: fix an error codeDan Carpenter
I'm reviewing static checker warnings where we do ERR_PTR(0), which is the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL) here. Sometimes these bugs lead to a NULL dereference but I don't immediately see that problem here. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net: use skb_unref() in napi_consume_skb()Paolo Abeni
The commit 83ada39bb79d ("net: factor out a helper to decrement the skb refcount") provided and used a helper for decrementing skb usage, but I missed at least a spot for it. This change remove some more duplicated code reusing skb_unref() in napi_consume_skb(), too. The helper uses an additional, unneeded unlikely(!skb) test - napi_consume_skb() already check it a few lines above - but the compiler is smart enough to optimize the duplicated test out. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-06-14 Here's another batch of Bluetooth patches for the 4.13 kernel: - Fix for Broadcom controllers not supporting Event Mask Page 2 - New QCA ROME USB ID for btusb - Fix for Security Manager Protocol to use constant-time memcmp - Improved support for TI WiLink chips Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14qed: Fix an off by one bugDan Carpenter
The p_l2_info->pp_qid_usage[] array has "p_l2_info->queues" elements so the > here should be a >= or we write beyond the end of the array. Fixes: bbe3f233ec5e ("qed: Assign a unique per-queue index to queue-cid") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14ufs: fix logics in "ufs: make fsck -f happy"Al Viro
Storing stats _only_ at new locations is wrong for UFS1; old locations should always be kept updated. The check for "has been converted to use of new locations" is also wrong - it should be "->fs_maxbsize is equal to ->fs_bsize". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14Merge branch 'mlxsw-Add-support-for-cable-info-access'David S. Miller
Jiri Pirko says: ==================== mlxsw: Add support for cable info access Add support for cable info access via ethtool. This is done by accessing the SFP+/QSFP internal EEPROM. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14mlxsw: spectrum: Add support for access cable info via ethtoolArkadi Sharshevsky
Add support for access cable info via ethtool. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14mlxsw: reg: Add MCIA register for cable info accessArkadi Sharshevsky
The MCIA register is used to access the SFP+ and QSFP connector's EPROM. It will be used to query the cable info. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14IB/ipoib: Fix memory leak in create child syscallFeras Daoud
The flow of creating a new child goes through ipoib_vlan_add which allocates a new interface and checks the rtnl_lock. If the lock is taken, restart_syscall will be called to restart the system call again. In this case we are not releasing the already allocated interface, causing a leak. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Fix access to un-initialized napi structAlex Vesker
There is no need to re-enable napi since we set the initialized flag before calling ipoib_ib_dev_stop which will disable napi, disabling napi twice is harmless in case it was already disabled. One more reason for this fix is that when using IPoIB new device driver napi is not added to priv, this can lead to kernel panic when rn_ops ndo_open fails. [ 289.755840] invalid opcode: 0000 [#1] SMP [ 289.757111] task: ffff880036964440 ti: ffff880178ee8000 task.ti: ffff880178ee8000 [ 289.757111] RIP: 0010:[<ffffffffa05368d6>] [<ffffffffa05368d6>] napi_enable.part.24+0x4/0x6 [ib_ipoib] [ 289.757111] RSP: 0018:ffff880178eeb6d8 EFLAGS: 00010246 [ 289.757111] RAX: 0000000000000000 RBX: ffff880177a80010 RCX: 000000007fffffff [ 289.757111] RDX: ffffffff81d5f118 RSI: 0000000000000000 RDI: ffff880177a80010 [ 289.757111] RBP: ffff880178eeb6d8 R08: 0000000000000082 R09: 0000000000000283 [ 289.757111] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175a00000 [ 289.757111] R13: ffff880177a80080 R14: 0000000000000000 R15: 0000000000000001 [ 289.757111] FS: 00007fe2ee346880(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 [ 289.757111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 289.757111] CR2: 00007fffca979020 CR3: 00000001792e4000 CR4: 00000000000006f0 [ 289.757111] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 289.757111] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 289.757111] Stack: [ 289.796027] ffff880178eeb6f0 ffffffffa05251f5 ffff880177a80000 ffff880178eeb718 [ 289.796027] ffffffffa0528505 ffff880175a00000 ffff880177a80000 0000000000000000 [ 289.796027] ffff880178eeb748 ffffffffa051f0ab ffff880175a00000 ffffffffa0537d60 [ 289.796027] Call Trace: [ 289.796027] [<ffffffffa05251f5>] napi_enable+0x25/0x30 [ib_ipoib] [ 289.796027] [<ffffffffa0528505>] ipoib_ib_dev_open+0x175/0x190 [ib_ipoib] [ 289.796027] [<ffffffffa051f0ab>] ipoib_open+0x4b/0x160 [ib_ipoib] [ 289.796027] [<ffffffff814fe33f>] _dev_open+0xbf/0x130 [ 289.796027] [<ffffffff814fe62d>] __dev_change_flags+0x9d/0x170 [ 289.796027] [<ffffffff814fe729>] dev_change_flags+0x29/0x60 [ 289.796027] [<ffffffff8150caf7>] do_setlink+0x397/0xa40 Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Delete napi in device uninit defaultAlex Vesker
This patch mekas init_default and uninit_default symmetric with a call to delete napi. Additionally, the uninit_default gained delete napi call in case of init_default fails. Fixes: 515ed4f3aab4 ('IB/IPoIB: Separate control and data related initializations') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Limit call to free rdma_netdev for capable devicesAlex Vesker
Limit calls to free_rdma_netdev() for capable devices only. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Fix memory leaks for child interfaces privAlex Vesker
There is a need to free priv explicitly and not just to release the device, child priv is freed explicitly on remove flow and this patch also includes priv free on error flow in P_key creation and also in add_port. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14net: update undefined ->ndo_change_mtu() commentMagnus Damm
Update ->ndo_change_mtu() callback comment to remove text about returning error in case of undefined callback. This change makes the comment match the existing code behavior. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14Merge branch 'bpf-MIPS-infra'David S. Miller
David Daney says: ==================== bpf: Changes needed (or desired) for MIPS support This is a grab bag of changes to the bpf testing infrastructure I developed working on MIPS eBPF JIT support. The change to bpf_jit_disasm is probably universally beneficial, the others are more MIPS specific. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14samples/bpf: Fix tracex5 to work with MIPS syscalls.David Daney
There are two problems: 1) In MIPS the __NR_* macros expand to an expression, this causes the sections of the object file to be named like: . . . [ 5] kprobe/(5000 + 1) PROGBITS 0000000000000000 000160 ... [ 6] kprobe/(5000 + 0) PROGBITS 0000000000000000 000258 ... [ 7] kprobe/(5000 + 9) PROGBITS 0000000000000000 000348 ... . . . The fix here is to use the "asm_offsets" trick to evaluate the macros in the C compiler and generate a header file with a usable form of the macros. 2) MIPS syscall numbers start at 5000, so we need a bigger map to hold the sub-programs. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14bpf: Add MIPS support to samples/bpf.David Daney
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14test_bpf: Add test to make conditional jump cross a large number of insns.David Daney
On MIPS, conditional branches can only span 32k instructions. To exceed this limit in the JIT with the BPF maximum of 4k insns, we need to choose eBPF insns that expand to more than 8 machine instructions. Use BPF_LD_ABS as it is quite complex. This forces the JIT to invert the sense of the branch to branch around a long jump to the end. This (somewhat) verifies that the branch inversion logic and target address calculation of the long jumps are done correctly. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14tools: bpf_jit_disasm: Handle large images.David Daney
Dynamically allocate memory so that JIT images larger than the size of the statically allocated array can be handled. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14Merge branch 'bpf-ctx-narrow'David S. Miller
Yonghong Song says: ==================== bpf: permit bpf program narrower loads for ctx fields Today, if users try to access a ctx field through a narrower load, e.g., __be16 prot = __sk_buff->protocol, verifier will fail. This set contains the verifier change to permit such loads for certain ctx fields as well as the new test cases in selftests/bpf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14selftests/bpf: Add test cases to test narrower ctx field loadsYonghong Song
Add test cases in test_verifier and test_progs. Negative tests are added in test_verifier as well. The test in test_progs will compare the value of narrower ctx field load result vs. the masked value of normal full-field load result, and will fail if they are not the same. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14bpf: permits narrower load from bpf program context fieldsYonghong Song
Currently, verifier will reject a program if it contains an narrower load from the bpf context structure. For example, __u8 h = __sk_buff->hash, or __u16 p = __sk_buff->protocol __u32 sample_period = bpf_perf_event_data->sample_period which are narrower loads of 4-byte or 8-byte field. This patch solves the issue by: . Introduce a new parameter ctx_field_size to carry the field size of narrower load from prog type specific *__is_valid_access validator back to verifier. . The non-zero ctx_field_size for a memory access indicates (1). underlying prog type specific convert_ctx_accesses supporting non-whole-field access (2). the current insn is a narrower or whole field access. . In verifier, for such loads where load memory size is less than ctx_field_size, verifier transforms it to a full field load followed by proper masking. . Currently, __sk_buff and bpf_perf_event_data->sample_period are supporting narrowing loads. . Narrower stores are still not allowed as typical ctx stores are just normal stores. Because of this change, some tests in verifier will fail and these tests are removed. As a bonus, rename some out of bound __sk_buff->cb access to proper field name and remove two redundant "skb cb oob" tests. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14perf tools: Fix build with ARCH=x86_64Jiada Wang
With commit: 0a943cb10ce78 (tools build: Add HOSTARCH Makefile variable) when building for ARCH=x86_64, ARCH=x86_64 is passed to perf instead of ARCH=x86, so the perf build process searchs header files from tools/arch/x86_64/include, which doesn't exist. The following build failure is seen: In file included from util/event.c:2:0: tools/include/uapi/linux/mman.h:4:27: fatal error: uapi/asm/mman.h: No such file or directory compilation terminated. Fix this issue by using SRCARCH instead of ARCH in perf, just like the main kernel Makefile and tools/objtool's. Signed-off-by: Jiada Wang <jiada_wang@mentor.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Eugeniu Rosca <erosca@de.adit-jv.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 0a943cb10ce7 ("tools build: Add HOSTARCH Makefile variable") Link: http://lkml.kernel.org/r/1491793357-14977-2-git-send-email-jiada_wang@mentor.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-14perf evsel: Fix probing of precise_ip level for default cycles eventArnaldo Carvalho de Melo
Since commit 18e7a45af91a ("perf/x86: Reject non sampling events with precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done in the routine used to probe the max precise level when no events were passed to 'perf record' or 'perf top', i.e.: perf_evsel__new_cycles() perf_event_attr__set_max_precise_ip() The x86 code, in x86_pmu_hw_config(), which is called all the way from sys_perf_event_open() did, starting with the aforementioned commit: /* There's no sense in having PEBS for non sampling events: */ if (!is_sampling_event(event)) return -EINVAL; Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using just the non precise cycles variant. To make sure that this is the case, I tested it, before this patch, with: # perf probe -L x86_pmu_hw_config <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0> 0 int x86_pmu_hw_config(struct perf_event *event) 1 { 2 if (event->attr.precise_ip) { <SNIP> 17 if (event->attr.precise_ip > precise) 18 return -EOPNOTSUPP; /* There's no sense in having PEBS for non sampling events: */ 21 if (!is_sampling_event(event)) 22 return -EINVAL; } <SNIP> # perf probe x86_pmu_hw_config:22 Added new events: probe:x86_pmu_hw_config (on x86_pmu_hw_config:22) probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22) You can now use it in all perf tools, such as: perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1 # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1 0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ... 0.015 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1)) x86_pmu_hw_config ([kernel.kallsyms]) hsw_hw_config ([kernel.kallsyms]) x86_pmu_event_init ([kernel.kallsyms]) perf_try_init_event ([kernel.kallsyms]) perf_event_alloc ([kernel.kallsyms]) SYSC_perf_event_open ([kernel.kallsyms]) sys_perf_event_open ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) return_from_SYSCALL_64 ([kernel.kallsyms]) syscall (/usr/lib64/libc-2.24.so) perf_event_attr__set_max_precise_ip (/home/acme/bin/perf) perf_evsel__new_cycles (/home/acme/bin/perf) perf_evlist__add_default (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) 0.000 ( 0.021 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument 0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ... 0.025 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1)) x86_pmu_hw_config ([kernel.kallsyms]) hsw_hw_config ([kernel.kallsyms]) x86_pmu_event_init ([kernel.kallsyms]) perf_try_init_event ([kernel.kallsyms]) perf_event_alloc ([kernel.kallsyms]) SYSC_perf_event_open ([kernel.kallsyms]) sys_perf_event_open ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) return_from_SYSCALL_64 ([kernel.kallsyms]) syscall (/usr/lib64/libc-2.24.so) perf_event_attr__set_max_precise_ip (/home/acme/bin/perf) perf_evsel__new_cycles (/home/acme/bin/perf) perf_evlist__add_default (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) 0.023 ( 0.004 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument 0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1 ) ... 0.030 ( ): probe:x86_pmu_hw_config:(ffffffff9c0065e1)) x86_pmu_hw_config ([kernel.kallsyms]) hsw_hw_config ([kernel.kallsyms]) x86_pmu_event_init ([kernel.kallsyms]) perf_try_init_event ([kernel.kallsyms]) perf_event_alloc ([kernel.kallsyms]) SYSC_perf_event_open ([kernel.kallsyms]) sys_perf_event_open ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) return_from_SYSCALL_64 ([kernel.kallsyms]) syscall (/usr/lib64/libc-2.24.so) perf_event_attr__set_max_precise_ip (/home/acme/bin/perf) perf_evsel__new_cycles (/home/acme/bin/perf) perf_evlist__add_default (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) 0.028 ( 0.004 ms): perf/4150 ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument 41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4 41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5 41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6 41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ] # I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times. So fix it by just setting attr.sample_period Now, after this patch: # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1 [ perf record: Woken up 1 times to write data ] 0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4 syscall (/usr/lib64/libc-2.24.so) perf_event_open_cloexec_flag (/home/acme/bin/perf) 0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 syscall (/usr/lib64/libc-2.24.so) perf_evlist__config (/home/acme/bin/perf) 0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 syscall (/usr/lib64/libc-2.24.so) perf_evlist__config (/home/acme/bin/perf) 0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1 ) = 4 syscall (/usr/lib64/libc-2.24.so) perf_event_attr__set_max_precise_ip (/home/acme/bin/perf) 0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4 syscall (/usr/lib64/libc-2.24.so) perf_evsel__open (/home/acme/bin/perf) 0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5 syscall (/usr/lib64/libc-2.24.so) perf_evsel__open (/home/acme/bin/perf) 0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6 syscall (/usr/lib64/libc-2.24.so) perf_evsel__open (/home/acme/bin/perf) 0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8 syscall (/usr/lib64/libc-2.24.so) perf_evsel__open (/home/acme/bin/perf) [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ] # The probe one called from perf_event_attr__set_max_precise_ip() works the first time, with attr.precise_ip = 3, wit hthe next ones being the per cpu ones for the cycles:ppp event. And here is the text from a report and alternative proposed patch by Thomas-Mich Richter: --- On s390 the counter and sampling facility do not support a precise IP skid level and sometimes returns EOPNOTSUPP when structure member precise_ip in struct perf_event_attr is not set to zero. On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP. This happens only when no events are specified on command line. The functions called are ... --> perf_evlist__add_default --> perf_evsel__new_cycles --> perf_event_attr__set_max_precise_ip The last function determines the value of structure member precise_ip by invoking the perf_event_open() system call and checking the return code. The first successful open is the value for precise_ip. However the value is determined without setting member sample_period and indicates no sampling. On s390 the counter facility and sampling facility are different. The above procedure determines a precise_ip value of 3 using the counter facility. Later it uses the sampling facility with a value of 3 and fails with EOPNOTSUPP. --- v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members of unnamed union members in the container struct initialization, so move from: struct perf_event_attr attr = { ... .sample_period = 1, }; to right after it as: struct perf_event_attr attr = { ... }; attr.sample_period = 1; v3: We need to reset .sample_period to 0 to let the users of perf_evsel__new_cycles() to properly setup attr.sample_period or attr.sample_freq. Reported by Ingo Molnar. Reported-and-Acked-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 18e7a45af91a ("perf/x86: Reject non sampling events with precise_ip") Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-14net_sched: move tcf_lock down after gen_replace_estimator()WANG Cong
Laura reported a sleep-in-atomic kernel warning inside tcf_act_police_init() which calls gen_replace_estimator() with spinlock protection. It is not necessary in this case, we already have RTNL lock here so it is enough to protect concurrent writers. For the reader, i.e. tcf_act_police(), it needs to make decision based on this rate estimator, in the worst case we drop more/less packets than necessary while changing the rate in parallel, it is still acceptable. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Nick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14macvlan: propagate the mac address change status for lowerdevZhang Shengju
The macvlan dev should propagate the return value of mac address change for lower device in the passthru mode, instead of always return 0. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>