summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2017-07-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer fixes from James Morris: "Bugfixes for TPM and SELinux" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: IB/core: Fix static analysis warning in ib_policy_change_task IB/core: Fix uninitialized variable use in check_qp_port_pkey_settings tpm: do not suspend/resume if power stays on tpm: use tpm2_pcr_read() in tpm2_do_selftest() tpm: use tpm_buf functions in tpm2_pcr_read() tpm_tis: make ilb_base_addr static tpm: consolidate the TPM startup code tpm: Enable CLKRUN protocol for Braswell systems tpm/tpm_crb: fix priv->cmd_size initialisation tpm: fix a kernel memory leak in tpm-sysfs.c tpm: Issue a TPM2_Shutdown for TPM2 devices. Add "shutdown" to "struct class".
2017-07-07IB/core: Fix static analysis warning in ib_policy_change_taskDaniel Jurgens
ib_get_cached_subnet_prefix can technically fail, but the only way it could is not possible based on the loop conditions. Check the return value before using the variable sp to resolve a static analysis warning. -v1: - Fix check to !ret. Paul Moore Fixes: 8f408ab64be6 ("selinux lsm IB/core: Implement LSM notification system") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07IB/core: Fix uninitialized variable use in check_qp_port_pkey_settingsDaniel Jurgens
Check the return value from get_pkey_and_subnet_prefix to prevent using uninitialized variables. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-06Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma update from Doug Ledford: "This includes two bugs against the newly added opa vnic that were found by turning on the debug kernel options: - sleeping while holding a lock, so a one line fix where they switched it from GFP_KERNEL allocation to a GFP_ATOMIC allocation - a case where they had an isolated caller of their code that could call them in an atomic context so they had to switch their use of a mutex to a spinlock to be safe, so this was considerably more lines of diff because all uses of that lock had to be switched In addition, the bug that was discussed with you already about an out of bounds array access in ib_uverbs_modify_qp and ib_uverbs_create_ah and is only seven lines of diff. And finally, one fix to an earlier fix in the -rc cycle that broke hfi1 and qib in regards to IPoIB (this one is, unfortunately, larger than I would like for a -rc7 submission, but fixing the problem required that we not treat all devices as though they had allocated a netdev universally because it isn't true, and it took 70 lines of diff to resolve the issue, but the final patch has been vetted by Intel and Mellanox and they've both given their approval to the fix). Summary: - Two fixes for OPA found by debug kernel - Fix for user supplied input causing kernel problems - Fix for the IPoIB fixes submitted around -rc4" [ Doug sent this having not noticed the 4.12 release, so I guess I'll be getting another rdma pull request with the actuakl merge window updates and not just fixes. Oh well - it would have been nice if this small update had been the merge window one. - Linus ] * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev RDMA/uverbs: Check port number supplied by user verbs cmds IB/opa_vnic: Use spinlock instead of mutex for stats_lock IB/opa_vnic: Use GFP_ATOMIC while sending trap
2017-07-05IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdevNiranjana Vishwanathapura
IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has returned -EOPNOTSUPP. Move free_rdma_netdev from ib_device structure to rdma_netdev structure thus ensuring proper cleanup function is called for the rdma net device. Fix the following trace: ib0: Failed to modify QP to ERROR state BUG: unable to handle kernel paging request at 0000000000001d20 IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1] Call Trace: ipoib_remove_one+0xbe/0x160 [ib_ipoib] ib_unregister_device+0xd0/0x170 [ib_core] rvt_unregister_device+0x29/0x90 [rdmavt] hfi1_unregister_ib_device+0x1a/0x100 [hfi1] remove_one+0x4b/0x220 [hfi1] pci_device_remove+0x39/0xc0 device_release_driver_internal+0x141/0x200 driver_detach+0x3f/0x80 bus_remove_driver+0x55/0xd0 driver_unregister+0x2c/0x50 pci_unregister_driver+0x2a/0xa0 hfi1_mod_cleanup+0x10/0xf65 [hfi1] SyS_delete_module+0x171/0x250 do_syscall_64+0x67/0x150 entry_SYSCALL64_slow_path+0x25/0x25 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...
2017-07-05Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: - a major update for AppArmor. From JJ: * several bug fixes and cleanups * the patch to add symlink support to securityfs that was floated on the list earlier and the apparmorfs changes that make use of securityfs symlinks * it introduces the domain labeling base code that Ubuntu has been carrying for several years, with several cleanups applied. And it converts the current mediation over to using the domain labeling base, which brings domain stacking support with it. This finally will bring the base upstream code in line with Ubuntu and provide a base to upstream the new feature work that Ubuntu carries. * This does _not_ contain any of the newer apparmor mediation features/controls (mount, signals, network, keys, ...) that Ubuntu is currently carrying, all of which will be RFC'd on top of this. - Notable also is the Infiniband work in SELinux, and the new file:map permission. From Paul: "While we're down to 21 patches for v4.13 (it was 31 for v4.12), the diffstat jumps up tremendously with over 2k of line changes. Almost all of these changes are the SELinux/IB work done by Daniel Jurgens; some other noteworthy changes include a NFS v4.2 labeling fix, a new file:map permission, and reporting of policy capabilities on policy load" There's also now genfscon labeling support for tracefs, which was lost in v4.1 with the separation from debugfs. - Smack incorporates a safer socket check in file_receive, and adds a cap_capable call in privilege check. - TPM as usual has a bunch of fixes and enhancements. - Multiple calls to security_add_hooks() can now be made for the same LSM, to allow LSMs to have hook declarations across multiple files. - IMA now supports different "ima_appraise=" modes (eg. log, fix) from the boot command line. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (126 commits) apparmor: put back designators in struct initialisers seccomp: Switch from atomic_t to recount_t seccomp: Adjust selftests to avoid double-join seccomp: Clean up core dump logic IMA: update IMA policy documentation to include pcr= option ima: Log the same audit cause whenever a file has no signature ima: Simplify policy_func_show. integrity: Small code improvements ima: fix get_binary_runtime_size() ima: use ima_parse_buf() to parse template data ima: use ima_parse_buf() to parse measurements headers ima: introduce ima_parse_buf() ima: Add cgroups2 to the defaults list ima: use memdup_user_nul ima: fix up #endif comments IMA: Correct Kconfig dependencies for hash selection ima: define is_ima_appraise_enabled() ima: define Kconfig IMA_APPRAISE_BOOTPARAM option ima: define a set of appraisal rules requiring file signatures ima: extend the "ima_policy" boot command line to support multiple policies ...
2017-07-03Merge tag 'driver-core-4.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core update for 4.13-rc1. The large majority of this is a lot of cleanup of old fields in the driver core structures and their remaining usages in random drivers. All of those fixes have been reviewed by the various subsystem maintainers. There's also some small firmware updates in here, a new kobject uevent api interface that makes userspace interaction easier, and a few other minor things. All of these have been in linux-next for a long while with no reported issues" * tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits) arm: mach-rpc: ecard: fix build error zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO() driver-core: remove struct bus_type.dev_attrs powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type powerpc: vio: use dev_groups and not dev_attrs for bus_type USB: usbip: convert to use DRIVER_ATTR_RW s390: drivers: convert to use DRIVER_ATTR_RO/WO platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW pcmcia: ds: convert to use DRIVER_ATTR_RO wireless: ipw2x00: convert to use DRIVER_ATTR_RW net: ehea: convert to use DRIVER_ATTR_RO net: caif: convert to use DRIVER_ATTR_RO TTY: hvc: convert to use DRIVER_ATTR_RW PCI: pci-driver: convert to use DRIVER_ATTR_WO IB: nes: convert to use DRIVER_ATTR_RW HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups arm: ecard: fix dev_groups patch typo tty: serdev: use dev_groups and not dev_attrs for bus_type sparc: vio: use dev_groups and not dev_attrs for bus_type hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type ...
2017-07-03RDMA/uverbs: Check port number supplied by user verbs cmdsBoris Pismenny
The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive the port number from user input as part of its attributes and assumes it is valid. Down on the stack, that parameter is used to access kernel data structures. If the value is invalid, the kernel accesses memory it should not. To prevent this, verify the port number before using it. BUG: KASAN: use-after-free in ib_uverbs_create_ah+0x6d5/0x7b0 Read of size 4 at addr ffff880018d67ab8 by task syz-executor/313 BUG: KASAN: slab-out-of-bounds in modify_qp.isra.4+0x19d0/0x1ef0 Read of size 4 at addr ffff88006c40ec58 by task syz-executor/819 Fixes: 67cdb40ca444 ("[IB] uverbs: Implement more commands") Fixes: 189aba99e70 ("IB/uverbs: Extend modify_qp and support packet pacing") Cc: <stable@vger.kernel.org> # v2.6.14+ Cc: <security@kernel.org> Cc: Yevgeny Kliteynik <kliteyn@mellanox.com> Cc: Tziporet Koren <tziporet@mellanox.com> Cc: Alex Polak <alexpo@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-01net: convert sk_buff.users from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29IB/opa_vnic: Use spinlock instead of mutex for stats_lockVishwanathapura, Niranjana
Stats can be read from atomic context, hence make stats_lock as a spinlock. Fix the following trace with debug kernel. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 in_atomic(): 1, irqs_disabled(): 0, pid: 6487, name: sadc Call Trace: dump_stack+0x63/0x90 ___might_sleep+0xda/0x130 __might_sleep+0x4a/0x90 mutex_lock+0x20/0x50 opa_vnic_get_stats64+0x56/0x140 [opa_vnic] dev_get_stats+0x74/0x130 dev_seq_printf_stats+0x37/0x120 dev_seq_show+0x14/0x30 seq_read+0x26d/0x3d0 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-29IB/opa_vnic: Use GFP_ATOMIC while sending trapVishwanathapura, Niranjana
Pass GFP_ATOMIC flag to ib_create_send_mad() while sending trap as it can be triggered from the atomic context. Fix the following trace with debug kernel. BUG: sleeping function called from invalid context at mm/slab.h:432 in_atomic(): 1, irqs_disabled(): 0, pid: 1771, name: NetworkManager Call Trace: dump_stack+0x63/0x90 ___might_sleep+0xda/0x130 __might_sleep+0x4a/0x90 __kmalloc+0x19e/0x220 ? ib_create_send_mad+0xea/0x390 [ib_core] ib_create_send_mad+0xea/0x390 [ib_core] opa_vnic_vema_send_trap+0x17b/0x460 [opa_vnic] opa_vnic_vema_report_event+0x57/0x80 [opa_vnic] opa_vnic_mac_send_event+0xaa/0xf0 [opa_vnic] opa_vnic_set_rx_mode+0x17/0x30 [opa_vnic] __dev_set_rx_mode+0x52/0x90 dev_set_rx_mode+0x26/0x40 __dev_open+0xe8/0x140 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/mlx5: Respect mlx5_core reserved GIDsIlan Tayari
Reserved gids are taken by the mlx5_core, report smaller GID table size to IB core. Set mlx5_query_roce_port's return value back to int. In case of error, return an indication. This rolls back some of the change in commit 50f22fd8ecf9 ("IB/mlx5: Set mlx5_query_roce_port's return value to void") Change set_roce_addr to use gid_set function, instead of directly sending the command. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two entries being added at the same time to the IFLA policy table, whilst parallel bug fixes to decnet routing dst handling overlapping with the dst gc removal in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21qed*: Rename qed_roce_if.h to qed_rdma_if.hKalderon, Michal
Rename the qed_roce_if file to qed_rdma_if as it represents a common interface for RoCE and iWARP. this commit affects RDMA/qedr as well. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20net: introduce __skb_put_[zero, data, u8]yuan linyu
follow Johannes Berg, semantic patch file as below, @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = __skb_put(skb, len); +p = __skb_put_zero(skb, len); | -p = (t)__skb_put(skb, len); +p = __skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ identifier p; expression len; expression skb; type t; @@ ( -t p = __skb_put(skb, len); +t p = __skb_put_zero(skb, len); ) ... when != p ( -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = __skb_put(skb, sizeof(t)); +p = __skb_put_zero(skb, sizeof(t)); | -p = (t *)__skb_put(skb, sizeof(t)); +p = __skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(__skb_put(skb, len), 0, len); +__skb_put_zero(skb, len); @@ expression skb, len, data; @@ -memcpy(__skb_put(skb, len), data, len); +__skb_put_data(skb, data, len); @@ expression SKB, C, S; typedef u8; identifier fn = {__skb_put}; fresh identifier fn2 = fn ## "_u8"; @@ - *(u8 *)fn(SKB, S) = C; + fn2(SKB, C); Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20qed*: Set rdma generic functions prefixMichal Kalderon
Rename the functions common to both iWARP and RoCE to have a prefix of _rdma_ instead of _roce_. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20qed*: qede_roce.[ch] -> qede_rdma.[ch]Michal Kalderon
Once we have iWARP support, the qede portion of the qedr<->qede would serve all the RDMA protocols - so rename the file to be appropriate to its function. While we're at it, we're also moving a couple of inclusions to it into .h files and adding includes to make sure it contains all type definitions it requires. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20qed: Chain support for external PBLMintz, Yuval
iWARP would require the chains to allocate/free their PBL memory independently, so add the infrastructure to provide it externally. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20sched/wait: Rename wait_queue_t => wait_queue_entry_tIngo Molnar
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-16Merge tag 'mlx5-updates-2017-06-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox mlx5 updates and cleanups 2017-06-16 mlx5-updates-2017-06-16 This series provide some updates and cleanups for mlx5 core and netdevice driver. From Eli Cohen, add a missing event string. From Or Gerlitz, some checkpatch cleanups. From Moni, Disalbe HW level LAG when SRIOV is enabled. From Tariq, A code reuse cleanup in aRFS flow. From Itay Aveksis, Typo fix. From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations. From Majd Dibbiny, Fast unload support on kernel shutdown. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_push & __skb_push return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: convert many more places to skb_put_zero()Johannes Berg
There were many places that my previous spatch didn't find, as pointed out by yuan linyu in various patches. The following spatch found many more and also removes the now unnecessary casts: @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_zero(skb, len); | -p = (t)skb_put(skb, len); +p = skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(skb_put(skb, len), 0, len); +skb_put_zero(skb, len); Apply it to the tree (with one manual fixup to keep the comment in vxlan.c, which spatch removed.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "I had thought at the time of the last pull request that there wouldn't be much more to go, but several things just kept trickling in over the last week. Instead of just the six patches to bnxt_re that I had anticipated, there are another five IPoIB patches, two qedr patches, and a few other miscellaneous patches. The bnxt_re patches are more lines of diff than I like to submit this late in the game. That's mostly because of the first two patches in the series of six. I almost dropped them just because of the lines of churn, but on a close review, a lot of the churn came from removing duplicated code sections and consolidating them into callable routines. I felt like this made the number of lines of change more acceptable, and they address problems, so I left them. The remainder of the patches are all small, well contained, and well understood. These have passed 0day testing, but have not been submitted to linux-next (but a local merge test with your current master was without any conflicts). Summary: - A fix for fix eea40b8f624 ("infiniband: call ipv6 route lookup via the stub interface") - Six patches against bnxt_re...the first two are considerably larger than I would like, but as they address real issues I went ahead and submitted them (it also helped that a good deal of the churn was removing code repeated in multiple places and consolidating it to one common function) - Two fixes against qedr that just came in - One fix against rxe that took a few revisions to get right plus time to get the proper reviews - Five late breaking IPoIB fixes - One late cxgb4 fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: rdma/cxgb4: Fix memory leaks during module exit IB/ipoib: Fix memory leak in create child syscall IB/ipoib: Fix access to un-initialized napi struct IB/ipoib: Delete napi in device uninit default IB/ipoib: Limit call to free rdma_netdev for capable devices IB/ipoib: Fix memory leaks for child interfaces priv rxe: Fix a sleep-in-atomic bug in post_one_send RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues RDMA/qedr: Initialize byte_len in WC of READ and SEND commands RDMA/bnxt_re: Remove FMR support RDMA/bnxt_re: Fix RQE posting logic RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list RDMA/bnxt_re: HW workarounds for handling specific conditions RDMA/bnxt_re: Fixing the Control path command and response handling IB/addr: Fix setting source address in addr6_resolve()
2017-06-16net/mlx5: Fix some spelling mistakesOr Gerlitz
Fixed few places where endianness was misspelled and one spot whwere output was: CHECK: 'endianess' may be misspelled - perhaps 'endianness'? CHECK: 'ouput' may be misspelled - perhaps 'output'? Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a bug on sparc where we may dereference freed stack memory" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: Work around deallocated stack frame reference gcc bug on sparc.
2017-06-14rdma/cxgb4: Fix memory leaks during module exitRaju Rangoju
Fix memory leaks of iw_cxgb4 module in the exit path Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Fix memory leak in create child syscallFeras Daoud
The flow of creating a new child goes through ipoib_vlan_add which allocates a new interface and checks the rtnl_lock. If the lock is taken, restart_syscall will be called to restart the system call again. In this case we are not releasing the already allocated interface, causing a leak. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Fix access to un-initialized napi structAlex Vesker
There is no need to re-enable napi since we set the initialized flag before calling ipoib_ib_dev_stop which will disable napi, disabling napi twice is harmless in case it was already disabled. One more reason for this fix is that when using IPoIB new device driver napi is not added to priv, this can lead to kernel panic when rn_ops ndo_open fails. [ 289.755840] invalid opcode: 0000 [#1] SMP [ 289.757111] task: ffff880036964440 ti: ffff880178ee8000 task.ti: ffff880178ee8000 [ 289.757111] RIP: 0010:[<ffffffffa05368d6>] [<ffffffffa05368d6>] napi_enable.part.24+0x4/0x6 [ib_ipoib] [ 289.757111] RSP: 0018:ffff880178eeb6d8 EFLAGS: 00010246 [ 289.757111] RAX: 0000000000000000 RBX: ffff880177a80010 RCX: 000000007fffffff [ 289.757111] RDX: ffffffff81d5f118 RSI: 0000000000000000 RDI: ffff880177a80010 [ 289.757111] RBP: ffff880178eeb6d8 R08: 0000000000000082 R09: 0000000000000283 [ 289.757111] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175a00000 [ 289.757111] R13: ffff880177a80080 R14: 0000000000000000 R15: 0000000000000001 [ 289.757111] FS: 00007fe2ee346880(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 [ 289.757111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 289.757111] CR2: 00007fffca979020 CR3: 00000001792e4000 CR4: 00000000000006f0 [ 289.757111] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 289.757111] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 289.757111] Stack: [ 289.796027] ffff880178eeb6f0 ffffffffa05251f5 ffff880177a80000 ffff880178eeb718 [ 289.796027] ffffffffa0528505 ffff880175a00000 ffff880177a80000 0000000000000000 [ 289.796027] ffff880178eeb748 ffffffffa051f0ab ffff880175a00000 ffffffffa0537d60 [ 289.796027] Call Trace: [ 289.796027] [<ffffffffa05251f5>] napi_enable+0x25/0x30 [ib_ipoib] [ 289.796027] [<ffffffffa0528505>] ipoib_ib_dev_open+0x175/0x190 [ib_ipoib] [ 289.796027] [<ffffffffa051f0ab>] ipoib_open+0x4b/0x160 [ib_ipoib] [ 289.796027] [<ffffffff814fe33f>] _dev_open+0xbf/0x130 [ 289.796027] [<ffffffff814fe62d>] __dev_change_flags+0x9d/0x170 [ 289.796027] [<ffffffff814fe729>] dev_change_flags+0x29/0x60 [ 289.796027] [<ffffffff8150caf7>] do_setlink+0x397/0xa40 Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Delete napi in device uninit defaultAlex Vesker
This patch mekas init_default and uninit_default symmetric with a call to delete napi. Additionally, the uninit_default gained delete napi call in case of init_default fails. Fixes: 515ed4f3aab4 ('IB/IPoIB: Separate control and data related initializations') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Limit call to free rdma_netdev for capable devicesAlex Vesker
Limit calls to free_rdma_netdev() for capable devices only. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14IB/ipoib: Fix memory leaks for child interfaces privAlex Vesker
There is a need to free priv explicitly and not just to release the device, child priv is freed explicitly on remove flow and this patch also includes priv free on error flow in P_key creation and also in add_port. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14rxe: Fix a sleep-in-atomic bug in post_one_sendJia-Ju Bai
The driver may sleep under a spin lock, and the function call path is: post_one_send (acquire the lock by spin_lock_irqsave) init_send_wqe copy_from_user --> may sleep There is no flow that makes "qp->is_user" true, and copy_from_user may cause bug when a non-user pointer is used. So the lines of copy_from_user and check of "qp->is_user" are removed. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queuesRam Amrani
Add 64KB PAGE_SIZE support to user-space CQ, SQ and RQ queues. De-facto it means that code was added to translate 64KB pages to smaller 4KB pages that the FW can handle. Otherwise, the FW would wrap (or jump to the next page) when reaching 4KB while the user space library will continue on the same large page. Note that MR code remains as is since the FW supports larger pages for MRs. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/qedr: Initialize byte_len in WC of READ and SEND commandsMichal Kalderon
Initialize byte_len in work completion of RDMA_READ and RDMA_SEND. Exposed by uDAPL application. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/bnxt_re: Remove FMR supportSelvin Xavier
Some issues observed with FMR implementation while running stress traffic. So removing the FMR verbs support for now. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/bnxt_re: Fix RQE posting logicDevesh Sharma
This patch adds code to ring RQ Doorbell aggressively so that the adapter can DMA RQ buffers sooner, instead of DMA all WQEs in the post_recv WR list together at the end of the post_recv verb. Also use spinlock to serialize RQ posting Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPsSomnath Kotur
HW stalls out after 0x800000 WQEs are posted for UD QPs. To workaround this problem, driver will send a modify_qp cmd to the HW at around the halfway mark(0x400000) so that FW can accordingly modify the QP context in the HW to prevent this stall. This workaround needs to be done for UD, QP1 and Raw Ethertype packets. Added a counter to keep track of WQEs posted during post_send. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_listSelvin Xavier
If the host buffers are freed before destroying MR in HW, HW could try accessing these buffers. This could cause a host crash. Fixing the code to avoid this condition. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14RDMA/bnxt_re: HW workarounds for handling specific conditionsEddie Wai
This patch implements the following HW workarounds 1. The SQ depth needs to be augmented by 128 + 1 to avoid running into an Out of order CQE issue 2. Workaround to handle the problem where the HW fast path engine continues to access DMA memory in retranmission mode even after the WQE has already been completed. If the HW reports this condition, driver detects it and posts a Fence WQE. The driver stops reporting the completions to stack until it receives completion for Fence WQE. Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-13RDMA/bnxt_re: Fixing the Control path command and response handlingDevesh Sharma
Fixing a concurrency issue with creq handling. Each caller was given a globally managed crsq element, which was accessed outside a lock. This could result in corruption, if lot of applications are simultaneously issuing Control Path commands. Now, each caller will provide its own response buffer and the responses will be copied under a lock. Also, Fixing the queue full condition check for the CMDQ. As a part of these changes, the control path code is refactored to remove the code replication in the response status checking. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-12IB: nes: convert to use DRIVER_ATTR_RWGreg Kroah-Hartman
We are trying to get rid of DRIVER_ATTR(), and all of the nes.c driver attributes can be trivially changed to use DRIVER_ATTR_RW(), making the code smaller and easier to manage over time. Cc: Faisal Latif <faisal.latif@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: <linux-rdma@vger.kernel.org> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09qed*: LL2 callback operationsMichal Kalderon
LL2 today is interrupt driven - when tx/rx completion arrives [or any other indication], qed needs to operate on the connection and pass the information to the protocol-driver [or internal qed consumer]. Since we have several flavors of ll2 employeed by the driver, each handler needs to do an if-else to determine the right functionality to use based on the connection type. In order to make things more scalable [given that we're going to add additional types of ll2 flavors] move the infrastrucutre into using a callback-based approach - the callbacks would be provided as part of the connection's initialization parameters. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: add a struct net parameter to tcp_parse_options()Eric Dumazet
We want to move some TCP sysctls to net namespaces in the future. tcp_window_scaling, tcp_sack and tcp_timestamps being fetched from tcp_parse_options(), we need to pass an extra parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08crypto: Work around deallocated stack frame reference gcc bug on sparc.David Miller
On sparc, if we have an alloca() like situation, as is the case with SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack memory. The result can be that the value is clobbered if a trap or interrupt arrives at just the right instruction. It only occurs if the function ends returning a value from that alloca() area and that value can be placed into the return value register using a single instruction. For example, in lib/libcrc32c.c:crc32c() we end up with a return sequence like: return %i7+8 lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B], %o5 holds the base of the on-stack area allocated for the shash descriptor. But the return released the stack frame and the register window. So if an intererupt arrives between 'return' and 'lduw', then the value read at %o5+16 can be corrupted. Add a data compiler barrier to work around this problem. This is exactly what the gcc fix will end up doing as well, and it absolutely should not change the code generated for other cpus (unless gcc on them has the same bug :-) With crucial insight from Eric Sandeen. Cc: <stable@vger.kernel.org> Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-07IB/mlx4: Bump driver versionTariq Toukan
Remove date and bump version for mlx4_ib driver. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>