summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-01Merge branch 'udp-fix-early-demux-for-mcast-packets'David S. Miller
Paolo Abeni says: ==================== udp: fix early demux for mcast packets Currently the early demux callbacks do not perform source address validation. This is not an issue for TCP or UDP unicast, where the early demux is only allowed for connected sockets and the source address is validated for the first packet and never change. The UDP protocol currently allows early demux also for unconnected multicast sockets, and we are not currently doing any validation for them, after that the first packet lands on the socket: beyond ignoring the rp_filter - if enabled - any kind of martian sources are also allowed. This series addresses the issue allowing the early demux callback to return an error code, and performing the proper checks for unconnected UDP multicast sockets before leveraging the rx dst cache. Alternatively we could disable the early demux for unconnected mcast sockets, but that would cause relevant performance regression - around 50% - while with this series, with full rp_filter in place, we keep the regression to a more moderate level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01udp: perform source validation for mcast early demuxPaolo Abeni
The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01IPv4: early demux can return an error codePaolo Abeni
Currently no error is emitted, but this infrastructure will used by the next patch to allow source address validation for mcast sockets. Since early demux can do a route lookup and an ipv4 route lookup can return an error code this is consistent with the current ipv4 route infrastructure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx pathXin Long
Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel device, like ip6gre_tap tunnel, for which it should also subtract ether header to get the correct mtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_gre: ip6gre_tap device should keep dstXin Long
The patch 'ip_gre: ipgre_tap device should keep dst' fixed a issue that ipgre_tap mtu couldn't be updated in tx path. The same fix is needed for ip6gre_tap as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: ipgre_tap device should keep dstXin Long
Without keeping dst, the tunnel will not update any mtu/pmtu info, since it does not have a dst on the skb. Reproducer: client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server After reducing eth1's mtu on client, then perforamnce became 0. This patch is to netif_keep_dst in gre_tap_init, as ipgre does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30Merge tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd fixes from Boris Brezillon: - Fix partition alignment check in mtdcore.c - Fix a buffer overflow in the Atmel NAND driver * tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtd: mtd: nand: atmel: fix buffer overflow in atmel_pmecc_user mtd: Fix partition alignment check on multi-erasesize devices
2017-09-30Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight mostly minor fixes for recently discovered issues in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ILLEGAL REQUEST + ASC==27 => target failure scsi: aacraid: Add a small delay after IOP reset scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add() scsi: scsi_transport_fc: set scsi_target_id upon rescan scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly scsi: aacraid: error: testing array offset 'bus' after use scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
2017-09-30netlink: do not proceed if dump's start() errsJason A. Donenfeld
Drivers that use the start method for netlink dumping rely on dumpit not being called if start fails. For example, ila_xlat.c allocates memory and assigns it to cb->args[0] in its start() function. It might fail to do that and return -ENOMEM instead. However, even when returning an error, dumpit will be called, which, in the example above, quickly dereferences the memory in cb->args[0], which will OOPS the kernel. This is but one example of how this goes wrong. Since start() has always been a function with an int return type, it therefore makes sense to use it properly, rather than ignoring it. This patch thus returns early and does not call dumpit() when start() fails. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29Merge tag 'platform-drivers-x86-v4.14-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform drivers fix from Darren Hart: "Newly discovered species of fujitsu laptops break some assumptions about ACPI device pairings. fujitsu-laptop: Don't oops when FUJ02E3 is not present" * tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: fujitsu-laptop: Don't oops when FUJ02E3 is not presnt
2017-09-29Merge tag 'led_fixes-4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: "Four fixes for the as3645a LED flash controller and one update to MAINTAINERS" * tag 'led_fixes-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: MAINTAINERS: Add entry for MediaTek PMIC LED driver as3645a: Unregister indicator LED on device unbind as3645a: Use integer numbers for parsing LEDs dt: bindings: as3645a: Use LED number to refer to LEDs as3645a: Use ams,input-max-microamp as documented in DT bindings
2017-09-29Merge tag 'v4.14-rockchip-clkfixes-1' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-fixes Pull Rockchip clk driver fixes from Heiko Stuebner: Some smallish fixes for the rk3128 clock support including some register errors and some clocks that should be critical for safe usage. * tag 'v4.14-rockchip-clkfixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: add sclk_timer5 as critical clock on rk3128 clk: rockchip: fix up rk3128 pvtm and mipi_24m gate regs error clk: rockchip: add pclk_pmu as critical clock on rk3128
2017-09-29clk: Export clk_bulk_prepare()Bjorn Andersson
Allow clk_bulk_prepare() to be referenced by kernel modules by adding the missing EXPORT_SYMBOL_GPL(). Fixes: 266e4e9d9150 ("clk: add clk_bulk_get accessories") Reported-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2017-09-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull waitid fix from Al Viro: "Fix infoleak in waitid()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix infoleak in waitid(2)
2017-09-29Merge branch 'for-4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected a bunch of isolated fixes, for crashes, user-visible behaviour or missing bits from other subsystem cleanups from the past. The overall number is not small but I was not able to make it significantly smaller. Most of the patches are supposed to go to stable" * 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: log csums for all modified extents Btrfs: fix unexpected result when dio reading corrupted blocks btrfs: Report error on removing qgroup if del_qgroup_item fails Btrfs: skip checksum when reading compressed data if some IO have failed Btrfs: fix kernel oops while reading compressed data Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block Btrfs: do not backup tree roots when fsync btrfs: remove BTRFS_FS_QUOTA_DISABLING flag btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid Btrfs: send: fix error number for unknown inode types btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: finish ordered extent cleaning if no progress is found btrfs: clear ordered flag on cleaning up ordered extents Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO Btrfs: do not reset bio->bi_ops while writing bio Btrfs: use the new helper wbc_to_write_flags
2017-09-29Merge tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD fixes from Shaohua Li: "A few fixes for MD. Mainly fix a problem introduced in 4.13, which we retry bio for some code paths but not all in some situations" * tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid5: cap worker count dm-raid: fix a race condition in request handling md: fix a race condition for flush request handling md: separate request handling
2017-09-29Merge tag 'pci-v4.14-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert Uytterhoeven) - fix a race in sysfs driver_override store/show (Nicolai Stange) * tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix race condition with driver_override PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
2017-09-29Merge tag 'drm-fixes-for-v4.14-rc3' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Regular fixes pull, some amdkfd, amdgpu, etnaviv, sun4i, qxl, tegra fixes. I've got an outstanding pull for i915 but it wasn't on an rc2 base so I wanted to ship these out first, I might get to it before rc3 or I might not" * tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux: drm/tegra: trace: Fix path to include qxl: fix framebuffer unpinning drm/sun4i: cec: Enable back CEC-pin framework drm/amdkfd: Print event limit messages only once per process drm/amdkfd: Fix kernel-queue wrapping bugs drm/amdkfd: Fix incorrect destroy_mqd parameter drm/radeon: disable hard reset in hibernate for APUs drm/amdgpu: revert tile table update for oland etnaviv: fix gem object list corruption etnaviv: fix submit error path qxl: fix primary surface handling drm/amdkfd: check for null dev to avoid a null pointer dereference
2017-09-29Merge tag 'iommu-fixes-v4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - A comment fix for 'struct iommu_ops' - Format string fixes for AMD IOMMU, unfortunatly I missed that during review. - Limit mediatek physical addresses to 32 bit for v7s to fix a warning triggered in io-page-table code. - Fix dma-sync in io-pgtable-arm-v7s code * tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix comment for iommu_ops.map_sg iommu/amd: pr_err() strings should end with newlines iommu/mediatek: Limit the physical address in 32bit for v7s iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
2017-09-29Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - SPsel register initialisation on reset as the architecture defines its state as unknown - Use READ_ONCE when dereferencing pmd_t pointers to avoid race conditions in page_vma_mapped_walk() (or fast GUP) with concurrent modifications of the page table - Avoid invoking the mm fault handling code for kernel addresses (check against TASK_SIZE) which would otherwise result in calling might_sleep() in atomic context * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fault: Route pte translation faults via do_translation_fault arm64: mm: Use READ_ONCE when dereferencing pointer to pte table arm64: Make sure SPsel is always set
2017-09-29Merge tag 'for-linus-4.14c-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - avoid a warning when compiling with clang - consider read-only bits in xen-pciback when writing to a BAR - fix a boot crash of pv-domains * tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping xen-pciback: relax BAR sizing write value check x86/xen: clean up clang build warning
2017-09-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Mixed bugfixes. Perhaps the most interesting one is a latent bug that was finally triggered by PCID support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/x86: Handle async PF in RCU read-side critical sections KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume KVM: VMX: use cmpxchg64 KVM: VMX: simplify and fix vmx_vcpu_pi_load KVM: VMX: avoid double list add with VT-d posted interrupts KVM: VMX: extract __pi_post_block KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29fix infoleak in waitid(2)Al Viro
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first case and waitid(2) rusage should've been copied out exactly in that case, *not* whenever kernel_waitid() has not returned an error. Compat variant shares that braino; none of kernel_wait4() callers do, so the below ought to fix it. Reported-and-tested-by: Alexander Potapenko <glider@google.com> Fixes: ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland") Cc: stable@vger.kernel.org # v4.13 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-29x86/asm: Use register variable to get stack pointer valueAndrey Ryabinin
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") ... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions: -mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d> +sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a> Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29x86/mm: Disable branch profiling in mem_encrypt.cTom Lendacky
Some routines in mem_encrypt.c are called very early in the boot process, e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined the resulting branch profiling associated with the check to see if SME is active results in a kernel crash. Disable branch profiling for mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any header files. Reported-by: kernel test robot <lkp@01.org> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge tag 'perf-urgent-for-mingo-4.14-20170928' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix syscalltbl build failure (Akemi Yagi) - Fix attr.exclude_kernel setting for default cycles:p, this time for !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo) - Sync kernel ABI headers with tooling headers (Ingo Molnar) - Remove misleading debug messages with --call-graph option (Mengting Zhang) - Revert vmlinux symbol resolution patches for s390x (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge branch 'fixes-v4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys fixes from James Morris: "Notable here is a rewrite of big_key crypto by Jason Donenfeld to address some issues in the original code. From Jason's commit log: "This started out as just replacing the use of crypto/rng with get_random_bytes_wait, so that we wouldn't use bad randomness at boot time. But, upon looking further, it appears that there were even deeper underlying cryptographic problems, and that this seems to have been committed with very little crypto review. So, I rewrote the whole thing, trying to keep to the conventions introduced by the previous author, to fix these cryptographic flaws." There has been positive review of the new code by Eric Biggers and Herbert Xu, and it passes basic testing via the keyutils test suite. Eric also manually tested it. Generally speaking, we likely need to improve the amount of crypto review for kernel crypto users including keys (I'll post a note separately to ksummit-discuss)" * 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/keys: rewrite all of big_key crypto security/keys: properly zero out sensitive key material in big_key KEYS: use kmemdup() in request_key_auth_new() KEYS: restrict /proc/keys by credentials at open time KEYS: reset parent each time before searching key_user_tree KEYS: prevent KEYCTL_READ on negative key KEYS: prevent creating a different user's keyrings KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: fix key refcount leak in keyctl_read_key() KEYS: fix key refcount leak in keyctl_assume_authority() KEYS: don't revoke uninstantiated key in request_key_auth_new() KEYS: fix cred refcount leak in request_key_auth_new()
2017-09-29RDMA/hns: Fix calltrace for sleeping in atomicLijun Ou
We replace usleep_range that was excessively long anyway with udelay to avoid using usleep_range function in spin_lock_bh spin region, thereby avoiding this calltrace: BUG: scheduling while atomic: insmod/1428/0x00000002 Modules linked in: hns-roce-hw-v2(+) hns_roce rdma_ucm rdma_cm iw_cm ib_uverbs ib_cm ib_core CPU: 0 PID: 1428 Comm: insmod Not tainted 4.12.0-rc1-00677-g252e8fd-dirty #43 Hardware name: (null) (DT) Call trace: [<ffff000008089d20>] dump_backtrace+0x0/0x274 [<ffff00000808a068>] show_stack+0x20/0x28 [<ffff00000844ea58>] dump_stack+0x94/0xb4 [<ffff0000080f975c>] __schedule_bug+0x68/0x84 [<ffff000008a988d4>] __schedule+0x5fc/0x70c [<ffff000008a98a24>] schedule+0x40/0xa4 [<ffff000008a9c6f0>] schedule_hrtimeout_range_clock+0x98/0xfc [<ffff000008a9c788>] schedule_hrtimeout_range+0x34/0x40 [<ffff000008a9c098>] usleep_range+0x6c/0x80 [<ffff000000b9ae68>] hns_roce_cmd_send+0xe4/0x264 [hns-roce-hw-v2] [<ffff000000b9b748>] hns_roce_cmd_query_hw_info+0x40/0x60 [hns-roce-hw-v2] [<ffff000000b9b790>] hns_roce_v2_profile+0x28/0x668 [hns-roce-hw-v2] [<ffff000000b6b1f4>] hns_roce_init+0x6c/0x948 [hns-roce-hw-v2] Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Don't unregister a callback we didn't registerLijun Ou
The driver doesn't actually register an inetaddr notifier function, so there is no need to unregister it on shutdown. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Avoid NULL pointer exceptionWei Hu(Xavier)
After the loop in hns_roce_v1_mr_free_work_fn function, it is possible that all qps will have been freed (in which case ne will be 0). If that happens, then later in the function when we dereference hr_qp we will get an exception. Check ne is not 0 to make sure we actually have an hr_qp left to work on. This patch fixes the smatch error as below: drivers/infiniband/hw/hns/hns_roce_hw_v1.c:1009 hns_roce_v1_mr_free_work_fn() error: we previously assumed 'hr_qp' could be null Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Set rdma_ah_attr type for querying qpLijun Ou
When querying qp, It needs to return RoCE device ah_attr type that may be specific to RoCE devices. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Only assign dest_qp if IB_QP_DEST_QPN bit is setLijun Ou
Only when the IB_QP_DEST_QPN flag of attr_mask is set is it valid to assign the dest_qp_num into the dest_qp field of qp context. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Check return value of kzallocWei Hu(Xavier)
When lp_qp_work is NULL, we should return ENOMEM. In order to do so, we had to make some upper layer functions return a value instead of being void type so we can propagate the error up the stack. This patch fixes the smatch error as below: drivers/infiniband/hw/hns/hns_roce_hw_v1.c:918 hns_roce_v1_recreate_lp_qp() error: potential null dereference 'lp_qp_work'. (kzalloc returns null) Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Refactor code for readabilityLijun Ou
Put the code for checking the send doorbell status into a separate function and call it from check_qp_db_process_status to improve indenting and readability. It fixes the warning from static checker: drivers/infiniband/hw/hns/hns_roce_hw_v1.c:3562 check_qp_db_process_status() warn: inconsistent indenting. Fixes: 5f110ac4bed8 ("IB/hns: Fix for checkpatch.pl comment style) Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: Modify the value with rd&dest_rd of qp_attrLijun Ou
The value of max_rd_atomic and max_dest_rd_atomic in query_qp are incorrect. It should be assigned by left shifting of the bit in hip06 SoC. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29arm64: fault: Route pte translation faults via do_translation_faultWill Deacon
We currently route pte translation faults via do_page_fault, which elides the address check against TASK_SIZE before invoking the mm fault handling code. However, this can cause issues with the path walking code in conjunction with our word-at-a-time implementation because load_unaligned_zeropad can end up faulting in kernel space if it reads across a page boundary and runs into a page fault (e.g. by attempting to read from a guard region). In the case of such a fault, load_unaligned_zeropad has registered a fixup to shift the valid data and pad with zeroes, however the abort is reported as a level 3 translation fault and we dispatch it straight to do_page_fault, despite it being a kernel address. This results in calling a sleeping function from atomic context: BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313 in_atomic(): 0, irqs_disabled(): 0, pid: 10290 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144 [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c [<ffffff8e016977f0>] do_page_fault+0x140/0x330 [<ffffff8e01681328>] do_mem_abort+0x54/0xb0 Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0) [...] [<ffffff8e016844fc>] el1_da+0x18/0x78 [<ffffff8e017f399c>] path_parentat+0x44/0x88 [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8 [<ffffff8e017f5044>] filename_create+0x4c/0x128 [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8 [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28 Code: 36380080 d5384100 f9400800 9402566d (d4210000) ---[ end trace 2d01889f2bca9b9f ]--- Fix this by dispatching all translation faults to do_translation_faults, which avoids invoking the page fault logic for faults on kernel addresses. Cc: <stable@vger.kernel.org> Reported-by: Ankit Jain <ankijain@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29arm64: mm: Use READ_ONCE when dereferencing pointer to pte tableWill Deacon
On kernels built with support for transparent huge pages, different CPUs can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk and they must take care to use READ_ONCE to avoid value tearing or caching of stale values by the compiler. Unfortunately, these functions call into our pgtable macros, which don't use READ_ONCE, and compiler caching has been observed to cause the following crash during ext4 writeback: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: [<ffff000008233328>] check_pte+0x20/0x170 [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540 [<ffff000008234adc>] page_mkclean_one+0xac/0x278 [<ffff000008234d98>] rmap_walk_file+0xf0/0x238 [<ffff000008236e74>] rmap_walk+0x64/0xa0 [<ffff0000082370c8>] page_mkclean+0x90/0xa8 [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8 [<ffff00000832f984>] mpage_submit_page+0x34/0x98 [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170 [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8 [<ffff00000833530c>] ext4_writepages+0x484/0xe30 [<ffff0000081f6ab4>] do_writepages+0x44/0xe8 [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110 [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8 [<ffff000008324310>] ext4_sync_file+0x80/0x4b8 [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0 [<ffff0000082332b4>] SyS_msync+0x194/0x1e8 This is because page_vma_mapped_walk loads the PMD twice before calling pte_offset_map: the first time without READ_ONCE (where it gets all zeroes due to a concurrent pmdp_invalidate) and the second time with READ_ONCE (where it sees a valid table pointer due to a concurrent pmd_populate). However, the compiler inlines everything and caches the first value in a register, which is subsequently used in pte_offset_phys which returns a junk pointer that is later dereferenced when attempting to access the relevant pte. This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure that a stale value is not used. Whilst this is a point fix for a known failure (and simple to backport), a full fix moving all of our page table accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in page_vma_mapped_walk is in the works for a future kernel release. Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29iw_cxgb4: add referencing to wait objectsSteve Wise
For messages sent from the host to fw that solicit a reply from fw, the c4iw_wr_wait struct pointer is passed in the host->fw message, and included in the fw->host fw6_msg reply. This allows the sender to wait until the reply is received, and the code processing the ingress reply to wake up the sender. If c4iw_wait_for_reply() times out, however, we need to keep the c4iw_wr_wait object around in case the reply eventually does arrive. Otherwise we have touch-after-free bugs in the wake_up paths. This was hit due to a bad kernel driver that blocked ingress processing of cxgb4 for a long time, causing iw_cxgb4 timeouts, but eventually resuming ingress processing and thus hitting the touch-after-free bug. So I want to fix iw_cxgb4 such that we'll at least keep the wait object around until the reply comes. If it never comes we leak a small amount of memory, but if it does come late, we won't potentially crash the system. So add a kref struct in the c4iw_wr_wait struct, and take a reference before sending a message to FW that will generate a FW6 reply. And remove the reference (and potentially free the wait object) when the reply is processed. The ep code also uses the wr_wait for non FW6 CPL messages and doesn't embed the c4iw_wr_wait object in the message sent to firmware. So for those cases we add c4iw_wake_up_noref(). The mr/mw, cq, and qp object create/destroy paths do need this reference logic. For these paths, c4iw_ref_send_wait() is introduced to take the wr_wait reference, send the msg to fw, and then wait for the reply. So going forward, iw_cxgb4 either uses c4iw_ofld_send(), c4iw_wait_for_reply() and c4iw_wake_up_noref() like is done in the some of the endpoint logic, or c4iw_ref_send_wait() and c4iw_wake_up_deref() (formerly c4iw_wake_up()) when sending messages with the c4iw_wr_wait object pointer embedded in the message and resulting FW6 reply. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29iw_cxgb4: allocate wait object for each ep objectSteve Wise
Remove the embedded c4iw_wr_wait object in preparation for correctly handling timeouts. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29iw_cxgb4: allocate wait object for each qp objectSteve Wise
Remove the local stack allocated c4iw_wr_wait object in preparation for correctly handling timeouts. Also cleaned up some error path unwind logic to make it more readable. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29iw_cxgb4: allocate wait object for each cq objectSteve Wise
Remove the local stack allocated c4iw_wr_wait object in preparation for correctly handling timeouts. Also cleaned up some error path unwind logic to make it more readable. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29iw_cxgb4: allocate wait object for each memory objectSteve Wise
Remove the local stack allocated c4iw_wr_wait object in preparation for correctly handling timeouts. Also refactored some code to simplify it and make errpath unwinding more readable. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/iwpm: Properly mark end of NL messagesShiraz Saleem
Commit 1a1c116f3dcf removes nlmsg_len calculation in ibnl_put_attr causing netlink messages to be rejected due to incorrect length. Add nlmsg_end after all attributes are appended to calculate the nlmsg_len. Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: remove redundant assignment to variable jColin Ian King
Variable j is being assigned to loop_j and then later being assigned to a new value in for loops. The first initialization is therefore redundant and can be removed. Cleans up clang warning: warning: Value stored to 'j' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/hns: make various function static, fixes warningsColin Ian King
The functions hns_roce_table_mhop_get, hns_roce_table_mhop_put, hns_roce_cleanup_mhop_hem_table, hns_roce_v1_post_mbox, hns_roce_cmq_setup_basic_desc, hns_roce_cmq_send, hns_roce_cmq_query_hw_info are all local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: symbol 'hns_roce_table_mhop_get' was not declared. Should it be static? symbol 'hns_roce_table_mhop_put' was not declared. Should it be static? symbol 'hns_roce_cleanup_mhop_hem_table' was not declared. Should it be static? symbol 'hns_roce_v1_post_mbox' was not declared. Should it be static? symbol 'hns_roce_cmq_setup_basic_desc' was not declared. Should it be static? symbol 'hns_roce_cmq_send' was not declared. Should it be static? symbol 'hns_roce_cmq_query_hw_info' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29i40iw: delete some stray tabsDan Carpenter
These lines were indented too far by mistake. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29IB/{ipoib, iser}: Consistent print format of vendor errorAjaykumar Hotchandani
Vendor error print should be consistent across protocols to avoid any confusion. This patch corrects that. Suggested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29IB/qib: Use setup_timer and mod_timerHimanshu Jha
Use setup_timer and mod_timer API instead of structure assignments. This is done using Coccinelle and semantic patch used for this as follows: @@ expression x,y,z,a,b; @@ -init_timer (&x); +setup_timer (&x, y, z); +mod_timer (&a, b); -x.function = y; -x.data = z; -x.expires = b; -add_timer(&a); Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/qedr: Fix rdma_type initializationKalderon, Michal
Initialize the rdma_type (iWARP or RoCE) which is set according to device configuration in qed. Fixes: e6a38c54faf ("RDMA/qedr: Add support for registering an iWARP device") Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29Merge branch 'vnic' into k.o/for-nextDoug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>