summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-15Merge tag 'irqchip-fixes-5.6-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Add workaround for Cavium/Marvell ThunderX unimplemented GIC registers
2020-03-15geneve: move debug check after netdev unregisterFlorian Westphal
The debug check must be done after unregister_netdevice_many() call -- the list_del() for this is done inside .ndo_stop. Fixes: 2843a25348f8 ("geneve: speedup geneve tunnels dismantle") Reported-and-tested-by: <syzbot+68a8ed58e3d17c700de5@syzkaller.appspotmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net/packet: tpacket_rcv: avoid a producer race conditionWillem de Bruijn
PACKET_RX_RING can cause multiple writers to access the same slot if a fast writer wraps the ring while a slow writer is still copying. This is particularly likely with few, large, slots (e.g., GSO packets). Synchronize kernel thread ownership of rx ring slots with a bitmap. Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL while holding the sk receive queue lock. They release this lock before copying and set tp_status to TP_STATUS_USER to release to userspace when done. During copying, another writer may take the lock, also see TP_STATUS_KERNEL, and start writing to the same slot. Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a slot, test and set with the lock held. To release race-free, update tp_status and owner bit as a transaction, so take the lock again. This is the one of a variety of discussed options (see Link below): * instead of a shadow ring, embed the data in the slot itself, such as in tp_padding. But any test for this field may match a value left by userspace, causing deadlock. * avoid the lock on release. This leaves a small race if releasing the shadow slot before setting TP_STATUS_USER. The below reproducer showed that this race is not academic. If releasing the slot after tp_status, the race is more subtle. See the first link for details. * add a new tp_status TP_KERNEL_OWNED to avoid the transactional store of two fields. But, legacy applications may interpret all non-zero tp_status as owned by the user. As libpcap does. So this is possible only opt-in by newer processes. It can be added as an optional mode. * embed the struct at the tail of pg_vec to avoid extra allocation. The implementation proved no less complex than a separate field. The additional locking cost on release adds contention, no different than scaling on multicore or multiqueue h/w. In practice, below reproducer nor small packet tcpdump showed a noticeable change in perf report in cycles spent in spinlock. Where contention is problematic, packet sockets support mitigation through PACKET_FANOUT. And we can consider adding opt-in state TP_KERNEL_OWNED. Easy to reproduce by running multiple netperf or similar TCP_STREAM flows concurrently with `tcpdump -B 129 -n greater 60000`. Based on an earlier patchset by Jon Rosen. See links below. I believe this issue goes back to the introduction of tpacket_rcv, which predates git history. Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.html Suggested-by: Jon Rosen <jrosen@cisco.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jon Rosen <jrosen@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15net: ip_gre: Separate ERSPAN newlink / changelink callbacksPetr Machata
ERSPAN shares most of the code path with GRE and gretap code. While that helps keep the code compact, it is also error prone. Currently a broken userspace can turn a gretap tunnel into a de facto ERSPAN one by passing IFLA_GRE_ERSPAN_VER. There has been a similar issue in ip6gretap in the past. To prevent these problems in future, split the newlink and changelink code paths. Split the ERSPAN code out of ipgre_netlink_parms() into a new function erspan_netlink_parms(). Extract a piece of common logic from ipgre_newlink() and ipgre_changelink() into ipgre_newlink_encap_setup(). Add erspan_newlink() and erspan_changelink(). Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15cxgb4: fix delete filter entry fail in unload pathShahjada Abul Husain
Currently, the hardware TID index is assumed to start from index 0. However, with the following changeset, commit c21939998802 ("cxgb4: add support for high priority filters") hardware TID index can start after the high priority region, which has introduced a regression resulting in remove filters entry failure for cxgb4 unload path. This patch fix that. Fixes: c21939998802 ("cxgb4: add support for high priority filters") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14net: stmmac: platform: Fix misleading interrupt error msgMarkus Fuchs
Not every stmmac based platform makes use of the eth_wake_irq or eth_lpi interrupts. Use the platform_get_irq_byname_optional variant for these interrupts, so no error message is displayed, if they can't be found. Rather print an information to hint something might be wrong to assist debugging on platforms which use these interrupts. Signed-off-by: Markus Fuchs <mklntf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14net/bpfilter: fix dprintf usage for /dev/kmsgBruno Meneguele
The bpfilter UMH code was recently changed to log its informative messages to /dev/kmsg, however this interface doesn't support SEEK_CUR yet, used by dprintf(). As result dprintf() returns -EINVAL and doesn't log anything. However there already had some discussions about supporting SEEK_CUR into /dev/kmsg interface in the past it wasn't concluded. Since the only user of that from userspace perspective inside the kernel is the bpfilter UMH (userspace) module it's better to correct it here instead waiting a conclusion on the interface. Fixes: 36c4357c63f3 ("net: bpfilter: print umh messages to /dev/kmsg") Signed-off-by: Bruno Meneguele <bmeneg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14net_sched: keep alloc_hash updated after hash allocationCong Wang
In commit 599be01ee567 ("net_sched: fix an OOB access in cls_tcindex") I moved cp->hash calculation before the first tcindex_alloc_perfect_hash(), but cp->alloc_hash is left untouched. This difference could lead to another out of bound access. cp->alloc_hash should always be the size allocated, we should update it after this tcindex_alloc_perfect_hash(). Reported-and-tested-by: syzbot+dcc34d54d68ef7d2d53d@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+c72da7b9ed57cde6fca2@syzkaller.appspotmail.com Fixes: 599be01ee567 ("net_sched: fix an OOB access in cls_tcindex") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14net_sched: hold rtnl lock in tcindex_partial_destroy_work()Cong Wang
syzbot reported a use-after-free in tcindex_dump(). This is due to the lack of RTNL in the deferred rcu work. We queue this work with RTNL in tcindex_change(), later, tcindex_dump() is called: fh = tp->ops->get(tp, t->tcm_handle); ... err = tp->ops->change(..., &fh, ...); tfilter_notify(..., fh, ...); but there is nothing to serialize the pending tcindex_partial_destroy_work() with tcindex_dump(). Fix this by simply holding RTNL in tcindex_partial_destroy_work(), so that it won't be called until RTNL is released after tc_new_tfilter() is completed. Reported-and-tested-by: syzbot+653090db2562495901dc@syzkaller.appspotmail.com Fixes: 3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}Pavel Begunkov
Processing links, io_submit_sqe() prepares requests, drops sqes, and passes them with sqe=NULL to io_queue_sqe(). There IOSQE_DRAIN and/or IOSQE_ASYNC requests will go through the same prep, which doesn't expect sqe=NULL and fail with NULL pointer deference. Always do full prepare including io_alloc_async_ctx() for linked requests, and then it can skip the second preparation. Cc: stable@vger.kernel.org # 5.5 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has quite some regression fixes this time. One is also related to watchdogs, we have proper acks from Guenter for them" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: acpi: put device when verifying client fails misc: eeprom: at24: fix regulator underflow i2c: gpio: suppress error on probe defer macintosh: windfarm: fix MODINFO regression i2c: designware-pci: Fix BUG_ON during device removal i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional watchdog: iTCO_wdt: Export vendorsupport
2020-03-14Merge tag 'arc-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Fix __ALIGN_STR and __ALIGN to not use default junk padding - Misc Kconfig cleanups, header updates * tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: define __ALIGN_STR and __ALIGN symbols for ARC ARC: show_regs: reduce lines of output ARC: Replace <linux/clk-provider.h> by <linux/of_clk.h> ARC: fpu: fix randconfig build error reported by 0-day test service ARC: fix some Kconfig typos ARC: Cleanup old Kconfig IO scheduler options
2020-03-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes for x86 and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1 KVM: s390: Also reset registers in sync regs for initial cpu reset KVM: fix Kconfig menu text for -Werror KVM: x86: remove stale comment from struct x86_emulate_ctxt KVM: x86: clear stale x86_emulate_ctxt->intercept value KVM: SVM: Fix the svm vmexit code for WRMSR KVM: X86: Fix dereference null cpufreq policy
2020-03-14iommu/vt-d: Populate debugfs if IOMMUs are detectedMegha Dey
Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel) gets populated only when DMA remapping is enabled (dmar_disabled = 0) irrespective of whether interrupt remapping is enabled or not. Instead, populate the intel iommu debugfs directory if any IOMMUs are detected. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support") Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A small collection of fixes. I'll make another sweep soon to look for more fixes for this -rc series. - Mark device node const in of_clk_get_parent APIs to ease landing changes in users later - Fix flag for Qualcomm SC7180 video clocks where we thought it would never turn off but actually hardware takes care of it - Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because this clk is always on anyway - Correct some bad dt-binding numbers for i.MX8MN SoCs" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8mn: Fix incorrect clock defines clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk of: clk: Make of_clk_get_parent_{count,name}() parameter const
2020-03-14Merge branch 'kvm-null-pointer-fix' into kvm-masterPaolo Bonzini
2020-03-14KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAsVitaly Kuznetsov
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter with EVMCS GPA = 0 the host crashes because the evmcs_gpa != vmx->nested.hv_evmcs_vmptr condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will happen on vmx->nested.hv_evmcs pointer dereference. Another problematic EVMCS ptr value is '-1' but it only causes host crash after nested_release_evmcs() invocation. The problem is exactly the same as with '0', we mistakenly think that the EVMCS pointer hasn't changed and thus nested.hv_evmcs_vmptr is valid. Resolve the issue by adding an additional !vmx->nested.hv_evmcs check to nested_vmx_handle_enlightened_vmptrld(), this way we will always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL and this is supposed to catch all invalid EVMCS GPAs. Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs() to be consistent with initialization where we don't currently set hv_evmcs_vmptr to '-1'. Cc: stable@vger.kernel.org Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14Merge tag 'kvm-s390-master-5.6-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fully do the CPU resets as intended With 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") we clarified the meaning of the reset ioctl to fully reset the CPU and not only the parts that can not be handled by userspace. Turns out that we missed some parts.
2020-03-14irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2Marc Zyngier
Despite the architecture spec requiring that reserved registers in the GIC distributor memory map are RES0 (and thus are not allowed to generate an exception), the Cavium ThunderX (aka TX1) SoC explodes as such: [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GICv3: 128 SPIs implemented [ 0.000000] GICv3: 0 Extended SPIs implemented [ 0.000000] Internal error: synchronous external abort: 96000210 [#1] SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-00035-g3cf6a3d5725f #7956 [ 0.000000] Hardware name: cavium,thunder-88xx (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : __raw_readl+0x0/0x8 [ 0.000000] lr : gic_init_bases+0x110/0x560 [ 0.000000] sp : ffff800011243d90 [ 0.000000] x29: ffff800011243d90 x28: 0000000000000000 [ 0.000000] x27: 0000000000000018 x26: 0000000000000002 [ 0.000000] x25: ffff8000116f0000 x24: ffff000fbe6a2c80 [ 0.000000] x23: 0000000000000000 x22: ffff010fdc322b68 [ 0.000000] x21: ffff800010a7a208 x20: 00000000009b0404 [ 0.000000] x19: ffff80001124dad0 x18: 0000000000000010 [ 0.000000] x17: 000000004d8d492b x16: 00000000f67eb9af [ 0.000000] x15: ffffffffffffffff x14: ffff800011249908 [ 0.000000] x13: ffff800091243ae7 x12: ffff800011243af4 [ 0.000000] x11: ffff80001126e000 x10: ffff800011243a70 [ 0.000000] x9 : 00000000ffffffd0 x8 : ffff80001069c828 [ 0.000000] x7 : 0000000000000059 x6 : ffff8000113fb4d1 [ 0.000000] x5 : 0000000000000001 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : ffff8000116f000c [ 0.000000] Call trace: [ 0.000000] __raw_readl+0x0/0x8 [ 0.000000] gic_of_init+0x188/0x224 [ 0.000000] of_irq_init+0x200/0x3cc [ 0.000000] irqchip_init+0x1c/0x40 [ 0.000000] init_IRQ+0x160/0x1d0 [ 0.000000] start_kernel+0x2ec/0x4b8 [ 0.000000] Code: a8c47bfd d65f03c0 d538d080 d65f03c0 (b9400000) when reading the GICv4.1 GICD_TYPER2 register, which is unexpected... Work around it by adding a new quirk for the following variants: ThunderX: CN88xx OCTEON TX: CN83xx, CN81xx OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx* and use this flag to avoid accessing GICD_TYPER2. Note that all reserved registers (including redistributors and ITS) are impacted by this erratum, but that only GICD_TYPER2 has to be worked around so far. Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Robert Richter <rrichter@marvell.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Tim Harvey <tharvey@gateworks.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/r/20191027144234.8395-11-maz@kernel.org Link: https://lore.kernel.org/r/20200311115649.26060-1-maz@kernel.org
2020-03-14KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirectNitesh Narayan Lal
Previously all fields of structure kvm_lapic_irq were not initialized before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause an issue when any of those fields are used for processing a request. For example not initializing the msi_redir_hint field before passing to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of kvm_apic_map_get_dest_lapic(). This will specifically happen when the kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage value of msi_redir_hint, which should not happen as the request belongs to APIC fixed delivery mode and we do not want to deliver the interrupt only to the lowest priority candidate. This patch initializes all the fields of kvm_lapic_irq based on the values of ioapic redirect_entry object before passing it on to kvm_bitmap_or_dest_vcpus(). Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Set level to false since the value doesn't really matter. Suggested by Vitaly Kuznetsov. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1Sean Christopherson
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1 is not supported[*], i.e. intercepting ENCLS to inject a #UD is unnecessary. Avoiding ENCLS-exiting even when it is reported as supported by the CPU works around a reported issue where SGX is "hard" disabled after an S3 suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control are enumerated as unsupported. While the root cause of the S3 issue is unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a workaround for what is most likely a hardware or firmware issue. As a bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and vmcs02. Note, SGX must be disabled in BIOS to take advantage of this workaround [*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are cleared. Soft disabled meaning disabling SGX without clearing the primary CPUID bit (in leaf 0x7) and without poking into non-SGX CPU paths, e.g. for the VMCS controls. Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest") Reported-by: Toni Spets <toni.spets@iki.fi> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTESuravee Suthikulpanit
Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") accidentally left out the ir_data pointer when calling modity_irte_ga(), which causes the function amd_iommu_update_ga() to return prematurely due to struct amd_ir_data.ref is NULL and the "is_run" bit of IRTE does not get updated properly. This results in bad I/O performance since IOMMU AVIC always generate GA Log entry and notify IOMMU driver and KVM when it receives interrupt from the PCI pass-through device instead of directly inject interrupt to the vCPU. Fixes by passing ir_data when calling modify_irte_ga() as done previously. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14iommu/vt-d: Ignore devices with out-of-spec domain numberDaniel Drake
VMD subdevices are created with a PCI domain ID of 0x10000 or higher. These subdevices are also handled like all other PCI devices by dmar_pci_bus_notifier(). However, when dmar_alloc_pci_notify_info() take records of such devices, it will truncate the domain ID to a u16 value (in info->seg). The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if it is 0000:00:02.0. In the unlucky event that a real device also exists at 0000:00:02.0 and also has a device-specific entry in the DMAR table, dmar_insert_dev_scope() will crash on:   BUG_ON(i >= devices_cnt); That's basically a sanity check that only one PCI device matches a single DMAR entry; in this case we seem to have two matching devices. Fix this by ignoring devices that have a domain number higher than what can be looked up in the DMAR table. This problem was carefully diagnosed by Jian-Hong Pan. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Fixes: 59ce0515cdaf3 ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14iommu/vt-d: Fix the wrong printing in RHSA parsingZhenzhong Duan
When base address in RHSA structure doesn't match base address in each DRHD structure, the base address in last DRHD is printed out. This doesn't make sense when there are multiple DRHD units, fix it by printing the buggy RHSA's base address. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Fixes: fd0c8894893cb ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14kbuild: Disable -Wpointer-to-enum-castNathan Chancellor
Clang's -Wpointer-to-int-cast deviates from GCC in that it warns when casting to enums. The kernel does this in certain places, such as device tree matches to set the version of the device being used, which allows the kernel to avoid using a gigantic union. https://elixir.bootlin.com/linux/v5.5.8/source/drivers/ata/ahci_brcm.c#L428 https://elixir.bootlin.com/linux/v5.5.8/source/drivers/ata/ahci_brcm.c#L402 https://elixir.bootlin.com/linux/v5.5.8/source/include/linux/mod_devicetable.h#L264 To avoid a ton of false positive warnings, disable this particular part of the warning, which has been split off into a separate diagnostic so that the entire warning does not need to be turned off for clang. It will be visible under W=1 in case people want to go about fixing these easily and enabling the warning treewide. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/887 Link: https://github.com/llvm/llvm-project/commit/2a41b31fcdfcb67ab7038fc2ffb606fd50b83a84 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-13Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in drivers: ipr and ufs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ipr: Fix softlockup when rescanning devices in petitboot scsi: ufs: Fix possible unclocked access to auto hibern8 timer register
2020-03-13afs: Fix client call Rx-phase signal handlingDavid Howells
Fix the handling of signals in client rxrpc calls made by the afs filesystem. Ignore signals completely, leaving call abandonment or connection loss to be detected by timeouts inside AF_RXRPC. Allowing a filesystem call to be interrupted after the entire request has been transmitted and an abort sent means that the server may or may not have done the action - and we don't know. It may even be worse than that for older servers. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13afs: Fix handling of an abort from a service handlerDavid Howells
When an AFS service handler function aborts a call, AF_RXRPC marks the call as complete - which means that it's not going to get any more packets from the receiver. This is a problem because reception of the final ACK is what triggers afs_deliver_to_call() to drop the final ref on the afs_call object. Instead, aborted AFS service calls may then just sit around waiting for ever or until they're displaced by a new call on the same connection channel or a connection-level abort. Fix this by calling afs_set_call_complete() to finalise the afs_call struct representing the call. However, we then need to drop the ref that stops the call from being deallocated. We can do this in afs_set_call_complete(), as the work queue is holding a separate ref of its own, but then we shouldn't do it in afs_process_async_call() and afs_delete_async_call(). call->drop_ref is set to indicate that a ref needs dropping for a call and this is dealt with when we transition a call to AFS_CALL_COMPLETE. But then we also need to get rid of the ref that pins an asynchronous client call. We can do this by the same mechanism, setting call->drop_ref for an async client call too. We can also get rid of call->incoming since nothing ever sets it and only one thing ever checks it (futilely). A trace of the rxrpc_call and afs_call struct ref counting looks like: <idle>-0 [001] ..s5 164.764892: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_new_incoming_call+0x473/0xb34 a=00000000442095b5 <idle>-0 [001] .Ns5 164.766001: rxrpc_call: c=00000002 QUE u=4 sp=rxrpc_propose_ACK+0xbe/0x551 a=00000000442095b5 <idle>-0 [001] .Ns4 164.766005: rxrpc_call: c=00000002 PUT u=3 sp=rxrpc_new_incoming_call+0xa3f/0xb34 a=00000000442095b5 <idle>-0 [001] .Ns7 164.766433: afs_call: c=00000002 WAKE u=2 o=11 sp=rxrpc_notify_socket+0x196/0x33c kworker/1:2-1810 [001] ...1 164.768409: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_process_call+0x25/0x7ae a=00000000442095b5 kworker/1:2-1810 [001] ...1 164.769439: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000001 00000000 02 22 ACK CallAck kworker/1:2-1810 [001] ...1 164.769459: rxrpc_call: c=00000002 PUT u=2 sp=rxrpc_process_call+0x74f/0x7ae a=00000000442095b5 kworker/1:2-1810 [001] ...1 164.770794: afs_call: c=00000002 QUEUE u=3 o=12 sp=afs_deliver_to_call+0x449/0x72c kworker/1:2-1810 [001] ...1 164.770829: afs_call: c=00000002 PUT u=2 o=12 sp=afs_process_async_call+0xdb/0x11e kworker/1:2-1810 [001] ...2 164.771084: rxrpc_abort: c=00000002 95786a88:00000008 s=0 a=1 e=1 K-1 kworker/1:2-1810 [001] ...1 164.771461: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000002 00000000 04 00 ABORT CallAbort kworker/1:2-1810 [001] ...1 164.771466: afs_call: c=00000002 PUT u=1 o=12 sp=SRXAFSCB_ProbeUuid+0xc1/0x106 The abort generated in SRXAFSCB_ProbeUuid(), labelled "K-1", indicates that the local filesystem/cache manager didn't recognise the UUID as its own. Fixes: 2067b2b3f484 ("afs: Fix the CB.ProbeUuid service handler to reply correctly") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13afs: Fix some tracing detailsDavid Howells
Fix a couple of tracelines to indicate the usage count after the atomic op, not the usage count before it to be consistent with other afs and rxrpc trace lines. Change the wording of the afs_call_trace_work trace ID label from "WORK" to "QUEUE" to reflect the fact that it's queueing work, not doing work. Fixes: 341f741f04be ("afs: Refcount the afs_call struct") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13rxrpc: Fix sendmsg(MSG_WAITALL) handlingDavid Howells
Fix the handling of sendmsg() with MSG_WAITALL for userspace to round the timeout for when a signal occurs up to at least two jiffies as a 1 jiffy timeout may end up being effectively 0 if jiffies wraps at the wrong time. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13rxrpc: Fix call interruptibility handlingDavid Howells
Fix the interruptibility of kernel-initiated client calls so that they're either only interruptible when they're waiting for a call slot to come available or they're not interruptible at all. Either way, they're not interruptible during transmission. This should help prevent StoreData calls from being interrupted when writeback is in progress. It doesn't, however, handle interruption during the receive phase. Userspace-initiated calls are still interruptable. After the signal has been handled, sendmsg() will return the amount of data copied out of the buffer and userspace can perform another sendmsg() call to continue transmission. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13rxrpc: Abstract out the calculation of whether there's Tx spaceDavid Howells
Abstract out the calculation of there being sufficient Tx buffer space. This is reproduced several times in the rxrpc sendmsg code. Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13Merge tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "These are mostly fscontext fixes, but there is also one that fixes collisions seen in fscache: - Ensure the fs_context has the correct fs_type when mounting and submounting - Fix leaking of ctx->nfs_server.hostname - Add minor version to fscache key to prevent collisions" * tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs: add minor version to nfs_server_key for fscache NFS: Fix leak of ctx->nfs_server.hostname NFS: Don't hard-code the fs_type when submounting NFS: Ensure the fs_context has the correct fs_type before mounting
2020-03-13Merge tag 'fuse-fixes-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "Fix an Oops introduced in v5.4" * tag 'fuse-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix stack use after return
2020-03-13Merge tag 'ovl-fixes-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix three bugs introduced in this cycle" * tag 'ovl-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix lockdep warning for async write ovl: fix some xino configurations ovl: fix lock in ovl_llseek()
2020-03-13btrfs: fix log context list corruption after rename whiteout errorFilipe Manana
During a rename whiteout, if btrfs_whiteout_for_rename() returns an error we can end up returning from btrfs_rename() with the log context object still in the root's log context list - this happens if 'sync_log' was set to true before we called btrfs_whiteout_for_rename() and it is dangerous because we end up with a corrupt linked list (root->log_ctxs) as the log context object was allocated on the stack. After btrfs_rename() returns, any task that is running btrfs_sync_log() concurrently can end up crashing because that linked list is traversed by btrfs_sync_log() (through btrfs_remove_all_log_ctxs()). That results in the same issue that commit e6c617102c7e4 ("Btrfs: fix log context list corruption after rename exchange operation") fixed. Fixes: d4682ba03ef618 ("Btrfs: sync log after logging new name") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-13Merge tag 'pm-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix cpupower utility build failures with -fno-common enabled (Mike Gilbert)" * tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpupower: avoid multiple definition with gcc -fno-common
2020-03-13Merge tag 'io_uring-5.6-2020-03-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix here, improving the RCU callback ordering from last week. After a bit more perusing by Paul, he poked a hole in the original" * tag 'io_uring-5.6-2020-03-13' of git://git.kernel.dk/linux-block: io_uring: ensure RCU callback ordering with rcu_barrier()
2020-03-13Merge tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should go into this release. This contains: - Fix for a corruption issue with the s390 dasd driver (Stefan) - Fixup/improvement for the flush insertion change that we had in this series (Ming) - Fix for the partition suppor for host aware zoned devices (Shin'ichiro) - Fix incorrect blk-iocost comparison (Tejun) The diffstat looks large, but that's a) mostly dasd, and b) the flush fix from Ming adds a big comment" * tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-block: block: Fix partition support for host aware zoned block devices blk-mq: insert flush request to the front of dispatch queue s390/dasd: fix data corruption for thin provisioned devices blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
2020-03-13Merge tag 'mmc-v5.6-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix HW busy detection support for host controllers requiring the MMC_RSP_BUSY response flag (R1B) to be set for the command. In particular for CMD6 (eMMC), erase/trim/discard (SD/eMMC) and CMD5 (eMMC sleep). MMC host: - sdhci-omap|tegra: Fix support for HW busy detection" * tag 'mmc-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard mmc: core: Allow host controllers to require R1B for CMD6
2020-03-13Merge tag 'wireless-drivers-2020-03-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.6 Third, and hopefully last, set of fixes for v5.6. iwlwifi * fix a locking issue in time events handling * a fix in rate-scaling * fix for a potential NULL pointer deref * enable antenna diversity in some devices that were erroneously not doing it * allow FW dumps to continue when the FW is stuck * a fix in the HE capabilities handling * another fix for FW dumps where we were reading wrong addresses * fix link in MAINTAINERS file rtlwifi * fix regression causing connect issues in v5.4 wlcore * remove merge damage which luckily didn't have any impact on functionality ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-03-12 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 8 day(s) which contain a total of 12 files changed, 161 insertions(+), 15 deletions(-). The main changes are: 1) Andrii fixed two bugs in cgroup-bpf. 2) John fixed sockmap. 3) Luke fixed x32 jit. 4) Martin fixed two issues in struct_ops. 5) Yonghong fixed bpf_send_signal. 6) Yoshiki fixed BTF enum. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13afs: Use kfree_rcu() instead of casting kfree() to rcu_callback_tJann Horn
afs_put_addrlist() casts kfree() to rcu_callback_t. Apart from being wrong in theory, this might also blow up when people start enforcing function types via compiler instrumentation, and it means the rcu_head has to be first in struct afs_addr_list. Use kfree_rcu() instead, it's simpler and more correct. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-13Merge tag 'at24-fixes-for-v5.6-rc6' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current at24 fixes for v5.6-rc6 - fix regulator underflow bug introduced during the v5.6 merge window
2020-03-13ovl: fix lockdep warning for async writeMiklos Szeredi
Lockdep reports "WARNING: lock held when returning to user space!" due to async write holding freeze lock over the write. Apparently aio.c already deals with this by lying to lockdep about the state of the lock. Do the same here. No need to check for S_IFREG() here since these file ops are regular-only. Reported-by: syzbot+9331a354f4f624a52a55@syzkaller.appspotmail.com Fixes: 2406a307ac7d ("ovl: implement async IO routines") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-13ovl: fix some xino configurationsAmir Goldstein
Fix up two bugs in the coversion to xino_mode: 1. xino=off does not always end up in disabled mode 2. xino=auto on 32bit arch should end up in disabled mode Take a proactive approach to disabling xino on 32bit kernel: 1. Disable XINO_AUTO config during build time 2. Disable xino with a warning on mount time As a by product, xino=on on 32bit arch also ends up in disabled mode. We never intended to enable xino on 32bit arch and this will make the rest of the logic simpler. Fixes: 0f831ec85eda ("ovl: simplify ovl_same_sb() helper") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-13ARM: dts: dra7: Add bus_dma_limit for L3 busRoger Quadros
The L3 interconnect's memory map is from 0x0 to 0xffffffff. Out of this, System memory (SDRAM) can be accessed from 0x80000000 to 0xffffffff (2GB) DRA7 does support 4GB of SDRAM but upper 2GB can only be accessed by the MPU subsystem. Add the dma-ranges property to reflect the physical address limit of the L3 bus. Issues ere observed only with SATA on DRA7-EVM with 4GB RAM and CONFIG_ARM_LPAE enabled. This is because the controller supports 64-bit DMA and its driver sets the dma_mask to 64-bit thus resulting in DMA accesses beyond L3 limit of 2G. Setting the correct bus_dma_limit fixes the issue. Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: stable@kernel.org Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-03-13drm/bochs: downgrade pci_request_region failure from error to warningGerd Hoffmann
Shutdown of firmware framebuffer has a bunch of problems. Because of this the framebuffer region might still be reserved even after drm_fb_helper_remove_conflicting_pci_framebuffers() returned. Don't consider pci_request_region() failure for the framebuffer region as fatal error to workaround this issue. Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: http://patchwork.freedesktop.org/patch/msgid/20200313084152.2734-1-kraxel@redhat.com
2020-03-13IB/rdmavt: Free kernel completion queue when doneKaike Wan
When a kernel ULP requests the rdmavt to create a completion queue, it allocated the queue and set cq->kqueue to point to it. However, when the completion queue is destroyed, cq->queue is freed instead, leading to a memory leak: https://lore.kernel.org/r/215235485.15264050.1583334487658.JavaMail.zimbra@redhat.com unreferenced object 0xffffc90006639000 (size 12288): comm "kworker/u128:0", pid 8, jiffies 4295777598 (age 589.085s) hex dump (first 32 bytes): 4d 00 00 00 4d 00 00 00 00 c0 08 ac 8b 88 ff ff M...M........... 00 00 00 00 80 00 00 00 00 00 00 00 10 00 00 00 ................ backtrace: [<0000000035a3d625>] __vmalloc_node_range+0x361/0x720 [<000000002942ce4f>] __vmalloc_node.constprop.30+0x63/0xb0 [<00000000f228f784>] rvt_create_cq+0x98a/0xd80 [rdmavt] [<00000000b84aec66>] __ib_alloc_cq_user+0x281/0x1260 [ib_core] [<00000000ef3764be>] nvme_rdma_cm_handler+0xdb7/0x1b80 [nvme_rdma] [<00000000936b401c>] cma_cm_event_handler+0xb7/0x550 [rdma_cm] [<00000000d9c40b7b>] addr_handler+0x195/0x310 [rdma_cm] [<00000000c7398a03>] process_one_req+0xdd/0x600 [ib_core] [<000000004d29675b>] process_one_work+0x920/0x1740 [<00000000efedcdb5>] worker_thread+0x87/0xb40 [<000000005688b340>] kthread+0x327/0x3f0 [<0000000043a168d6>] ret_from_fork+0x3a/0x50 This patch fixes the issue by freeing cq->kqueue instead. Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory") Link: https://lore.kernel.org/r/20200313123957.14343.43879.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> # 5.4.x Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13x86/vector: Remove warning on managed interrupt migrationPeter Xu
The vector management code assumes that managed interrupts cannot be migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE() which triggers when a managed interrupt vector association on a online CPU is cleared. The CPU offline code uses a different mechanism which cannot trigger this. This assumption is not longer correct because the new CPU isolation feature which affects the placement of managed interrupts must be able to move a managed interrupt away from an online CPU. There are two reasons why this can happen: 1) When the interrupt is activated the affinity mask which was established in irq_create_affinity_masks() is handed in to the vector allocation code. This mask contains all CPUs to which the interrupt can be made affine to, but this does not take the CPU isolation 'managed_irq' mask into account. When the interrupt is finally requested by the device driver then the affinity is checked again and the CPU isolation 'managed_irq' mask is taken into account, which moves the interrupt to a non-isolated CPU if possible. 2) The interrupt can be affine to an isolated CPU because the non-isolated CPUs in the calculated affinity mask are not online. Once a non-isolated CPU which is in the mask comes online the interrupt is migrated to this non-isolated CPU In both cases the regular online migration mechanism is used which triggers the WARN_ON_ONCE() in free_moved_vector(). Case #1 could have been addressed by taking the isolation mask into account, but that would require a massive code change in the activation logic and the eventual migration event was accepted as a reasonable tradeoff when the isolation feature was developed. But even if #1 would be addressed, #2 would still trigger it. Of course the warning in free_moved_vector() was overlooked at that time and the above two cases which have been discussed during patch review have obviously never been tested before the final submission. So keep it simple and remove the warning. [ tglx: Rewrote changelog and added a comment to free_moved_vector() ] Fixes: 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com