summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-08-10perf/x86: Fix RDPMC vs. mm_struct trackingPeter Zijlstra
Vince reported the following rdpmc() testcase failure: > Failing test case: > > fd=perf_event_open(); > addr=mmap(fd); > exec() // without closing or unmapping the event > fd=perf_event_open(); > addr=mmap(fd); > rdpmc() // GPFs due to rdpmc being disabled The problem is of course that exec() plays tricks with what is current->mm, only destroying the old mappings after having installed the new mm. Fix this confusion by passing along vma->vm_mm instead of relying on current->mm. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Link: http://lkml.kernel.org/r/20170802173930.cstykcqefmqt7jau@hirez.programming.kicks-ass.net [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10nvmet_fc: add defer_req callback for deferment of cmd buffer returnJames Smart
At queue creation, the transport allocates a local job struct (struct nvmet_fc_fcp_iod) for each possible element of the queue. When a new CMD is received from the wire, a jobs struct is allocated from the queue and then used for the duration of the command. The job struct contains buffer space for the wire command iu. Thus, upon allocation of the job struct, the cmd iu buffer is copied to the job struct and the LLDD may immediately free/reuse the CMD IU buffer passed in the call. However, in some circumstances, due to the packetized nature of FC and the api of the FC LLDD which may issue a hw command to send the wire response, but the LLDD may not get the hw completion for the command and upcall the nvmet_fc layer before a new command may be asynchronously received on the wire. In other words, its possible for the initiator to get the response from the wire, thus believe a command slot free, and send a new command iu. The new command iu may be received by the LLDD and passed to the transport before the LLDD had serviced the hw completion and made the teardown calls for the original job struct. As such, there is no available job struct available for the new io. E.g. it appears like the host sent more queue elements than the queue size. It didn't based on it's understanding. Rather than treat this as a hard connection failure queue the new request until the job struct does free up. As the buffer isn't copied as there's no job struct, a special return value must be returned to the LLDD to signify to hold off on recycling the cmd iu buffer. And later, when a job struct is allocated and the buffer copied, a new LLDD callback is introduced to notify the LLDD and allow it to recycle it's command iu buffer. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-10Merge tag 'drm-misc-fixes-2017-08-08' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-fixes Core Changes: - dma-buf: Allow multiple sync_files to wrap a single dma-fence (Chris) Driver Changes: - rockchip: misc fixes to vop driver from the downstream rockchip tree (Mark) - Error path cleanups to tc358767 & host1x (Lucas & Paul, respectively) * tag 'drm-misc-fixes-2017-08-08' of git://anongit.freedesktop.org/git/drm-misc: drm/rockchip: vop: report error when check resource error drm/rockchip: vop: round_up pitches to word align drm/rockchip: vop: fix NV12 video display error drm/rockchip: vop: fix iommu page fault when resume dma-buf/sync_file: Allow multiple sync_files to wrap a single dma-fence drm/bridge: tc358767: fix probe without attached output node
2017-08-09Merge tag 'pinctrl-v4.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "These are the pin control fixes I have gathered since the return from my vacation. They boiled in -next a while so let's get them in. Apart from the documentation build it is purely driver fixes. Which is nice. The Intel fixes seem kind of important. - Fix the documentation build as the docs were moved - Correct the UART pin list on the Intel Merrifield - Fix pin assignment and number of pins on the Marvell Armada 37xx pin controller - Cover the Setzer models in the Chromebook DMI quirk in the Intel cheryview driver so they start working - Add the missing "sim" function to the sunxi driver - Fix USB pin definitions on Uniphier Pro4 - Smatch fix for invalid reference in the zx pin control driver" * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: generic: update references to Documentation/pinctrl.txt pinctrl: intel: merrifield: Correct UART pin lists pinctrl: armada-37xx: Fix number of pin in south bridge pinctrl: armada-37xx: Fix the pin 23 on south bridge pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver pinctrl: uniphier: fix USB3 pin assignment for Pro4 pinctrl: zte: fix dereference of 'data' in zx_set_mux()
2017-08-09Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "The main thing is to allow empty id_tables for ACPI to make some drivers get probed again. It looks a bit bigger than usual because it needs some internal renaming, too. Other than that, there is a fix for broken DSTDs, a super simple enablement for ARM MPS, and two documentation fixes which I'd like to see in v4.13 already" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: rephrase explanation of I2C_CLASS_DEPRECATED i2c: allow i2c-versatile for ARM MPS platforms i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz i2c: designware: Print clock freq on invalid clock freq error i2c: core: Allow empty id_table in ACPI case as well i2c: mux: pinctrl: mention correct module name in Kconfig help text
2017-08-09md/raid6: implement recovery using ARM NEON intrinsicsArd Biesheuvel
Provide a NEON accelerated implementation of the recovery algorithm, which supersedes the default byte-by-byte one. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "The pull requests are getting smaller, that's progress I suppose :-) 1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi. 2) Fix remote checksum handling in VXLAN and GUE tunneling drivers, from Koichiro Den. 3) Missing u64_stats_init() calls in several drivers, from Florian Fainelli. 4) TCP can set the congestion window to an invalid ssthresh value after congestion window reductions, from Yuchung Cheng. 5) Fix BPF jit branch generation on s390, from Daniel Borkmann. 6) Correct MIPS ebpf JIT merge, from David Daney. 7) Correct byte order test in BPF test_verifier.c, from Daniel Borkmann. 8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins. 9) Handle SCTP checksums properly in mlx4 driver, from Davide Caratti. 10) We can potentially enter tcp_connect() with a cached route already, due to fastopen, so we have to explicitly invalidate it. 11) skb_warn_bad_offload() can bark in legitimate situations, fix from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) net: avoid skb_warn_bad_offload false positives on UFO qmi_wwan: fix NULL deref on disconnect ppp: fix xmit recursion detection on ppp channels rds: Reintroduce statistics counting tcp: fastopen: tcp_connect() must refresh the route net: sched: set xt_tgchk_param par.net properly in ipt_init_target net: dsa: mediatek: add adjust link support for user ports net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()' hysdn: fix to a race condition in put_log_buffer s390/qeth: fix L3 next-hop in xmit qeth hdr asix: Fix small memory leak in ax88772_unbind() asix: Ensure asix_rx_fixup_info members are all reset asix: Add rx->ax_skb = NULL after usbnet_skb_return() bpf: fix selftest/bpf/test_pkt_md_access on s390x netvsc: fix race on sub channel creation bpf: fix byte order test in test_verifier xgene: Always get clk source, but ignore if it's missing for SGMII ports MIPS: Add missing file for eBPF JIT. bpf, s390: fix build for libbpf and selftest suite ...
2017-08-08block: Add rdma affinity based queue mapping helperSagi Grimberg
Like pci and virtio, we add a rdma helper for affinity spreading. This achieves optimal mq affinity assignments according to the underlying rdma device affinity maps. Reviewed-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: move affinity hints assignments to generic codeSagi Grimberg
generic api takes care of spreading affinity similar to what mlx5 open coded (and even handles better asymmetric configurations). Ask the generic API to spread affinity for us, and feed him pre_vectors that do not participate in affinity settings (which is an improvement to what we had before). The affinity assignments should match what mlx5 tried to do earlier but now we do not set affinity to async, cmd and pages dedicated vectors. Also, remove mlx5e_get_cpu and introduce mlx5e_get_node (used for allocation purposes) and mlx5_get_vector_affinity (for indirection table construction) as they provide the needed information. Luckily, we have generic helpers to get cpumask and node given a irq vector. mlx5_get_vector_affinity will be used by mlx5_ib in a subsequent patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: convert to generic pci_alloc_irq_vectorsSagi Grimberg
Now that we have a generic code to allocate an array of irq vectors and even correctly spread their affinity, correctly handle cpu hotplug events and more, were much better off using it. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Third set of -rc fixes for 4.13 cycle - small set of miscellanous fixes - a reasonably sizable set of IPoIB fixes that deal with multiple long standing issues" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/hns: checking for IS_ERR() instead of NULL RDMA/mlx5: Fix existence check for extended address vector IB/uverbs: Fix device cleanup RDMA/uverbs: Prevent leak of reserved field IB/core: Fix race condition in resolving IP to MAC IB/ipoib: Notify on modify QP failure only when relevant Revert "IB/core: Allow QP state transition from reset to error" IB/ipoib: Remove double pointer assigning IB/ipoib: Clean error paths in add port IB/ipoib: Add get statistics support to SRIOV VF IB/ipoib: Add multicast packets statistics IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp IB/ipoib: Make sure no in-flight joins while leaving that mcast IB/ipoib: Use cancel_delayed_work_sync when needed IB/ipoib: Fix race between light events and interface restart
2017-08-08Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, one re-fix of a previous fix and five patches sorting out hotplug in the bnx2X class of drivers. The latter is rather involved, but necessary because these drivers have started dropping lockdep recursion warnings on the hotplug lock because of its conversion to a percpu rwsem" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sg: only check for dxfer_len greater than 256M scsi: aacraid: reading out of bounds scsi: qedf: Limit number of CQs scsi: bnx2i: Simplify cpu hotplug code scsi: bnx2fc: Simplify CPU hotplug code scsi: bnx2i: Prevent recursive cpuhotplug locking scsi: bnx2fc: Prevent recursive cpuhotplug locking scsi: bnx2fc: Plug CPU hotplug race
2017-08-08cpufreq: Simplify cpufreq_can_do_remote_dvfs()Rafael J. Wysocki
The if () in cpufreq_can_do_remote_dvfs() is superfluous, so drop it and simply return the value of the expression under it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-07Merge tag 'for-linus-20170807' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Brian Norris: "I missed getting these out for rc4, but here are some MTD fixes. Just NAND fixes (in both the core handling, and a few drivers). Notes stolen from Boris: Core fixes: - fix data interface setup for ONFI NANDs that do not support the SET FEATURES command - fix a kernel doc header - fix potential integer overflow when retrieving timing information from the parameter page - fix wrong OOB layout for small page NANDs Driver fixes: - fix potential division-by-zero bug - fix backward compat with old atmel-nand DT bindings - fix ->setup_data_interface() in the atmel NAND driver" * tag 'for-linus-20170807' of git://git.infradead.org/linux-mtd: mtd: nand: atmel: Fix EDO mode check mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES mtd: nand: Fix a docs build warning mtd: nand: sunxi: fix potential divide-by-zero error nand: fix wrong default oob layout for small pages using soft ecc mtd: nand: atmel: Fix DT backward compatibility in pmecc.c
2017-08-07pinctrl: generic: update references to Documentation/pinctrl.txtLudovic Desroches
Update deprecated references to Documentation/pinctrl.txt since it has been moved to Documentation/driver-api/pinctl.rst. Signed-off-by: Ludovic Desroches <ludovic.desroches@o2linux.fr> Fixes: 5a9b73832e9e ("pinctrl.txt: move it to the driver-api book") Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-08-05ACPI / PM: Prefer suspend-to-idle over S3 on some systemsRafael J. Wysocki
Modify the ACPI system sleep support setup code to select suspend-to-idle as the default system sleep state if (1) the ACPI_FADT_LOW_POWER_S0 flag is set in the FADT and (2) the Low Power Idle S0 _DSM interface has been discovered and (3) the default sleep state was not selected from the kernel command line. The main motivation for this change is that systems where the (1) and (2) conditions are met typically ship with OSes that don't exercise the S3 path in the platform firmware which remains untested and turns out to be non-functional at least in some cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2017-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" KVM: nVMX: mark vmcs12 pages dirty on L2 exit kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown KVM: nVMX: do not pin the VMCS12 KVM: avoid using rcu_dereference_protected KVM: X86: init irq->level in kvm_pv_kick_cpu_op KVM: X86: Fix loss of pending INIT due to race KVM: async_pf: make rcu irq exit if not triggered from idle task KVM: nVMX: fixes to nested virt interrupt injection KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 KVM: arm/arm64: Handle hva aging while destroying the vm KVM: arm/arm64: PMU: Fix overflow interrupt injection KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
2017-08-04RDMA/mlx5: Fix existence check for extended address vectorLeon Romanovsky
The extended address vector is the highest bit in be32 variable, but it was compared with the lowest. This patch fixes the endianness of that check and removes already declared define. Fixes: 17d2f88f92ce ("IB/mlx5: Add ODP atomics support") Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04Merge tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A bunch of fixes and follow-ups for -rc1 Luminous patches: issues with ->reencode_message() and last minute RADOS semantic changes in v12.1.2" * tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client: libceph: make RECOVERY_DELETES feature create a new interval libceph: upmap semantic changes crush: assume weight_set != null imples weight_set_size > 0 libceph: fallback for when there isn't a pool-specific choose_arg libceph: don't call ->reencode_message() more than once per message libceph: make encode_request_*() work with r_mempool requests
2017-08-03Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fixes from Alex Williamson: - SPAPR/EEH config build fix (Murilo Opsfelder Araujo) - Fix possible device lock deadlock (Alex Williamson) - Correctly size integrated endpoint PCIe capabilities (Alex Williamson) * tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio: vfio/pci: Fix handling of RC integrated endpoint PCIe capability size vfio/pci: Use pci_try_reset_function() on initial open include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
2017-08-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 fixes" [ This does not merge the "fortify: use WARN instead of BUG for now" patch, which needs a bit of extra work to build cleanly with all configurations. Arnd is on it. - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2: don't clear SGID when inheriting ACLs mm: allow page_cache_get_speculative in interrupt context userfaultfd: non-cooperative: flush event_wqh at release time ipc: add missing container_of()s for randstruct cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() userfaultfd_zeropage: return -ENOSPC in case mm has gone mm: take memory hotplug lock within numa_zonelist_order_handler() mm/page_io.c: fix oops during block io poll in swapin path zram: do not free pool->size_class kthread: fix documentation build warning kasan: avoid -Wmaybe-uninitialized warning userfaultfd: non-cooperative: notify about unmap of destination during mremap mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries pid: kill pidhash_size in pidhash_init() mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
2017-08-03HID: multitouch: Support Asus T304UA media keysJoão Paulo Rechi Vita
The Asus T304UA convertible sports a magnetic detachable keyboard with touchpad, which is connected over USB. Most of the keyboard hotkeys are exposed through the same USB interface as the touchpad, defined in the report descriptor as follows: 0x06, 0x31, 0xFF, // Usage Page (Vendor Defined 0xFF31) 0x09, 0x76, // Usage (0x76) 0xA1, 0x01, // Collection (Application) 0x05, 0xFF, // Usage Page (Reserved 0xFF) 0x85, 0x5A, // Report ID (90) 0x19, 0x00, // Usage Minimum (0x00) 0x2A, 0xFF, 0x00, // Usage Maximum (0xFF) 0x15, 0x00, // Logical Minimum (0) 0x26, 0xFF, 0x00, // Logical Maximum (255) 0x75, 0x08, // Report Size (8) 0x95, 0x0F, // Report Count (15) 0xB1, 0x02, // Feature (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position,Non-volatile) 0x05, 0xFF, // Usage Page (Reserved 0xFF) 0x85, 0x5A, // Report ID (90) 0x19, 0x00, // Usage Minimum (0x00) 0x2A, 0xFF, 0x00, // Usage Maximum (0xFF) 0x15, 0x00, // Logical Minimum (0) 0x26, 0xFF, 0x00, // Logical Maximum (255) 0x75, 0x08, // Report Size (8) 0x95, 0x02, // Report Count (2) 0x81, 0x02, // Input (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position) 0xC0, // End Collection This UsagePage is declared as a variable, but we need to treat it as an array to be able to map each Usage we care about to its corresponding input key. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-08-02Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Two fixes from Trond this time, now that he's back from his vacation. The first is a stable fix for the EXCHANGE_ID issue on the mailing list, and the other fixes a double-free situation that he found at the same time. Stable fix: - Fix EXCHANGE_ID corrupt verifier issue Other fix: - Fix double frees in nfs4_test_session_trunk()" * tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Fix double frees in nfs4_test_session_trunk() NFSv4: Fix EXCHANGE_ID corrupt verifier issue
2017-08-02mm: allow page_cache_get_speculative in interrupt contextKan Liang
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI handler. The bug was introduced by commit 2947ba054a4d ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"). The original x86 __get_user_page_fast used plain get_page() or page_ref_add(). However, the generic __get_user_page_fast uses page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()). There is no reason to prevent page_cache_get_speculative from using in interrupt context. According to the author, putting a BUG_ON there is just because the code is not verifying correctness of interrupt races. I did some tests in interrupt context. There is no issue found. Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative(). Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com Fixes: 2947ba054a4d ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation") Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ying Huang <ying.huang@intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()Dima Zavin
In codepaths that use the begin/retry interface for reading mems_allowed_seq with irqs disabled, there exists a race condition that stalls the patch process after only modifying a subset of the static_branch call sites. This problem manifested itself as a deadlock in the slub allocator, inside get_any_partial. The loop reads mems_allowed_seq value (via read_mems_allowed_begin), performs the defrag operation, and then verifies the consistency of mem_allowed via the read_mems_allowed_retry and the cookie returned by xxx_begin. The issue here is that both begin and retry first check if cpusets are enabled via cpusets_enabled() static branch. This branch can be rewritted dynamically (via cpuset_inc) if a new cpuset is created. The x86 jump label code fully synchronizes across all CPUs for every entry it rewrites. If it rewrites only one of the callsites (specifically the one in read_mems_allowed_retry) and then waits for the smp_call_function(do_sync_core) to complete while a CPU is inside the begin/retry section with IRQs off and the mems_allowed value is changed, we can hang. This is because begin() will always return 0 (since it wasn't patched yet) while retry() will test the 0 against the actual value of the seq counter. The fix is to use two different static keys: one for begin (pre_enable_key) and one for retry (enable_key). In cpuset_inc(), we first bump the pre_enable key to ensure that cpuset_mems_allowed_begin() always return a valid seqcount if are enabling cpusets. Similarly, when disabling cpusets via cpuset_dec(), we first ensure that callers of cpuset_mems_allowed_retry() will start ignoring the seqcount value before we let cpuset_mems_allowed_begin() return 0. The relevant stack traces of the two stuck threads: CPU: 1 PID: 1415 Comm: mkdir Tainted: G L 4.9.36-00104-g540c51286237 #4 Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017 task: ffff8817f9c28000 task.stack: ffffc9000ffa4000 RIP: smp_call_function_many+0x1f9/0x260 Call Trace: smp_call_function+0x3b/0x70 on_each_cpu+0x2f/0x90 text_poke_bp+0x87/0xd0 arch_jump_label_transform+0x93/0x100 __jump_label_update+0x77/0x90 jump_label_update+0xaa/0xc0 static_key_slow_inc+0x9e/0xb0 cpuset_css_online+0x70/0x2e0 online_css+0x2c/0xa0 cgroup_apply_control_enable+0x27f/0x3d0 cgroup_mkdir+0x2b7/0x420 kernfs_iop_mkdir+0x5a/0x80 vfs_mkdir+0xf6/0x1a0 SyS_mkdir+0xb7/0xe0 entry_SYSCALL_64_fastpath+0x18/0xad ... CPU: 2 PID: 1 Comm: init Tainted: G L 4.9.36-00104-g540c51286237 #4 Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017 task: ffff8818087c0000 task.stack: ffffc90000030000 RIP: int3+0x39/0x70 Call Trace: <#DB> ? ___slab_alloc+0x28b/0x5a0 <EOE> ? copy_process.part.40+0xf7/0x1de0 __slab_alloc.isra.80+0x54/0x90 copy_process.part.40+0xf7/0x1de0 copy_process.part.40+0xf7/0x1de0 kmem_cache_alloc_node+0x8a/0x280 copy_process.part.40+0xf7/0x1de0 _do_fork+0xe7/0x6c0 _raw_spin_unlock_irq+0x2d/0x60 trace_hardirqs_on_caller+0x136/0x1d0 entry_SYSCALL_64_fastpath+0x5/0xad do_syscall_64+0x27/0x350 SyS_clone+0x19/0x20 do_syscall_64+0x60/0x350 entry_SYSCALL64_slow_path+0x25/0x25 Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com Fixes: 46e700abc44c ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled") Signed-off-by: Dima Zavin <dmitriyz@waymo.com> Reported-by: Cliff Spradlin <cspradlin@waymo.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christopher Lameter <cl@linux.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02kthread: fix documentation build warningJonathan Corbet
The kerneldoc comment for kthread_create() had an incorrect argument name, leading to a warning in the docs build. Correct it, and make one more small step toward a warning-free build. Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02mm, mprotect: flush TLB if potentially racing with a parallel reclaim ↵Mel Gorman
leaving stale TLB entries Nadav Amit identified a theoritical race between page reclaim and mprotect due to TLB flushes being batched outside of the PTL being held. He described the race as follows: CPU0 CPU1 ---- ---- user accesses memory using RW PTE [PTE now cached in TLB] try_to_unmap_one() ==> ptep_get_and_clear() ==> set_tlb_ubc_flush_pending() mprotect(addr, PROT_READ) ==> change_pte_range() ==> [ PTE non-present - no flush ] user writes using cached RW PTE ... try_to_unmap_flush() The same type of race exists for reads when protecting for PROT_NONE and also exists for operations that can leave an old TLB entry behind such as munmap, mremap and madvise. For some operations like mprotect, it's not necessarily a data integrity issue but it is a correctness issue as there is a window where an mprotect that limits access still allows access. For munmap, it's potentially a data integrity issue although the race is massive as an munmap, mmap and return to userspace must all complete between the window when reclaim drops the PTL and flushes the TLB. However, it's theoritically possible so handle this issue by flushing the mm if reclaim is potentially currently batching TLB flushes. Other instances where a flush is required for a present pte should be ok as either the page lock is held preventing parallel reclaim or a page reference count is elevated preventing a parallel free leading to corruption. In the case of page_mkclean there isn't an obvious path that userspace could take advantage of without using the operations that are guarded by this patch. Other users such as gup as a race with reclaim looks just at PTEs. huge page variants should be ok as they don't race with reclaim. mincore only looks at PTEs. userfault also should be ok as if a parallel reclaim takes place, it will either fault the page back in or read some of the data before the flush occurs triggering a fault. Note that a variant of this patch was acked by Andy Lutomirski but this was for the x86 parts on top of his PCID work which didn't make the 4.13 merge window as expected. His ack is dropped from this version and there will be a follow-on patch on top of PCID that will include his ack. [akpm@linux-foundation.org: tweak comments] [akpm@linux-foundation.org: fix spello] Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> [v4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02KVM: avoid using rcu_dereference_protectedPaolo Bonzini
During teardown, accesses to memslots and buses are using rcu_dereference_protected with an always-true condition because these accesses are done outside the usual mutexes. This is because the last reference is gone and there cannot be any concurrent modifications, but rcu_dereference_protected is ugly and unobvious. Instead, check the refcount in kvm_get_bus and __kvm_memslots. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) supportInbar Karmy
Currently when WoL is supported but disabled, ethtool reports: "Supports Wake-on: d". Fix the indication of Wol support, so that the indication remains "g" all the time if the NIC supports WoL. Tested: As accepted, when NIC supports WoL- ethtool reports: Supports Wake-on: g Wake-on: d when NIC doesn't support WoL- ethtool reports: Supports Wake-on: d Wake-on: d Fixes: 14c07b1358ed ("mlx4: Wake on LAN support") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02HID: input: map digitizer battery usageDmitry Torokhov
We already mapped battery strength reports from the generic device control page, but we did not update capacity from input reports, nor we mapped the battery strength report from the digitizer page, so let's implement this now. Batteries driven by the input reports will now start in "unknown" state, and will get updated once we receive first report containing battery strength from the device. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-08-02mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflowBoris Brezillon
All timings in nand_sdr_timings are expressed in picoseconds but some of them may not fit in an u32. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: 204e7ecd47e2 ("mtd: nand: Add a few more timings to nand_sdr_timings") Reported-by: Alexander Dahl <ada@thorsis.com> Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Dahl <ada@thorsis.com> Tested-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-08-01PCI: Add pci_reset_function_locked()Marc Zyngier
The implementation of PCI workarounds may require that the device is reset from its probe function. This implies that the PCI device lock is already held, and makes calling pci_reset_function() impossible (since it will itself try to take that lock). Add pci_reset_function_locked(), which is the equivalent of pci_reset_function(), except that it requires the PCI device lock to be already held by the caller. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [bhelgaas: folded in fix for conflict with 52354b9d1f46 ("PCI: Remove __pci_dev_reset() and pci_dev_reset()")] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # 4.11: 52354b9d1f46: PCI: Remove __pci_dev_reset() and pci_dev_reset() Cc: stable@vger.kernel.org # 4.11
2017-08-01ptp: introduce ptp auxiliary workerGrygorii Strashko
Many PTP drivers required to perform some asynchronous or periodic work, like periodically handling PHC counter overflow or handle delayed timestamp for RX/TX network packets. In most of the cases, such work is implemented using workqueues. Unfortunately, Kernel workqueues might introduce significant delay in work scheduling under high system load and on -RT, which could cause misbehavior of PTP drivers due to internal counter overflow, for example, and there is no way to tune its execution policy and priority manuallly. Hence, The kthread_worker can be used insted of workqueues, as it create separte named kthread for each worker and its its execution policy and priority can be configured using chrt tool. This prblem was reported for two drivers TI CPSW CPTS and dp83640, so instead of modifying each of these driver it was proposed to add PTP auxiliary worker to the PHC subsystem. The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker and kthread_delayed_work and introduces two new PHC subsystem APIs: - long (*do_aux_work)(struct ptp_clock_info *ptp) callback in ptp_clock_info structure, which driver should assign if it require to perform asynchronous or periodic work. Driver should return the delay of the PTP next auxiliary work scheduling time (>=0) or negative value in case further scheduling is not required. - int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which allows schedule PTP auxiliary work. The name of kthread_worker thread corresponds PTP PHC device name "ptp%d". Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01x86/intel_rdt/cqm: Add tasks file supportVikas Shivappa
The root directory, ctrl_mon and monitor groups are populated with a read/write file named "tasks". When read, it shows all the task IDs assigned to the resource group. Tasks can be added to groups by writing the PID to the file. A task can be present in one "ctrl_mon" group "and" one "monitor" group. IOW a PID_x can be seen in a ctrl_mon group and a monitor group at the same time. When a task is added to a ctrl_mon group, it is automatically removed from the previous ctrl_mon group where it belonged. Similarly if a task is moved to a monitor group it is removed from the previous monitor group . Also since the monitor groups can only have subset of tasks of parent ctrl_mon group, a task can be moved to a monitor group only if its already present in the parent ctrl_mon group. Task membership is indicated by a new field in the task_struct "u32 rmid" which holds the RMID for the task. RMID=0 is reserved for the default root group where the tasks belong to at mount. [tony: zero the rmid if rdtgroup was deleted when task was being moved] Signed-off-by: Tony Luck <tony.luck@linux.intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-16-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Change closid type from int to u32Vikas Shivappa
OS associates a CLOSid(Class of service id) to a task by writing the high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and it is zero indexed. Hence change the type to u32 from int. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Introduce a common compile option for RDTVikas Shivappa
We currently have a CONFIG_RDT_A which is for RDT(Resource directory technology) allocation based resctrl filesystem interface. As a preparation to add support for RDT monitoring as well into the same resctrl filesystem, change the config option to be CONFIG_RDT which would include both RDT allocation and monitoring code. No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/perf/cqm: Wipe out perf based cqmVikas Shivappa
'perf cqm' never worked due to the incompatibility between perf infrastructure and cqm hardware support. The hardware uses RMIDs to track the llc occupancy of tasks and these RMIDs are per package. This makes monitoring a hierarchy like cgroup along with monitoring of tasks separately difficult and several patches sent to lkml to fix them were NACKed. Further more, the following issues in the current perf cqm make it almost unusable: 1. No support to monitor the same group of tasks for which we do allocation using resctrl. 2. It gives random and inaccurate data (mostly 0s) once we run out of RMIDs due to issues in Recycling. 3. Recycling results in inaccuracy of data because we cannot guarantee that the RMID was stolen from a task when it was not pulling data into cache or even when it pulled the least data. Also for monitoring llc_occupancy, if we stop using an RMID_x and then start using an RMID_y after we reclaim an RMID from an other event, we miss accounting all the occupancy that was tagged to RMID_x at a later perf_count. 2. Recycling code makes the monitoring code complex including scheduling because the event can lose RMID any time. Since MBM counters count bandwidth for a period of time by taking snap shot of total bytes at two different times, recycling complicates the way we count MBM in a hierarchy. Also we need a spin lock while we do the processing to account for MBM counter overflow. We also currently use a spin lock in scheduling to prevent the RMID from being taken away. 4. Lack of support when we run different kind of event like task, system-wide and cgroup events together. Data mostly prints 0s. This is also because we can have only one RMID tied to a cpu as defined by the cqm hardware but a perf can at the same time tie multiple events during one sched_in. 5. No support of monitoring a group of tasks. There is partial support for cgroup but it does not work once there is a hierarchy of cgroups or if we want to monitor a task in a cgroup and the cgroup itself. 6. No support for monitoring tasks for the lifetime without perf overhead. 7. It reported the aggregate cache occupancy or memory bandwidth over all sockets. But most cloud and VMM based use cases want to know the individual per-socket usage. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01NFSv4: Fix EXCHANGE_ID corrupt verifier issueTrond Myklebust
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was changed to be asynchronous by commit 8d89bd70bc939. If we interrrupt the call to rpc_wait_for_completion_task(), we can therefore end up transmitting random stack contents in lieu of the verifier. Fixes: 8d89bd70bc939 ("NFS setup async exchange_id") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-01libceph: make RECOVERY_DELETES feature create a new intervalIlya Dryomov
This is needed so that the OSDs can regenerate the missing set at the start of a new interval where support for recovery deletes changed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-01libceph: fallback for when there isn't a pool-specific choose_argIlya Dryomov
There is now a fallback to a choose_arg index of -1 if there isn't a pool-specific choose_arg set. If you create a per-pool weight-set, that works for that pool. Otherwise we try the compat/default one. If that doesn't exist either, then we use the normal CRUSH weights. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-01futex: Allow for compiling out PI supportNicolas Pitre
This makes it possible to preserve basic futex support and compile out the PI support when RT mutexes are not available. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr
2017-08-01cpufreq: Process remote callbacks from any CPU if the platform permitsViresh Kumar
On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU from policy-A can change frequency of CPUs belonging to policy-B. This is quite common in case of ARM platforms where we don't configure any per-cpu register. Add a flag to identify such platforms and update cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is set. Also enable the flag for cpufreq-dt driver which is used only on ARM platforms currently. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-01sched: cpufreq: Allow remote cpufreq callbacksViresh Kumar
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This patch updates the scheduler core to call the cpufreq callbacks for remote CPUs as well. The schedutil, ondemand and conservative governors are updated to process cpufreq utilization update hooks called for remote CPUs where the remote CPU is managed by the cpufreq policy of the local CPU. The intel_pstate driver is updated to always reject remote callbacks. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Based on initial work from Steve Muckle. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-01HID: Remove the semaphore driver_lockBinoy Jayan
The semaphore 'driver_lock' is used as a simple mutex, and also unnecessary as suggested by Arnd. Hence removing it, as the concurrency between the probe and remove is already handled in the driver core. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle notifier registry failures properly in tun/tap driver, from Tonghao Zhang. 2) Fix bpf verifier handling of subtraction bounds and add a testcase for this, from Edward Cree. 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt. 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET, from Cong Wang. 5) Fix SElinux regression due to recent UDP optimizations, from Paolo Abeni. 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code paths, fix from Stefano Brivio. 7) Fix some mem leaks in dccp, from Xin Long. 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd Bergmann. 9) Mac address length check and buffer size fixes from Cong Wang. 10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni. 11) Fix return value when copy_from_user() fails in bpf_prog_get_info_by_fd(), from Daniel Borkmann. 12) Handle PHY_HALTED properly in phy library state machine, from Florian Fainelli. 13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel. 14) Fix truesize calculation in virtio_net which led to performance regressions, from Michael S Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) samples/bpf: fix bpf tunnel cleanup udp6: fix jumbogram reception ppp: Fix a scheduling-while-atomic bug in del_chan Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" virtio_net: fix truesize for mergeable buffers mv643xx_eth: fix of_irq_to_resource() error check MAINTAINERS: Add more files to the PHY LIBRARY section ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() net: phy: Correctly process PHY_HALTED in phy_stop_machine() sunhme: fix up GREG_STAT and GREG_IMASK register offsets bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len tcp: avoid bogus gcc-7 array-bounds warning net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt" bpf: don't indicate success when copy_from_user fails udp6: fix socket leak on early demux net: thunderx: Fix BGX transmit stall due to underflow Revert "vhost: cache used event for better performance" team: use a larger struct for mac address net: check dev->addr_len for dev_set_mac_address() phy: bcm-ns-usb3: fix MDIO_BUS dependency ...
2017-07-31udp6: fix jumbogram receptionPaolo Abeni
Since commit 67a51780aebb ("ipv6: udp: leverage scratch area helpers") udp6_recvmsg() read the skb len from the scratch area, to avoid a cache miss. But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their length exceeds the 16 bits available in the scratch area. As a side effect the length returned by recvmsg() is: <ingress datagram len> % (1<<16) This commit addresses the issue allocating one more bit in the IP6CB flags field and setting it for incoming jumbograms. Such field is still in the first cacheline, so at recvmsg() time we can check it and fallback to access skb->len if required, without a measurable overhead. Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31Merge branch 'for-4.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two notable fixes. - While adding NUMA affinity support to unbound workqueues, the assumption that an unbound workqueue with max_active == 1 is ordered was broken. The plan was to use explicit alloc_ordered_workqueue() for those cases. Unfortunately, I forgot to update the documentation properly and we grew a handful of use cases which depend on that assumption. While we want to convert them to alloc_ordered_workqueue(), we don't really lose anything by enforcing ordered execution on unbound max_active == 1 workqueues and it doesn't make sense to risk subtle bugs. Restore the assumption. - Workqueue assumes that CPU <-> NUMA node mapping remains static. This is a general assumption - we don't have any synchronization mechanism around CPU <-> node mapping. Unfortunately, powerpc may change the mapping dynamically leading to crashes. Michael added a workaround so that we at least don't crash while powerpc hotplug code gets updated" * 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Work around edge cases for calc of pool's cpumask workqueue: implicit ordered attribute should be overridable workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
2017-07-31Merge branch 'for-4.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Dan found a really old bug where libata hotplug code wasn't sanitizing index value from userland and may end up indexing with a negative number. It is scary but fortunately can only be triggered by root. Other than that, minor fixes" * 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: fix a couple of doc build warnings libata: array underflow in ata_find_dev() ata: sata_rcar: add gen[23] fallback compatibility strings libata: remove unused rc in ata_eh_handle_port_resume libata: Cleanup ata_read_log_page() ata: fix gemini Kconfig dependencies
2017-07-31i2c: rephrase explanation of I2C_CLASS_DEPRECATEDWolfram Sang
Hopefully making clear that it is not needed for new drivers. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-07-31dma-buf/sync_file: Allow multiple sync_files to wrap a single dma-fenceChris Wilson
Up until recently sync_file were create to export a single dma-fence to userspace, and so we could canabalise a bit insie dma-fence to mark whether or not we had enable polling for the sync_file itself. However, with the advent of syncobj, we do allow userspace to create multiple sync_files for a single dma-fence. (Similarly, that the sw-sync validation framework also started returning multiple sync-files wrapping a single dma-fence for a syncpt also triggering the problem.) This patch reverts my suggestion in commit e24165537312 ("dma-buf/sync_file: only enable fence signalling on poll()") to use a single bit in the shared dma-fence and restores the sync_file->flags for tracking the bits individually. Reported-by: Gustavo Padovan <gustavo.padovan@collabora.com> Fixes: f1e8c67123cf ("dma-buf/sw-sync: Use an rbtree to sort fences in the timeline") Fixes: e9083420bbac ("drm: introduce sync objects (v4)") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Sean Paul <seanpaul@chromium.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: dri-devel@lists.freedesktop.org Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+ Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170728212951.7818-1-chris@chris-wilson.co.uk (cherry picked from commit db1fc97ca0c0d3fdeeadf314d99a26188438940a)