summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-02-12ARC: mm: Introduce explicit super page size supportVineet Gupta
MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M] So far Linux supported a single super page size for a given Normal page, depending on the software page walking address split. e.g. we had 11:8:13 address split for 8K page, which meant super page was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT) Now we turn this around, by allowing multiple Super Pages in Kconfig (currently 2M and 16M only) and forcing page walker address split to PGDIR_SHIFT and PAGE_SHIFT For configs without Super page, things are same as before and PGDIR_SHIFT can be hacked to get non default address split The motivation for this change is a customer who needs 16M super page and a 8K Normal page combo. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-02-12ext4: remove unused parameter "newblock" in convert_initialized_extent()Eryu Guan
The "newblock" parameter is not used in convert_initialized_extent(), remove it. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12ext4: don't read blocks from disk after extents being swappedEryu Guan
I notice ext4/307 fails occasionally on ppc64 host, reporting md5 checksum mismatch after moving data from original file to donor file. The reason is that move_extent_per_page() calls __block_write_begin() and block_commit_write() to write saved data from original inode blocks to donor inode blocks, but __block_write_begin() not only maps buffer heads but also reads block content from disk if the size is not block size aligned. At this time the physical block number in mapped buffer head is pointing to the donor file not the original file, and that results in reading wrong data to page, which get written to disk in following block_commit_write call. This also can be reproduced by the following script on 1k block size ext4 on x86_64 host: mnt=/mnt/ext4 donorfile=$mnt/donor testfile=$mnt/testfile e4compact=~/xfstests/src/e4compact rm -f $donorfile $testfile # reserve space for donor file, written by 0xaa and sync to disk to # avoid EBUSY on EXT4_IOC_MOVE_EXT xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile # create test file written by 0xbb xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile # compute initial md5sum md5sum $testfile | tee md5sum.txt # drop cache, force e4compact to read data from disk echo 3 > /proc/sys/vm/drop_caches # test defrag echo "$testfile" | $e4compact -i -v -f $donorfile # check md5sum md5sum -c md5sum.txt Fix it by creating & mapping buffer heads only but not reading blocks from disk, because all the data in page is guaranteed to be up-to-date in mext_page_mkuptodate(). Cc: stable@vger.kernel.org Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12ext4: fix potential integer overflowInsu Yun
Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data), integer overflow could be happened. Therefore, need to fix integer overflow sanitization. Cc: stable@vger.kernel.org Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-11Merge remote-tracking branch 'mkp-scsi/4.5/scsi-fixes' into fixesJames Bottomley
2016-02-11scsi: fix soft lockup in scsi_remove_target() on module removalJames Bottomley
This softlockup is currently happening: [ 444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29] [ 444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_ fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th ermal [ 444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G O 4.4.0-rc5-2.g1e923a3-default #1 [ 444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E /D2164-A1, BIOS 5.00 R1.10.2164.A1 05/08/2006 [ 444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc] [ 444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000 [ 444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1 [ 444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20 [ 444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286 [ 444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8 [ 444.088002] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0 [ 444.088002] Stack: [ 444.088002] f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90 [ 444.088002] f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040 [ 444.088002] f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004 [ 444.088002] Call Trace: [ 444.088002] [<c066b0f7>] scsi_remove_target+0x167/0x1c0 [ 444.088002] [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc] [ 444.088002] [<c026cb25>] process_one_work+0x155/0x3e0 [ 444.088002] [<c026cde7>] worker_thread+0x37/0x490 [ 444.088002] [<c027214b>] kthread+0x9b/0xb0 [ 444.088002] [<c07e72c1>] ret_from_kernel_thread+0x21/0x40 What appears to be happening is that something has pinned the target so it can't go into STARGET_DEL via final release and the loop in scsi_remove_target spins endlessly until that happens. The fix for this soft lockup is to not keep looping over a device that we've called remove on but which hasn't gone into DEL state. This patch will retain a simplistic memory of the last target and not keep looping over it. Reported-by: Sebastian Herbszt <herbszt@gmx.de> Tested-by: Sebastian Herbszt <herbszt@gmx.de> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2016-02-12ext4: add a line break for proc mb_groups displayHuaitong Han
This patch adds a line break for proc mb_groups display. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-02-11ext4: ioctl: fix erroneous return valueAnton Protopopov
The ext4_ioctl_setflags() function which is used in the ioctls EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value EPERM instead of -EPERM in case of error. This bug was introduced by a recent commit 9b7365fc. The following program can be used to illustrate the wrong behavior: #include <sys/types.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) #define FS_IMMUTABLE_FL 0x00000010 int main(void) { int fd; long flags; fd = open("file", O_RDWR|O_CREAT, 0600); if (fd < 0) err(1, "open"); if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0) err(1, "ioctl: FS_IOC_GETFLAGS"); flags |= FS_IMMUTABLE_FL; if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0) err(1, "ioctl: FS_IOC_SETFLAGS"); warnx("ioctl returned no error"); return 0; } Running it gives the following result: $ strace -e ioctl ./test ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0 ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1 test: ioctl returned no error +++ exited with 0 +++ Running the program on a kernel with the bug fixed gives the proper result: $ strace -e ioctl ./test ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0 ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted) test: ioctl: FS_IOC_SETFLAGS: Operation not permitted +++ exited with 1 +++ Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-11ext4: fix scheduling in atomic on group checksum failureJan Kara
When block group checksum is wrong, we call ext4_error() while holding group spinlock from ext4_init_block_bitmap() or ext4_init_inode_bitmap() which results in scheduling while in atomic. Fix the issue by calling ext4_error() later after dropping the spinlock. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-02-11Merge tag 'phy-for-4.5-rc' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-linus Kishon writes: phy: for 4.5-rc *) Fix error handling code in phy core [phy_power_on()] *) phy-twl4030-usb fixes for unloading the module *) Restrict phy-hi6220-usb to HiSilicon arm64 Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2016-02-11arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFIAndrew Morton
arch/x86/built-in.o: In function `uv_bios_call': (.text+0xeba00): undefined reference to `efi_call' Reported-by: kbuild test robot <fengguang.wu@intel.com> Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm: fix pfn_t vs highmemDan Williams
The pfn_t type uses an unsigned long to store a pfn + flags value. On a 64-bit platform the upper 12 bits of an unsigned long are never used for storing the value of a pfn. However, this is not true on highmem platforms, all 32-bits of a pfn value are used to address a 44-bit physical address space. A pfn_t needs to store a 64-bit value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211 Fixes: 01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stuart Foster <smf.linux@ntlworld.com> Reported-by: Julian Margetson <runaway@candw.ms> Tested-by: Julian Margetson <runaway@candw.ms> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11kernel/locking/lockdep.c: convert hash tables to hlistsAndrew Morton
Mike said: : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i. e : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error : message. : : The problem is that ubsan callbacks use spinlocks and might be called : before lockdep is initialized. Particularly this line in the : reserve_ebda_region function causes problem: : : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); : : If i put lockdep_init() before reserve_ebda_region call in : x86_64_start_reservations kernel loads well. Fix this ordering issue permanently: change lockdep so that it uses hlists for the hash tables. Unlike a list_head, an hlist_head is in its initialized state when it is all-zeroes, so lockdep is ready for operation immediately upon boot - lockdep_init() need not have run. The patch will also save some memory. lockdep_init() and lockdep_initialized can be done away with now - a 4.6 patch has been prepared to do this. Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGEVineet Gupta
[akpm@linux-foundation.org: s/threshhold/threshold/] Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm,thp: khugepaged: call pte flush at the time of collapseVineet Gupta
This showed up on ARC when running LMBench bw_mem tests as Overlapping TLB Machine Check Exception triggered due to STLB entry (2M pages) overlapping some NTLB entry (regular 8K page). bw_mem 2m touches a large chunk of vaddr creating NTLB entries. In the interim khugepaged kicks in, collapsing the contiguous ptes into a single pmd. pmdp_collapse_flush()->flush_pmd_tlb_range() is called to flush out NTLB entries for the ptes. This for ARC (by design) can only shootdown STLB entries (for pmd). The stray NTLB entries cause the overlap with the subsequent STLB entry for collapsed page. So make pmdp_collapse_flush() call pte flush interface not pmd flush. Note that originally all thp flush call sites in generic code called flush_tlb_range() leaving it to architecture to implement the flush for pte and/or pmd. Commit 12ebc1581ad11454 changed this by calling a new opt-in API flush_pmd_tlb_range() which made the semantics more explicit but failed to distinguish the pte vs pmd flush in generic code, which is what this patch fixes. Note that ARC can fixed w/o touching the generic pmdp_collapse_flush() by defining a ARC version, but that defeats the purpose of generic version, plus sementically this is the right thing to do. Fixes STAR 9000961194: LMBench on AXS103 triggering duplicate TLB exceptions with super pages Fixes: 12ebc1581ad11454 ("mm,thp: introduce flush_pmd_tlb_range") Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.4] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm/backing-dev.c: fix error path in wb_init()Rasmus Villemoes
We need to use post-decrement to get percpu_counter_destroy() called on &wb->stat[0]. Moreover, the pre-decremebt would cause infinite out-of-bounds accesses if the setup code failed at i==0. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm, dax: check for pmd_none() after split_huge_pmd()Kirill A. Shutemov
DAX implements split_huge_pmd() by clearing pmd. This simple approach reduces memory overhead, as we don't need to deposit page table on huge page mapping to make split_huge_pmd() never-fail. PTE table can be allocated and populated later on page fault from backing store. But one side effect is that have to check if pmd is pmd_none() after split_huge_pmd(). In most places we do this already to deal with parallel MADV_DONTNEED. But I found two call sites which is not affected by MADV_DONTNEED (due down_write(mmap_sem)), but need to have the check to work with DAX properly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11vsprintf: kptr_restrict is okay in IRQ when 2Jason A. Donenfeld
The kptr_restrict flag, when set to 1, only prints the kernel address when the user has CAP_SYSLOG. When it is set to 2, the kernel address is always printed as zero. When set to 1, this needs to check whether or not we're in IRQ. However, when set to 2, this check is unneccessary, and produces confusing results in dmesg. Thus, only make sure we're not in IRQ when mode 1 is used, but not mode 2. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11mm: fix filemap.c kernel doc warningRandy Dunlap
Add missing kernel-doc notation for function parameter 'gfp_mask' to fix kernel-doc warning. mm/filemap.c:1898: warning: No description found for parameter 'gfp_mask' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11ubsan: cosmetic fix to Kconfig textYang Shi
When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased significantly (~3x). So, it sounds better to have some note in Kconfig. And, fixed a typo. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11of/irq: Fix msi-map calculation for nonzero rid-baseRobin Murphy
The existing msi-map code is fine for shifting the entire RID space upwards, but attempting finer-grained remapping reveals a bug. It turns out that we are mistakenly treating the msi-base part as an offset, not as a new base to remap onto, so things get squiffy when rid-base is nonzero. Fix this, and at the same time add a sanity check against having msi-map-mask clash with a nonzero rid-base, as that's another thing one can easily get wrong. CC: <stable@vger.kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Stuart Yoder <stuart.yoder@nxp.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Rob Herring <robh@kernel.org>
2016-02-11NVMe: Allow request mergesKeith Busch
It is generally more efficient to submit larger IO. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11NVMe: Fix io incapable return valuesKeith Busch
The function returns true when the controller can't handle IO. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11blk-mq: End unstarted requests on dying queueKeith Busch
Go directly to ending a request if it wasn't started. Previously, completing a request may invoke a driver callback for a request it didn't initialize. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11block: Initialize max_dev_sectors to 0Keith Busch
The new queue limit is not used by the majority of block drivers, and should be initialized to 0 for the driver's requested settings to be used. Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11Merge tag 'gpio-v4.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: - Probe errorpath fix for the Altera - irqchip ofnode pointer added to the DaVinci driver - controller instance number correction for DaVinci * tag 'gpio-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: davinci: Fix the number of controllers allocated gpio: davinci: Add the missing of-node pointer gpio: gpio-altera: Remove gpiochip on probe failure.
2016-02-11Merge tag 'platform-drivers-x86-v4.5-3' of ↵Linus Torvalds
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Just two small fixes for the 4.5-rc cycle: intel_scu_ipcutil: - underflow in scu_reg_access() intel-hid: - fix incorrect entries in intel_hid_keymap" * tag 'platform-drivers-x86-v4.5-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_scu_ipcutil: underflow in scu_reg_access() intel-hid: fix incorrect entries in intel_hid_keymap
2016-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix BPF handling of branch offset adjustmnets on backjumps, from Daniel Borkmann. 2) Make sure selinux knows about SOCK_DESTROY netlink messages, from Lorenzo Colitti. 3) Fix openvswitch tunnel mtu regression, from David Wragg. 4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric Dumazet. 5) Fix SCTP user hmacid byte ordering bug, from Xin Long. 6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: bpf: fix branch offset adjustment on backjumps after patching ctx expansion vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices geneve: Relax MTU constraints vxlan: Relax MTU constraints flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities. selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables sctp: translate network order to host order when users get a hmacid enic: increment devcmd2 result ring in case of timeout tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs net:Add sysctl_max_skb_frags tcp: do not drop syn_recv on all icmp reports ipv6: fix a lockdep splat unix: correctly track in-flight fds in sending process user_struct update be2net maintainers' email addresses dwc_eth_qos: Reset hardware before PHY start ipv6: addrconf: Fix recursive spin lock call
2016-02-11Merge branch 'net-mitigate-kmem_free-slowpath'David S. Miller
Jesper Dangaard Brouer says: ==================== net: mitigating kmem_cache free slowpath This patchset is the first real use-case for kmem_cache bulk _free_. The use of bulk _alloc_ is NOT included in this patchset. The full use have previously been posted here [1]. The bulk free side have the largest benefit for the network stack use-case, because network stack is hitting the kmem_cache/SLUB slowpath when freeing SKBs, due to the amount of outstanding SKBs. This is solved by using the new API kmem_cache_free_bulk(). Introduce new API napi_consume_skb(), that hides/handles bulk freeing for the caller. The drivers simply need to use this call when freeing SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() / dev_consume_skb_any(). Driver ixgbe is the first user of this new API. [1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ixgbe: bulk free SKBs during TX completion cleanup cycleJesper Dangaard Brouer
There is an opportunity to bulk free SKBs during reclaiming of resources after DMA transmit completes in ixgbe_clean_tx_irq. Thus, bulk freeing at this point does not introduce any added latency. Simply use napi_consume_skb() which were recently introduced. The napi_budget parameter is needed by napi_consume_skb() to detect if it is called from netpoll. Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost) Single CPU/flow numbers: before: 1982144 pps -> after : 2064446 pps Improvement: +82302 pps, -20 nanosec, +4.1% (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4)) Joint work with Alexander Duyck. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: bulk free SKBs that were delay free'ed due to IRQ contextJesper Dangaard Brouer
The network stack defers SKBs free, in-case free happens in IRQ or when IRQs are disabled. This happens in __dev_kfree_skb_irq() that writes SKBs that were free'ed during IRQ to the softirq completion queue (softnet_data.completion_queue). These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ in function net_tx_action(). Take advantage of this a use the skb defer and flush API, as we are already in softirq context. For modern drivers this rarely happens. Although most drivers do call dev_kfree_skb_any(), which detects the situation and calls __dev_kfree_skb_irq() when needed. This due to netpoll can call from IRQ context. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: bulk free infrastructure for NAPI context, use napi_consume_skbJesper Dangaard Brouer
Discovered that network stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk can speedup this slowpath. NAPI context is a bit special, lets take advantage of that for bulk free'ing SKBs. In NAPI context we are running in softirq, which gives us certain protection. A softirq can run on several CPUs at once. BUT the important part is a softirq will never preempt another softirq running on the same CPU. This gives us the opportunity to access per-cpu variables in softirq context. Extend napi_alloc_cache (before only contained page_frag_cache) to be a struct with a small array based stack for holding SKBs. Introduce a SKB defer and flush API for accessing this. Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any() when running in NAPI context. A small trick to handle/detect if we are called from netpoll is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11Merge branch 'virtio_net-ethtool-validation'David S. Miller
Nikolay Aleksandrov says: ==================== virtio_net: better ethtool setting validation This small set is a follow-up for the recent patches that added ethtool get/set settings. Patch 1 changes the speed validation routine to check if the speed is between 0 and INT_MAX (or SPEED_UNKNOWN) and patch 2 adds port validation to virtio_net and better validation comment. This set is on top of Michael's patch which explains that speeds from 0 to INT_MAX are valid: http://patchwork.ozlabs.org/patch/578911/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11virtio_net: validate ethtool port setting and explain the user validationNikolay Aleksandrov
We should validate the port setting that we got from the user and check if it's what we've set it to (PORT_OTHER), also add explanation that ignoring advertising is good as long as we don't have autonegotiation. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ethtool: make validate_speed accept all speeds between 0 and INT_MAXNikolay Aleksandrov
Devices these days can have any speed and as was recently pointed out any speed from 0 to INT_MAX is valid so adjust speed validation to accept such values. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11Merge branch 'dp83848-TLK10x'David S. Miller
Andrew F. Davis says: ==================== net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs This series is [0] split into its logical components. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: dp83848: Add comments for register definitionsAndrew F. Davis
Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: dp83848: Add support for TI TLK10x Ethernet PHYsAndrew F. Davis
The TI TLK10x Ethernet PHYs are similar in the interrupt relevant registers and so are compatible with the DP83848x devices already supported. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: dp83848: Reorganize code for readability and safetyAndrew F. Davis
Reorganize code by moving the desired interrupt mask definition out of function. Also rearrange the enable/disable interrupt function to prevent accidental over-writing of values in registers. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: dp83848: Add PHY ID for TI version of DP83848CAndrew F. Davis
After acquiring National Semiconductor, TI appears to have changed the Vendor Model Number for the DP83848C PHYs, add this new ID to supported IDs. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: dp83848: Add macro for dp83848 compatible devicesAndrew F. Davis
Add a helper macro for defining dp83848 compatible phy devices. Update copyright info. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11be2net: don't report EVB for older chipsets when SR-IOV is disabledIvan Vecera
The EVB (virtual bridge) functionality should be disabled on older BE3 and Lancer chips if SR-IOV is disabled in the NIC's BIOS. This setting is identified by the zero value of total VFs reported by the card. The GET_HSW_CONFIG command cannot be used as it is not supported by these older chipset's FW. v2: added the comment Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11Merge branch 'spi_ks8995'David S. Miller
Helmut Buchsbaum says: ==================== Add support for MICREL KSZ8795CLX 5-port switch This patch series refactors the spi-ks8995 driver to finally add support for the MICREL KSZ8795CLX. Additionally support for controlling a GPIO line for resetting the switch is added. Helmut Changes since v2: - use GPIO_ACTIVE_LOW according to Andrew's remark. - use ePAPR compliant node name in example, thanks to Sergei for pointing out Changes since v1: - removed initializing registers from Device Tree following Florian's advice - fixed GPIO handling for reset according to Andrew's remark. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11dt-bindings: net: ks8995: add bindings documentation for ks8995Helmut Buchsbaum
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: spi_ks8995: add support for MICREL KSZ8795CLXHelmut Buchsbaum
Add support for MICREL KSZ8795CLX Integrated 5-Port, 10-/100-Managed Ethernet Switch with Gigabit GMII/RGMII and MII/RMII interfaces. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: spi_ks8995: generalize creation of SPI commandsHelmut Buchsbaum
Prepare creating SPI reads and writes for other switch families. The KS8995 family uses the straight forward <8bit CMD><8bit ADDR> sequence. To be able to support KSZ8795 family, which uses <3bit CMD><12bit ADDR><1 bit TR> make the SPI command creation chip variant dependent. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: spi_ks8995: add support for resetting switch using GPIOHelmut Buchsbaum
When using device tree it is no more possible to reset the PHY at board level. Furthermore, doing in the driver allows to power down the switch when it is not used any more. The patch introduces a new optional property "reset-gpios" denoting an appropriate GPIO handle, e.g.: reset-gpios = <&gpio0 46 GPIO_ACTIVE_LOW> Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: spi_ks8995: verify chip and determine revisionHelmut Buchsbaum
Since the chip variant is now determined by spi_device_id, verify family and chip id and determine the revision id. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: phy: spi_ks8995: introduce spi_device_id tableHelmut Buchsbaum
Refactor to use spi_device_id table to facilitate easy extendability. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11Merge branch 'thunderx-irq-hints'David S. Miller
Sunil Goutham says: ==================== net: thunderx: Setting IRQ affinity hints and other optimizations This patch series contains changes - To add support for virtual function's irq affinity hint - Replace napi_schedule() with napi_schedule_irqoff() - Reduce page allocation overhead by allocating pages of higher order when pagesize is 4KB. - Add couple of stats which helps in debugging - Some miscellaneous changes to BGX driver. Changes from v1: - As suggested changed MAC address invalid log message to dev_err() instead of dev_warn(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>