linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-01-05	net/mlx5e: Do not modify the TX SKB	Achiad Shochat
	If the SKB is cloned, or has an elevated users count, someone else can be looking at it at the same time. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	Merge remote-tracking branches 'regmap/topic/mmio', 'regmap/topic/rbtree' ↵	Mark Brown
	and 'regmap/topic/seq' into regmap-next
2016-01-05	Merge remote-tracking branches 'regmap/topic/64bit' and ↵	Mark Brown
	'regmap/topic/irq-type' into regmap-next
2016-01-05	Merge remote-tracking branch 'regmap/topic/core' into regmap-next	Mark Brown

2016-01-05	Merge remote-tracking branch 'regmap/topic/cache' into regmap-next	Mark Brown

2016-01-05	regmap: debugfs: Use seq_file for the access map	Mark Brown
	Unlike the registers file we don't have any substantial performance concerns rendering the entire file (it involves no device accesses) so just use seq_printf() to simplify the code. Signed-off-by: Mark Brown <broonie@kernel.org>
2016-01-05	regmap: irq: add support for configuration of trigger type	Laxman Dewangan
	Some of devices supports the trigger level for interrupt like rising/falling edge specially for GPIOs. The interrupt support of such devices may have uses the generic regmap irq framework for implementation. Add support to configure the trigger type device interrupt register via regmap-irq framework. The regmap-irq framework configures the trigger register only if the details of trigger type registers are provided. [Fixed use of terery operator for legibility -- broonie] Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-01-05	Merge branch 'sctp-transport-rhashtable'	David S. Miller
	Xin Long says: ==================== sctp: use transport hashtable to replace association's with rhashtable for telecom center, the usual case is that a server is connected by thousands of clients. but if the server with only one enpoint(udp style) use the same sport and dport to communicate with every clients, and every assoc in server will be hashed in the same chain of global assoc hashtable due to currently we choose dport and sport as the hash key. when a packet is received, sctp_rcv try to find the assoc with sport and dport, since that chain is too long to find it fast, it make the performance turn to very low, some test data is as follow: in server: $./ss [start a udp style server there] in client: $./cc [start 2500 sockets to connect server with same port and different ip, and use one of them to send data to server] ===== test on net-next -- perf top server: 55.73% [kernel] [k] sctp_assoc_is_match 6.80% [kernel] [k] sctp_assoc_lookup_paddr 4.81% [kernel] [k] sctp_v4_cmp_addr 3.12% [kernel] [k] _raw_spin_unlock_irqrestore 1.94% [kernel] [k] sctp_cmp_addr_exact client: 46.01% [kernel] [k] sctp_endpoint_lookup_assoc 5.55% libc-2.17.so [.] __libc_calloc 5.39% libc-2.17.so [.] _int_free 3.92% libc-2.17.so [.] _int_malloc 3.23% [kernel] [k] __memset -- spent time time is 487s, send pkt is 10000000 we need to change the way to calculate the hash key, to use lport + rport + paddr as the hash key can avoid this issue. besides, this patchset will use transport hashtable to replace association hashtable to lookup with rhashtable api. get transport first then get association by t->asoc. and also it will make tcp style work better. ===== test with this patchset: -- perf top server: 15.98% [kernel] [k] _raw_spin_unlock_irqrestore 9.92% [kernel] [k] __pv_queued_spin_lock_slowpath 7.22% [kernel] [k] copy_user_generic_string 2.38% libpthread-2.17.so [.] __recvmsg_nocancel 1.88% [kernel] [k] sctp_recvmsg client: 11.90% [kernel] [k] sctp_hash_cmp 8.52% [kernel] [k] rht_deferred_worker 4.94% [kernel] [k] __pv_queued_spin_lock_slowpath 3.95% [kernel] [k] sctp_bind_addr_match 2.49% [kernel] [k] __memset -- spent time time is 22s, send pkt is 10000000 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc	Xin Long
	sctp_endpoint_lookup_assoc is called in the protection of sock lock there is no need to call local_bh_disable in this function. so remove them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	sctp: drop the old assoc hashtable of sctp	Xin Long
	transport hashtable will replace the association hashtable, so association hashtable is not used in sctp any more, so drop the codes about that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	sctp: apply rhashtable api to sctp procfs	Xin Long
	Traversal the transport rhashtable, get the association only once through the condition assoc->peer.primary_path != transport. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	sctp: apply rhashtable api to send/recv path	Xin Long
	apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc and __sctp_lookup_association, it's invoked in the protection of sock lock, it will be safe, but sctp_lookup_association need to call rcu_read_lock() and to detect the t->dead to protect it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	sctp: add the rhashtable apis for sctp global transport hashtable	Xin Long
	tranport hashtbale will replace the association hashtable to do the lookup for transport, and then get association by t->assoc, rhashtable apis will be used because of it's resizable, scalable and using rcu. lport + rport + paddr will be the base hashkey to locate the chain, with net to protect one netns from another, then plus the laddr to compare to get the target. this patch will provider the lookup functions: - sctp_epaddr_lookup_transport - sctp_addrs_lookup_transport hash/unhash functions: - sctp_hash_transport - sctp_unhash_transport init/destroy functions: - sctp_transport_hashtable_init - sctp_transport_hashtable_destroy Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05	mmc: dw_mmc: remove the unused quirks	Jaehoon Chung
	Removed the unused quirks. These quirks don't used anywhere. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-01-05	mmc: sdhci-pci: use to_pci_dev()	Geliang Tang
	Use to_pci_dev() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-01-05	mmc: cb710: use to_platform_device()	Geliang Tang
	Use to_platform_device() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-01-05	mmc: tegra: use correct accessor for misc ctrl register	Lucas Stach
	The misc control register is 32bit wide, the used readw/writew accessors only mainipulate the low 16bit of this register. It currently doesn't matter as all the bit changed are located in the lower half, but together with the u32 variable used to hold the contents of the register it is seriously confusing. Switch to 32bit accessors to avoid any future breakage. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-01-05	mmc: tegra: enable UHS-I modes	Lucas Stach
	Keep the quirk bits, as Tegra30 and Tegra114 host have different levels of support for UHS-I modes and so need different spare bits to be set, but change the logic to be positive. Tegra210 needs a different tuning sequence than Tegra30+. Disable UHS modes until support for this is properly added. Signed-off-by: Lucas Stach <dev@lynxeye.de> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-01-05	Bluetooth: Add support for Start Limited Discovery command	Johan Hedberg
	This patch implements the mgmt Start Limited Discovery command. Most of existing Start Discovery code is reused since the only difference is the presence of a 'limited' flag as part of the discovery state. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-01-05	Bluetooth: Change eir_has_data_type() to more generic eir_get_data()	Johan Hedberg
	To make the EIR parsing helper more general purpose, make it return the found data and its length rather than just saying whether the data was present or not. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-01-05	arm64: mm: move pgd_cache initialisation to pgtable_cache_init	Will Deacon
	Initialising the suppport for EFI runtime services requires us to allocate a pgd off the back of an early_initcall. On systems where the PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the pgd_cache isn't initialised at this stage, and we panic with a NULL dereference during boot: Unable to handle kernel NULL pointer dereference at virtual address 00000000 __create_mapping.isra.5+0x84/0x350 create_pgd_mapping+0x20/0x28 efi_create_mapping+0x5c/0x6c arm_enable_runtime_services+0x154/0x1e4 do_one_initcall+0x8c/0x190 kernel_init_freeable+0x84/0x1ec kernel_init+0x10/0xe0 ret_from_fork+0x10/0x50 This patch fixes the problem by initialising the pgd_cache earlier, in the pgtable_cache_init callback, which sounds suspiciously like what it was intended for. Reported-by: Dennis Chen <dennis.chen@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-05	tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro	Chris Metcalf
	This allows the build system to know that it can't attempt to configure the Lustre virtual block device, for example, when tilepro is using 64KB pages (as it does by default). The tilegx build already provided those symbols. Previously we required that the tilepro hypervisor be rebuilt with a different hardcoded page size in its headers, and then Linux be rebuilt using the updated hypervisor header. Now we allow each of the hypervisor and Linux to be built independently. We still check at boot time to ensure that the page size provided by the hypervisor matches what Linux expects. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: stable@vger.kernel.org [3.19+]
2016-01-05	pinctrl: lantiq: 2 pins have the wrong mux list	John Crispin
	The latest vendor SDK contained this patch. Signed-off-by: John Crispin <blogic@openwrt.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-01-05	ASoC: Intel: Skylake: Fix the memory leak	Vinod Koul
	This provide the fix for firmware memory by freeing the pointer in driver remove where it is safe to do so Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-01-05	ASoC: Intel: Skylake: Revert previous broken fix memory leak fix	Vinod Koul
	This reverts commit 87b5ed8ecb9fe05a696e1c0b53c7a49ea66432c1 ("ASoC: Intel: Skylake: fix memory leak") as it causes regression on Skylake devices The SKL drivers can be deferred probe. The topology file based widgets can have references to topology file so this can't be freed until card is fully created, so revert this patch for now [ 66.682767] BUG: unable to handle kernel paging request at ffffc900001363fc [ 66.690735] IP: [<ffffffff806c94dd>] strnlen+0xd/0x40 [ 66.696509] PGD 16e035067 PUD 16e036067 PMD 16e038067 PTE 0 [ 66.702925] Oops: 0000 [#1] PREEMPT SMP [ 66.768390] CPU: 3 PID: 57 Comm: kworker/u16:3 Tainted: G O 4.4.0-rc7-skl #62 [ 66.778869] Hardware name: Intel Corporation Skylake Client platform [ 66.793201] Workqueue: deferwq deferred_probe_work_func [ 66.799173] task: ffff88008b700f40 ti: ffff88008b704000 task.ti: ffff88008b704000 [ 66.807692] RIP: 0010:[<ffffffff806c94dd>] [<ffffffff806c94dd>] strnlen+0xd/0x40 [ 66.816243] RSP: 0018:ffff88008b707878 EFLAGS: 00010286 [ 66.822293] RAX: ffffffff80e60a82 RBX: 000000000000000e RCX: fffffffffffffffe [ 66.830406] RDX: ffffc900001363fc RSI: ffffffffffffffff RDI: ffffc900001363fc [ 66.838520] RBP: ffff88008b707878 R08: 000000000000ffff R09: 000000000000ffff [ 66.846649] R10: 0000000000000001 R11: ffffffffa01c6368 R12: ffffc900001363fc [ 66.854765] R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000 [ 66.862910] FS: 0000000000000000(0000) GS:ffff88016ecc0000(0000) knlGS:0000000000000000 [ 66.872150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 66.878696] CR2: ffffc900001363fc CR3: 0000000002c09000 CR4: 00000000003406e0 [ 66.886820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 66.894938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 66.903052] Stack: [ 66.905346] ffff88008b7078b0 ffffffff806cb1db 000000000000000e 0000000000000000 [ 66.913854] ffff88008b707928 ffffffffa00d1050 ffffffffa00d104e ffff88008b707918 [ 66.922353] ffffffff806ccbd6 ffff88008b707948 0000000000000046 ffff88008b707940 [ 66.930855] Call Trace: [ 66.933646] [<ffffffff806cb1db>] string.isra.4+0x3b/0xd0 [ 66.939793] [<ffffffff806ccbd6>] vsnprintf+0x116/0x540 [ 66.945742] [<ffffffff806d02f0>] kvasprintf+0x40/0x80 [ 66.951591] [<ffffffff806d0370>] kasprintf+0x40/0x50 [ 66.957359] [<ffffffffa00c085f>] dapm_create_or_share_kcontrol+0x1cf/0x300 [snd_soc_core] [ 66.966771] [<ffffffff8057dd1e>] ? __kmalloc+0x16e/0x2a0 [ 66.972931] [<ffffffffa00c0dab>] snd_soc_dapm_new_widgets+0x41b/0x4b0 [snd_soc_core] [ 66.981857] [<ffffffffa00be8c0>] ? snd_soc_dapm_add_routes+0xb0/0xd0 [snd_soc_core] [ 67.007828] [<ffffffffa00b92ed>] soc_probe_component+0x23d/0x360 [snd_soc_core] [ 67.016244] [<ffffffff80b14e69>] ? mutex_unlock+0x9/0x10 [ 67.022405] [<ffffffffa00ba02f>] snd_soc_instantiate_card+0x47f/0xd10 [snd_soc_core] [ 67.031329] [<ffffffff8049eeb2>] ? debug_mutex_init+0x32/0x40 [ 67.037973] [<ffffffffa00baa92>] snd_soc_register_card+0x1d2/0x2b0 [snd_soc_core] [ 67.046619] [<ffffffffa00c8b54>] devm_snd_soc_register_card+0x44/0x80 [snd_soc_core] [ 67.055539] [<ffffffffa01c303b>] skylake_audio_probe+0x1b/0x20 [snd_soc_skl_rt286] [ 67.064292] [<ffffffff808aa887>] platform_drv_probe+0x37/0x90 Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-01-05	arm64: module: avoid undefined shift behavior in reloc_data()	Ard Biesheuvel
	Compilers may engage the improbability drive when encountering shifts by a distance that is a multiple of the size of the operand type. Since the required bounds check is very simple here, we can get rid of all the fuzzy masking, shifting and comparing, and use the documented bounds directly. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-05	arm64: module: fix relocation of movz instruction with negative immediate	Ard Biesheuvel
	The test whether a movz instruction with a signed immediate should be turned into a movn instruction (i.e., when the immediate is negative) is flawed, since the value of imm is always positive. Also, the subsequent bounds check is incorrect since the limit update never executes, due to the fact that the imm_type comparison will always be false for negative signed immediates. Let's fix this by performing the sign test on sval directly, and replacing the bounds check with a simple comparison against U16_MAX. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: tidied up use of sval, renamed MOVK enum value to MOVKZ] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-05	Merge branches 'misc' and 'misc-rc6' into for-linus	Russell King

2016-01-05	configfs: add myself as co-maintainer, updated git tree	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2016-01-05	x86/mm/pat: Change free_memtype() to support shrinking case	Toshi Kani
	Using mremap() to shrink the map size of a VM_PFNMAP range causes the following error message, and leaves the pfn range allocated. x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff] This is because rbt_memtype_erase(), called from free_memtype() with spin_lock held, only supports to free a whole memtype node in memtype_rbroot. Therefore, this patch changes rbt_memtype_erase() to support a request that shrinks the size of a memtype node for mremap(). memtype_rb_exact_match() is renamed to memtype_rb_match(), and is enhanced to support EXACT_MATCH and END_MATCH in @match_type. Since the memtype_rbroot tree allows overlapping ranges, rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free a whole node for the munmap case. If no such entry is found, it then checks with END_MATCH, i.e. shrink the size of a node from the end for the mremap case. On the mremap case, rbt_memtype_erase() proceeds in two steps, 1) remove the node, and then 2) insert the updated node. This allows proper update of augmented values, subtree_max_end, in the tree. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: stsp@list.ru Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-05	x86/mm/pat: Add untrack_pfn_moved for mremap	Toshi Kani
	mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following WARN_ON_ONCE() message in untrack_pfn(). WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0() Call Trace: [<ffffffff817729ea>] dump_stack+0x45/0x57 [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0 [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0 [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860 [<ffffffff811d3725>] unmap_vmas+0x55/0xb0 [<ffffffff811d916c>] unmap_region+0xac/0x120 [<ffffffff811db86a>] do_munmap+0x28a/0x460 [<ffffffff811dec33>] move_vma+0x1b3/0x2e0 [<ffffffff811df113>] SyS_mremap+0x3b3/0x510 [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71 MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is called with the old vma after its pfnmap page table has been removed, which causes follow_phys() to fail. The new vma has a new pfnmap to the same pfn & cache type with VM_PAT set. Therefore, we only need to clear VM_PAT from the old vma in this case. Add untrack_pfn_moved(), which clears VM_PAT from a given old vma. move_vma() is changed to call this function with the old vma when VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn() is a no-op since VM_PAT is cleared. Reported-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-04	af_unix: Fix splice-bind deadlock	Rainer Weikusat
	On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice system call and AF_UNIX sockets, http://lists.openwall.net/netdev/2015/11/06/24 The situation was analyzed as (a while ago) A: socketpair() B: splice() from a pipe to /mnt/regular_file does sb_start_write() on /mnt C: try to freeze /mnt wait for B to finish with /mnt A: bind() try to bind our socket to /mnt/new_socket_name lock our socket, see it not bound yet decide that it needs to create something in /mnt try to do sb_start_write() on /mnt, block (it's waiting for C). D: splice() from the same pipe to our socket lock the pipe, see that socket is connected try to lock the socket, block waiting for A B: get around to actually feeding a chunk from pipe to file, try to lock the pipe. Deadlock. on 2015/11/10 by Al Viro, http://lists.openwall.net/netdev/2015/11/10/4 The patch fixes this by removing the kern_path_create related code from unix_mknod and executing it as part of unix_bind prior acquiring the readlock of the socket in question. This means that A (as used above) will sb_start_write on /mnt before it acquires the readlock, hence, it won't indirectly block B which first did a sb_start_write and then waited for a thread trying to acquire the readlock. Consequently, A being blocked by C waiting for B won't cause a deadlock anymore (effectively, both A and B acquire two locks in opposite order in the situation described above). Dmitry Vyukov(<dvyukov@google.com>) tested the original patch. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	net: Propagate lookup failure in l3mdev_get_saddr to caller	David Ahern
	Commands run in a vrf context are not failing as expected on a route lookup: root@kenny:~# ip ro ls table vrf-red unreachable default root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 ping: Warning: source address might be selected on device other than vrf-red. PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data. --- 10.100.1.254 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms Since the vrf table does not have a route for 10.100.1.254 the ping should have failed. The saddr lookup causes a full VRF table lookup. Propogating a lookup failure to the user allows the command to fail as expected: root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 connect: No route to host Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	Merge branch 'faster-soreuseport'	David S. Miller
	Craig Gallek says: ==================== Faster SO_REUSEPORT This series contains two optimizations for the SO_REUSEPORT feature: Faster lookup when selecting a socket for an incoming packet and the ability to select the socket from the group using a BPF program. This series only includes the UDP path. I plan to submit a follow-up including the TCP path if the implementation in this series is acceptable. Changes in v4: - pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov) Changes in v3: - skb_pull_inline -> pskb_pull (per Alexei Starovoitov) - reuseport_attach* -> sk_reuseport_attach* and simple return statement syntax change (per Daniel Borkmann) Changes in v2: - Fix ARM build; remove unnecessary include. - Handle case where protocol header is not in linear section (per Alexei Starovoitov). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	soreuseport: BPF selection functional test	Craig Gallek
	This program will build classic and extended BPF programs and validate the socket selection logic when used with SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF. It also validates the re-programing flow and several edge cases. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF	Craig Gallek
	Expose socket options for setting a classic or extended BPF program for use when selecting sockets in an SO_REUSEPORT group. These options can be used on the first socket to belong to a group before bind or on any socket in the group after bind. This change includes refactoring of the existing sk_filter code to allow reuse of the existing BPF filter validation checks. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	soreuseport: fast reuseport UDP socket selection	Craig Gallek
	Include a struct sock_reuseport instance when a UDP socket binds to a specific address for the first time with the reuseport flag set. When selecting a socket for an incoming UDP packet, use the information available in sock_reuseport if present. This required adding an additional field to the UDP source address equality function to differentiate between exact and wildcard matches. The original use case allowed wildcard matches when checking for existing port uses during bind. The new use case of adding a socket to a reuseport group requires exact address matching. Performance test (using a machine with 2 CPU sockets and a total of 48 cores): Create reuseport groups of varying size. Use one socket from this group per user thread (pinning each thread to a different core) calling recvmmsg in a tight loop. Record number of messages received per second while saturating a 10G link. 10 sockets: 18% increase (~2.8M -> 3.3M pkts/s) 20 sockets: 14% increase (~2.9M -> 3.3M pkts/s) 40 sockets: 13% increase (~3.0M -> 3.4M pkts/s) This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	soreuseport: define reuseport groups	Craig Gallek
	struct sock_reuseport is an optional shared structure referenced by each socket belonging to a reuseport group. When a socket is bound to an address/port not yet in use and the reuseport flag has been set, the structure will be allocated and attached to the newly bound socket. When subsequent calls to bind are made for the same address/port, the shared structure will be updated to include the new socket and the newly bound socket will reference the group structure. Usually, when an incoming packet was destined for a reuseport group, all sockets in the same group needed to be considered before a dispatching decision was made. With this structure, an appropriate socket can be found after looking up just one socket in the group. This shared structure will also allow for more complicated decisions to be made when selecting a socket (eg a BPF filter). This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	Merge branch 'mlxsw-fixes'	David S. Miller
	Jiri Pirko says: ==================== mlxsw: couple of fixes Couple of fixes from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	mlxsw: spectrum: Change bridge port attributes only when bridged	Ido Schimmel
	Bridge port attributes are offloaded to hardware when invoked with SELF flag set, but it really makes no sense to reflect them when port is not bridged. Allow a user to change these attribute only when port is bridged and initialize them correctly when joining or leaving a bridge. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	mlxsw: spectrum: Set bridge status in appropriate functions	Ido Schimmel
	Set the bridge status of physical ports in the appropriate functions, to be consistent with LAG join/leave and vPorts joining/leaving bridge. Also, remove the error messages in these two functions, as we already emit errors in both the single functions they call. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	mlxsw: spectrum: Return NOTIFY_BAD on bridge failure	Ido Schimmel
	It is possible for us to fail when joining or leaving a bridge, so let the user know about that by returning NOTIFY_BAD, as already done for LAG join/leave and 802.1D bridges. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	mlxsw: spectrum: Initialize PVID only once	Ido Schimmel
	We set PVID to 1 in mlxsw_sp_port_vlan_init(), so we can remove this statement. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	r8152: add reset_resume function	hayeswang
	When the reset_resume() is called, the flag of SELECTIVE_SUSPEND should be cleared and reinitialize the device, whether the SELECTIVE_SUSPEND is set or not. If reset_resume() is called, it means the power supply is cut or the device is reset. That is, the device wouldn't be in runtime suspend state and the reinitialization is necessary. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	chelsio: constify cphy_ops structures	Julia Lawall
	The cphy_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	fsl/fman: allow modular build	Arnd Bergmann
	ARM allmodconfig fails because of the addition of the FMAN driver: drivers/built-in.o: In function `dtsec_restart_autoneg': binder.c:(.text+0x173328): undefined reference to `mdiobus_read' binder.c:(.text+0x173348): undefined reference to `mdiobus_write' drivers/built-in.o: In function `dtsec_config': binder.c:(.text+0x173d24): undefined reference to `of_phy_find_device' drivers/built-in.o: In function `init_phy': binder.c:(.text+0x1763b0): undefined reference to `of_phy_connect' drivers/built-in.o: In function `stop': binder.c:(.text+0x176014): undefined reference to `phy_stop' drivers/built-in.o: In function `start': binder.c:(.text+0x176078): undefined reference to `phy_start' The reason is that the driver uses PHYLIB, but that is a loadable module here, and fman itself is built-in. This patch makes it possible to configure fman as a module as well so we don't change the status of PHYLIB in an allmodconfig kernel, and it adds a 'select PHYLIB' statement to ensure that phylib is always built-in when fman is. The driver uses "builtin_platform_driver(fman_driver);", which means it cannot be unloaded, but it's still possible to have it as a loadable module that gets loaded once and never removed. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5adae51a64b8 ("fsl/fman: Add FMan MURAM support") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	net: make ip6tunnel_xmit definition conditional	Arnd Bergmann
	Moving the caller of iptunnel_xmit_stats causes a build error in randconfig builds that disable CONFIG_INET: In file included from ../net/xfrm/xfrm_input.c:17:0: ../include/net/ip6_tunnel.h: In function 'ip6tunnel_xmit': ../include/net/ip6_tunnel.h:93:2: error: implicit declaration of function 'iptunnel_xmit_stats' [-Werror=implicit-function-declaration] iptunnel_xmit_stats(dev, pkt_len); The reason is that the iptunnel_xmit_stats definition is hidden inside #ifdef CONFIG_INET but the caller is not. We can change one or the other to fix it, and this patch adds a second #ifdef around ip6tunnel_xmit() to avoid seeing the invalid call. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	Merge tag 'nfc-next-4.5-1' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.5 pull request This is the first NFC pull request for 4.5 and it brings: - A new driver for the STMicroelectronics ST95HF NFC chipset. The ST95HF is an NFC digital transceiver with an embedded analog front-end and as such relies on the Linux NFC digital implementation. This is the 3rd user of the NFC digital stack. - ACPI support for the ST st-nci and st21nfca drivers. - A small improvement for the nfcsim driver, as we can now tune the Rx delay through sysfs. - A bunch of minor cleanups and small fixes from Christophe Ricard, for a few drivers and the NFC core code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	connector: bump skb->users before callback invocation	Florian Westphal
	Dmitry reports memleak with syskaller program. Problem is that connector bumps skb usecount but might not invoke callback. So move skb_get to where we invoke the callback. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04	udp: properly support MSG_PEEK with truncated buffers	Eric Dumazet
	Backport of this upstream commit into stable kernels : 89c22d8c3b27 ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>