git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-05-05	Merge tag 'iio-fixes-for-4.6d' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Fourth set of IIO fixes for the 4.6 cycle. This last minute set is concerned with a regression in the mpu6050 driver. The regression causes a null pointer dereference on any ACPI device that has one of these present such as the ASUS T100TA Baytrail/T. The issue was known but thought (i.e. missunderstood by me) to only be a possible with no reports, so was routed via the normal merge window. Turns out this was wrong (thanks to Alan for reporting the crash). The pull is just for the null dereference fix and a followup fix that also stops the reported name of the device being NULL. * mpu6050 - Fix a 'possible' NULL dereference introduced as part of splitting the driver to allow both i2c and spi to be supported. The issue affects ACPI systems with this device. - Fix a follow up issue where the name and chip id both get set to null if the device driver instance is instantiated from ACPI tables.
2016-05-05	Merge tag 'fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Here are a couple last-minute fixes for ARM SoCs. Most of them are for the OMAP platforms, the rest are all for different platforms. OMAP: All dts fixes, mostly affecting voltages and pinctrl for various device drivers: - Regulator minimum voltage fixes for omap5 - ISP syscon register offset fix for omap3 - Fix regulator initial modes for n900 - Fix omap5 pinctrl wkup instance size Allwinner: Remove incorrect constraints from a dcdc1 regulator Alltera SoCFPGA: Fix compilation in thumb2 mode Samsung exynos: Fix a potential oops in the pm-domain error handling Davinci: Avoid a link error if NVMEM is disabled Renesas: Do not mark an external uart clock as disabled, to allow probing the uarts" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: davinci: only use NVMEM when available ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel ARM: dts: omap5: fix range of permitted wakeup pinmux registers ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode ARM: dts: omap3: Fix ISP syscon register offset ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges arm64: dts: r8a7795: Don't disable referenced optional scif clock ARM: EXYNOS: Properly skip unitialized parent clock in power domain on ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator
2016-05-05	maintainers: update rmk's email address(es)	Russell King
	Update my email and web addresses in the kernel maintainers file. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05	writeback: Fix performance regression in wb_over_bg_thresh()	Howard Cochran
	Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") unintentionally changed this function's meaning from "are there more dirty pages than the background writeback threshold" to "are there more dirty pages than the writeback threshold". The background writeback threshold is typically half of the writeback threshold, so this had the effect of raising the number of dirty pages required to cause a writeback worker to perform background writeout. This can cause a very severe performance regression when a BDI uses BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker can now disagree on whether writeback should be initiated. For example, in a system having 1GB of RAM, a single spinning disk, and a "pass-through" FUSE filesystem mounted over the disk, application code mmapped a 128MB file on the disk and was randomly dirtying pages in that mapping. Because FUSE uses strictlimit and has a default max_ratio of only 1%, in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the dirty_freerun_ceiling is the average of those, ~150. So, it pauses the dirtying processes when we have 151 dirty pages and wakes up a background writeback worker. But the worker tests the wrong threshold (200 instead of 100), so it does not initiate writeback and just returns. Thus, balance_dirty_pages keeps looping, sleeping and then waking up the worker who will do nothing. It remains stuck in this state until the few dirty pages that we have finally expire and we write them back for that reason. Then the whole process repeats, resulting in near-zero throughput through the FUSE BDI. The fix is to call the parameterized variant of wb_calc_thresh, so that the worker will do writeback if the bg_thresh is exceeded which was the behavior before the referenced commit. Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+ Tested-by Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-05	dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call	Michal Hocko
	copy_params()'s use of __GFP_REPEAT for the __vmalloc() call doesn't make much sense because vmalloc doesn't rely on costly high order allocations. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm stats: fix spelling mistake in Documentation	Eric Engestrom
	Signed-off-by: Eric Engestrom <eric@engestrom.ch> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm cache: update cache-policies.txt now that mq is an alias for smq	Mike Snitzer
	Also fix some typos and make all "smq" and "mq" references consistently lowercase. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm mpath: eliminate use of spinlock in IO fast-paths	Mike Snitzer
	The primary motivation of this commit is to improve the scalability of DM multipath on large NUMA systems where m->lock spinlock contention has been proven to be a serious bottleneck on really fast storage. The ability to atomically read a pointer, using lockless_dereference(), is leveraged in this commit. But all pointer writes are still protected by the m->lock spinlock (which is fine since these all now occur in the slow-path). The following functions no longer require the m->lock spinlock in their fast-path: multipath_busy(), __multipath_map(), and do_end_io() And choose_pgpath() is modified to _not_ update m->current_pgpath unless it also switches the path-group. This is done to avoid needing to take the m->lock everytime __multipath_map() calls choose_pgpath(). But m->current_pgpath will be reset if it is failed via fail_path(). Suggested-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm mpath: move trigger_event member to the end of 'struct multipath'	Mike Snitzer
	Allows the 'work_mutex' member to no longer cross a cacheline. Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm mpath: use atomic_t for counting members of 'struct multipath'	Mike Snitzer
	The use of atomic_t for nr_valid_paths, pg_init_in_progress and pg_init_count will allow relaxing the use of the m->lock spinlock. Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm mpath: switch to using bitops for state flags	Mike Snitzer
	Mechanical change that doesn't make any real effort to reduce the use of m->lock; that will come later (once atomics are used for counters, etc). Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm thin: Remove return statement from void function	Amitoj Kaur Chawla
	Return statement at the end of a void function is useless. The Coccinelle semantic patch used to make this change is as follows: //<smpl> @@ identifier f; expression e; @@ void f(...) { <... - return e; ...> } //</smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	dm: remove unused mapped_device argument from free_tio()	Mike Snitzer
	Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05	Merge remote-tracking branch 'jens/for-4.7/core' into dm-4.7	Mike Snitzer
	Needed in order to update the DM thinp code to use the new async __blkdev_issue_discard() interface.
2016-05-05	block: make bio_inc_remaining() interface accessible again	Mike Snitzer
	Commit 326e1dbb57 ("block: remove management of bi_remaining when restoring original bi_end_io") made bio_inc_remaining() private to bio.c because the only use-case that made sense was confined to the bio_chain() interface. Since that time DM thinp went on to use bio_chain() in its relatively complex implementation of async discard support. That implementation, even when converted over to use the new async __blkdev_issue_discard() interface, depends on deferred completion of the original discard bio -- which is most appropriately implemented using bio_inc_remaining(). DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as __bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to put an end to that foolishness. All said, bio_inc_remaining() should really only be used in conjunction with bio_chain(). It isn't intended for generic bio reference counting. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-05	block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard	Mike Snitzer
	Commit 38f25255330 ("block: add __blkdev_issue_discard") incorrectly disallowed the early return of -EOPNOTSUPP if the device doesn't support discard (or secure discard). This early return of -EOPNOTSUPP has always been part of blkdev_issue_discard() interface so there isn't a good reason to break that behaviour -- especially when it can be easily reinstated. The nuance of allowing early return of -EOPNOTSUPP vs disallowing late return of -EOPNOTSUPP is: if the overall device never advertised support for discards and one is issued to the device it is beneficial to inform the caller that discards are not supported via -EOPNOTSUPP. But if a device advertises discard support it means that at least a subset of the device does have discard support -- but it could be that discards issued to some regions of a stacked device will not be supported. In that case the late return of -EOPNOTSUPP must be disallowed. Fixes: 38f25255330 ("block: add __blkdev_issue_discard") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-05	ARM: 8573/1: domain: move {set,get}_domain under config guard	Vladimir Murzin
	Recursive undefined instrcution falut is seen with R-class taking an exception. The reson for that is __show_regs() tries to get domain information, but domains is not available on !MMU cores, like R/M class. Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU guard and providing stubs for the case where domains is not supported. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05	ARM: 8572/1: nommu: change memory reserve for the vectors	Jean-Philippe Brucker
	Commit 19accfd3 (ARM: move vector stubs) moved the vector stubs in an additional page above the base vector one. This change wasn't taken into account by the nommu memreserve. This patch ensures that the kernel won't overwrite any vector stub on nommu. [changed the MPU side too] Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05	ARM: 8571/1: nommu: fix PMSAv7 setup	Jean-Philippe Brucker
	Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) broke the support for MPU on ARMv7-R. This patch adapts the code inside CONFIG_ARM_MPU to use memblocks appropriately. MPU initialisation only uses the first memory region, and removes all subsequent ones. Because looping over all regions that need removal is inefficient, and memblock_remove already handles memory ranges, we can flatten the 'for_each_memblock' part. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-05-05	IB/iser: Fix max_sectors calculation	Christoph Hellwig
	iSER currently has a couple places that set max_sectors in either the host template or SCSI host, and all of them get it wrong. This patch instead uses a single assignment that (hopefully) gets it right: the max_sectors value must be derived from the number of segments in the FR or FMR structure, but actually be one lower than the page size multiplied by the number of sectors, as it has to handle the case of non-aligned I/O. Without this I get trivial to reproduce hangs when running xfstests (on XFS) over iSER to Linux targets. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "This contains just a single fix for a nasty oops" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: propogate_mnt: Handle the first propogated copy being a slave
2016-05-05	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio/qemu fixes from Michael Tsirkin: "A couple of fixes for virtio and for the new QEMU fw cfg driver" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: Silence uninitialized variable warning firmware: qemu_fw_cfg.c: potential unintialized variable
2016-05-05	drm/radeon: fix PLL sharing on DCE6.1 (v2)	Lucas Stach
	On DCE6.1 PPLL2 is exclusively available to UNIPHYA, so it should not be taken into consideration when looking for an already enabled PLL to be shared with other outputs. This fixes the broken VGA port (TRAVIS DP->VGA bridge) on my Richland based laptop, where the internal display is connected to UNIPHYA through a TRAVIS DP->LVDS bridge. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=78987 v2: agd: add check in radeon_get_shared_nondp_ppll as well, drop extra parameter. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-05	drm/radeon: fix DP link training issue with second 4K monitor	Arindam Nath
	There is an issue observed when we hotplug a second DP 4K monitor to the system. Sometimes, the link training fails for the second monitor after HPD interrupt generation. The issue happens when some queued or deferred transactions are already present on the AUX channel when we initiate a new transcation to (say) get DPCD or during link training. We set AUX_IGNORE_HPD_DISCON bit in the AUX_CONTROL register so that we can ignore any such deferred transactions when a new AUX transaction is initiated. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-05	irqchip/alpine-msi: Don't use <asm-generic/msi.h>	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: marc.zyngier@arm.com Cc: antoine.tenart@free-electrons.com Cc: jason@lakedaemon.net Cc: tsahee@annapurnalabs.com Link: http://lkml.kernel.org/r/1462459265-20974-1-git-send-email-hch@lst.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-05	propogate_mnt: Handle the first propogated copy being a slave	Eric W. Biederman
	When the first propgated copy was a slave the following oops would result: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > PGD bacd4067 PUD bac66067 PMD 0 > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000 > RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283 > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480 > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000 > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00 > FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0 > Stack: > ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85 > ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40 > 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0 > Call Trace: > [<ffffffff811fbf85>] propagate_mnt+0x105/0x140 > [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0 > [<ffffffff811f1ec3>] graft_tree+0x63/0x70 > [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100 > [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0 > [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70 > [<ffffffff811f3a45>] SyS_mount+0x75/0xc0 > [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0 > [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25 > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30 > RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP <ffff8800bac3fd38> > CR2: 0000000000000010 > ---[ end trace 2725ecd95164f217 ]--- This oops happens with the namespace_sem held and can be triggered by non-root users. An all around not pleasant experience. To avoid this scenario when finding the appropriate source mount to copy stop the walk up the mnt_master chain when the first source mount is encountered. Further rewrite the walk up the last_source mnt_master chain so that it is clear what is going on. The reason why the first source mount is special is that it it's mnt_parent is not a mount in the dest_mnt propagation tree, and as such termination conditions based up on the dest_mnt mount propgation tree do not make sense. To avoid other kinds of confusion last_dest is not changed when computing last_source. last_dest is only used once in propagate_one and that is above the point of the code being modified, so changing the global variable is meaningless and confusing. Cc: stable@vger.kernel.org fixes: f2ebb3a921c1ca1e2ddd9242e95a1989a50c4c68 ("smarter propagate_mnt()") Reported-by: Tycho Andersen <tycho.andersen@canonical.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-05-05	netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter	Phil Turnbull
	If a quota bit is set in NFACCT_FLAGS but the NFACCT_QUOTA parameter is missing then a NULL pointer dereference is triggered. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: nf_tables: allow set names up to 32 bytes	Pablo Neira Ayuso
	Currently, we support set names of up to 16 bytes, get this aligned with the maximum length we can use in ipset to make it easier when considering migration to nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: x_tables: get rid of old and inconsistent debugging	Pablo Neira Ayuso
	The dprintf() and duprintf() functions are enabled at compile time, these days we have better runtime debugging through pr_debug() and static keys. On top of this, this debugging is so old that I don't expect anyone using this anymore, so let's get rid of this. IP_NF_ASSERT() is still left in place, although this needs that NETFILTER_DEBUG is enabled, I think these assertions provide useful context information when reading the code. Note that ARP_NF_ASSERT() has been removed as there is no user of this. Kill also DEBUG_ALLOW_ALL and a couple of pr_error() and pr_debug() spots that are inconsistently placed in the code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	openvswitch: __nf_ct_l{3,4}proto_find() always return a valid pointer	Pablo Neira Ayuso
	If the protocol is not natively supported, this assigns generic protocol tracker so we can always assume a valid pointer after these calls. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Joe Stringer <joe@ovn.org>
2016-05-05	netfilter: conntrack: introduce clash resolution on insertion race	Pablo Neira Ayuso
	This patch introduces nf_ct_resolve_clash() to resolve race condition on conntrack insertions. This is particularly a problem for connection-less protocols such as UDP, with no initial handshake. Two or more packets may race to insert the entry resulting in packet drops. Another problematic scenario are packets enqueued to userspace via NFQUEUE after the raw table, that make it easier to trigger this race. To resolve this, the idea is to reset the conntrack entry to the one that won race. Packet and bytes counters are also merged. The 'insert_failed' stats still accounts for this situation, after this patch, the drop counter is bumped whenever we drop packets, so we can watch for unresolved clashes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: introduce nf_ct_acct_update()	Pablo Neira Ayuso
	Introduce a helper function to update conntrack counters. __nf_ct_kill_acct() was unnecessarily subtracting skb_network_offset() that is expected to be zero from the ipv4/ipv6 hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: __nf_ct_l4proto_find() always returns valid pointer	Pablo Neira Ayuso
	Remove unnecessary check for non-nul pointer in destroy_conntrack() given that __nf_ct_l4proto_find() returns the generic protocol tracker if the protocol is not supported. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: consider ct netns in early_drop logic	Florian Westphal
	When iterating, skip conntrack entries living in a different netns. We could ignore netns and kill some other non-assured one, but it has two problems: - a netns can kill non-assured conntracks in other namespace - we would start to 'over-subscribe' the affected/overlimit netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: use a single hashtable for all namespaces	Florian Westphal
	We already include netns address in the hash and compare the netns pointers during lookup, so even if namespaces have overlapping addresses entries will be spread across the table. Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a 64bit system. NAT bysrc and expectation hash is still per namespace, those will changed too soon. Future patch will also make conntrack object slab cache global again. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: make netns address part of hash	Florian Westphal
	Once we place all conntracks into a global hash table we want them to be spread across entire hash table, even if namespaces have overlapping ip addresses. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: check netns when comparing conntrack objects	Florian Westphal
	Once we place all conntracks in the same hash table we must also compare the netns pointer to skip conntracks that belong to a different namespace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: small refactoring of conntrack seq_printf	Florian Westphal
	The iteration process is lockless, so we test if the conntrack object is eligible for printing (e.g. is AF_INET) after obtaining the reference count. Once we put all conntracks into same hash table we might see more entries that need to be skipped. So add a helper and first perform the test in a lockless fashion for fast skip. Once we obtain the reference count, just repeat the check. Note that this refactoring also includes a missing check for unconfirmed conntrack entries due to slab rcu object re-usage, so they need to be skipped since they are not part of the listing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: use nf_ct_key_equal() in more places	Florian Westphal
	This prepares for upcoming change that places all conntracks into a single, global table. For this to work we will need to also compare net pointer during lookup. To avoid open-coding such check use the nf_ct_key_equal helper and then later extend it to also consider net_eq. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: don't attempt to iterate over empty table	Florian Westphal
	Once we place all conntracks into same table iteration becomes more costly because the table contains conntracks that we are not interested in (belonging to other netns). So don't bother scanning if the current namespace has no entries. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: fix lookup race during hash resize	Florian Westphal
	When resizing the conntrack hash table at runtime via echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with the conntrack lookup path -- reads can happen in parallel and nothing prevents readers from observing a the newly allocated hash but the old size (or vice versa). So access to hash[bucket] can trigger OOB read access in case the table got expanded and we saw the new size but the old hash pointer (or it got shrunk and we got new hash ptr but the size of the old and larger table): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107 [..] Call Trace: [<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90 [<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0 [<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760 Use generation counter to obtain the address/length of the same table. Also add a synchronize_net before freeing the old hash. AFAICS, without it we might access ct_hash[bucket] after ct_hash has been freed, provided that lockless reader got delayed by another event: CPU1 CPU2 seq_begin seq_retry <delay> resize occurs free oldhash for_each(oldhash[size]) Note that resize is only supported in init_netns, it took over 2 minutes of constant resizing+flooding to produce the warning, so this isn't a big problem in practice. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: conntrack: keep BH enabled during lookup	Florian Westphal
	No need to disable BH here anymore: stats are switched to _ATOMIC variant (== this_cpu_inc()), which nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	netfilter: nftables: add connlabel set support	Florian Westphal
	Conntrack labels are currently sized depending on the iptables ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we would allocate enough room to store at least bit 65. However, with nft, the input is just a register with arbitrary runtime content. We therefore ask for the upper ceiling we currently have, which is enough room to store 128 bits. Alternatively, we could alter nf_connlabel_replace to increase net->ct.label_words at run time, but since 128 bits is not that big we'd only save sizeof(long) so it doesn't seem worth it for now. This follows a similar approach that xtables 'connlabel' match uses, so when user inputs ct label set bar then we will set the bit used by the 'bar' label and leave the rest alone. This is done by passing the sreg content to nf_connlabels_replace as both value and mask argument. Labels (bits) already set thus cannot be re-set to zero, but this is not supported by xtables connlabel match either. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05	x86/sysfb_efi: Fix valid BAR address range check	Wang YanQing
	The code for checking whether a BAR address range is valid will break out of the loop when a start address of 0x0 is encountered. This behaviour is wrong since by breaking out of the loop we may miss the BAR that describes the EFI frame buffer in a later iteration. Because of this bug I can't use video=efifb: boot parameter to get efifb on my new ThinkPad E550 for my old linux system hard disk with 3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not supporting the GPU. This patch also add a trivial optimization to break out after we find the frame buffer address range without testing later BARs. Signed-off-by: Wang YanQing <udknight@gmail.com> [ Rewrote changelog. ] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Peter Jones <pjones@redhat.com> Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05	ACPI / sysfs: fix error code in get_status()	Dan Carpenter
	The problem with ornamental, do-nothing gotos is that they lead to "forgot to set the error code" bugs. We should be returning -EINVAL here but we don't. It leads to an uninitalized variable in counter_show(): drivers/acpi/sysfs.c:603 counter_show() error: uninitialized symbol 'status'. Fixes: 1c8fce27e275 (ACPI: introduce drivers/acpi/sysfs.c) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05	ACPICA: Update version to 20160422	Bob Moore
	ACPICA commit a2327ba410e19c2aabaf34b711dbadf7d1dcf346 Version 20160422. Link: https://github.com/acpica/acpica/commit/a2327ba4 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05	ACPICA: Move all ASCII utilities to a common file	Bob Moore
	ACPICA commit ba60e4500053010bf775d58f6f61febbdb94d817 New file is utascii.c Link: https://github.com/acpica/acpica/commit/ba60e450 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05	ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support for ↵	Lv Zheng
	acpi_hw_write() ACPICA commit 48eea5e7993ccb7189bd63cd726e02adafee6057 This patch adds access_width/bit_offset support in acpi_hw_write(). Lv Zheng. Link: https://github.com/acpica/acpica/commit/48eea5e7 Link: https://bugs.acpica.org/show_bug.cgi?id=1240 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05	ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support in ↵	Lv Zheng
	acpi_hw_read() ACPICA commit 96ece052d4d073aae4f935f0ff0746646aea1174 ACPICA commit 3d8583a054e410f2ea4d73b48986facad9cfc0d4 This patch adds access_width/bit_offset support in acpi_hw_read(). This also enables GAS definition where bit_width is not a power of two. Lv Zheng. Link: https://github.com/acpica/acpica/commit/96ece052 Link: https://github.com/acpica/acpica/commit/3d8583a0 Link: https://bugs.acpica.org/show_bug.cgi?id=1240 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05	ACPICA: Executer: Introduce a set of macros to handle bit width mask generation	Lv Zheng
	ACPICA commit c23034a3a09d5ed79f1827d51f43cfbccf68ab64 A regression was reported to the shift offset >= width of type. This patch fixes this issue. BZ 1270. This is a part of the fix because the order of the patches are modified for Linux upstream, containing the cleanups for the old code. Lv Zheng. Link: https://github.com/acpica/acpica/commit/c23034a3 Link: https://bugs.acpica.org/show_bug.cgi?id=1270 Reported-by: Sascha Wildner <swildner@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>