Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Fourth set of IIO fixes for the 4.6 cycle.
This last minute set is concerned with a regression in the mpu6050 driver.
The regression causes a null pointer dereference on any ACPI device
that has one of these present such as the ASUS T100TA Baytrail/T.
The issue was known but thought (i.e. missunderstood by me)
to only be a possible with no reports, so was routed via the normal merge
window. Turns out this was wrong (thanks to Alan for reporting the crash).
The pull is just for the null dereference fix and a followup fix
that also stops the reported name of the device being NULL.
* mpu6050
- Fix a 'possible' NULL dereference introduced as part of splitting the
driver to allow both i2c and spi to be supported. The issue affects ACPI
systems with this device.
- Fix a follow up issue where the name and chip id both get set to null if
the device driver instance is instantiated from ACPI tables.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple last-minute fixes for ARM SoCs. Most of them are
for the OMAP platforms, the rest are all for different platforms.
OMAP:
All dts fixes, mostly affecting voltages and pinctrl for various
device drivers:
- Regulator minimum voltage fixes for omap5
- ISP syscon register offset fix for omap3
- Fix regulator initial modes for n900
- Fix omap5 pinctrl wkup instance size
Allwinner:
Remove incorrect constraints from a dcdc1 regulator
Alltera SoCFPGA:
Fix compilation in thumb2 mode
Samsung exynos:
Fix a potential oops in the pm-domain error handling
Davinci:
Avoid a link error if NVMEM is disabled
Renesas:
Do not mark an external uart clock as disabled, to allow probing
the uarts"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: davinci: only use NVMEM when available
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: dts: omap5: fix range of permitted wakeup pinmux registers
ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode
ARM: dts: omap3: Fix ISP syscon register offset
ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges
ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges
arm64: dts: r8a7795: Don't disable referenced optional scif clock
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator
|
|
Update my email and web addresses in the kernel maintainers file.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.
This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.
For example, in a system having 1GB of RAM, a single spinning disk, and a
"pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.
Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a background
writeback worker. But the worker tests the wrong threshold (200 instead of
100), so it does not initiate writeback and just returns.
Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero throughput
through the FUSE BDI.
The fix is to call the parameterized variant of wb_calc_thresh, so that the
worker will do writeback if the bg_thresh is exceeded which was the
behavior before the referenced commit.
Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations")
Signed-off-by: Howard Cochran <hcochran@kernelspring.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Tested-by Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
copy_params()'s use of __GFP_REPEAT for the __vmalloc() call doesn't make much
sense because vmalloc doesn't rely on costly high order allocations.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Also fix some typos and make all "smq" and "mq" references consistently
lowercase.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The primary motivation of this commit is to improve the scalability of
DM multipath on large NUMA systems where m->lock spinlock contention has
been proven to be a serious bottleneck on really fast storage.
The ability to atomically read a pointer, using lockless_dereference(),
is leveraged in this commit. But all pointer writes are still protected
by the m->lock spinlock (which is fine since these all now occur in the
slow-path).
The following functions no longer require the m->lock spinlock in their
fast-path: multipath_busy(), __multipath_map(), and do_end_io()
And choose_pgpath() is modified to _not_ update m->current_pgpath unless
it also switches the path-group. This is done to avoid needing to take
the m->lock everytime __multipath_map() calls choose_pgpath().
But m->current_pgpath will be reset if it is failed via fail_path().
Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Allows the 'work_mutex' member to no longer cross a cacheline.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The use of atomic_t for nr_valid_paths, pg_init_in_progress and
pg_init_count will allow relaxing the use of the m->lock spinlock.
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Mechanical change that doesn't make any real effort to reduce the use of
m->lock; that will come later (once atomics are used for counters, etc).
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Return statement at the end of a void function is useless.
The Coccinelle semantic patch used to make this change is as follows:
//<smpl>
@@
identifier f;
expression e;
@@
void f(...) {
<...
- return
e;
...>
}
//</smpl>
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Needed in order to update the DM thinp code to use the new async
__blkdev_issue_discard() interface.
|
|
Commit 326e1dbb57 ("block: remove management of bi_remaining when
restoring original bi_end_io") made bio_inc_remaining() private to bio.c
because the only use-case that made sense was confined to the
bio_chain() interface.
Since that time DM thinp went on to use bio_chain() in its relatively
complex implementation of async discard support. That implementation,
even when converted over to use the new async __blkdev_issue_discard()
interface, depends on deferred completion of the original discard bio --
which is most appropriately implemented using bio_inc_remaining().
DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as
__bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to
put an end to that foolishness.
All said, bio_inc_remaining() should really only be used in conjunction
with bio_chain(). It isn't intended for generic bio reference counting.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit 38f25255330 ("block: add __blkdev_issue_discard") incorrectly
disallowed the early return of -EOPNOTSUPP if the device doesn't support
discard (or secure discard). This early return of -EOPNOTSUPP has
always been part of blkdev_issue_discard() interface so there isn't a
good reason to break that behaviour -- especially when it can be easily
reinstated.
The nuance of allowing early return of -EOPNOTSUPP vs disallowing late
return of -EOPNOTSUPP is: if the overall device never advertised support
for discards and one is issued to the device it is beneficial to inform
the caller that discards are not supported via -EOPNOTSUPP. But if a
device advertises discard support it means that at least a subset of the
device does have discard support -- but it could be that discards issued
to some regions of a stacked device will not be supported. In that case
the late return of -EOPNOTSUPP must be disallowed.
Fixes: 38f25255330 ("block: add __blkdev_issue_discard")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Recursive undefined instrcution falut is seen with R-class taking an
exception. The reson for that is __show_regs() tries to get domain
information, but domains is not available on !MMU cores, like R/M
class.
Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU
guard and providing stubs for the case where domains is not supported.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Commit 19accfd3 (ARM: move vector stubs) moved the vector stubs in an
additional page above the base vector one. This change wasn't taken into
account by the nommu memreserve.
This patch ensures that the kernel won't overwrite any vector stub on
nommu.
[changed the MPU side too]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) broke the support for
MPU on ARMv7-R. This patch adapts the code inside CONFIG_ARM_MPU to use
memblocks appropriately.
MPU initialisation only uses the first memory region, and removes all
subsequent ones. Because looping over all regions that need removal is
inefficient, and memblock_remove already handles memory ranges, we can
flatten the 'for_each_memblock' part.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.
This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.
Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fix from Eric Biederman:
"This contains just a single fix for a nasty oops"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
propogate_mnt: Handle the first propogated copy being a slave
|
|
Pull virtio/qemu fixes from Michael Tsirkin:
"A couple of fixes for virtio and for the new QEMU fw cfg driver"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: Silence uninitialized variable warning
firmware: qemu_fw_cfg.c: potential unintialized variable
|
|
On DCE6.1 PPLL2 is exclusively available to UNIPHYA, so it should not
be taken into consideration when looking for an already enabled PLL
to be shared with other outputs.
This fixes the broken VGA port (TRAVIS DP->VGA bridge) on my Richland
based laptop, where the internal display is connected to UNIPHYA through
a TRAVIS DP->LVDS bridge.
Bug:
https://bugs.freedesktop.org/show_bug.cgi?id=78987
v2: agd: add check in radeon_get_shared_nondp_ppll as well, drop
extra parameter.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
There is an issue observed when we hotplug a second DP
4K monitor to the system. Sometimes, the link training
fails for the second monitor after HPD interrupt
generation.
The issue happens when some queued or deferred transactions
are already present on the AUX channel when we initiate
a new transcation to (say) get DPCD or during link training.
We set AUX_IGNORE_HPD_DISCON bit in the AUX_CONTROL
register so that we can ignore any such deferred
transactions when a new AUX transaction is initiated.
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: marc.zyngier@arm.com
Cc: antoine.tenart@free-electrons.com
Cc: jason@lakedaemon.net
Cc: tsahee@annapurnalabs.com
Link: http://lkml.kernel.org/r/1462459265-20974-1-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When the first propgated copy was a slave the following oops would result:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> PGD bacd4067 PUD bac66067 PMD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
> RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283
> RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
> RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
> R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
> FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
> Stack:
> ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
> ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
> 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
> Call Trace:
> [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
> [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
> [<ffffffff811f1ec3>] graft_tree+0x63/0x70
> [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
> [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
> [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
> [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
> [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
> [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
> RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP <ffff8800bac3fd38>
> CR2: 0000000000000010
> ---[ end trace 2725ecd95164f217 ]---
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
Cc: stable@vger.kernel.org
fixes: f2ebb3a921c1ca1e2ddd9242e95a1989a50c4c68 ("smarter propagate_mnt()")
Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
If a quota bit is set in NFACCT_FLAGS but the NFACCT_QUOTA parameter is
missing then a NULL pointer dereference is triggered. CAP_NET_ADMIN is
required to trigger the bug.
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently, we support set names of up to 16 bytes, get this aligned
with the maximum length we can use in ipset to make it easier when
considering migration to nf_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The dprintf() and duprintf() functions are enabled at compile time,
these days we have better runtime debugging through pr_debug() and
static keys.
On top of this, this debugging is so old that I don't expect anyone
using this anymore, so let's get rid of this.
IP_NF_ASSERT() is still left in place, although this needs that
NETFILTER_DEBUG is enabled, I think these assertions provide useful
context information when reading the code.
Note that ARP_NF_ASSERT() has been removed as there is no user of
this.
Kill also DEBUG_ALLOW_ALL and a couple of pr_error() and pr_debug()
spots that are inconsistently placed in the code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If the protocol is not natively supported, this assigns generic protocol
tracker so we can always assume a valid pointer after these calls.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joe@ovn.org>
|
|
This patch introduces nf_ct_resolve_clash() to resolve race condition on
conntrack insertions.
This is particularly a problem for connection-less protocols such as
UDP, with no initial handshake. Two or more packets may race to insert
the entry resulting in packet drops.
Another problematic scenario are packets enqueued to userspace via
NFQUEUE after the raw table, that make it easier to trigger this
race.
To resolve this, the idea is to reset the conntrack entry to the one
that won race. Packet and bytes counters are also merged.
The 'insert_failed' stats still accounts for this situation, after
this patch, the drop counter is bumped whenever we drop packets, so we
can watch for unresolved clashes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce a helper function to update conntrack counters.
__nf_ct_kill_acct() was unnecessarily subtracting skb_network_offset()
that is expected to be zero from the ipv4/ipv6 hooks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Remove unnecessary check for non-nul pointer in destroy_conntrack()
given that __nf_ct_l4proto_find() returns the generic protocol tracker
if the protocol is not supported.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When iterating, skip conntrack entries living in a different netns.
We could ignore netns and kill some other non-assured one, but it
has two problems:
- a netns can kill non-assured conntracks in other namespace
- we would start to 'over-subscribe' the affected/overlimit netns.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.
Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.
NAT bysrc and expectation hash is still per namespace, those will
changed too soon.
Future patch will also make conntrack object slab cache global again.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Once we place all conntracks into a global hash table we want them to be
spread across entire hash table, even if namespaces have overlapping ip
addresses.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Once we place all conntracks in the same hash table we must also compare
the netns pointer to skip conntracks that belong to a different namespace.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The iteration process is lockless, so we test if the conntrack object is
eligible for printing (e.g. is AF_INET) after obtaining the reference
count.
Once we put all conntracks into same hash table we might see more
entries that need to be skipped.
So add a helper and first perform the test in a lockless fashion
for fast skip.
Once we obtain the reference count, just repeat the check.
Note that this refactoring also includes a missing check for unconfirmed
conntrack entries due to slab rcu object re-usage, so they need to be
skipped since they are not part of the listing.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This prepares for upcoming change that places all conntracks into a
single, global table. For this to work we will need to also compare
net pointer during lookup. To avoid open-coding such check use the
nf_ct_key_equal helper and then later extend it to also consider net_eq.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Once we place all conntracks into same table iteration becomes more
costly because the table contains conntracks that we are not interested
in (belonging to other netns).
So don't bother scanning if the current namespace has no entries.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When resizing the conntrack hash table at runtime via
echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with
the conntrack lookup path -- reads can happen in parallel and nothing
prevents readers from observing a the newly allocated hash but the old
size (or vice versa).
So access to hash[bucket] can trigger OOB read access in case the table got
expanded and we saw the new size but the old hash pointer (or it got shrunk
and we got new hash ptr but the size of the old and larger table):
kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107
[..]
Call Trace:
[<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90
[<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0
[<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760
Use generation counter to obtain the address/length of the same table.
Also add a synchronize_net before freeing the old hash.
AFAICS, without it we might access ct_hash[bucket] after ct_hash has been
freed, provided that lockless reader got delayed by another event:
CPU1 CPU2
seq_begin
seq_retry
<delay> resize occurs
free oldhash
for_each(oldhash[size])
Note that resize is only supported in init_netns, it took over 2 minutes
of constant resizing+flooding to produce the warning, so this isn't a
big problem in practice.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
No need to disable BH here anymore:
stats are switched to _ATOMIC variant (== this_cpu_inc()), which
nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Conntrack labels are currently sized depending on the iptables
ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we
would allocate enough room to store at least bit 65.
However, with nft, the input is just a register with arbitrary runtime
content.
We therefore ask for the upper ceiling we currently have, which is
enough room to store 128 bits.
Alternatively, we could alter nf_connlabel_replace to increase
net->ct.label_words at run time, but since 128 bits is not that
big we'd only save sizeof(long) so it doesn't seem worth it for now.
This follows a similar approach that xtables 'connlabel'
match uses, so when user inputs
ct label set bar
then we will set the bit used by the 'bar' label and leave the rest alone.
This is done by passing the sreg content to nf_connlabels_replace
as both value and mask argument.
Labels (bits) already set thus cannot be re-set to zero, but
this is not supported by xtables connlabel match either.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.
This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.
Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.
This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.
Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Peter Jones <pjones@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The problem with ornamental, do-nothing gotos is that they lead to
"forgot to set the error code" bugs. We should be returning -EINVAL
here but we don't. It leads to an uninitalized variable in
counter_show():
drivers/acpi/sysfs.c:603 counter_show()
error: uninitialized symbol 'status'.
Fixes: 1c8fce27e275 (ACPI: introduce drivers/acpi/sysfs.c)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ACPICA commit a2327ba410e19c2aabaf34b711dbadf7d1dcf346
Version 20160422.
Link: https://github.com/acpica/acpica/commit/a2327ba4
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ACPICA commit ba60e4500053010bf775d58f6f61febbdb94d817
New file is utascii.c
Link: https://github.com/acpica/acpica/commit/ba60e450
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_hw_write()
ACPICA commit 48eea5e7993ccb7189bd63cd726e02adafee6057
This patch adds access_width/bit_offset support in acpi_hw_write().
Lv Zheng.
Link: https://github.com/acpica/acpica/commit/48eea5e7
Link: https://bugs.acpica.org/show_bug.cgi?id=1240
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_hw_read()
ACPICA commit 96ece052d4d073aae4f935f0ff0746646aea1174
ACPICA commit 3d8583a054e410f2ea4d73b48986facad9cfc0d4
This patch adds access_width/bit_offset support in acpi_hw_read().
This also enables GAS definition where bit_width is not a power of
two. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/96ece052
Link: https://github.com/acpica/acpica/commit/3d8583a0
Link: https://bugs.acpica.org/show_bug.cgi?id=1240
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ACPICA commit c23034a3a09d5ed79f1827d51f43cfbccf68ab64
A regression was reported to the shift offset >= width of type.
This patch fixes this issue. BZ 1270.
This is a part of the fix because the order of the patches are modified for
Linux upstream, containing the cleanups for the old code. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/c23034a3
Link: https://bugs.acpica.org/show_bug.cgi?id=1270
Reported-by: Sascha Wildner <swildner@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|