Age | Commit message (Collapse) | Author |
|
One CPU is doing ctnetlink_change_helper(), while another CPU is doing
unhelp() at the same time. So even if help->helper is not NULL at first,
the later statement strcmp(help->helper->name, ...) may still access
the NULL pointer.
So we must use rcu_read_lock and rcu_dereference to avoid such _bad_
thing happen.
Fixes: f95d7a46bc57 ("netfilter: ctnetlink: Fix regression in CTA_HELP processing")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When invoke __nf_conntrack_helper_find, it needs the rcu lock to
protect the helper module which would not be unloaded.
Now there are two caller nf_conntrack_helper_try_module_get and
ctnetlink_create_expect which don't hold rcu lock. And the other
callers left like ctnetlink_change_helper, ctnetlink_create_conntrack,
and ctnetlink_glue_attach_expect, they already hold the rcu lock
or spin_lock_bh.
Remove the rcu lock in functions nf_ct_helper_expectfn_find_by_name
and nf_ct_helper_expectfn_find_by_symbol. Because they return one pointer
which needs rcu lock, so their caller should hold the rcu lock, not in
these two functions.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Otherwise, creating a new conntrack via nfnetlink:
# conntrack -I -p udp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20
will emit the wrong ct events(where UPDATE should be NEW):
# conntrack -E
[UPDATE] udp 17 10 src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20
[UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS
I have provided three patches to fix this issue, either in xt_TCPMSS or
in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull block fixes from Jens Axboe:
"Here's a pull request for 4.11-rc, fixing a set of issues mostly
centered around the new scheduling framework. These have been brewing
for a while, but split up into what we absolutely need in 4.11, and
what we can defer until 4.12. These are well tested, on both single
queue and multiqueue setups, and with and without shared tags. They
fix several hangs that have happened in testing.
This is obviously larger than I would have preferred at this point in
time, but I don't think we can shave much off this and still get the
desired results.
In detail, this pull request contains:
- a set of five fixes for NVMe, mostly from Christoph and one from
Roland.
- a series from Bart, fixing issues with dm-mq and SCSI shared tags
and scheduling. Note that one of those patches commit messages may
read like an optimization, but it is in fact an important fix for
queue restarts in particular.
- a series from Omar, most importantly fixing a hang with multiple
hardware queues when we fail to get a driver tag. Another important
fix in there is for resizing hardware queues, which nbd does when
handling multiple sockets for one connection.
- fixing an imbalance in putting the ctx for hctx request allocations
from Minchan"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: Restart a single queue if tag sets are shared
dm rq: Avoid that request processing stalls sporadically
scsi: Avoid that SCSI queues get stuck
blk-mq: Introduce blk_mq_delay_run_hw_queue()
blk-mq: remap queues when adding/removing hardware queues
blk-mq-sched: fix crash in switch error path
blk-mq-sched: set up scheduler tags when bringing up new queues
blk-mq-sched: refactor scheduler initialization
blk-mq: use the right hctx when getting a driver tag fails
nvmet: fix byte swap in nvmet_parse_io_cmd
nvmet: fix byte swap in nvmet_execute_write_zeroes
nvmet: add missing byte swap in nvmet_get_smart_log
nvme: add missing byte swap in nvme_setup_discard
nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
block: do not put mq context in blk_mq_alloc_request_hctx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"This late fix for pin control is hopefully the last I send this cycle.
The problem was detected early in the v4.11 release cycle and there
has been some back and forth on how to solve it. Sadly the proper fix
arrives late, but at least not too late.
An issue was detected with pin control on the Freescale i.MX after the
refactorings for more general group and function handling.
We now have the proper fix for this"
* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.11:
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host
kernel crash triggered by a hostile guest, but only in
configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian
kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
Paul Mackerras"
* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
powerpc/mm: Add missing global TLB invalidate if cxl is active
powerpc/64: Fix flush_(d|i)cache_range() called from modules
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc: Disable HFSCR[TM] if TM is not supported
selftests/powerpc: Fix standalone powerpc build
|
|
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, inputting the following command will succeed but actually the
value will be truncated:
# echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat
This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.
Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Separating discards and zeroout operations allows us to remove the LBPRZ
block zeroing constraints from discards and honor the device preferences
for UNMAP commands.
If supported by the device, we'll also choose UNMAP over one of the
WRITE SAME variants for discards.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that zeroout and discards are distinct operations we need to
separate the policy of choosing the appropriate command. Create a
zeroing_mode which can be one of:
write: Zeroout assist not present, use regular WRITE
writesame: Allow WRITE SAME(10/16) with a zeroed payload
writesame_16_unmap: Allow WRITE SAME(16) with UNMAP
writesame_10_unmap: Allow WRITE SAME(10) with UNMAP
The last two are conditional on the device being thin provisioned with
LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.
Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
if none of the _unmap variants are supported, regular WRITE SAME will be
used if the device supports it.
The zeroout_mode is exported in sysfs and the detected mode for a given
device can be overridden using the string constants above.
With this change in place we can now issue WRITE SAME(16) with UNMAP set
for block zeroing applications that require hard guarantees and
logical_block_size granularity. And at the same time use the UNMAP
command with the device's preferred granulary and alignment for discard
operations.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It seems like DRBD assumes its on the wire TRIM request always zeroes data.
Use that fact to implement REQ_OP_WRITE_ZEROES.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that
support efficient zeroing, we can remove the call to blkdev_issue_discard.
This means we only have two ways of zeroing left and can simplify the
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
mmc only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
rsxx only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's just a in-driver reimplementation of writing zeroes to the pages,
which fails if the discards aren't page aligned.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's identical to discard as hole punches will always leave us with
zeroes on reads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Just the same as discard if the block size equals the system page size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing. Also rename the quirk flag to be a bit
more self-explanatory.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Try to use a write same with unmap bit variant if the device supports it
and the caller allows for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
and prepares for removing the discard_zeroes_data flag.
Also remove a pointless discard support check, which is done in
blkdev_issue_discard already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.
Also clean up the convoluted check for the return condition that this
new flag is added to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If this flag is set logical provisioning capable device should
release space for the zeroed blocks if possible, if it is not set
devices should keep the blocks anchored.
Also remove an out of sync kerneldoc comment for a static function
that would have become even more out of data with this change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Turn the existing discard flag into a new BLKDEV_ZERO_UNMAP flag with
similar semantics, but without referring to diѕcard.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We'll always use the WRITE ZEROES code for zeroing now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It seems like the code currently passes whatever it was using for writes
to WRITE SAME. Just switch it to WRITE ZEROES, although that doesn't
need any payload.
Untested, and confused by the code, maybe someone who understands it
better than me can help..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Fix up do_region to not allocate a bio_vec for discards. We've
got rid of the discard payload allocated by the caller years ago.
Obviously this wasn't actually harmful given how long it's been
there, but it's still good to avoid the pointless allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations
that limit the write zeroes size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Make life easy for implementations that needs to send a data buffer
to the device (e.g. SCSI) by numbering it as a data out command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Split sd_setup_discard_cmnd into one function per provisioning type. While
this creates some very slight duplication of boilerplate code it keeps the
code modular for additions of new provisioning types, and for reusing the
write same functions for the upcoming scsi implementation of the Write Zeroes
operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Separate out kernfs from driver core and add myself as a
co-maintainer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-04-04
this is a pull request of two patches for net/master.
The first patch by Markus Marb fixes a register read access in the ifi driver.
The second patch by Geert Uytterhoeven for the rcar driver remove the printing
of a kernel virtual address.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ops->show() can return a negative error code.
Commit 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.
Commit 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.
This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.
I can reproduce it reliably only by putting delay like
usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
The bug can be triggered without the usleep.
Fixes: 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).
Drop the minus sign to avoid any confusion.
Fixes: fdc81b7910ad ("stable_kernel_rules: Add clause about specification of kernel versions to patch.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Because TCP_MIB_OUTRSTS is an important count, so always increase it
whatever send it successfully or not.
Now move the increment of TCP_MIB_OUTRSTS to the top of
tcp_send_active_reset to make sure it is increased always even though
fail to alloc skb.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guillaume Nault says:
====================
l2tp: fix error handling of PPPoL2TP socket options
Fix pppol2tp_[gs]etsockopt() so that they don't ignore errors returned
by their helper functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pppol2tp_getsockopt() doesn't take into account the error code returned
by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If
error occurs there, pppol2tp_getsockopt() continues unconditionally and
reports erroneous values.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pppol2tp_setsockopt() unconditionally overwrites the error value
returned by pppol2tp_tunnel_setsockopt() or
pppol2tp_session_setsockopt(), thus hiding errors from userspace.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.
Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file. 91360b02ab48 ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call. This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.
Fixes: 91360b02ab48 ("ashmem: use vfs_llseek()")
Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Cc: stable <stable@vger.kernel.org> # 3.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
|
|
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
|
|
Pull audit cleanup from Paul Moore:
"A week later than I had hoped, but as promised, here is the audit
uninline-fix we talked about during the last audit pull request.
The patch is slightly different than what we originally discussed as
it made more sense to keep the audit_signal_info() function in
auditsc.c rather than move it and bunch of other related
variables/definitions into audit.c/audit.h.
At some point in the future I need to look at how the audit code is
organized across kernel/audit*, I suspect we could do things a bit
better, but it doesn't seem like a -rc release is a good place for
that ;)
Regardless, this patch passes our tests without problem and looks good
for v4.11"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_signal_info() into kernel/auditsc.c
|
|
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: move pcp and lru-pcp draining into single wq
mailmap: update Yakir Yang email address
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
dax: fix radix tree insertion race
mm, thp: fix setting of defer+madvise thp defrag mode
ptrace: fix PTRACE_LISTEN race corrupting task->state
vmlinux.lds: add missing VMLINUX_SYMBOL macros
mm/page_alloc.c: fix print order in show_free_areas()
userfaultfd: report actual registered features in fdinfo
mm: fix page_vma_mapped_walk() for ksm pages
|
|
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|