Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Nothing too interesting. All are device specific additions and
workarounds"
* 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata/sata_fsl.c: add ATA_FLAG_NO_LOG_PAGE to blacklist the controller for log page reads
libata-eh.c: Introduce new ata port flag for controller which lockup on read log page
sata_sil: disable trim
AHCI: Fix softreset failed issue of Port Multiplier
sata/mvebu: use #ifdef around suspend/resume code
ahci: Order SATA device IDs for codename Lewisburg
ahci: Add Device ID for Intel Sunrise Point PCH
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree includes four core perf fixes for misc bugs, three fixes to
x86 PMU drivers, and two updates to old email addresses"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do not send exit event twice
perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
perf: Fix PERF_EVENT_IOC_PERIOD deadlock
treewide: Remove old email address
perf/x86: Fix LBR call stack save/restore
perf: Update email address in MAINTAINERS
perf/core: Robustify the perf_cgroup_from_task() RCU checks
perf/core: Fix RCU problem with cgroup context switching code
|
|
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)
So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD
doc.2015.12.05a: Documentation updates
exp.2015.12.07a: Expedited grace-period updates
fixes.2015.12.07a: Miscellaneous fixes
list.2015.12.04b: Linked-list updates
torture.2015.12.05a: Torture-test updates
|
|
Although list_for_each_entry_rcu() can in theory be used anywhere
preemption is disabled, it can result in calls to lockdep, which cannot
be used in certain constrained execution environments, such as exception
handlers that do not map the entire kernel into their address spaces.
This commit therefore adds list_entry_lockless() and
list_for_each_entry_lockless(), which never invoke lockdep and can
therefore safely be used from these constrained environments, but only
as long as those environments are non-preemptible (or items are never
deleted from the list).
Use synchronize_sched(), call_rcu_sched(), or synchronize_sched_expedited()
in updates for the needed grace periods. Of course, if items are never
deleted from the list, there is no need to wait for grace periods.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
rcu_dereference_raw() calls indirectly rcu_read_lock_held() while
rcu_dereference_raw_notrace() does not so fix the comment about the latter.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This commit replaces a local_irq_save()/local_irq_restore() pair with
a lockdep assertion that interrupts are already disabled. This should
remove the corresponding overhead from the interrupt entry/exit fastpaths.
This change was inspired by the fact that Iftekhar Ahmed's mutation
testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
had no effect, which might indicate that interrupts were always enabled
anyway.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_expedited, rcu_normal, and rcu_normal_after_boot kernel boot
parameters are pointless in the case of TINY_RCU because in that case
synchronous grace periods, both expedited and normal, are no-ops.
However, these three symbols contribute several hundred bytes of bloat.
This commit therefore uses CPP directives to avoid compiling this code
in TINY_RCU kernels.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Concurrent non-blocking slowpath ramrods can be completed
out-of-order on the completion chain. Recycling completed elements,
while previously sent elements are still completion pending,
can lead to overriding of active elements on the chain. Furthermore,
sending pending slowpath ramrods currently lacks the update of the
chain element physical pointer.
This patch:
* Ensures that ramrods are sent to the FW with
consecutive echo values.
* Handles out-of-order completions by freeing only first
successive completed entries.
* Updates the chain element physical pointer when copying
a pending element into a free element for sending.
Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The amount of chain next pointer elements between the producer
and the consumer indices depends on which pages they currently
point to. The current calculation is based only on their difference,
and it can lead to a number of free elements which is higher by 1
than the actual value.
Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the case where a request queue is passed to the low lever lightnvm
device drive integration, the device driver might pass its admin
commands through another queue. Instead pass nvm_dev, and let the
low level drive the appropriate queue.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It is not obvious what NVM_IO_* and NVM_BLK_T_* are used for. Make sure
to comment them appropriately as the other constants.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
log page
Some controller lockup on a ata_read_log_page.
Add new ata port flag ATA_FLAG_NO_LOG_PAGE which can used
to blacklist a controller.
If this flag is set, any attempt to read a log page returns an error
without actually issuing the command.
Signed-off-by: Andreas Werner <andreas.werner@men.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The following commit which went into mainline through networking tree
3b13758f51de ("cgroups: Allow dynamically changing net_classid")
conflicts in net/core/netclassid_cgroup.c with the following pending
fix in cgroup/for-4.4-fixes.
1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling")
The former separates out update_classid() from cgrp_attach() and
updates it to walk all fds of all tasks in the target css so that it
can be used from both migration and config change paths. The latter
drops @css from cgrp_attach().
Resolve the conflict by making cgrp_attach() call update_classid()
with the css from the first task. We can revive @tset walking in
cgrp_attach() but given that net_cls is v1 only where there always is
only one target css during migration, this is fine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nina Schiff <ninasc@fb.com>
|
|
This semicolon causes a build error if the function call is wrapped in
parentheses.
Fixes: aabc92bbe3cf ("net: add __netdev_alloc_pcpu_stats() to indicate gfp flags")
Reported-by: Imre Kaloz <kaloz@openwrt.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a file on tmpfs has an ACL or a Default ACL, listxattr should include the
corresponding xattr name.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Use the VFS xattr handler infrastructure and get rid of similar code in
the filesystem. For implementing shmem_xattr_handler_set, we need a
version of simple_xattr_set which removes the attribute when value is
NULL. Use this to implement kernfs_iop_removexattr as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Add an additional "name" field to struct xattr_handler. When the name
is set, the handler matches attributes with exactly that name. When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.
This patch should avoid bugs like the one fixed in commit c361016a in
the future.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove POSIX_ACL_XATTR_{ACCESS,DEFAULT} and GFS2_POSIX_ACL_{ACCESS,DEFAULT}
and replace them with the definitions in <include/uapi/linux/xattr.h>.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This function was only briefly used in security/integrity/evm, between
commits 66dbc325 and 15647eb3.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is quite a bumper crop of fixes: three from Arnd correcting
various build issues in some configurations, a lock recursion in
qla2xxx. Two potentially exploitable issues in hpsa and mvsas, a
potential null deref in st, a revert of a bdi registration fix that
turned out to cause even more problems, a set of fixes to allow people
who only defined MPT2SAS to still work after the mpt2/mpt3sas merger
and a couple of fixes for issues turned up by the hyper-v storvsc
driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility
Revert "scsi: Fix a bdi reregistration race"
mpt3sas: Add dummy Kconfig option for backwards compatibility
Fix a memory leak in scsi_host_dev_release()
block/sd: Fix device-imposed transfer length limits
scsi_debug: fix prevent_allow+verify regressions
MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem.
sd: Make discard granularity match logical block size when LBPRZ=1
scsi: hpsa: select CONFIG_SCSI_SAS_ATTR
scsi: advansys needs ISA dma api for ISA support
scsi_sysfs: protect against double execution of __scsi_remove_device()
st: fix potential null pointer dereference.
scsi: report 'INQUIRY result too short' once per host
advansys: fix big-endian builds
qla2xxx: Fix rwlock recursion
hpsa: logical vs bitwise AND typo
mvsas: don't allow negative timeouts
mpt3sas: Fix use sas_is_tlr_enabled API before enabling MPI2_SCSIIO_CONTROL_TLR_ON flag
|
|
Code that does lockless emptiness testing of non-RCU lists is relying
on INIT_LIST_HEAD() to write the list head's ->next pointer atomically,
particularly when INIT_LIST_HEAD() is invoked from list_del_init().
This commit therefore adds WRITE_ONCE() to this function's pointer stores
that could affect the head's ->next pointer.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The list_splice_init_rcu() can be used as a stack onto which full lists
are pushed, but queue-like behavior is now needed by some security
policies. This requires a list_splice_tail_init_rcu().
This commit therefore supplies a list_splice_tail_init_rcu() by
pulling code common it and to list_splice_init_rcu() into a new
__list_splice_init_rcu() function. This new function is based on the
existing list_splice_init_rcu() implementation.
Signed-off-by: Petko Manolov <petkan@mip-labs.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal. Especially given that there are regions of the scheduler that
already have interrupts disabled.
This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
by Peter Zijlstra. ]
|
|
Although expedited grace periods can be quite useful, and although their
OS jitter has been greatly reduced, they can still pose problems for
extreme real-time workloads. This commit therefore adds a rcu_normal
kernel boot parameter (which can also be manipulated via sysfs)
to suppress expedited grace periods, that is, to treat requests for
expedited grace periods as if they were requests for normal grace periods.
If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
This means that if you are relying on expedited grace periods to speed up
boot, you will want to specify rcu_expedited on the kernel command line,
and then specify rcu_normal via sysfs once boot completes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Thomas and Phil observed that under stress rhashtable insertion
sometimes failed with EBUSY, even though this error should only
ever been seen when we're under attack and our hash chain length
has grown to an unacceptable level, even after a rehash.
It turns out that the logic for detecting whether there is an
existing rehash is faulty. In particular, when two threads both
try to grow the same table at the same time, one of them may see
the newly grown table and thus erroneously conclude that it had
been rehashed. This is what leads to the EBUSY error.
This patch fixes this by remembering the current last table we
used during insertion so that rhashtable_insert_rehash can detect
when another thread has also done a resize/rehash. When this is
detected we will give up our resize/rehash and simply retry the
insertion with the new table.
Reported-by: Thomas Graf <tgraf@suug.ch>
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix a recent regression in the ACPI PCI host bridge
initialization code, clean up some recent changes (generic power
domains framework, ACPI AML debugger support), fix three older but
annoying bugs (PCI power management. generic power domains framework,
cpufreq) and a build problem (device properties framework), and update
a stale MAINTAINERS entry (ACPI backlight driver).
Specifics:
- Fix a regression in the ACPI PCI host bridge initialization code
introduced by the recent consolidation of the host bridge handling
on x86 and ia64 that forgot to take one special piece of code
related to NUMA on x86 into account (Liu Jiang).
- Improve the Kconfig help description of the new ACPI AML debugger
support option to avoid possible confusion (Peter Zijlstra).
- Remove a piece of code in the generic power domains framework that
should have been removed by one of the recent commits modifying
that code (Ulf Hansson).
- Reduce the log level of a PCI PM message that generates a lot of
false-positive log noise for some drivers and improve the message
itself while at it (Imre Deak).
- Fix the OF-based domain lookup code in the generic power domains
framework to make it drop references to DT nodes correctly (Eric
Anholt).
- Prevent the cpufreq core from setting the policy back to the
default after a CPU offline/online cycle for cpufreq drivers
providing the ->setpolicy callback (Srinivas Pandruvada).
- Fix a build problem for CONFIG_ACPI unset in the device properties
framework (Hanjun Guo).
- Fix a stale file path in the ACPI backlight driver entry in
MAINTAINERS (Dan Carpenter)"
* tag 'pm+acpi-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
cpufreq: use last policy after online for drivers with ->setpolicy
PCI / PM: Tune down retryable runtime suspend error messages
PM / Domains: Validate cases of a non-bound driver in genpd governor
MAINTAINERS: ACPI / video: update a file name in drivers/acpi/
ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n
x86/PCI/ACPI: Fix regression caused by commit 4d6b4e69a245
ACPI: Better describe ACPI_DEBUGGER
|
|
Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack
of a user. This was originally intended to fill a need for the DPDK
driver, but uptake has been slow so rather than support an unproven
kernel interface revert it and revisit when userspace catches up.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
* pm-domains:
PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
PM / Domains: Validate cases of a non-bound driver in genpd governor
* pm-cpufreq:
cpufreq: use last policy after online for drivers with ->setpolicy
|
|
* acpica:
ACPI: Better describe ACPI_DEBUGGER
* acpi-video:
MAINTAINERS: ACPI / video: update a file name in drivers/acpi/
* device-properties:
ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n
|
|
Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:
1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
2) IDs for some new iwlwifi chips, from Oren Givon.
3) Fix rtlwifi lockups on boot, from Larry Finger.
4) Fix memory leak in fm10k, from Stephen Hemminger.
5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.
6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.
8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.
9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.
10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.
11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.
12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.
13) Fix double free in VRF code, from Nikolay Aleksandrov.
14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.
17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.
18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.
19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.
20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.
21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.
22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.
23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.
24) Don't actively scan radar channels, from Antonio Quartulli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...
|
|
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
|
|
|
|
enabling
Consider the following v2 hierarchy.
P0 (+memory) --- P1 (-memory) --- A
\- B
P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.
The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.
WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[<ffffffff81551ffc>] dump_stack+0x4e/0x82
[<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
[<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
[<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
[<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
[<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
[<ffffffff81189016>] cgroup_attach_task+0x176/0x200
[<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
[<ffffffff81189684>] cgroup_procs_write+0x14/0x20
[<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
[<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
[<ffffffff81265f88>] __vfs_write+0x28/0xe0
[<ffffffff812666fc>] vfs_write+0xac/0x1a0
[<ffffffff81267019>] SyS_write+0x49/0xb0
[<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.
* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.
* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.
* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.
* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
|
|
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.
For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
when CONFIG_ACPI=n
In commit 60ba032ed76e ("ACPI / property: Drop size_prop from
acpi_dev_get_property_reference()"), the argument "const char *cells_name"
was dropped, but forgot to update the stub function in no-ACPI case,
it will lead to compile error when CONFIG_ACPI=n, easliy remove
"const char *cells_name" to fix it.
Fixes: 60ba032ed76e "ACPI / property: Drop size_prop from acpi_dev_get_property_reference()"
Reported-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
triggering a fault in sock_wake_async() when async IO is requested.
Said program stressed af_unix sockets, but the issue is generic
and should be addressed in core networking stack.
The problem is that by the time sock_wake_async() is called,
we should not access the @flags field of 'struct socket',
as the inode containing this socket might be freed without
further notice, and without RCU grace period.
We already maintain an RCU protected structure, "struct socket_wq"
so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
is the safe route.
It also reduces number of cache lines needing dirtying, so might
provide a performance improvement anyway.
In followup patches, we might move remaining flags (SOCK_NOSPACE,
SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
being mostly read and let it being shared between cpus.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is a cleanup to make following patch easier to
review.
Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()
To ease backports, we rename both constants.
Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 9c7077622dd91 ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.
This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24 <= 24)"
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().
To clarify this the patch renames blk_rq_check_limits()
to blk_cloned_rq_check_limits() and removes the symbol
export, as the new function should only be used for
cloned requests and never exported.
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Fixes: e2a60da74 ("block: Clean up special command handling logic")
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The get_bb_tbl function takes ppa as a generic address, which is
converted to the ppa device address within the device driver. When
the update_bbtbl callback is called from get_bb_tbl, the device
specific ppa is used, instead of the generic ppa.
Make sure to pass the generic ppa.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Pull SCSI target fixes from Nicholas Bellinger:
- fix tcm-user backend driver expired cmd time processing (agrover)
- eliminate kref_put_spinlock_irqsave() for I/O completion (bart)
- fix iscsi login kthread failure case hung task regression (nab)
- fix COMPARE_AND_WRITE completion use-after-free race (nab)
- fix COMPARE_AND_WRITE with SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC non zero
SGL offset data corruption. (Jan + Doug)
- fix >= v4.4-rc1 regression for tcm_qla2xxx enable configfs attribute
(Himanshu + HCH)
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/stat: print full t10_wwn.model buffer
target: fix COMPARE_AND_WRITE non zero SGL offset data corruption
qla2xxx: Fix regression introduced by target configFS changes
kref: Remove kref_put_spinlock_irqsave()
target: Invoke release_cmd() callback without holding a spinlock
target: Fix race for SCF_COMPARE_AND_WRITE_POST checking
iscsi-target: Fix rx_login_comp hang after login failure
iscsi-target: return -ENOMEM instead of -1 in case of failed kmalloc()
target/user: Do not set unused fields in tcmu_ops
target/user: Fix time calc in expired cmd processing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
"Specifics:
- several fixes and cleanups on Rockchip thermal drivers.
- add the missing support of RK3368 SoCs in Rockchip driver.
- small fixes on of-thermal, power_allocator, rcar driver, IMX, and
QCOM drivers, and also compilation fixes, on thermal.h, when thermal
is not selected"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
imx: thermal: use CPU temperature grade info for thresholds
thermal: fix thermal_zone_bind_cooling_device prototype
Revert "thermal: qcom_spmi: allow compile test"
thermal: rcar_thermal: remove redundant operation
thermal: of-thermal: Reduce log level for message when can't fine thermal zone
thermal: power_allocator: Use temperature reading from tz
thermal: rockchip: Support the RK3368 SoCs in thermal driver
thermal: rockchip: consistently use int for temperatures
thermal: rockchip: Add the sort mode for adc value increment or decrement
thermal: rockchip: improve the conversion function
thermal: rockchip: trivial: fix typo in commit
thermal: rockchip: better to compatible the driver for different SoCs
dt-bindings: rockchip-thermal: Support the RK3368 SoCs compatible
|
|
The last user is gone. Hence remove this function.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are a few fixes I'd like to have in v4.4: a generic one for sysfs
and three for HiSilicon and DesignWare host controllers.
Summary:
NUMA:
- Prevent out of bounds access in numa_node override (Mathias Krause)
HiSilicon host bridge driver:
- Fix deferred probing (Arnd Bergmann)
Synopsys DesignWare host bridge driver:
- Remove incorrect io_base assignment (Stanimir Varbanov)
- Move align_resource function pointer to pci_host_bridge structure
(Gabriele Paoloni)"
* tag 'pci-v4.4-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
ARM/PCI: Move align_resource function pointer to pci_host_bridge structure
PCI: hisi: Fix deferred probing
PCI: designware: Remove incorrect io_base assignment
PCI: Prevent out of bounds access in numa_node override
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix a NFSv4 callback identifier leak that was also causing client
crashes
- Fix NFSv4 callback decoding issues when incoming requests are
truncated
- Don't declare the attribute cache valid when we call
nfs_update_inode with an empty attribute structure.
- Resend LAYOUTGET when there is a race that changes the seqid
Bugfixes:
- Fix a number of issues with the NFSv4.2 CLONE ioctl()
- Properly set NFS v4.2 NFSDBG_FACILITY
- NFSv4 referrals are broken; Cleanup FATTR4_WORD0_FS_LOCATIONS after
decoding success
- Use sliding delay when LAYOUTGET gets NFS4ERR_DELAY
- Ensure that attrcache is revalidated after a SETATTR"
* tag 'nfs-for-4.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs4: resend LAYOUTGET when there is a race that changes the seqid
nfs: if we have no valid attrs, then don't declare the attribute cache valid
nfs: ensure that attrcache is revalidated after a SETATTR
nfs4: limit callback decoding to received bytes
nfs4: start callback_ident at idr 1
nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAY
NFS4: Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding success
NFS: Properly set NFS v4.2 NFSDBG_FACILITY
nfs: reduce the amount of ifdefs for v4.2 in nfs4file.c
nfs: use btrfs ioctl defintions for clone
nfs: allow intra-file CLONE
nfs: offer native ioctls even if CONFIG_COMPAT is set
nfs: pass on count for CLONE operations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"There is a small backlog of at91 patches here, the most significant is
the addition of some sama5d2 Xplained nodes that were waiting on an
MFD include file to get merged through another tree.
We normally try to sort those out before the merge window opens, but
the maintainer wasn't aware of that here and I decided to merge the
changes this time as an exception.
On OMAP a series of audio changes for dra7 missed the merge window but
turned out to be necessary to fix a boot time imprecise external abort
error and to get audio working.
The other changes are the usual simple changes, here is a list sorted
by platform:
at91:
removal of a useless defconfig option
removal of some legacy DT pieces
use of the proper watchdog compatible string
update of the MAINTAINERS entries for some Atmel drivers
drivers/scpi:
hide get_scpi_ops in module from built-in code
imx:
add missing .irq_set_type for i.MX GPC irq_chip.
fix the wrong spi-num-chipselects settings for Vybrid DSPI devices.
fix a merge error in Vybrid dts regarding to ADC device property
keystone:
fix the optional PDSP firmware loading
fix linking RAM setup for QMs
fix crash with clk_ignore_unused
mediatek:
Enable SCPSYS power domain driver by default
mvebu:
fix QNAP TS219 power-off in dts
fix legacy get_irqnr_and_base for dove and orion5x
omap:
fix l4 related boot time errors for dm81xx
use lockless cldm/pwrdm api in omap4_boot_secondary
remove t410 abort handler to avoid hiding other critical errors
mark cpuidle tracepoints as _rcuidle
fix module alias for omap-ocp2scp
pxa:
palm: Fix typos in PWM lookup table code
renesas:
missing __initconst annotation for r8a7793_boards_compat_dt
rockchip:
disable mmc-tuning on the veyron-minnie board
adding the init state for the over-temperature-protection
zx:
only build power domain code when CONFIG_PM=y"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
ARM: OMAP4+: SMP: use lockless clkdm/pwrdm api in omap4_boot_secondary
arm: omap2+: add missing HWMOD_NO_IDLEST in 81xx hwmod data
ARM: orion5x: Fix legacy get_irqnr_and_base
ARM: dove: Fix legacy get_irqnr_and_base
soc: Mediatek: Enable SCPSYS power domain driver by default
ARM: dts: vfxxx: Fix dspi[01] spi-num-chipselects.
ARM: dts: keystone: k2l: fix kernel crash when clk_ignore_unused is not in bootargs
soc: ti: knav_qmss_queue: Fix linking RAM setup for queue managers
soc: ti: use request_firmware_direct() as acc firmware is optional
ARM: imx: add platform irq type setting in gpc
ARM: dts: vfxxx: Fix erroneous property in esdhc0 node
ARM: shmobile: r8a7793: proper constness with __initconst
scpi: hide get_scpi_ops in module from built-in code
ARM: zx: only build power domain code when CONFIG_PM=y
ARM: pxa: palm: Fix typos in PWM lookup table code
ARM: dts: Kirkwood: Fix QNAP TS219 power-off
ARM: dts: rockchip: Add OTP gpio pinctrl to rk3288 tsadc node
ARM: dts: rockchip: temporarily remove emmc hs200 speed from rk3288 minnie
MAINTAINERS: Atmel drivers: change NAND and ISI entries
ARM: at91/dt: sama5d2 Xplained: add several devices
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Build fix when !CONFIG_UID16 (the patch is touching generic files but
it only affects arm64 builds; submitted by Arnd Bergmann)
- EFI fixes to deal with early_memremap() returning NULL and correctly
mapping run-time regions
- Fix CPUID register extraction of unsigned fields (not to be
sign-extended)
- ASID allocator fix to deal with long-running tasks over multiple
generation roll-overs
- Revert support for marking page ranges as contiguous PTEs (it leads
to TLB conflicts and requires additional non-trivial kernel changes)
- Proper early_alloc() failure check
- Disable KASan for 48-bit VA and 16KB page configuration (the pgd is
larger than the KASan shadow memory)
- Update the fault_info table (original descriptions based on early
engineering spec)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: efi: fix initcall return values
arm64: efi: deal with NULL return value of early_memremap()
arm64: debug: Treat the BRPs/WRPs as unsigned
arm64: cpufeature: Track unsigned fields
arm64: cpufeature: Add helpers for extracting unsigned values
Revert "arm64: Mark kernel page ranges contiguous"
arm64: mm: keep reserved ASIDs in sync with mm after multiple rollovers
arm64: KASAN depends on !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
arm64: efi: correctly map runtime regions
arm64: mm: fix fault_info table xFSC decoding
arm64: fix building without CONFIG_UID16
arm64: early_alloc: Fix check for allocation failure
|
|
Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.
Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.
- Introduce a new queue limit, max_dev_sectors, which is used by the
ULD to signal the maximum sectors for a REQ_TYPE_FS request.
- Ensure that max_dev_sectors is correctly stacked and taken into
account when overriding max_sectors through sysfs.
- Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
values for later processing.
- In sd_revalidate() set the queue's max_dev_sectors based on the
MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
field size.
- In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
VPD--if reported and sane--to signal the preferred device transfer
size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.
- blk_limits_max_hw_sectors() is no longer used and can be removed.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets <anatol.pomozov@gmail.com>
Tested-by: David Eisner <david.eisner@oriel.oxon.org>
Tested-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|