Age | Commit message (Collapse) | Author |
|
The mpt2sas driver was recently folded into mpt3sas to reduce code
duplication.
To avoid problems for people that only have CONFIG_SCSI_MPT2SAS in their
.config we introduce a dummy option that will select MPT3SAS if MPT2SAS
was previously enabled.
This is a temporary measure and we will deprecate this config option in
4.6.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Bottomley <James.Bottomley@hansenpartnership.com>
CC: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Avoid that kmemleak reports the following memory leak if a
SCSI LLD calls scsi_host_alloc() and scsi_host_put() but neither
scsi_host_add() nor scsi_host_remove(). The following shell
command triggers that scenario:
for ((i=0; i<2; i++)); do
srp_daemon -oac |
while read line; do
echo $line >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target
done
done
unreferenced object 0xffff88021b24a220 (size 8):
comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
hex dump (first 8 bytes):
68 6f 73 74 35 38 00 a5 host58..
backtrace:
[<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
[<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
[<ffffffff81260d2b>] kvasprintf+0x5b/0x90
[<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
[<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
[<ffffffff81337e3c>] dev_set_name+0x3c/0x40
[<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
[<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
[<ffffffff8133778b>] dev_attr_store+0x1b/0x20
[<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
[<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
[<ffffffff81176eef>] __vfs_write+0x2f/0xf0
[<ffffffff811771e4>] vfs_write+0xa4/0x100
[<ffffffff81177c64>] SyS_write+0x54/0xc0
[<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Signed-off-by: Dmitry V. Krivenok <krivenok.dmitry@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Avoid the sticky preferred mode bit by using the no-merge version of the
function (this allows gnome-shell to resize to lower resolutions than
the default resolution)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
for_each_available_child_of_node performs an of_node_get on each iteration,
so a break out of the loop requires an of_node_put.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
expression root,e;
local idexpression child;
@@
for_each_available_child_of_node(root, child) {
... when != of_node_put(child)
when != e = child
(
return child;
|
+ of_node_put(child);
? return ...;
)
...
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull mn10300 fix from Guenter Roeck:
"A single patch to fix mn10300 build failures"
* tag 'mn10300-for-linus-v4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
mn10300: Select CONFIG_HAVE_UID16 to fix build failure
|
|
Remove the "not" before "cannot".
I am fixing the comment block style while I am here.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Signed-off-by: Dmitry V. Krivenok <krivenok.dmitry@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
In order to check for overlapping reserved memory regions, we first need
to sort the array of memory regions. This is implemented using sort(),
and a custom comparison function __rmem_cmp().
Unfortunatley __rmem_cmp() doesn't work in all cases. Because the two
base values are phys_addr_t, they may be u64 on some platforms, in which
case subtracting one from the other and then (implicitly) casting to int
does not give us the -ve/0/+ve value we need.
This leads to incorrect reports about overlaps, eg:
ibm,slw-image@1ffe600000 (0x0000001ffe600000--0x0000001ffe700000) overlaps with
ibm,firmware-allocs-memory@1000000000 (0x0000001000000000--0x0000001000dc0200)
Fix it by just doing the standard double if and return 0 logic.
Fixes: ae1add247bf8 ("of: Check for overlap in reserved memory regions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"I found two minor bugs while doing development on the ring buffer
code.
The first is something that's been there since its creation. If a
reader reads a page out of the ring buffer before there's any events
on it, it can get an out of date timestamp for that event. It may be
off by a few microseconds, more if the first event gets discarded.
The fix was to only update the reader time stamp when it actually sees
an event on the page, instead of just reading the timestamp from the
page even if it has no events on it. That timestamp is still volatile
until an event is present.
The second bug is more recent. Instead of passing around parameters a
descriptor was made and the parameters are passed via a single
descriptor. This simplified the code a bit. But there was one place
that expected the parameter to be passed by value not reference (which
a descriptor now does). And it added to the length of the event,
which may be ignored later, but the length should not have been
increased. The only real problem with this bug is that it may
allocate more than was needed for the event"
* tag 'trace-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Put back the length if crossed page with add_timestamp
ring-buffer: Update read stamp with first real commit on page
|
|
When support for _FIT was added, the code presumed that the data
returned by the _FIT method is identical to the NFIT table, which
starts with an acpi_table_header. However, the _FIT is defined
to return a data in the format of a series of NFIT type structure
entries and as a method, has an acpi_object header rather tahn
an acpi_table_header.
To address the differences, explicitly save the acpi_table_header
from the NFIT, since it is accessible through /sys, and change
the nfit pointer in the acpi_desc structure to point to the
table entries rather than the headers.
Reported-by: Jeff Moyer (jmoyer@redhat.com>
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
[vishal: fix up unit test for new header assumptions]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Missed previously due to a lack of test coverage on a platform that
provided an valid response to _FIT.
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Enable REGULATOR_FIXED_VOLTAGE for all OMAP2+ platforms
otherwise system can't boot from SD-card when kernel is
built for single SoC (for example, with CONFIG_SOC_DRA7XX=y only).
It's also required for almost all TI SoC's platforms.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
The size of NFIT tables don't necessarily match the size of the
data structures that we use for them. For example, the NVDIMM
Control Region Structure table is shorter for a device with
no block control windows than for a device with block control windows.
Other tables, such as Flush Hint Address Structure and the Interleave
Structure are variable length by definition.
Account for the size difference when comparing table entries by
using the actual table size from the table header if it's less
than the structure size.
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://anongit.freedesktop.org/drm-intel into drm-fixes
few i915 fixes.
* tag 'drm-intel-fixes-2015-11-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Don't override output type for DDI HDMI
drm/i915: Don't compare has_drrs strictly in pipe config
drm/i915: Mark uneven memory banks on gen4 desktop as unknown swizzling
|
|
Hi,
For a brief moment I was tricked into thinking that:
In-kernel debugger (EXPERIMENTAL) (ACPI_DEBUGGER) [N/y/?] (NEW)
might be something useful. Better describe the feature to reduce
such confusion.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Because interrupt endpoints usually transmit such
small amounts of data, it seems pointless to prestart
transfers and try to get speed improvements. This
patch also sorts out a problem with CDC ECM function
where its notification endpoint gets stuck in busy
state and we continuously issue Update Transfer
commands.
Fixes: 8a1a9c9e4503 ("usb: dwc3: gadget: start transfer on XFER_COMPLETE")
Signed-off-by: Felipe Balbi <balbi@ti.com>
|
|
This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
|
|
If get_pll_div() fails we exited by returning NULL but we missed
releasing hwc.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Fixes: 0dfc86b3173f ("clk: qoriq: Move chip-specific knowledge into driver")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
do_div() is meant to be used with an unsigned dividend.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
do_div() is meant to be used with an unsigned dividend.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :
WARN_ON(tp->copied_seq != tp->rcv_nxt &&
!(flags & (MSG_PEEK | MSG_TRUNC)));
His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.
Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://github.com/t-kristo/linux-pm into clk-fixes
Pull TI clock driver fixes from Tero Kristo:
* 'for-4.4-rc/ti-clk-fixes' of https://github.com/t-kristo/linux-pm:
clk: ti: drop locking code from mux/divider drivers
clk: ti816x: Add missing dmtimer clkdev entries
clk: ti: fapll: fix wrong do_div() usage
clk: ti: clkt_dpll: fix wrong do_div() usage
|
|
The gianfar driver has recently been enabled on arm64 but fails to build
since it check the return value of platform_get_irq() against NO_IRQ. Fix
this by instead checking for a negative error code.
Even on ARM where this code was previously being built this check was
incorrect since platform_get_irq() returns a negative error code which
may not be exactly the (unsigned int)(-1) that NO_IRQ is defined to be.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver can be built on arm64 but relies on NO_IRQ to check the return
value of irq_of_parse_and_map() which fails to build on arm64 because the
architecture does not provide a NO_IRQ. Fix this to correctly check the
return value of irq_of_parse_and_map().
Even on ARM systems where the driver was previously used the check was
broken since on ARM NO_IRQ is -1 but irq_of_parse_and_map() returns 0 on
error.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.
Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When bio has only one physical segment, we should set bio's
bi_seg_front_size as the real(final) size of the single segment.
Fixes: 02e707424c2ea(blk-merge: fix blk_bio_segment_split)
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Giuseppe Cavallaro says:
====================
Spare stmmac fixes
These are some fixes for the stmmac d.d. tested on STi platforms.
They are for some part of the PM, STi glue and rx path when test
Jumbo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The receive skb buffers can be preallocated when the link is opened
according to mtu size.
While testing on a network environment with not standard MTU (e.g. 3000),
a panic occurred if an incoming packet had a length greater than rx skb
buffer size. This is because the HW is programmed to copy, from the DMA,
an Jumbo frame and the Sw must check if the allocated buffer is enough to
store the frame.
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When stmmac_mdio_reset, was called from stmmac_resume, it was not
resetting the PHY due to which MAC was not getting reset properly and
hence ethernet interface not was resumed properly.
The issue was currently only reproducible on stih301-b2204.
Signed-off-by: Pankaj Dev <pankaj.dev@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of the st,tx-retime-src is missing from device-tree
(it's an optional field) the driver will invoke the strcasecmp to check
which clock has been selected and this is a bug; the else condition
is needed.
In the dwmac_setup, the "rs" variable, passed to the strcasecmp, was not
initialized and the compiler, depending on the options adopted, could
take it in some different part of the stack generating the hang in such
configuration.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is to fix the csr clock in case of 300MHz is provided.
Reported-by: Kent Borg <Kent.Borg@csr.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When resume the HW is re-configured but some settings can be lost.
For example, the MAC Address_X High/Low Registers used for VLAN tagging..
So, while resuming, the set_filter callback needs to be invoked to
re-program perfect and hash-table registers.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Thinkpad T40p needs agpmode 1.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We use the reservation object of the page directory for the page tables as
well, because of this the page directory should be freed last. Ensure that
by keeping a reference from the page tables to the directory.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
That got messed up while porting it from Radeon.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the missing SPI controller DMA handler in the dm816x DT
node, only properties for the two channels on four were present.
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add missing #mbox-cells for dm816x mbox DT node.
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Assume a filesystem with 4KB blocks. When a file has size 1000 bytes and
we issue direct IO read at offset 1024, blockdev_direct_IO() reads the
tail of the last block and the logic for handling short DIO reads in
dio_complete() results in a return value -24 (1000 - 1024) which
obviously confuses userspace.
Fix the problem by bailing out early once we sample i_size and can
reliably check that direct IO read starts beyond i_size.
Reported-by: Avi Kivity <avi@scylladb.com>
Fixes: 9fe55eea7e4b444bafc42fa0000cc2d1d2847275
CC: stable@vger.kernel.org
CC: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If there are no persistent memory ranges present then don't bother
creating the platform device. Otherwise, it loads the full libnvdimm
sub-system only to discover no resources present.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When I connect an Intel SSD to SATA SIL controller (PCI ID 1095:3114), any
TRIM command results in I/O errors being reported in the log. There is
other similar error reported with TRIM and the SIL controller:
https://bugs.centos.org/view.php?id=5880
Apparently the controller doesn't support TRIM commands. This patch
disables TRIM support on the SATA SIL controller.
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata7.00: BMDMA2 stat 0x50001
ata7.00: failed command: DATA SET MANAGEMENT
ata7.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 0 dma 512 out
res 51/04:01:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
ata7.00: status: { DRDY ERR }
ata7.00: error: { ABRT }
ata7.00: device reported invalid CHS sector 0
sd 8:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 8:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current] [descriptor]
sd 8:0:0:0: [sdb] tag#0 Add. Sense: Unaligned write command
sd 8:0:0:0: [sdb] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 21 95 88 00 20 00 00 00 00
blk_update_request: I/O error, dev sdb, sector 2200968
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
mn10300 builds fail with
fs/stat.c: In function 'cp_old_stat':
fs/stat.c:163:2: error: 'old_uid_t' undeclared
ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
ipc/util.c:540:2: error: 'old_uid_t' undeclared
Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
to fix the problem.
Fixes: fbc416ff8618 ("arm64: fix building without CONFIG_UID16")
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Current code doesn't update port value of Port Multiplier(PM) when
sending FIS of softreset to device, command will fail if FBS is
enabled.
There are two ways to fix the issue: the first is to disable FBS
before sending softreset command to PM device and the second is
to update port value of PM when sending command.
For the first way, i can't find any related rule in AHCI Spec. The
second way can avoid disabling FBS and has better performance.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
|
|
I2C controller used in Keystone SoC has an undocumented peculiarity which
results in SDA-SCL margins being dependent on module clock. Driving high
capacity bus near its limits can result in STOP condition sometimes being
understood as REPEATED-START by slaves (or NACK instead of ACK, etc...).
Driving the module with higher clocks increases the margin between SDA and SCL
transitions, making the operations with higher bus rates more robust. Therefore,
target the module clock to 12MHz instead of 7MHz, still staying within
the specification limits.
Before the change STOP timing looked like this on 400kHz:
SDA ----------+ +----
\ /
\ /
+----+
(1)
SCL --+ +------------
\ /
\ /
+----+
(2)
While only point (1) signals STOP, point (2) could be incorrectly recognized as
repeated-START (almost no margin between SDA and SCL transitions).
After the change there is at least 600ns margin measured between SCL fall and
SDA fall during STOP generation:
SDA ------+ +----
\ /
\ /
+----+
SCL --+ +--------
\ /
\ /
+----+
->| |<- 600ns
->| |<- tSUSTO
So called tSUSTO (setup time for STOP condition) is still slightly higher than
600ns, so no problem here.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
According to the datasheets the n factor for dividing the tclk is
2 to the power n on Allwinner SoCs, not 2 to the power n + 1 as it is
on other mv64xxx implementations.
I've contacted Allwinner about this and they have confirmed that the
datasheet is correct.
This commit fixes the clk-divider calculations for Allwinner SoCs
accordingly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Tested-by: Olliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
Now that we know that the forking task can't migrate amd the child is always
moved to the same cgroup by cgroup_post_fork()->css_set_move_task() we can
change pids_can_fork() and pids_cancel_fork() to just use task_css(current).
And since we no longer need to pin this css, we can remove pid_fork().
Note: the patch uses task_css_check(true), perhaps it makes sense to add a
helper or change task_css_set_check() to take cgroup_threadgroup_rwsem into
account.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
If the new child migrates to another cgroup before cgroup_post_fork() calls
subsys->fork(), then both pids_can_attach() and pids_fork() will do the same
pids_uncharge(old_pids) + pids_charge(pids) sequence twice.
Change copy_process() to call threadgroup_change_begin/threadgroup_change_end
unconditionally. percpu_down_read() is cheap and this allows other cleanups,
see the next changes.
Also, this way we can unify cgroup_threadgroup_rwsem and dup_mmap_sem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A css_set represents the relationship between a set of tasks and
css's. css_set never pinned the associated css's. This was okay
because tasks used to always disassociate immediately (in RCU sense) -
either a task is moved to a different css_set or exits and never
accesses css_set again.
Unfortunately, afcf6c8b7544 ("cgroup: add cgroup_subsys->free() method
and use it to fix pids controller") and patches leading up to it made
a zombie hold onto its css_set and deref the associated css's on its
release. Nothing pins the css's after exit and it might have already
been freed leading to use-after-free.
general protection fault: 0000 [#1] PREEMPT SMP
task: ffffffff81bf2500 ti: ffffffff81be4000 task.ti: ffffffff81be4000
RIP: 0010:[<ffffffff810fa205>] [<ffffffff810fa205>] pids_cancel.constprop.4+0x5/0x40
...
Call Trace:
<IRQ>
[<ffffffff810fb02d>] ? pids_free+0x3d/0xa0
[<ffffffff810f8893>] cgroup_free+0x53/0xe0
[<ffffffff8104ed62>] __put_task_struct+0x42/0x130
[<ffffffff81053557>] delayed_put_task_struct+0x77/0x130
[<ffffffff810c6b34>] rcu_process_callbacks+0x2f4/0x820
[<ffffffff810c6af3>] ? rcu_process_callbacks+0x2b3/0x820
[<ffffffff81056e54>] __do_softirq+0xd4/0x460
[<ffffffff81057369>] irq_exit+0x89/0xa0
[<ffffffff81876212>] smp_apic_timer_interrupt+0x42/0x50
[<ffffffff818747f4>] apic_timer_interrupt+0x84/0x90
<EOI>
...
Code: 5b 5d c3 48 89 df 48 c7 c2 c9 f9 ae 81 48 c7 c6 91 2c ae 81 e8 1d 94 0e 00 31 c0 5b 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <f0> 48 83 87 e0 00 00 00 ff 78 01 c3 80 3d 08 7a c1 00 00 74 02
RIP [<ffffffff810fa205>] pids_cancel.constprop.4+0x5/0x40
RSP <ffff88001fc03e20>
---[ end trace 89a4a4b916b90c49 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Fix it by making css_set pin the associate css's until its release.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Link: http://lkml.kernel.org/g/20151120041836.GA18390@codemonkey.org.uk
Link: http://lkml.kernel.org/g/5652D448.3080002@bmw-carit.de
Fixes: afcf6c8b7544 ("cgroup: add cgroup_subsys->free() method and use it to fix pids controller")
|
|
Some ACPI node's _STA will touch operation region field, since the
evaluation of _STA in acpi_bus_type_and_status is very early, the
operation region handler is not ready yet. Instead of fail that function
and not creating the acpi_device node consequently, set status to 0 so
that later when the driver for that device is probing, it can find
the acpi_device node and proceed normally. And at that time, the
handler for the operation region is ready and its _STA evaluation will
succeed, its present status can be checked there.
Even there will be no driver using this node later, it doesn't seem
hurt to have one more acpi_device node created with status set to 0.
This happens on Microsoft Surface 3, where the SPI device node NTRG's
_STA touches GPIO fields and the SPI core driver will only enumerate SPI
devices from ACPI if the acpi_device node is 1: created; 2: _STA
indicates it's present.
Note that due to another problem in SPI driver, for NTRG to be actually
enumerated, some changes have to be made in the SPI layer, which is
addressed by Mika(not send out yet):
https://bugzilla.kernel.org/show_bug.cgi?id=104291#c23
Link: https://bugzilla.kernel.org/show_bug.cgi?id=104291
Reported-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There are two common expectations among several subsystems/drivers that
deploys runtime PM support, but which isn't met by the driver core.
Expectation 1)
At ->probe() the subsystem/driver expects the runtime PM status of the
device to be RPM_SUSPENDED, which is the initial status being assigned at
device registration.
This expectation is especially common among some of those subsystems/
drivers that manages devices with an attached PM domain, as those requires
the ->runtime_resume() callback at the PM domain level to be invoked
during ->probe().
Moreover these subsystems/drivers entirely relies on runtime PM resources
being managed at the PM domain level, thus don't implement their own set
of runtime PM callbacks.
These are two scenarios that suffers from this unmet expectation.
i) A failed ->probe() sequence requests probe deferral:
->probe()
...
pm_runtime_enable()
pm_runtime_get_sync()
...
err:
pm_runtime_put()
pm_runtime_disable()
...
As there are no guarantees that such sequence turns the runtime PM status
of the device into RPM_SUSPENDED, the re-trying ->probe() may start with
the status in RPM_ACTIVE.
In such case the runtime PM core won't invoke the ->runtime_resume()
callback because of a pm_runtime_get_sync(), as it considers the device to
be already runtime resumed.
ii) A driver re-bind sequence:
At driver unbind, the subsystem/driver's >remove() callback invokes a
sequence of runtime PM APIs, to undo actions during ->probe() and to put
the device into low power state.
->remove()
...
pm_runtime_put()
pm_runtime_disable()
...
Similar as in the failing ->probe() case, this sequence don't guarantee
the runtime PM status of the device to turn into RPM_SUSPENDED.
Trying to re-bind the driver thus causes the same issue as when re-trying
->probe(), in the probe deferral scenario.
Expectation 2)
Drivers that invokes the pm_runtime_irq_safe() API during ->probe(),
triggers the runtime PM core to increase the usage count for the device's
parent and permanently make it runtime resumed.
The usage count is only dropped at device removal, which also allows it to
be runtime suspended again.
A re-trying ->probe() repeats the call to pm_runtime_irq_safe() and thus
once more triggers the usage count of the device's parent to be increased.
This leads to not only an imbalance issue of the usage count of the
device's parent, but also to keep it runtime resumed permanently even if
->probe() fails.
To address these issues, let's change the policy of the driver core to
meet these expectations. More precisely, at ->probe() failures and driver
unbind, restore the initial states of runtime PM.
Although to still allow subsystem's to control PM for devices that doesn't
->probe() successfully, don't restore the initial states unless runtime PM
is disabled.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|