Age | Commit message (Collapse) | Author |
|
Now that {cpu|edmac}_to_{edmac|cpu}() functions boiled down to the mere
{cpu|le32}_to_{le32|cpu}() calls, there's no need for these functions
anymore, so just get rid of them.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") added support
for the big-endian EDMAC descriptors. However, it was never used and never
worked right until the recent driver fixes. I think we now can just remove
this support, it was only burdening the driver from the start. It should be
easy to do without disturbing the SH platform code, at least for now...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of re-creating the array on the stack each time
is_cmos_rtc_device() gets called, make the array 'static const'.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
|
|
acpi_penalize_isa_irq() can be written in fewer lines of code,
so do that. No functional change.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Works-for: Andy Shevchenko <andy.shevchenko@gmail.com>
|
|
Use to_delayed_work() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit f06147f9fbf1 (ACPICA: Hardware: Enable firmware waking vector
for both 32-bit and 64-bit FACS) added three functions that aren't
present in upstream ACPICA, acpi_hw_set_firmware_waking_vectors(),
acpi_set_firmware_waking_vectors() and acpi_set_firmware_waking_vector64(),
to allow Linux to use the previously existing API for setting the
platform firmware waking vector.
However, that wasn't necessary, since the ACPI sleep support code
in Linux can be modified to use the upstream ACPICA's API easily
and the additional functions may be dropped which reduces the code
size and puts the kernel's ACPICA code more in line with the upstream.
Make the changes as per the above. While at it, make the relevant
function desctiption comments reflect the upstream ACPICA's ones.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Lv Zheng <lv.zheng@intel.com>
|
|
Michael Chan says:
====================
bnxt_en: Support combined and rx/tx channels.
The bnxt hardware uses a completion ring for rx and tx events. The driver
has to process the completion ring entries sequentially for the events.
The current code only supports an rx/tx ring pair for each completion ring.
This patch series add support for using a dedicated completion ring for
rx only or tx only as an option configuarble using ethtool -L.
The benefits for using dedicated completion rings are:
1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared. If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.
2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2. When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.
3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver can support either all combined or all rx/tx rings. The
default is combined, but the user can now select rx/tx rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modify ring memory allocation and MSIX setup to support shared or
non shared rings and do the proper mapping. Default is still to
use shared rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add logic to calculate how many shared or non shared rings can be
supported. Default is to use shared rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to support dedicated or shared completion rings, the ring
indexing and mapping are re-structured as below:
1. bp->grp_info[] array index is 1:1 with bp->bnapi[] array index and
completion ring index.
2. rx rings 0 to n will be mapped to completion rings 0 to n.
3. If tx and rx rings share completion rings, then tx rings 0 to m will
be mapped to completion rings 0 to m.
4. If tx and rx rings use dedicated completion rings, then tx rings 0 to
m will be mapped to completion rings n + 1 to n + m.
5. Each tx or rx ring will use the corresponding completion ring index
for doorbell mapping and MSIX mapping.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each bnxt_napi structure may no longer be having both an rx ring and
a tx ring. Check for a valid ring before using it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, an rx and a tx ring are always paired with a completion ring.
We want to restructure it so that it is possible to have a dedicated
completion ring for tx or rx only.
The bnxt hardware uses a completion ring for rx and tx events. The driver
has to process the completion ring entries sequentially for the rx and tx
events. Using a dedicated completion ring for rx only or tx only has these
benefits:
1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared. If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.
2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2. When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.
3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.
The first step is to separate the rx and tx ring structures from the
bnxt_napi struct.
In this patch, an rx ring and a tx ring will point to the same bnxt_napi
struct to share the same completion ring. No change in ring assignment
and mapping yet.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
By adding 3 separate functions to dump the different ring states.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
XFS now uses CRC verification over a limited section of the log to
detect torn writes prior to a crash. This is difficult to test directly
due to the timing and hardware requirements to cause a short write.
Add a mechanism to inject CRC errors into log records to facilitate
testing torn write detection during log recovery. This mechanism is
dangerous and can result in filesystem corruption. Thus, it is only
available in DEBUG mode for testing/development purposes. Set a non-zero
value to the following sysfs entry to enable error injection:
/sys/fs/xfs/<dev>/log/log_badcrc_factor
Once enabled, XFS intentionally writes an invalid CRC to a log record at
some random point in the future based on the provided frequency. The
filesystem immediately shuts down once the record has been written to
the physical log to prevent metadata writeback (e.g., AIL insertion)
once the log write completes. This helps reasonably simulate a torn
write to the log as the affected record must be safe to discard. The
next mount after the intentional shutdown requires log recovery and
should detect and recover from the torn write.
Note again that this _will_ result in data loss or worse. For testing
and development purposes only!
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Certain types of storage, such as persistent memory, do not provide
sector atomicity for writes. This means that if a crash occurs while XFS
is writing log records, only part of those records might make it to the
storage. This is problematic because log recovery uses the cycle value
packed at the top of each log block to locate the head/tail of the log.
This can lead to CRC verification failures during log recovery and an
unmountable fs for a filesystem that is otherwise consistent.
Update log recovery to incorporate log record CRC verification as part
of the head/tail discovery process. Once the head is located via the
traditional algorithm, run a CRC-only pass over the records up to the
head of the log. If CRC verification fails, assume that the records are
torn as a matter of policy and trim the head block back to the start of
the first bad record.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
panic at t_show.
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2
RIP: 0010:[<ffffffff811375b2>]
[<ffffffff811375b2>] t_show+0x22/0xe0
RSP: 0000:ffff88002b4ebe80 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
FS: 0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
Call Trace:
[<ffffffff811dc076>] seq_read+0x2f6/0x3e0
[<ffffffff811b749b>] vfs_read+0x9b/0x160
[<ffffffff811b7f69>] SyS_read+0x49/0xb0
[<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
---[ end trace 5bd9eb630614861e ]---
Kernel panic - not syncing: Fatal exception
When the first time find_next calls find_next_mod_format, it should
iterate the trace_bprintk_fmt_list to find the first print format of
the module. However in current code, start_index is smaller than *pos
at first, and code will not iterate the list. Latter container_of will
get the wrong address with former v, which will cause mod_fmt be a
meaningless object and so is the returned mod_fmt->fmt.
This patch will fix it by correcting the start_index. After fixed,
when the first time calls find_next_mod_format, start_index will be
equal to *pos, and code will iterate the trace_bprintk_fmt_list to
get the right module printk format, so is the returned mod_fmt->fmt.
Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
Cc: stable@vger.kernel.org # 3.12+
Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers"
Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Add the arch-specific code to support jump label for TILE-Gx. This
code shares NOP instruction with ftrace, so we move it to a common
header file.
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Zhigang Lu <zlu@ezchip.com>
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
|
It is used by kgdb, ftrace, kprobe and jump label, so we factor
this out into a helper routine.
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Zhigang Lu <zlu@ezchip.com>
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
|
The flags entry is there to tell the user that some
optional information is available.
Since we report the iova_pgsizes signal it to the user
by setting the flags to VFIO_IOMMU_INFO_PGSIZES.
Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The ieee802154_llsec_ops structure is never modified, so declare it as
const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
There's apparently a serial number woven into both input and output
packets; neglecting to specify a valid serial number causes the controller
to ignore the rumble packets.
The scale of the rumble was also apparently halved in the packets.
The initialization packet had to be changed to allow force feedback to
work.
see https://github.com/paroj/xpad/issues/7 for details.
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Track the status of the irq_out URB to prevent submission iof new requests
while current one is active. Failure to do so results in the "URB submitted
while active" warning/stack trace.
Store pending brightness and FF effect in the driver structure and replace
it with the latest requests until the device is ready to process next
request. Alternate serving LED vs FF requests to make sure one does not
starve another. See [1] for discussion. Inspired by patch of Sarah Bessmer
[2].
[1]: http://www.spinics.net/lists/linux-input/msg40708.html
[2]: http://www.spinics.net/lists/linux-input/msg31450.html
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Add device HID AMDI0510 to match the I2C controlers on AMD Seattle platform
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
These multi-lines comments do not follow the standard kernel coding
style. In fact, they are not useful comments, so get rid of them.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Commit 807f16d4db95 ("mtd: core: set some defaults when dev.parent is
set") attempted to provide some default settings for MTDs that
(a) assign the parent device and
(b) don't provide their own name or owner
However, this isn't a perfect drop-in replacement for the boilerplate
found in some drivers, because the MTD name is used by partition
parsers like cmdlinepart, but the name isn't set until add_mtd_device(),
after the parsing is completed. This means cmdlinepart sees a NULL name
and therefore will not work properly.
Fix this by moving the default name and owner assignment to be first in
the MTD registration process.
[Note: this does not fix all reported issues, particularly with NAND
drivers. Will require an additional fix for drivers/mtd/nand/]
Fixes: 807f16d4db95 ("mtd: core: set some defaults when dev.parent is set")
Reported-by: Heiko Schocher <hs@denx.de>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Heiko Schocher <hs@denx.de>
Cc: Frans Klaver <fransklaver@gmail.com>
|
|
It's been observed that when bluetooth driver fails to
activate the firmware, below hung task warning dump is
displayed after 120 seconds.
[ 36.461022] Bluetooth: vendor=0x2df, device=0x912e, class=255, fn=2
[ 56.512128] Bluetooth: FW failed to be active in time!
[ 56.517264] Bluetooth: Downloading firmware failed!
[ 240.252176] INFO: task kworker/3:2:129 blocked for more than 120 seconds.
[ 240.258931] Not tainted 3.18.0 #254
[ 240.262972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.270751] kworker/3:2 D ffffffc000205760 0 129 2 0x00000000
[ 240.277825] Workqueue: events request_firmware_work_func
[ 240.283134] Call trace:
[ 240.285581] [<ffffffc000205760>] __switch_to+0x80/0x8c
[ 240.290693] [<ffffffc00088dae0>] __schedule+0x540/0x7b8
[ 240.295921] [<ffffffc00088ddd0>] schedule+0x78/0x84
[ 240.300764] [<ffffffc0006dfd48>] __mmc_claim_host+0xe8/0x1c8
[ 240.306395] [<ffffffc0006edd6c>] sdio_claim_host+0x74/0x84
[ 240.311840] [<ffffffbffc163d08>] 0xffffffbffc163d08
[ 240.316685] [<ffffffbffc165104>] 0xffffffbffc165104
[ 240.321524] [<ffffffbffc130cf8>] mwifiex_dnld_fw+0x98/0x110 [mwifiex]
[ 240.327918] [<ffffffbffc12ee88>] mwifiex_remove_card+0x2c4/0x5fc [mwifiex]
[ 240.334741] [<ffffffc000596780>] request_firmware_work_func+0x44/0x80
[ 240.341127] [<ffffffc00023b934>] process_one_work+0x2ec/0x50c
[ 240.346831] [<ffffffc00023c6a0>] worker_thread+0x350/0x470
[ 240.352272] [<ffffffc0002419bc>] kthread+0xf0/0xfc
[ 240.357019] 2 locks held by kworker/3:2/129:
[ 240.361248] #0: ("events"){.+.+.+}, at: [<ffffffc00023b840>] process_one_work+0x1f8/0x50c
[ 240.369562] #1: ((&fw_work->work)){+.+.+.}, at: [<ffffffc00023b840>] process_one_work+0x1f8/0x50c
[ 240.378589] task PC stack pid father
[ 240.384501] kworker/1:1 D ffffffc000205760 0 40 2 0x00000000
[ 240.391524] Workqueue: events mtk_atomic_work
[ 240.395884] Call trace:
[ 240.398317] [<ffffffc000205760>] __switch_to+0x80/0x8c
[ 240.403448] [<ffffffc00027279c>] lock_acquire+0x128/0x164
[ 240.408821] kworker/3:2 D ffffffc000205760 0 129 2 0x00000000
[ 240.415867] Workqueue: events request_firmware_work_func
[ 240.421138] Call trace:
[ 240.423589] [<ffffffc000205760>] __switch_to+0x80/0x8c
[ 240.428688] [<ffffffc00088dae0>] __schedule+0x540/0x7b8
[ 240.433886] [<ffffffc00088ddd0>] schedule+0x78/0x84
[ 240.438732] [<ffffffc0006dfd48>] __mmc_claim_host+0xe8/0x1c8
[ 240.444361] [<ffffffc0006edd6c>] sdio_claim_host+0x74/0x84
[ 240.449801] [<ffffffbffc163d08>] 0xffffffbffc163d08
[ 240.454649] [<ffffffbffc165104>] 0xffffffbffc165104
[ 240.459486] [<ffffffbffc130cf8>] mwifiex_dnld_fw+0x98/0x110 [mwifiex]
[ 240.465882] [<ffffffbffc12ee88>] mwifiex_remove_card+0x2c4/0x5fc [mwifiex]
[ 240.472705] [<ffffffc000596780>] request_firmware_work_func+0x44/0x80
[ 240.479090] [<ffffffc00023b934>] process_one_work+0x2ec/0x50c
[ 240.484794] [<ffffffc00023c6a0>] worker_thread+0x350/0x470
[ 240.490231] [<ffffffc0002419bc>] kthread+0xf0/0xfc
This patch adds missing sdio_release_host() call so that wlan driver
thread can claim sdio host.
Fixes: 4863e4cc31d647e1 ("Bluetooth: btmrvl: release sdio bus after firmware is up")
Signed-off-by: Chin-Ran Lo <crlo@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Constify the ACPI device ID array, no need to have it writable at
runtime. Also drop the unused RT5645_INIT_REG_LEN define.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Bard Liao <bardliao@realtek.com>
Cc: Oder Chiou <oder_chiou@realtek.com>
Cc: John Lin <john.lin@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
These are used at least by Acer with BCM43241.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The IDs should all be for Broadcom BCM43241 module, and
hci_bcm is now the proper driver for them. This removes one
of two different ways of handling PM with the module.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
* pnfs_generic:
NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
NFSv4.1/pNFS: Fix a race in initiate_file_draining()
NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
NFS: Relax requirements in nfs_flush_incompatible
NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
NFS: Allow multiple commit requests in flight per file
NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
NFSv4: List stateid information in the callback tracepoints
NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
pNFS: If we have to delay the layout callback, mark the layout for return
NFSv4.1/pNFS: Add a helper to mark the layout as returned
pNFS: Ensure nfs4_layoutget_prepare returns the correct error
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Make it more obvious what we're returning...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Peng Tao points out that the call to pnfs_mark_matching_lsegs_return()
could race with pnfs_put_lseg(), in which case the layout segment is
cleared, but no layoutreturn will be sent.
Fix is to replace the call to pnfs_mark_matching_lsegs_invalid().
Reported-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Fix a bug whereby if all the layout segments could be immediately freed,
the call to pnfs_error_mark_layout_for_return() would never result in
a layoutreturn.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for
return, then it must also set the return iomode.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
A stateid is a structure, pass it as a pointer.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unsigned integers can never be negative, so drop this check.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We have split the setting up of all the resources in two steps:
1) talk_to_blkback - which figures out the num_ring_pages (from
the default value of zero), sets up shadow and so
2) blkfront_connect - does the real part of filling out the
internal structures.
The problem is if we bypass the 1) step and go straight to 2)
and call blkfront_setup_indirect where we use the macro
BLK_RING_SIZE - which returns an negative value (because
sz is zero - since num_ring_pages is zero - since it has never
been set).
We can fix this by making sure that we always have called
talk_to_blkback before going to blkfront_connect.
Or we could set in blkfront_probe info->nr_ring_pages = 1
to have a default value. But that looks odd - as we haven't
actually negotiated any ring size.
This patch changes XenbusStateConnected state to detect if
we haven't done the initial handshake - and if so continue
on as if were in XenbusStateInitWait state.
We also roll the error recovery (freeing the structure) into
talk_to_blkback error path - which is safe since that function
is only called from blkback_changed.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
This patch fixs two memleaks:
backtrace:
[<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
[<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0
[<ffffffff81534028>] xen_blkbk_probe+0x58/0x230
[<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130
[<ffffffff81511716>] driver_probe_device+0x166/0x2c0
[<ffffffff815119bc>] __device_attach_driver+0xac/0xb0
[<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90
[<ffffffff81511ab7>] __device_attach+0xc7/0x120
[<ffffffff81511b23>] device_initial_probe+0x13/0x20
[<ffffffff8151059a>] bus_probe_device+0x9a/0xb0
[<ffffffff8150f0a1>] device_add+0x3b1/0x5c0
[<ffffffff8150f47e>] device_register+0x1e/0x30
[<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170
[<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0
[<ffffffff8146b1bb>] backend_changed+0x1b/0x20
[<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
unreferenced object 0xffff880007ba8ef8 (size 224):
backtrace:
[<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
[<ffffffff81205c73>] __kmalloc+0xd3/0x1e0
[<ffffffff81534d87>] frontend_changed+0x2c7/0x580
[<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0
[<ffffffff8146b2c0>] frontend_changed+0x10/0x20
[<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
[<ffffffff810d3e97>] kthread+0xd7/0xf0
[<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800048dcd38 (size 224):
The first leak is caused by not put() the be->blkif reference
which we had gotten in xen_blkif_alloc(), while the second is
us not freeing blkif->rings in the right place.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Make st_* statistics per ring and the VBD sysfs would iterate over all the
rings.
Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
are torn down, so it's safe.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Aligned the variables on the same column.
|
|
The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.
Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).
The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.
On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.
So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.
The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.
Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.
To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.
Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.
[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
The code to get a request is always the same. Therefore we can factorize
it in a single function.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
every attempt to purge the LRU. This operation can't ever succeed though,
as the kthread hasn't marked itself as freezable.
Before (hopefully eventually) kthread freezing gets converted to fileystem
freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
generate I/O during suspend).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
With the multi-queue support we could fail at setting up
some of the rings and fail the connection. That meant that
all resources tied to rings[0..n-1] (where n is the ring
that failed to be setup). Eventually the frontend will switch
to the states and we will call xen_blkif_disconnect.
However we do not want to be at the mercy of the frontend
deciding when to change states. This allows us to do the
cleanup right away and freeing resources.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Lets return sensible values instead of -1.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|