Age | Commit message (Collapse) | Author |
|
Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.
In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.
We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.
Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.
There are no functional changes in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-10-22
Here's probably the last bluetooth-next pull request for 4.4. Among
several other changes it contains the rest of the fixes & cleanups from
the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).
- Refactoring & cleanups to 6lowpan code
- New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
- Fix (quirk) for broken Broadcom BCM2045 controllers
- Support for latest Apple controllers
- Improvements to the vendor diagnostic message support
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are still four devices which are currently supported both by the new
rtl8xxxu driver and rtlwifi. To not break existing setups enable the support
for these four devices only when CONFIG_RTL8XXXU_UNTESTED is turned on.
Once rtl8xxxu support is found to be good enough the devices can be removed
from rtlwifi and enabled by default in rtl8xxxu.
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
This is the log recovery support. The process is quite straightforward.
We scan the log and read all valid meta/data/parity into memory. If a
stripe's data/parity checksum is correct, the stripe will be recoveried.
Otherwise, it's discarded and we don't scan the log further. The reclaim
process guarantees stripe which starts to be flushed raid disks has
completed data/parity and has correct checksum. To recovery a stripe, we
just copy its data/parity to corresponding raid disks.
The trick thing is superblock update after recovery. we can't let
superblock point to last valid meta block. The log might look like:
| meta 1| meta 2| meta 3|
meta 1 is valid, meta 2 is invalid. meta 3 could be valid. If superblock
points to meta 1, we write a new valid meta 2n. If crash happens again,
new recovery will start from meta 1. Since meta 2n is valid, recovery
will think meta 3 is valid, which is wrong. The solution is we create a
new meta in meta2 with its seq == meta 1's seq + 10 and let superblock
points to meta2. recovery will not think meta 3 is a valid meta,
because its seq is wrong
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
This is the reclaim support for raid5 log. A stripe write will have
following steps:
1. reconstruct the stripe, read data/calculate parity. ops_run_io
prepares to write data/parity to raid disks
2. hijack ops_run_io. stripe data/parity is appending to log disk
3. flush log disk cache
4. ops_run_io run again and do normal operation. stripe data/parity is
written in raid array disks. raid core can return io to upper layer.
5. flush cache of all raid array disks
6. update super block
7. log disk space used by the stripe can be reused
In practice, several stripes consist of an io_unit and we will batch
several io_unit in different steps, but the whole process doesn't
change.
It's possible io return just after data/parity hit log disk, but then
read IO will need read from log disk. For simplicity, IO return happens
at step 4, where read IO can directly read from raid disks.
Currently reclaim run if there is specific reclaimable space (1/4 disk
size or 10G) or we are out of space. Reclaim is just to free log disk
spaces, it doesn't impact data consistency. The size based force reclaim
is to make sure log isn't too big, so recovery doesn't scan log too
much.
Recovery make sure raid disks and log disk have the same data of a
stripe. If crash happens before 4, recovery might/might not recovery
stripe's data/parity depending on if data/parity and its checksum
matches. In either case, this doesn't change the syntax of an IO write.
After step 3, stripe is guaranteed recoverable, because stripe's
data/parity is persistent in log disk. In some cases, log disk content
and raid disks content of a stripe are the same, but recovery will still
copy log disk content to raid disks, this doesn't impact data
consistency. space reuse happens after superblock update and cache
flush.
There is one situation we want to avoid. A broken meta in the middle of
a log causes recovery can't find meta at the head of log. If operations
require meta at the head persistent in log, we must make sure meta
before it persistent in log too. The case is stripe data/parity is in
log and we start write stripe to raid disks (before step 4). stripe
data/parity must be persistent in log before we do the write to raid
disks. The solution is we restrictly maintain io_unit list order. In
this case, we only write stripes of an io_unit to raid disks till the
io_unit is the first one whose data/parity is in log.
The io_unit list order is important for other cases too. For example,
some io_unit are reclaimable and others not. They can be mixed in the
list, we shouldn't reuse space of an unreclaimable io_unit.
Includes fixes to problems which were...
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
This introduces a simple log for raid5. Data/parity writing to raid
array first writes to the log, then write to raid array disks. If
crash happens, we can recovery data from the log. This can speed up
raid resync and fix write hole issue.
The log structure is pretty simple. Data/meta data is stored in block
unit, which is 4k generally. It has only one type of meta data block.
The meta data block can track 3 types of data, stripe data, stripe
parity and flush block. MD superblock will point to the last valid
meta data block. Each meta data block has checksum/seq number, so
recovery can scan the log correctly. We store a checksum of stripe
data/parity to the metadata block, so meta data and stripe data/parity
can be written to log disk together. otherwise, meta data write must
wait till stripe data/parity is finished.
For stripe data, meta data block will record stripe data sector and
size. Currently the size is always 4k. This meta data record can be made
simpler if we just fix write hole (eg, we can record data of a stripe's
different disks together), but this format can be extended to support
caching in the future, which must record data address/size.
For stripe parity, meta data block will record stripe sector. It's
size should be 4k (for raid5) or 8k (for raid6). We always store p
parity first. This format should work for caching too.
flush block indicates a stripe is in raid array disks. Fixing write
hole doesn't need this type of meta data, it's for caching extension.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
When a stripe finishes construction, we write the stripe to raid in
ops_run_io normally. With log, we do a bunch of other operations before
the stripe is written to raid. Mainly write the stripe to log disk,
flush disk cache and so on. The operations are still driven by raid5d
and run in the stripe state machine. We introduce a new state for such
stripe (trapped into log). The stripe is in this state from the time it
first enters ops_run_io (finish construction) to the time it is written
to raid. Since we know the state is only for log, we bypass other
check/operation in handle_stripe.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Next several patches use some raid5 functions, rename them with raid5
prefix and export out.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Journal device stores data in a log structure. We need record the log
start. Here we override md superblock recovery_offset for this purpose.
This field of a journal device is meaningless otherwise.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Next patches will use a disk as raid5/6 journaling. We need a new disk
role to present the journal device and add MD_FEATURE_JOURNAL to
feature_map for backward compability.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Add the following two macros for special roles: spare and faulty
MD_DISK_ROLE_SPARE 0xffff
MD_DISK_ROLE_FAULTY 0xfffe
Add MD_DISK_ROLE_MAX 0xff00 as the maximal possible regular role,
and minimal value of special role.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
To incorporate --grow feature executed on one node, other nodes need to
acknowledge the change in number of disks. Call update_raid_disks()
to update internal data structures.
This leads to call check_reshape() -> md_allow_write() -> md_update_sb(),
this results in a deadlock. This is done so it can safely allocate memory
(which might trigger writeback which might write to raid1). This is
not required for md with a bitmap.
In the clustered case, we don't perform md_update_sb() in md_allow_write(),
but in do_md_run(). Also we disable safemode for clustered mode.
mddev->recovery_cp need not be set in check_sb_changes() because this
is required only when a node reads another node's bitmap. mddev->recovery_cp
(which is read from sb->resync_offset), is set only if mddev is in_sync.
Since we disabled safemode, in_sync is set to zero.
In a clustered environment, the MD may not be in sync because another
node could be writing to it. So make sure that in_sync is not set in
case of clustered node in __md_stop_writes().
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
The arg isn't used, so its presence is only confusing.
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
It is common practice in the kernel to leave out this case.
It isn't needed and adds little if any value.
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
This patches fixes sparse warnings like incorrect type in assignment
(different base types), cast to restricted __le64.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
supposed to rotate the replicas differently and so provide better
resilience. In particular it could survive more combinations of 2
drive failures.
Unfortunately. due to a coding error, this some did what was wanted,
sometimes improved less than we hoped, and sometimes - in very
unlikely circumstances - put multiple replicas on the same device so
the redundancy was harmed.
No public user-space tool has created arrays using this layout so it
is very unlikely that zero-redundancy arrays actually exist. Probably
no arrays using any form of the new layout exist. But we cannot be
certain.
So use another bit in the 'layout' number and introduce a bug-fixed
version of the layout.
Also when assembling an array, if it has a zero-redundancy layout,
give a warning.
Reported-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
Reported-by: Nate Dailey <nate.dailey@stratus.com>
Fixes: bd870a16c594 ("md/raid10: Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
Reported-and-tested-by: Nate Dailey <nate.dailey@stratus.com>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
Fixes: cd5ff9a16f08 ("md/raid1: Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Some tools have a lot of options, so, providing a way to show help just
for some of them may come handy:
$ perf report -h --tui
Usage: perf report [<options>]
--tui Use the TUI interface
$ perf report -h --tui --showcpuutilization -b -c
Usage: perf report [<options>]
-b, --branch-stack use branch records for per branch histogram filling
-c, --comms <comm[,comm...]>
only consider symbols in these comms
--showcpuutilization
Show sample percentage for different cpu modes
--tui Use the TUI interface
$
Using it with perf bash completion is also handy, just make sure you
source the needed file:
$ . ~/git/linux/tools/perf/perf-completion.sh
Then press tab/tab after -- to see a list of options, put them after -h
and only the options chosen will have its help presented:
$ perf report -h --
--asm-raw --demangle-kernel --group
--kallsyms --pretty --stdio
--branch-history --disassembler-style --gtk
--max-stack --showcpuutilization --symbol-filter
--branch-stack --dsos --header
--mem-mode --show-info --symbols
--call-graph --dump-raw-trace --header-only
--modules --show-nr-samples --symfs
--children --exclude-other --hide-unresolved
--objdump --show-ref-call-graph --threads
--column-widths --fields --ignore-callees
--parent --show-total-period --tid
--comms --field-separator --input
--percentage --socket-filter --tui
--cpu --force --inverted
--percent-limit --sort --verbose
--demangle --full-source-path --itrace
--pid --source --vmlinux
$ perf report -h --socket-filter
Usage: perf report [<options>]
--socket-filter <n>
only show processor socket that match with this filter
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-83mcdd3wj0379jcgea8w0fxa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When asking for a listing of the options, be it using -h or when an
unknown option is passed, order it by one-letter options, then the ones
having just long names.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-41qh68t35n4ehrpsuazp1dx8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
struct bnxt_pf_info needs to be always defined. Move bnxt_update_vf_mac()
to bnxt_sriov.c and add some missing #ifdef CONFIG_BNXT_SRIOV.
Reported-by: Jim Hull <jim.hull@hpe.com>
Tested-by: Jim Hull <jim.hull@hpe.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Upon registering a FPGA Manager low level driver, FPGA Manager
core overwrites the platform drvdata pointer. Prior to this commit
zynq-fpga falsely relied on this pointer to still be valid at remove()
time.
Reported-by: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Acked-by: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Remove unnecessary null pointer checks. We want the caller of
these functions to do their own pointer checks. Add some
comments to document this.
Signed-off-by: Alan Tull <atull@opensource.altera.com>
Reviewed-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Ensure device and driver lifetime from of_fpga_mgr_get() to
fpga_mgr_put().
* Don't put_device() in of_fpga_mgr_get, do it in fpga_mgr_put().
(still do put_device if there is an error).
* Do module_get on the low level driver.
* Don't need to module_get(THIS_MODULE) since we won't be allowed
to unload the fpga manager core without unloading low level
driver first.
* Remove unnedessary null check for node pointer.
Signed-off-by: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This gets rid of the code to strip away the header and byteswap,
as well as the check for the sync word.
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Reviewed-by: Josh Cartwright <joshc@ni.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This commit fixes the unbalanced clock handling, where
a failed probe would leave the clock with an enable count of -1.
Reported-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are three xhci driver fixes for reported issues for 4.3-rc7
All have been in linux-next for a while with no problems"
* tag 'usb-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Add spurious wakeup quirk for LynxPoint-LP controllers
xhci: handle no ping response error properly
xhci: don't finish a TD if we get a short transfer event mid TD
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two fixes that resolve reported issues, one with the 8250
driver, and the other with the generic fbcon driver.
Both have been in linux-next for a while"
* tag 'tty-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
fbcon: initialize blink interval before calling fb_set_par
Revert "serial: 8250_dma: don't bother DMA with small transfers"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are four iio driver fixes for 4.3-rc7, fixing some reported
issues. All of these have been in linux-next for a while"
* tag 'staging-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: mxs-lradc: Fix temperature offset
iio: accel: sca3000: memory corruption in sca3000_read_first_n_hw_rb()
iio: st_accel: fix interrupt handling on LIS3LV02
iio: adc: twl4030: Fix ADC[3:6] readings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull infiniband fixes from Doug Ledford:
"It's late in the game, I know, but these fixes seemed important enough
to warrant a late pull request. They all involve oopses or use after
frees or corruptions.
Six serious fixes:
- Hold the mutex around the find and corresponding update of our gid
- The ifa list is rcu protected, copy its contents under rcu to avoid
using a freed structure
- On error, netdev might be null, so check it before trying to
release it
- On init, if workqueue alloc fails, fail init
- The new demux patches exposed a bug in mlx5 and ipath drivers, we
need to use the payload P_Key to determine the P_Key the packet
arrived on because the hardware doesn't tell us the truth
- Due to a couple convoluted error flows, it is possible for the CM
to trigger a use_after_free and a double_free of rb nodes. Add two
checks to prevent that. This code has worked for 10+ years. It is
likely that some of the recent changes have caused this issue to
surface. The current patch will protect us from nasty events for
now while we track down why this is just now showing up"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/cm: Fix rb-tree duplicate free and use-after-free
IB/cma: Use inner P_Key to determine netdev
IB/ucma: check workqueue allocation before usage
IB/cma: Potential NULL dereference in cma_id_from_event
IB/core: Fix use after free of ifa
IB/core: Fix memory corruption in ib_cache_gid_set_default_gid
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Three stable fixes (two in btree code used by DM thinp and one to
properly store flags in DM cache metadata's superblock)"
* tag 'dm-4.3-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: the CLEAN_SHUTDOWN flag was not being set
dm btree: fix leak of bufio-backed block in btree_split_beneath error path
dm btree remove: fix a bug when rebalancing nodes after removal
|
|
Pull block layer fixes from Jens Axboe:
"A final set of fixes for 4.3.
It is (again) bigger than I would have liked, but it's all been
through the testing mill and has been carefully reviewed by multiple
parties. Each fix is either a regression fix for this cycle, or is
marked stable. You can scold me at KS. The pull request contains:
- Three simple fixes for NVMe, fixing regressions since 4.3. From
Arnd, Christoph, and Keith.
- A single xen-blkfront fix from Cathy, fixing a NULL dereference if
an error is returned through the staste change callback.
- Fixup for some bad/sloppy code in nbd that got introduced earlier
in this cycle. From Markus Pargmann.
- A blk-mq tagset use-after-free fix from Junichi.
- A backing device lifetime fix from Tejun, fixing a crash.
- And finally, a set of regression/stable fixes for cgroup writeback
from Tejun"
* 'for-linus' of git://git.kernel.dk/linux-block:
writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy()
NVMe: Fix memory leak on retried commands
block: don't release bdi while request_queue has live references
nvme: use an integer value to Linux errno values
blk-mq: fix use-after-free in blk_mq_free_tag_set()
nvme: fix 32-bit build warning
writeback: fix incorrect calculation of available memory for memcg domains
writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions
writeback: bdi_writeback iteration must not skip dying ones
writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration
nbd: Add locking for tasks
xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"Two fixes.
One is a stopgap to prevent a stack blowout when users have a deep
chain of image clones. (We'll rewrite this code to be non-recursive
for the next window, but in the meantime this is a simple fix that
avoids a crash.)
The second fixes a refcount underflow"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: prevent kernel stack blow up on rbd map
rbd: don't leak parent_spec in rbd_dev_probe_parent()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I have two more small fixes this week:
Qu's fix avoids unneeded COW during fallocate, and Christian found a
memory leak in the error handling of an earlier fix"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix possible leak in btrfs_ioctl_balance()
btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size
|
|
The driver can not be used on a platform with common clock framework
until clk_prepare/clk_unprepare calls are added, otherwise clk_enable
calls will fail and a WARN is generated.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
blocks are marked as dirty and a full writeback occurs.
__commit_transaction() is responsible for setting/clearing
CLEAN_SHUTDOWN (based the flags_mutator that is passed in).
Fix this issue, of the cache's on-disk flags being wrong, by making sure
__commit_transaction() does not reset the flags after the mutator has
altered the flags in preparation for them being serialized to disk.
before:
sb_flags = mutator(le32_to_cpu(disk_super->flags));
disk_super->flags = cpu_to_le32(sb_flags);
disk_super->flags = cpu_to_le32(cmd->flags);
after:
disk_super->flags = cpu_to_le32(cmd->flags);
sb_flags = mutator(le32_to_cpu(disk_super->flags));
disk_super->flags = cpu_to_le32(sb_flags);
Reported-by: Bogdan Vasiliev <bogdan.vasiliev@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
|
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.
Fix this by releasing the previously allocated bufio block using
unlock_block().
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <thornber@redhat.com>
Cc: stable@vger.kernel.org
|
|
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().
The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them. If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.
Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.
Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
|
Mapping an image with a long parent chain (e.g. image foo, whose parent
is bar, whose parent is baz, etc) currently leads to a kernel stack
overflow, due to the following recursion in the reply path:
rbd_osd_req_callback()
rbd_obj_request_complete()
rbd_img_obj_callback()
rbd_img_parent_read_callback()
rbd_obj_request_complete()
...
Limit the parent chain to 16 images, which is ~5K worth of stack. When
the above recursion is eliminated, this limit can be lifted.
Fixes: http://tracker.ceph.com/issues/12538
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
|
|
Currently we leak parent_spec and trigger a "parent reference
underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
The problem is we take the !parent out_err branch and that only drops
refcounts; parent_spec that would've been freed had we called
rbd_dev_unparent() remains and triggers rbd_warn() in
rbd_dev_parent_put() - at that point we have parent_spec != NULL and
parent_ref == 0, so counter ends up being -1 after the decrement.
Redo rbd_dev_probe_parent() to fix this.
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Fixes an error on resume caused by:
fa022a9b65d2886486a022fd66b20c823cd76ad9
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Fixes a harmless error message caused by:
51a4726b04e880fdd9b4e0e58b13f70b0a68a7f5
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
reservation_object_get_fences_rcu already takes the references.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Jammy Zhou <Jammy.Zhou@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-23
This series contains updates to i40e, i40evf, if_link, ixgbe and ixgbevf.
Anjali adds a workaround to drop any flow control frames from being
transmitted from any VSI, so that a malicious VF cannot send flow control
or PFC packets out on the wire. Also fixed a bug in debugfs by grabbing
the filter list lock before adding or deleting a filter.
Akeem fixes an issue where we were unconditionally returning VEB bridge
mode before allowing LB in the add VSI routine, resolve by checking if
the bridge is actually in VEB mode first.
Mitch fixed an issue where the incorrect structure was being used for
VLAN filter list, which meant the VLAN filter list did not get
processed correctly and VLAN filters would not be re-enabled after any
kind of reset.
Helin fixed a problem of possibly getting inconsistent flow control
status after a PF reset. The issue was requested_mode was being set
with a default value during probe, but the hardware state could be a
different value from this mode.
Carolyn fixed a problem where the driver output of the OEM version
string varied from the other tools.
Jean Sacren fixes up kernel documentation by fixing function header
comments to match actual variables used in the functions. Also
cleaned up variable initialization, when the variable would be
over-written immediately.
Hiroshi Shimanoto provides three patches to add "trusted" VF by adding
netlink directives and an NDO entry. Then implement these new controls
in ixgbe and ixgbevf. This series has gone through several iterations
to address all the suggested community changes and concerns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: two KASAN fixes, two EFI boot fixes, two boot-delay
optimization fixes, and a fix for a IRQ handling hang observed on
virtual platforms"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, kasan: Silence KASAN warnings in get_wchan()
compiler, atomics, kasan: Provide READ_ONCE_NOCHECK()
x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
x86/smpboot: Fix CPU #1 boot timeout
x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior
x86/ioapic: Disable interrupts when re-routing legacy IRQs
x86/setup: Extend low identity map to cover whole kernel range
x86/efi: Fix multiple GOP device support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes all around the map: an instrumentation fix, a nohz
usability fix, a lockdep annotation fix and two task group scheduling
fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Add missing lockdep_unpin() annotations
sched/deadline: Fix migration of SCHED_DEADLINE tasks
nohz: Revert "nohz: Set isolcpus when nohz_full is set"
sched/fair: Update task group's load_avg after task migration
sched/fair: Fix overly small weight for interactive group entities
sched, tracing: Stop/start critical timings around the idle=poll idle loop
|
|
Roopa Prabhu says:
====================
mpls: multipath support
This patch adds support for MPLS multipath routes.
Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.
- struct mpls_nh represents a mpls nexthop label forwarding entry
- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib
- In the process of restructuring, this patch also consistently changes all
labels to u8
$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
nexthop as to 800 via inet 40.1.1.2 dev swp3
====================
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.
Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
payload, if present.
Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|