Age | Commit message (Collapse) | Author |
|
Currently, blkg_[rw]stat_recursive_sum() assume that the target
counter is located in pd (blkg_policy_data); however, some counters
are planned to be moved to blkg (blkcg_gq).
This patch updates blkg_[rw]stat_recursive_sum() to take blkg and
blkg_policy pointers instead of pd. If policy is NULL, it indexes
into blkg. If non-NULL, into the blkg's pd of the policy.
The existing usages are updated to maintain the current behaviors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blkcg_[rw]stat are used as stat counters for blkcg policies. It isn't
per-cpu by itself and blk-throttle makes it per-cpu by wrapping around
it. This patch makes blkcg_[rw]stat per-cpu and drop the ad-hoc
per-cpu wrapping in blk-throttle.
* blkg_[rw]stat->cnt is replaced with cpu_cnt which is struct
percpu_counter. This makes syncp unnecessary as remote accesses are
handled by percpu_counter itself.
* blkg_[rw]stat_init() can now fail due to percpu allocation failure
and thus are updated to return int.
* percpu_counters need explicit freeing. blkg_[rw]stat_exit() added.
* As blkg_rwstat->cpu_cnt[] can't be read directly anymore, reading
and summing results are stored in ->aux_cnt[] instead.
* Custom per-cpu stat implementation in blk-throttle is removed.
This makes all blkcg stat counters per-cpu without complicating policy
implmentations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
cgroup stats are local to each cgroup and doesn't propagate to
ancestors by default. When recursive stats are necessary, the sum is
calculated over all the descendants. This initially was for backward
compatibility to support both group-local and recursive stats but this
mode of operation makes general sense as stat update is much hotter
thafn reporting those stats.
This however ends up losing recursive stats when a child is removed.
To work around this, cfq-iosched adds its stats to its parent
cfq_group->dead_stats which is summed up together when calculating
recursive stats.
It's planned that the core stats will be moved to blkcg_gq, so we want
to move the mechanism for keeping track of the stats of dead children
from cfq to blkcg core. This patch adds blkg_[rw]stat->aux_cnt which
are atomic64_t's keeping track of auxiliary counts which are excluded
when reading local counts but included for recursive.
blkg_[rw]stat_merge() which were used by cfq to implement dead_stats
are replaced by blkg_[rw]stat_add_aux(), and cfq now forwards stats of
a dead cgroup to the aux counts of parent->stats instead of separate
->dead_stats.
This will also help making blkg_[rw]stats per-cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blkg (blkcg_gq) currently is created by blkcg policies invoking
blkg_lookup_create() which ends up repeating about the same code in
different policies. Theoretically, this can avoid the overhead of
looking and/or creating blkg's if blkcg is enabled but no policy is in
use; however, the cost of blkg lookup / creation is very low
especially if only the root blkcg is in use which is highly likely if
no blkcg policy is in active use - it boils down to a single very
predictable conditional and surrounding RCU protection.
This patch consolidates blkg creation to a new function
blkcg_bio_issue_check() which is called during bio issue from
generic_make_request_checks(). blkcg_bio_issue_check() is now the
only function which tries to create missing blkg's. The subsequent
policy and request_list operations just perform blkg_lookup() and if
missing falls back to the root.
* blk_get_rl() no longer tries to create blkg. It uses blkg_lookup()
instead of blkg_lookup_create().
* blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu
read locked and blkg already looked up. Both throtl_lookup_tg() and
throtl_lookup_create_tg() are dropped.
* cfq is similarly updated. cfq_lookup_create_cfqg() is replaced with
cfq_lookup_cfqg()which uses blkg_lookup().
This consolidates blkg handling and avoids unnecessary blkg creation
retries under memory pressure. In addition, this provides a common
bio entry point into blkcg where things like common accounting can be
performed.
v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and
!CONFIG_BLK_DEV_THROTTLING.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
__blkg_lookup()
Currently, both throttle and cfq policies implement their own root
blkg (blkcg_gq) lookup fast path. This patch moves root blkg
optimization from throtl_lookup_tg() to __blkg_lookup(). cfq-iosched
currently doesn't use blkg_lookup() but will be converted and drop the
optimization too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blkg_lookup() checks whether the target queue is bypassing and, if
not, calls __blkg_lookup() which first checks the lookup hint and then
performs radix tree walk. The operations upto hint checking are
trivial and there are many users of this function. This patch inlines
blkg_lookup() and the fast path part of __blkg_lookup(). The radix
tree lookup and hint update are now in blkg_lookup_slowpath().
This will help consolidating blkg handling by easing moving root blkcg
short-circuit to inlined lookup fast path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Each active policy has a cpd (blkcg_policy_data) on each blkcg. The
cpd's were allocated by blkcg core and each policy could request to
allocate extra space at the end by setting blkcg_policy->cpd_size
larger than the size of cpd.
This is a bit unusual but blkg (blkcg_gq) policy data used to be
handled this way too so it made sense to be consistent; however, blkg
policy data switched to alloc/free callbacks.
This patch makes similar changes to cpd handling.
blkcg_policy->cpd_alloc/free_fn() are added to replace ->cpd_size. As
cpd allocation is now done from policy side, it can simply allocate a
larger area which embeds cpd at the beginning.
As ->cpd_alloc_fn() may be able to perform all necessary
initializations, this patch makes ->cpd_init_fn() optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
* Rename blkcg->pd[] to blkcg->cpd[] so that cpd is consistently used
for blkcg_policy_data.
* Make blkcg_policy->cpd_init_fn() take blkcg_policy_data instead of
blkcg. This makes it consistent with blkg_policy_data methods and
to-be-added cpd alloc/free methods.
* blkcg_policy_data->blkcg and cpd_to_blkcg() added so that
cpd_init_fn() can determine the associated blkcg from
blkcg_policy_data.
v2: blkcg_policy_data->blkcg initializations were missing. Added.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd
(blkg_policy_data) while the older ones use blkg (blkcg_gq). As using
blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd
can always be mapped to blkg and given that these are policy-specific
methods, it makes sense to converge on pd.
This patch makes all methods deal with pd instead of blkg. Most
conversions are trivial. In blk-cgroup.c, a couple method invocation
sites now test whether pd exists instead of policy state for
consistency. This shouldn't cause any behavioral differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
With the recent addition of alloc and free methods, things became
messier. This patch reorganizes them according to the followings.
* ->pd_alloc_fn()
Responsible for allocation and static initializations - the ones
which can be done independent of where the pd might be attached.
* ->pd_init_fn()
Initializations which require the knowledge of where the pd is
attached.
* ->pd_free_fn()
The counter part of pd_alloc_fn(). Static de-init and freeing.
This leaves ->pd_exit_fn() without any users. Removed.
While at it, collapse an one liner function throtl_pd_exit(), which
has only one user, into its user.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A blkg (blkcg_gq) represents the relationship between a cgroup and
request_queue. Each active policy has a pd (blkg_policy_data) on each
blkg. The pd's were allocated by blkcg core and each policy could
request to allocate extra space at the end by setting
blkcg_policy->pd_size larger than the size of pd.
This is a bit unusual but was done this way mostly to simplify error
handling and all the existing use cases could be handled this way;
however, this is becoming too restrictive now that percpu memory can
be allocated without blocking.
This introduces two new mandatory blkcg_policy methods - pd_alloc_fn()
and pd_free_fn() - which are used to allocate and release pd for a
given policy. As pd allocation is now done from policy side, it can
simply allocate a larger area which embeds pd at the beginning. This
change makes ->pd_size pointless. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When a policy gets activated, it needs to allocate and install its
policy data on all existing blkg's (blkcg_gq's). Because blkg
iteration is protected by a spinlock, it currently counts the total
number of blkg's in the system, allocates the matching number of
policy data on a list and installs them during a single iteration.
This can be simplified by using speculative GFP_NOWAIT allocations
while iterating and falling back to a preallocated policy data on
failure. If the preallocated one has already been consumed, it
releases the lock, preallocate with GFP_KERNEL and then restarts the
iteration. This can be a bit more expensive than before but policy
activation is a very cold path and shouldn't matter.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Since ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root
blkcg"), a request_list always has its blkg associated. Drop
unnecessary rl->blkg NULL test from blk_put_rl().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The following tracepoints are updated to report the cgroup used during
cgroup writeback.
* writeback_write_inode[_start]
* writeback_queue
* writeback_exec
* writeback_start
* writeback_written
* writeback_wait
* writeback_nowork
* writeback_wake_background
* wbc_writepage
* writeback_queue_io
* bdi_dirty_ratelimit
* balance_dirty_pages
* writeback_sb_inodes_requeue
* writeback_single_inode[_start]
Note that writeback_bdi_register is separated out from writeback_class
as reporting cgroup doesn't make sense to it. Tracepoints which take
bdi are updated to take bdi_writeback instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Add a function to determine the path length of a kernfs node. This
for now will be used by writeback tracepoint updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an
earlier implementation, wb's were keyed by blkcg ID.
bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and
allows iterations to start from an arbitrary ID which is used to
interrupt and resume iterations.
Unfortunately, while changing wb to be keyed by memcg ID instead of
blkcg, bdi_for_each_wb() was missed and is still assuming that wb's
are keyed by blkcg ID. This doesn't affect iterations which don't get
interrupted but bdi_split_work_to_wbs() makes use of iteration
resuming on allocation failures and thus may incorrectly skip or
repeat wb's.
Fix it by changing bdi_for_each_wb() to take memcg IDs instead of
blkcg IDs and updating bdi_split_work_to_wbs() accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.3/blkcg
|
|
If the user selected the precise_timestamps or histogram options, report
it in the @stats_list message output.
If the user didn't select these options, no extra tokens are reported,
thus it is backward compatible with old software that doesn't know about
precise timestamps and histogram.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.2
|
|
Add an inline helper for determining is a port is a DSA port.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows cgroup subsystems to use a different name on the unified
hierarchy. cgroup_subsys->name is used on the unified hierarchy,
->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's
automatically set to ->name and the userland visible behavior remains
unchanged.
v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount
options are used only on legacy hierarchies. Suggested by Li
Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/dt
Fourth Round of Renesas ARM Based SoC DT Updates for v4.3
* Enable Clock Domain support of the Clock Pulse Generator (CPG)
Module Stop (MSTP) Clocks driver.
* tag 'renesas-dt4-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: r8a7794 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r8a7793 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r8a7791 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r8a7790 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r8a7779 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r8a7778 dtsi: Add CPG/MSTP Clock Domain
ARM: shmobile: r7s72100 dtsi: Add CPG/MSTP Clock Domain
clk: shmobile: rz: Add CPG/MSTP Clock Domain support
clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support
clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support
clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support
clk: shmobile: Add CPG/MSTP Clock Domain support
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
Renesas ARM Based SoC CPG/MSTP Clock Driver Updates for v4.3
* Add Clock Domain support to the Clock Pulse Generator
(CPG) Module Stop (MSTP) Clocks driver using the generic PM Domain.
* tag 'renesas-clk-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
clk: shmobile: rz: Add CPG/MSTP Clock Domain support
clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support
clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support
clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support
clk: shmobile: Add CPG/MSTP Clock Domain support
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
A value of 2560 (1280k) will accommodate a 10-data-disk stripe
write with chunk size 128k. In the testing I've done using
iozone, fio, and aio-stress across a number of different storage
devices, a value of 1280 does not show a big performance
difference from 512, but will hopefully help software RAID
setups using SATA disks, as reported by Christoph.
NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to
BLK_DEF_MAX_SECTORS. So, this patch essentially changes aeoblk to
Use a larger maximum sector size, and I did not test this.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays. It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.
The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k. In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512. This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/dt
The i.MX device tree updates for 4.3:
- Add audio and eTSEC device support and update dspi node for LS1021A.
- Add initial i.MX6UL and imx6ul-14x14-evk board support, and enable
a bunch of device support for i.MX6UL, including RTC, power key, USB,
QSPI, and dual FEC.
- Enable HDMI and LVDS dual display support for a few imx6qdl boards.
- Support of imx6sl-warp board rev1.12, the version which will be
publicly available for the customers.
- A few i.MX7D device additions, watchdog, cortex-a7 coresight
components, RTC, power key, power off.
- Some Vybrid updates: add device support for I2C, QSPI, eSDHC etc.,
update ADC node, and define stdout-path property.
- A few random updates for i.MX27 and i.MX53 devices.
* tag 'imx-dt-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (44 commits)
ARM: dts: imx6ul: add snvs power key support
ARM: dts: imx6ul: add RTC support
ARM: dts: imx6ul: enable GPC as extended interrupt controller
ARM: dts: imx6sx: correct property name for wakeup source
ARM: dts: add property for maximum ADC clock frequencies
ARM: dts: imx7d: enable snvs rtc, onoffkey and power off
ARM: dts: imx6ul-14x14-evk: add fec1 and fec2 support
ARM: dts: imx: add fec1 and fec2 nodes for SOC i.MX6UL
ARM: dts: imx27: add support of internal rtc
ARM: dts: vf-colibri: define stdout-path property
ARM: dts: ls1021a: Enable the eTSEC ports on QDS and TWR
ARM: dts: ls1021a: Add the eTSEC controller nodes
ARM: dts: imx6ul: add qspi support
ARM: dts: imx6ul: fix low case define in imx6ul-pinfunc.h
ARM: dts: imx6ul: add usb host and function support
ARM: dts: vfxxx: Add io-channel-cells property for ADC node
ARM: dts: ls1021a: Add dts nodes for audio on LS1021A
ARM: imx6qdl-sabreauto.dtsi: enable USB support
ARM: dts: imx: update snvs to use syscon access register
ARM: dts: imx: add imx6ul and imx6ul evk board support
...
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/soc
The i.MX SoC changes for 4.3:
- Add i.MX6 Ultralite SoC support, which is the newest addition to
i.MX6 family. It integrates a single Cortex-A7 core and a power
management module that reduces the complexity of external power
supply and simplifies power sequencing.
- Change SNVS RTC driver to use syscon interface for register access,
and add SNVS power key driver support.
- Add a second clock for mxc rtc driver, and support device tree probe
for the driver.
- Add FEC MAC reference clock and phy fixup initialization for i.MX6UL
platform.
* tag 'imx-soc-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
rtc: snvs: select option REGMAP_MMIO
ARM: imx6ul: add fec MAC refrence clock and phy fixup init
ARM: imx6ul: add fec bits to GPR syscon definition
rtc: mxc: add support of device tree
dt-binding: document the binding for mxc rtc
rtc: mxc: use a second rtc clock
input: snvs_pwrkey: use "wakeup-source" as deivce tree property name
Document: devicetree: input: imx: i.mx snvs power device tree bindings
input: keyboard: imx: add snvs power key driver
Document: dt: fsl: snvs: change support syscon
rtc: snvs: use syscon to access register
ARM: imx: add low-level debug support for i.mx6ul
ARM: imx: add i.mx6ul msl support
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Controllers can perform optional subsystem resets as introduced in NVMe
1.1. This patch adds an IOCTL to trigger the subsystem reset by writing
"NVMe" to the NSSR register.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Controllers part of an NVMe subsystem may be reset by any other controller
in the subsystem. If the device is capable of subsystem resets, this
patch adds detection for such events and performs appropriate controller
initialization upon subsystem reset detection.
The register bit is a RW1C type, so the driver needs to write a 1 to the
status bit to clear the subsystem reset occured bit during initialization.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This reverts commit 484ebaedecc5ddf778a30ee1efab367cbee27030 as the
signed-off-by address is invalid.
Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The header just includes definitions of hardware-specific numbers which
can be written directly in the device tree, there's no need for a public
header containing these definitions.
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Cc: dmaengine@vger.kernel.org
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
authorization"
This reverts commit 1d958bef45030acfc5578263e9de3bb07032b8da as the
signed-off-by address is invalid.
Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
authorization"
This reverts commit 3cf1fc80655d3af7083ea4b3615e5f8532543be7 as the
signed-off-by address is invalid.
Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since the topology API is still in sufficient flux for changes to be
identified disable the use of the userspace ABI by adding #error
statements to the code, ensuring that nobody relies on the headers as
currently defined. It is expected that this change will be reverted for
v4.3.
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Pull drm fixes from Dave Airlie:
"These came in late last week, I wanted to look over the mst one before
forwarding, but it seems good.
Just three i915 and one MST fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Commit planes on each crtc separately.
drm/i915: calculate primary visibility changes instead of calling from set_config
drm/i915: Only dither on 6bpc panels
drm/dp/mst: Remove port after removing connector.
|
|
iMX6 devices suffer from an errata (ERR005174) where the audio FIFO can
be emptied while it is partially full, resulting in misalignment of the
audio samples.
To prevent this, the errata workaround recommends writing N as zero
until the audio FIFO has been loaded by DMA. Writing N=0 prevents the
HDMI bridge from reading from the audio FIFO, effectively disabling
audio.
This means we need to provide the audio driver with a pair of functions
to enable/disable audio. These are dw_hdmi_audio_enable() and
dw_hdmi_audio_disable().
A spinlock is introduced to ensure that setting the CTS/N values can't
race, ensuring that the audio driver calling the enable/disable
functions (which are called in an atomic context) can't race with a
modeset.
Tested-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Introduce dw_hdmi_set_sample_rate(), which allows us to configure the
audio sample rate, setting the CTS/N values appropriately.
Tested-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Add a function to find the start of the SADs in the ELD. This
complements the helper to retrieve the SAD count.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
userspace programs using cxl currently have to use two strategies for
dealing with MMIO errors simultaneously. They have to check every read
for a return of all Fs in case the adapter has gone away and the kernel
has not yet noticed, and they have to deal with SIGBUS in case the
kernel has already noticed, invalidated the mapping and marked the
context as failed.
In order to simplify things, this patch adds an alternative approach
where the kernel will return a page filled with Fs instead of delivering
a SIGBUS. This allows userspace to only need to deal with one of these
two error paths, and is intended for use in libraries that use cxl
transparently and may not be able to safely install a signal handler.
This approach will only work if certain constraints are met. Namely, if
the application is both reading and writing to an address in the problem
state area it cannot assume that a non-FF read is OK, as it may just be
reading out a value it has previously written. Further - since only one
page is used per context a write to a given offset would be visible when
reading the same offset from a different page in the mapping (this only
applies within a single context, not between contexts).
An application could deal with this by e.g. making sure it also reads
from a read-only offset after any reads to a read/write offset.
Due to these constraints, this functionality must be explicitly
requested by userspace when starting the context by passing in the
CXL_START_WORK_ERR_FF flag.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Conflicts:
arch/x86/entry/entry_64_compat.S
arch/x86/math-emu/get_address.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This switches the BCMA GPIO driver to use GPIOLIB_IRQCHIP to
handle its interrupts instead of rolling its own copy of the
irqdomain handling etc.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Allow the topology code to be compiled out so that users who don't need
topology don't need to havve the code compiled in, saving them some
memory.
Some more configuration could be added to remove some of the hooks into
the core data structures but that is probably best done with some
refactoring to use functions to do the updates of the data structures
rather than ifdefing in the code as we'd need to do at the minute.
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
iwlwifi needs new mac80211 patches so merge mac80211-next.git to
wireless-drivers-next.git.
|
|
Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the "ip -6 route" command
using the "encap ila <locator>" option, where <locator> is the
value to set in destination locator of the packet. e.g.
ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
Sets a route where 3333:0:0:1 will be overwritten by
2001:0:0:1 on output.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function updates a checksum field value and skb->csum based on
a value which is the difference between the old and new checksum.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
the checksum field carries a pseudo header. This argument should be a
boolean instead of an int.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.
Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a small consistency problem between the inode and writeback
naming. Writeback calls the "for IO" inode queues b_io and
b_more_io, but the inode calls these the "writeback list" or
i_wb_list. This makes it hard to an new "under writeback" list to
the inode, or call it an "under IO" list on the bdi because either
way we'll have writeback on IO and IO on writeback and it'll just be
confusing. I'm getting confused just writing this!
So, rename the inode "for IO" list variable to i_io_list so we can
add a new "writeback list" in a subsequent patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
|
|
This work adds the possibility of deriving the zone id from the skb->mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.
Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.
In order to not add/expose an extra ct->status bit, the zone structure has
been extended to carry a flag for deriving the mark.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Three minor device-specific fixes and revert of NCQ autosense added
during this -rc1.
It turned out that NCQ autosense as currently implemented interferes
with the usual error handling behavior. It will be revisited in the
near future"
* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_brcmstb: Fix misuse of IS_ENABLED
sata_sx4: Check return code from pdc20621_i2c_read()
Revert "libata: Implement NCQ autosense"
Revert "libata: Implement support for sense data reporting"
Revert "libata-eh: Set 'information' field for autosense"
ata: ahci_brcmstb: Fix warnings with CONFIG_PM_SLEEP=n
|