summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2015-08-18blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gqTejun Heo
Currently, blkg_[rw]stat_recursive_sum() assume that the target counter is located in pd (blkg_policy_data); however, some counters are planned to be moved to blkg (blkcg_gq). This patch updates blkg_[rw]stat_recursive_sum() to take blkg and blkg_policy pointers instead of pd. If policy is NULL, it indexes into blkg. If non-NULL, into the blkg's pd of the policy. The existing usages are updated to maintain the current behaviors. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: make blkcg_[rw]stat per-cpuTejun Heo
blkcg_[rw]stat are used as stat counters for blkcg policies. It isn't per-cpu by itself and blk-throttle makes it per-cpu by wrapping around it. This patch makes blkcg_[rw]stat per-cpu and drop the ad-hoc per-cpu wrapping in blk-throttle. * blkg_[rw]stat->cnt is replaced with cpu_cnt which is struct percpu_counter. This makes syncp unnecessary as remote accesses are handled by percpu_counter itself. * blkg_[rw]stat_init() can now fail due to percpu allocation failure and thus are updated to return int. * percpu_counters need explicit freeing. blkg_[rw]stat_exit() added. * As blkg_rwstat->cpu_cnt[] can't be read directly anymore, reading and summing results are stored in ->aux_cnt[] instead. * Custom per-cpu stat implementation in blk-throttle is removed. This makes all blkcg stat counters per-cpu without complicating policy implmentations. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with itTejun Heo
cgroup stats are local to each cgroup and doesn't propagate to ancestors by default. When recursive stats are necessary, the sum is calculated over all the descendants. This initially was for backward compatibility to support both group-local and recursive stats but this mode of operation makes general sense as stat update is much hotter thafn reporting those stats. This however ends up losing recursive stats when a child is removed. To work around this, cfq-iosched adds its stats to its parent cfq_group->dead_stats which is summed up together when calculating recursive stats. It's planned that the core stats will be moved to blkcg_gq, so we want to move the mechanism for keeping track of the stats of dead children from cfq to blkcg core. This patch adds blkg_[rw]stat->aux_cnt which are atomic64_t's keeping track of auxiliary counts which are excluded when reading local counts but included for recursive. blkg_[rw]stat_merge() which were used by cfq to implement dead_stats are replaced by blkg_[rw]stat_add_aux(), and cfq now forwards stats of a dead cgroup to the aux counts of parent->stats instead of separate ->dead_stats. This will also help making blkg_[rw]stats per-cpu. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: consolidate blkg creation in blkcg_bio_issue_check()Tejun Heo
blkg (blkcg_gq) currently is created by blkcg policies invoking blkg_lookup_create() which ends up repeating about the same code in different policies. Theoretically, this can avoid the overhead of looking and/or creating blkg's if blkcg is enabled but no policy is in use; however, the cost of blkg lookup / creation is very low especially if only the root blkcg is in use which is highly likely if no blkcg policy is in active use - it boils down to a single very predictable conditional and surrounding RCU protection. This patch consolidates blkg creation to a new function blkcg_bio_issue_check() which is called during bio issue from generic_make_request_checks(). blkcg_bio_issue_check() is now the only function which tries to create missing blkg's. The subsequent policy and request_list operations just perform blkg_lookup() and if missing falls back to the root. * blk_get_rl() no longer tries to create blkg. It uses blkg_lookup() instead of blkg_lookup_create(). * blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu read locked and blkg already looked up. Both throtl_lookup_tg() and throtl_lookup_create_tg() are dropped. * cfq is similarly updated. cfq_lookup_create_cfqg() is replaced with cfq_lookup_cfqg()which uses blkg_lookup(). This consolidates blkg handling and avoids unnecessary blkg creation retries under memory pressure. In addition, this provides a common bio entry point into blkcg where things like common accounting can be performed. v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and !CONFIG_BLK_DEV_THROTTLING. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: move root blkg lookup optimization from throtl_lookup_tg() to ↵Tejun Heo
__blkg_lookup() Currently, both throttle and cfq policies implement their own root blkg (blkcg_gq) lookup fast path. This patch moves root blkg optimization from throtl_lookup_tg() to __blkg_lookup(). cfq-iosched currently doesn't use blkg_lookup() but will be converted and drop the optimization too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: inline [__]blkg_lookup()Tejun Heo
blkg_lookup() checks whether the target queue is bypassing and, if not, calls __blkg_lookup() which first checks the lookup hint and then performs radix tree walk. The operations upto hint checking are trivial and there are many users of this function. This patch inlines blkg_lookup() and the fast path part of __blkg_lookup(). The radix tree lookup and hint update are now in blkg_lookup_slowpath(). This will help consolidating blkg handling by easing moving root blkcg short-circuit to inlined lookup fast path. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methodsTejun Heo
Each active policy has a cpd (blkcg_policy_data) on each blkcg. The cpd's were allocated by blkcg core and each policy could request to allocate extra space at the end by setting blkcg_policy->cpd_size larger than the size of cpd. This is a bit unusual but blkg (blkcg_gq) policy data used to be handled this way too so it made sense to be consistent; however, blkg policy data switched to alloc/free callbacks. This patch makes similar changes to cpd handling. blkcg_policy->cpd_alloc/free_fn() are added to replace ->cpd_size. As cpd allocation is now done from policy side, it can simply allocate a larger area which embeds cpd at the beginning. As ->cpd_alloc_fn() may be able to perform all necessary initializations, this patch makes ->cpd_init_fn() optional. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: minor updates around blkcg_policy_dataTejun Heo
* Rename blkcg->pd[] to blkcg->cpd[] so that cpd is consistently used for blkcg_policy_data. * Make blkcg_policy->cpd_init_fn() take blkcg_policy_data instead of blkcg. This makes it consistent with blkg_policy_data methods and to-be-added cpd alloc/free methods. * blkcg_policy_data->blkcg and cpd_to_blkcg() added so that cpd_init_fn() can determine the associated blkcg from blkcg_policy_data. v2: blkcg_policy_data->blkcg initializations were missing. Added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: make blkcg_policy methods take a pointer to blkcg_policy_dataTejun Heo
The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd (blkg_policy_data) while the older ones use blkg (blkcg_gq). As using blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd can always be mapped to blkg and given that these are policy-specific methods, it makes sense to converge on pd. This patch makes all methods deal with pd instead of blkg. Most conversions are trivial. In blk-cgroup.c, a couple method invocation sites now test whether pd exists instead of policy state for consistency. This shouldn't cause any behavioral differences. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blk-throttle: clean up blkg_policy_data alloc/init/exit/free methodsTejun Heo
With the recent addition of alloc and free methods, things became messier. This patch reorganizes them according to the followings. * ->pd_alloc_fn() Responsible for allocation and static initializations - the ones which can be done independent of where the pd might be attached. * ->pd_init_fn() Initializations which require the knowledge of where the pd is attached. * ->pd_free_fn() The counter part of pd_alloc_fn(). Static de-init and freeing. This leaves ->pd_exit_fn() without any users. Removed. While at it, collapse an one liner function throtl_pd_exit(), which has only one user, into its user. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: replace blkcg_policy->pd_size with ->pd_alloc/free_fn() methodsTejun Heo
A blkg (blkcg_gq) represents the relationship between a cgroup and request_queue. Each active policy has a pd (blkg_policy_data) on each blkg. The pd's were allocated by blkcg core and each policy could request to allocate extra space at the end by setting blkcg_policy->pd_size larger than the size of pd. This is a bit unusual but was done this way mostly to simplify error handling and all the existing use cases could be handled this way; however, this is becoming too restrictive now that percpu memory can be allocated without blocking. This introduces two new mandatory blkcg_policy methods - pd_alloc_fn() and pd_free_fn() - which are used to allocate and release pd for a given policy. As pd allocation is now done from policy side, it can simply allocate a larger area which embeds pd at the beginning. This change makes ->pd_size pointless. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: restructure blkg_policy_data allocation in blkcg_activate_policy()Tejun Heo
When a policy gets activated, it needs to allocate and install its policy data on all existing blkg's (blkcg_gq's). Because blkg iteration is protected by a spinlock, it currently counts the total number of blkg's in the system, allocates the matching number of policy data on a list and installs them during a single iteration. This can be simplified by using speculative GFP_NOWAIT allocations while iterating and falling back to a preallocated policy data on failure. If the preallocated one has already been consumed, it releases the lock, preallocate with GFP_KERNEL and then restarts the iteration. This can be a bit more expensive than before but policy activation is a very cold path and shouldn't matter. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18blkcg: remove unnecessary request_list->blkg NULL test in blk_put_rl()Tejun Heo
Since ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root blkcg"), a request_list always has its blkg associated. Drop unnecessary rl->blkg NULL test from blk_put_rl(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18writeback: update writeback tracepoints to report cgroupTejun Heo
The following tracepoints are updated to report the cgroup used during cgroup writeback. * writeback_write_inode[_start] * writeback_queue * writeback_exec * writeback_start * writeback_written * writeback_wait * writeback_nowork * writeback_wake_background * wbc_writepage * writeback_queue_io * bdi_dirty_ratelimit * balance_dirty_pages * writeback_sb_inodes_requeue * writeback_single_inode[_start] Note that writeback_bdi_register is separated out from writeback_class as reporting cgroup doesn't make sense to it. Tracepoints which take bdi are updated to take bdi_writeback instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18kernfs: implement kernfs_path_len()Tejun Heo
Add a function to determine the path length of a kernfs node. This for now will be used by writeback tracepoint updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18writeback: bdi_for_each_wb() iteration is memcg ID based not blkcgTejun Heo
wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an earlier implementation, wb's were keyed by blkcg ID. bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and allows iterations to start from an arbitrary ID which is used to interrupt and resume iterations. Unfortunately, while changing wb to be keyed by memcg ID instead of blkcg, bdi_for_each_wb() was missed and is still assuming that wb's are keyed by blkcg ID. This doesn't affect iterations which don't get interrupted but bdi_split_work_to_wbs() makes use of iteration resuming on allocation failures and thus may incorrectly skip or repeat wb's. Fix it by changing bdi_for_each_wb() to take memcg IDs instead of blkcg IDs and updating bdi_split_work_to_wbs() accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18Merge branch 'for-4.3-unified-base' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.3/blkcg
2015-08-18dm stats: report precise_timestamps and histogram in @stats_list outputMikulas Patocka
If the user selected the precise_timestamps or histogram options, report it in the @stats_list message output. If the user didn't select these options, no extra tokens are reported, thus it is backward compatible with old software that doesn't know about precise timestamps and histogram. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.2
2015-08-18net: dsa: Add dsa_is_dsa_port() helperAndrew Lunn
Add an inline helper for determining is a port is a DSA port. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18cgroup: introduce cgroup_subsys->legacy_nameTejun Heo
This allows cgroup subsystems to use a different name on the unified hierarchy. cgroup_subsys->name is used on the unified hierarchy, ->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's automatically set to ->name and the userland visible behavior remains unchanged. v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount options are used only on legacy hierarchies. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org
2015-08-18Merge tag 'renesas-dt4-for-v4.3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/dt Fourth Round of Renesas ARM Based SoC DT Updates for v4.3 * Enable Clock Domain support of the Clock Pulse Generator (CPG) Module Stop (MSTP) Clocks driver. * tag 'renesas-dt4-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: r8a7794 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r8a7793 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r8a7791 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r8a7790 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r8a7779 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r8a7778 dtsi: Add CPG/MSTP Clock Domain ARM: shmobile: r7s72100 dtsi: Add CPG/MSTP Clock Domain clk: shmobile: rz: Add CPG/MSTP Clock Domain support clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support clk: shmobile: Add CPG/MSTP Clock Domain support Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18Merge tag 'renesas-clk-for-v4.3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers Renesas ARM Based SoC CPG/MSTP Clock Driver Updates for v4.3 * Add Clock Domain support to the Clock Pulse Generator (CPG) Module Stop (MSTP) Clocks driver using the generic PM Domain. * tag 'renesas-clk-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: clk: shmobile: rz: Add CPG/MSTP Clock Domain support clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support clk: shmobile: Add CPG/MSTP Clock Domain support Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18block: bump BLK_DEF_MAX_SECTORS to 2560Jeff Moyer
A value of 2560 (1280k) will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress across a number of different storage devices, a value of 1280 does not show a big performance difference from 512, but will hopefully help software RAID setups using SATA disks, as reported by Christoph. NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to BLK_DEF_MAX_SECTORS. So, this patch essentially changes aeoblk to Use a larger maximum sector size, and I did not test this. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18Revert "block: remove artifical max_hw_sectors cap"Jeff Moyer
This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc. That commit caused performance regressions for streaming I/O workloads on a number of different storage devices, from SATA disks to external RAID arrays. It also managed to trip up some buggy firmware in at least one drive, causing data corruption. The next patch will bump the default max_sectors_kb value to 1280, which will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress, a value of 1280 does not show a big performance difference from 512. This will hopefully still help the software RAID setup that Christoph saw the original performance gains with while still not regressing other storage configurations. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18Merge tag 'imx-dt-4.3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/dt The i.MX device tree updates for 4.3: - Add audio and eTSEC device support and update dspi node for LS1021A. - Add initial i.MX6UL and imx6ul-14x14-evk board support, and enable a bunch of device support for i.MX6UL, including RTC, power key, USB, QSPI, and dual FEC. - Enable HDMI and LVDS dual display support for a few imx6qdl boards. - Support of imx6sl-warp board rev1.12, the version which will be publicly available for the customers. - A few i.MX7D device additions, watchdog, cortex-a7 coresight components, RTC, power key, power off. - Some Vybrid updates: add device support for I2C, QSPI, eSDHC etc., update ADC node, and define stdout-path property. - A few random updates for i.MX27 and i.MX53 devices. * tag 'imx-dt-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (44 commits) ARM: dts: imx6ul: add snvs power key support ARM: dts: imx6ul: add RTC support ARM: dts: imx6ul: enable GPC as extended interrupt controller ARM: dts: imx6sx: correct property name for wakeup source ARM: dts: add property for maximum ADC clock frequencies ARM: dts: imx7d: enable snvs rtc, onoffkey and power off ARM: dts: imx6ul-14x14-evk: add fec1 and fec2 support ARM: dts: imx: add fec1 and fec2 nodes for SOC i.MX6UL ARM: dts: imx27: add support of internal rtc ARM: dts: vf-colibri: define stdout-path property ARM: dts: ls1021a: Enable the eTSEC ports on QDS and TWR ARM: dts: ls1021a: Add the eTSEC controller nodes ARM: dts: imx6ul: add qspi support ARM: dts: imx6ul: fix low case define in imx6ul-pinfunc.h ARM: dts: imx6ul: add usb host and function support ARM: dts: vfxxx: Add io-channel-cells property for ADC node ARM: dts: ls1021a: Add dts nodes for audio on LS1021A ARM: imx6qdl-sabreauto.dtsi: enable USB support ARM: dts: imx: update snvs to use syscon access register ARM: dts: imx: add imx6ul and imx6ul evk board support ... Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18Merge tag 'imx-soc-4.3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/soc The i.MX SoC changes for 4.3: - Add i.MX6 Ultralite SoC support, which is the newest addition to i.MX6 family. It integrates a single Cortex-A7 core and a power management module that reduces the complexity of external power supply and simplifies power sequencing. - Change SNVS RTC driver to use syscon interface for register access, and add SNVS power key driver support. - Add a second clock for mxc rtc driver, and support device tree probe for the driver. - Add FEC MAC reference clock and phy fixup initialization for i.MX6UL platform. * tag 'imx-soc-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: rtc: snvs: select option REGMAP_MMIO ARM: imx6ul: add fec MAC refrence clock and phy fixup init ARM: imx6ul: add fec bits to GPR syscon definition rtc: mxc: add support of device tree dt-binding: document the binding for mxc rtc rtc: mxc: use a second rtc clock input: snvs_pwrkey: use "wakeup-source" as deivce tree property name Document: devicetree: input: imx: i.mx snvs power device tree bindings input: keyboard: imx: add snvs power key driver Document: dt: fsl: snvs: change support syscon rtc: snvs: use syscon to access register ARM: imx: add low-level debug support for i.mx6ul ARM: imx: add i.mx6ul msl support Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18NVMe: Add nvme subsystem reset IOCTLJon Derrick
Controllers can perform optional subsystem resets as introduced in NVMe 1.1. This patch adds an IOCTL to trigger the subsystem reset by writing "NVMe" to the NSSR register. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18NVMe: Add nvme subsystem reset supportKeith Busch
Controllers part of an NVMe subsystem may be reset by any other controller in the subsystem. If the device is capable of subsystem resets, this patch adds detection for such events and performs appropriate controller initialization upon subsystem reset detection. The register bit is a RW1C type, so the driver needs to write a 1 to the status bit to clear the subsystem reset occured bit during initialization. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18Revert "usb: interface authorization: Declare authorized attribute"Greg Kroah-Hartman
This reverts commit 484ebaedecc5ddf778a30ee1efab367cbee27030 as the signed-off-by address is invalid. Cc: Stefan Koch <stefan.koch10@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18dmaengine: jz4780: Remove request type number definitions headerAlex Smith
The header just includes definitions of hardware-specific numbers which can be written directly in the device tree, there's no need for a public header containing these definitions. Signed-off-by: Alex Smith <alex.smith@imgtec.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com> Cc: dmaengine@vger.kernel.org Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-08-18Revert "usb: interface authorization: Introduces the default interface ↵Greg Kroah-Hartman
authorization" This reverts commit 1d958bef45030acfc5578263e9de3bb07032b8da as the signed-off-by address is invalid. Cc: Stefan Koch <stefan.koch10@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18Revert "usb: interface authorization: Use a flag for the default device ↵Greg Kroah-Hartman
authorization" This reverts commit 3cf1fc80655d3af7083ea4b3615e5f8532543be7 as the signed-off-by address is invalid. Cc: Stefan Koch <stefan.koch10@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18ASoC: topology: Disable use from userspaceMark Brown
Since the topology API is still in sufficient flux for changes to be identified disable the use of the userspace ABI by adding #error statements to the code, ensuring that nobody relies on the headers as currently defined. It is expected that this change will be reverted for v4.3. Signed-off-by: Mark Brown <broonie@kernel.org>
2015-08-18Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "These came in late last week, I wanted to look over the mst one before forwarding, but it seems good. Just three i915 and one MST fix" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/i915: Commit planes on each crtc separately. drm/i915: calculate primary visibility changes instead of calling from set_config drm/i915: Only dither on 6bpc panels drm/dp/mst: Remove port after removing connector.
2015-08-18drm: bridge/dw_hdmi: introduce interfaces to enable and disable audioRussell King
iMX6 devices suffer from an errata (ERR005174) where the audio FIFO can be emptied while it is partially full, resulting in misalignment of the audio samples. To prevent this, the errata workaround recommends writing N as zero until the audio FIFO has been loaded by DMA. Writing N=0 prevents the HDMI bridge from reading from the audio FIFO, effectively disabling audio. This means we need to provide the audio driver with a pair of functions to enable/disable audio. These are dw_hdmi_audio_enable() and dw_hdmi_audio_disable(). A spinlock is introduced to ensure that setting the CTS/N values can't race, ensuring that the audio driver calling the enable/disable functions (which are called in an atomic context) can't race with a modeset. Tested-by: Yakir Yang <ykk@rock-chips.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-18drm: bridge/dw_hdmi: introduce interface to setting sample rateRussell King
Introduce dw_hdmi_set_sample_rate(), which allows us to configure the audio sample rate, setting the CTS/N values appropriately. Tested-by: Yakir Yang <ykk@rock-chips.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-18drm/edid: add function to help find SADsRussell King
Add a function to find the start of the SADs in the ELD. This complements the helper to retrieve the SAD count. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-18cxl: Add alternate MMIO error handlingIan Munsie
userspace programs using cxl currently have to use two strategies for dealing with MMIO errors simultaneously. They have to check every read for a return of all Fs in case the adapter has gone away and the kernel has not yet noticed, and they have to deal with SIGBUS in case the kernel has already noticed, invalidated the mapping and marked the context as failed. In order to simplify things, this patch adds an alternative approach where the kernel will return a page filled with Fs instead of delivering a SIGBUS. This allows userspace to only need to deal with one of these two error paths, and is intended for use in libraries that use cxl transparently and may not be able to safely install a signal handler. This approach will only work if certain constraints are met. Namely, if the application is both reading and writing to an address in the problem state area it cannot assume that a non-FF read is OK, as it may just be reading out a value it has previously written. Further - since only one page is used per context a write to a given offset would be visible when reading the same offset from a different page in the mapping (this only applies within a single context, not between contexts). An application could deal with this by e.g. making sure it also reads from a read-only offset after any reads to a read/write offset. Due to these constraints, this functionality must be explicitly requested by userspace when starting the context by passing in the CXL_START_WORK_ERR_FF flag. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixesIngo Molnar
Conflicts: arch/x86/entry/entry_64_compat.S arch/x86/math-emu/get_address.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-18bcma: switch GPIO portions to use GPIOLIB_IRQCHIPLinus Walleij
This switches the BCMA GPIO driver to use GPIOLIB_IRQCHIP to handle its interrupts instead of rolling its own copy of the irqdomain handling etc. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-08-17ASoC: topology: Add Kconfig option for topologyMark Brown
Allow the topology code to be compiled out so that users who don't need topology don't need to havve the code compiled in, saving them some memory. Some more configuration could be added to remove some of the hooks into the core data structures but that is probably best done with some refactoring to use functions to do the updates of the data structures rather than ifdefing in the code as we'd need to do at the minute. Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-08-18Merge tag 'mac80211-next-for-davem-2015-08-14' mac80211-next.gitKalle Valo
iwlwifi needs new mac80211 patches so merge mac80211-next.git to wireless-drivers-next.git.
2015-08-17net: Identifier Locator Addressing moduleTom Herbert
Adding new module name ila. This implements ILA translation. Light weight tunnel redirection is used to perform the translation in the data path. This is configured by the "ip -6 route" command using the "encap ila <locator>" option, where <locator> is the value to set in destination locator of the packet. e.g. ip -6 route add 3333:0:0:1:5555:0:1:0/128 \ encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0 Sets a route where 3333:0:0:1 will be overwritten by 2001:0:0:1 on output. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17net: Add inet_proto_csum_replace_by_diff utility functionTom Herbert
This function updates a checksum field value and skb->csum based on a value which is the difference between the old and new checksum. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17net: Change pseudohdr argument of inet_proto_csum_replace* to be a boolTom Herbert
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates the checksum field carries a pseudo header. This argument should be a boolean instead of an int. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17lwt: Add support to redirect dst.inputTom Herbert
This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17inode: rename i_wb_list to i_io_listDave Chinner
There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-18netfilter: nf_conntrack: add efficient mark to zone mappingDaniel Borkmann
This work adds the possibility of deriving the zone id from the skb->mark field in a scalable manner. This allows for having only a single template serving hundreds/thousands of different zones, for example, instead of the need to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. In order to not add/expose an extra ct->status bit, the zone structure has been extended to carry a flag for deriving the mark. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18netfilter: nf_conntrack: add direction support for zonesDaniel Borkmann
This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct->mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a "stable" value that would be equal for both directions at all times, f.e. if only zone->id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 | netperf |--+ +----------------+ | CT zone := mark [ORIGINAL] [ip,sport] := X +--------------+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--------------+ +---------------+ | +--- tenant-2 ---+ | ~~~|~~~ | netperf |--+ +-----------+ | +----------------+ mark := 2 | netserver |------ ... + [ip,sport] := X +-----------+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-17Merge branch 'for-4.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Three minor device-specific fixes and revert of NCQ autosense added during this -rc1. It turned out that NCQ autosense as currently implemented interferes with the usual error handling behavior. It will be revisited in the near future" * 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: ahci_brcmstb: Fix misuse of IS_ENABLED sata_sx4: Check return code from pdc20621_i2c_read() Revert "libata: Implement NCQ autosense" Revert "libata: Implement support for sense data reporting" Revert "libata-eh: Set 'information' field for autosense" ata: ahci_brcmstb: Fix warnings with CONFIG_PM_SLEEP=n