summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-08-27writeback, memcg: Implement foreign dirty flushingTejun Heo
There's an inherent mismatch between memcg and writeback. The former trackes ownership per-page while the latter per-inode. This was a deliberate design decision because honoring per-page ownership in the writeback path is complicated, may lead to higher CPU and IO overheads and deemed unnecessary given that write-sharing an inode across different cgroups isn't a common use-case. Combined with inode majority-writer ownership switching, this works well enough in most cases but there are some pathological cases. For example, let's say there are two cgroups A and B which keep writing to different but confined parts of the same inode. B owns the inode and A's memory is limited far below B's. A's dirty ratio can rise enough to trigger balance_dirty_pages() sleeps but B's can be low enough to avoid triggering background writeback. A will be slowed down without a way to make writeback of the dirty pages happen. This patch implements foreign dirty recording and foreign mechanism so that when a memcg encounters a condition as above it can trigger flushes on bdi_writebacks which can clean its pages. Please see the comment on top of mem_cgroup_track_foreign_dirty_slowpath() for details. A reproducer follows. write-range.c:: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> static const char *usage = "write-range FILE START SIZE\n"; int main(int argc, char **argv) { int fd; unsigned long start, size, end, pos; char *endp; char buf[4096]; if (argc < 4) { fprintf(stderr, usage); return 1; } fd = open(argv[1], O_WRONLY); if (fd < 0) { perror("open"); return 1; } start = strtoul(argv[2], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } size = strtoul(argv[3], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } end = start + size; while (1) { for (pos = start; pos < end; ) { long bread, bwritten = 0; if (lseek(fd, pos, SEEK_SET) < 0) { perror("lseek"); return 1; } bread = read(0, buf, sizeof(buf) < end - pos ? sizeof(buf) : end - pos); if (bread < 0) { perror("read"); return 1; } if (bread == 0) return 0; while (bwritten < bread) { long this; this = write(fd, buf + bwritten, bread - bwritten); if (this < 0) { perror("write"); return 1; } bwritten += this; pos += bwritten; } } } } repro.sh:: #!/bin/bash set -e set -x sysctl -w vm.dirty_expire_centisecs=300000 sysctl -w vm.dirty_writeback_centisecs=300000 sysctl -w vm.dirtytime_expire_seconds=300000 echo 3 > /proc/sys/vm/drop_caches TEST=/sys/fs/cgroup/test A=$TEST/A B=$TEST/B mkdir -p $A $B echo "+memory +io" > $TEST/cgroup.subtree_control echo $((1<<30)) > $A/memory.high echo $((32<<30)) > $B/memory.high rm -f testfile touch testfile fallocate -l 4G testfile echo "Starting B" (echo $BASHPID > $B/cgroup.procs pv -q --rate-limit 70M < /dev/urandom | ./write-range testfile $((2<<30)) $((2<<30))) & echo "Waiting 10s to ensure B claims the testfile inode" sleep 5 sync sleep 5 sync echo "Starting A" (echo $BASHPID > $A/cgroup.procs pv < /dev/urandom | ./write-range testfile 0 $((2<<30))) v2: Added comments explaining why the specific intervals are being used. v3: Use 0 @nr when calling cgroup_writeback_by_id() to use best-effort flushing while avoding possible livelocks. v4: Use get_jiffies_64() and time_before/after64() instead of raw jiffies_64 and arthimetic comparisons as suggested by Jan. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27writeback, memcg: Implement cgroup_writeback_by_id()Tejun Heo
Implement cgroup_writeback_by_id() which initiates cgroup writeback from bdi and memcg IDs. This will be used by memcg foreign inode flushing. v2: Use wb_get_lookup() instead of wb_get_create() to avoid creating spurious wbs. v3: Interpret 0 @nr as 1.25 * nr_dirty to implement best-effort flushing while avoding possible livelocks. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27writeback: Separate out wb_get_lookup() from wb_get_create()Tejun Heo
Separate out wb_get_lookup() which doesn't try to create one if there isn't already one from wb_get_create(). This will be used by later patches. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27bdi: Add bdi->idTejun Heo
There currently is no way to universally identify and lookup a bdi without holding a reference and pointer to it. This patch adds an non-recycling bdi->id and implements bdi_get_by_id() which looks up bdis by their ids. This will be used by memcg foreign inode flushing. I left bdi_list alone for simplicity and because while rb_tree does support rcu assignment it doesn't seem to guarantee lossless walk when walk is racing aginst tree rebalance operations. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27writeback: Generalize and expose wb_completionTejun Heo
wb_completion is used to track writeback completions. We want to use it from memcg side for foreign inode flushes. This patch updates it to remember the target waitq instead of assuming bdi->wb_waitq and expose it outside of fs-writeback.c. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-26firmware: ti_sci: Allow for device shared and exclusive requestsLokesh Vutla
Sysfw provides an option for requesting exclusive access for a device using the flags MSG_FLAG_DEVICE_EXCLUSIVE. If this flag is not used, the device is meant to be shared across hosts. Once a device is requested from a host with this flag set, any request to this device from a different host will be nacked by sysfw. Current tisci driver enables this flag for every device requests. But this may not be true for all the devices. So provide a separate commands in driver for exclusive and shared device requests. Reviewed-by: Nishanth Menon <nm@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-08-26rcu: Don't include <linux/ktime.h> in rcutiny.hChristoph Hellwig
The kbuild reported a built failure due to a header loop when RCUTINY is enabled with my pending riscv-nommu port. Switch rcutiny.h to only include the minimal required header to get HZ instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-26Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust
This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
2019-08-26bus: ti-sysc: Add module enable quirk for SGX on omap36xxTony Lindgren
Add module enable quirk for SGX needed on omap36xx. Cc: Adam Ford <aford173@gmail.com> Cc: Filip Matijević <filip.matijevic.pz@gmail.com> Cc: "H. Nikolaus Schaller" <hns@goldelico.com> Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Cc: moaz korena <moaz@korena.xyz> Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com> Cc: Philipp Rossak <embed3d@gmail.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
2019-08-26Merge remote-tracking branch 'mlx5-next/mlx5-next' into for-nextDoug Ledford
Bring in the lastest mlx5-next branch as the RDMA RX RoCE Steering Support patch series requires it (first two patches are in mlx5-next, final patch in RDMA tree). Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-26device property: Remove duplicate test for NULLAndy Shevchenko
There is no need to check twice for a NULL in fwnode_call_bool_op(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-26software node: Add software_node_find_by_name()Heikki Krogerus
Function that searches software nodes by node name. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-26thunderbolt: Add support for Intel Ice LakeMika Westerberg
The Thunderbolt controller is integrated into the Ice Lake CPU itself and requires special flows to power it on and off using force power bit in NHI VSEC registers. Runtime PM (RTD3) and Sx flows also differ from the discrete solutions. Now the firmware notifies the driver whether RTD3 entry or exit are possible. The driver is responsible of sending Go2Sx command through link controller mailbox when system enters Sx states (suspend-to-mem/disk). Rest of the ICM firwmare flows follow Titan Ridge. Signed-off-by: Raanan Avargil <raanan.avargil@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26Merge tag 'pullreq201908' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq Pull devfreq changes for v5.4 from MyungJoo Ham: "This series include: - Tegra driver fixes - new Tegra devices - Rockchip fixes - Exynos-bus fixes - Event: add "event type" - Passive: notifier updates" * tag 'pullreq201908' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq: (27 commits) PM / devfreq: passive: Use non-devm notifiers PM / devfreq: exynos-bus: Convert to use dev_pm_opp_set_rate() PM / devfreq: exynos-bus: Correct clock enable sequence PM / devfreq: Correct devm_devfreq_remove_device() documentation PM / devfreq: events: extend events by type of counted data PM / devfreq: exynos-events: change matching code during probe PM / devfreq: tegra20: add COMMON_CLK dependency PM / devfreq: events: add Exynos PPMU new events PM / devfreq: Fix kernel oops on governor module load PM / devfreq: rk3399_dmc: Fix spelling typo PM / devfreq: Fix spelling typo PM / devfreq: Introduce driver for NVIDIA Tegra20 PM / devfreq: tegra: Rename tegra-devfreq.c to tegra30-devfreq.c PM / devfreq: tegra: Enable COMPILE_TEST for the driver PM / devfreq: tegra: Support Tegra30 PM / devfreq: tegra: Reconfigure hardware on governor's restart PM / devfreq: tegra: Move governor registration to driver's probe PM / devfreq: tegra: Mark ACTMON's governor as immutable PM / devfreq: tegra: Avoid inconsistency of current frequency value PM / devfreq: tegra: Clean up driver's probe / remove ...
2019-08-26mtd: nand: fix typo, s/erasablocks/eraseblocksTudor Ambarus
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-08-26mtd: rawnand: sharpsl: add include guard to linux/mtd/sharpsl.hMasahiro Yamada
Add a header include guard just in case. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-08-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping fix from Thomas Gleixner: "A single fix for a regression caused by the generic VDSO implementation where a math overflow causes CLOCK_BOOTTIME to become a random number generator" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
2019-08-25PM / devfreq: events: extend events by type of counted dataLukasz Luba
This patch adds posibility to choose what type of data should be counted by the PPMU counter. Now the type comes from DT where the event has been defined. When there is no 'event-data-type' the default value is used, which is 'read+write data in bytes'. It is needed when you want to know not only read+write data bytes but i.e. only write data in byte, or number of read requests, etc. Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> [Updated property by MyungJoo. data_type --> event_type] Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2019-08-24Merge tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: "Two fixes for regressions in this merge window: - select the Kconfig symbols for the noncoherent dma arch helpers on arm if swiotlb is selected, not just for LPAE to not break then Xen build, that uses swiotlb indirectly through swiotlb-xen - fix the page allocator fallback in dma_alloc_contiguous if the CMA allocation fails" * tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: fix zone selection after an unaddressable CMA allocation arm: select the dma-noncoherent symbols for all swiotlb builds
2019-08-24Merge tag 'gpio-v5.3-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Here is a (hopefully last) set of GPIO fixes for the v5.3 kernel cycle. Two are pretty core: - Fix not reporting open drain/source lines to userspace as "input" - Fix a minor build error found in randconfigs - Fix a chip select quirk on the Freescale SPI - Fix the irqchip initialization semantic order to reflect what it was using the old API" * tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: Fix irqchip initialization order gpio: of: fix Freescale SPI CS quirk handling gpio: Fix build error of function redefinition gpiolib: never report open-drain/source lines as 'input' to user-space
2019-08-24net: use unlikely for dql_avail casexiaolinkui
This is an unlikely case, use unlikely() on it seems logical. Signed-off-by: xiaolinkui <xiaolinkui@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Three important fixes tagged for stable (an indefinite hang, a crash on an assert and a NULL pointer dereference) plus a small series from Luis fixing instances of vfree() under spinlock" * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client: libceph: fix PG split vs OSD (re)connect race ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply ceph: clear page dirty before invalidate page ceph: fix buffer free while holding i_ceph_lock in fill_inode() ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
2019-08-23Merge branch 'for-joerg/arm-smmu/updates' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
2019-08-23fdt: add support for rng-seedHsin-Yi Wang
Introducing a chosen node, rng-seed, which is an entropy that can be passed to kernel called very early to increase initial device randomness. Bootloader should provide this entropy and the value is read from /chosen/rng-seed in DT. Obtain of_fdt_crc32 for CRC check after early_init_dt_scan_nodes(), since early_init_dt_scan_chosen() would modify fdt to erase rng-seed. Add a new interface add_bootloader_randomness() for rng-seed use case. Depends on whether the seed is trustworthy, rng seed would be passed to add_hwgenerator_randomness(). Otherwise it would be passed to add_device_randomness(). Decision is controlled by kernel config RANDOM_TRUST_BOOTLOADER. Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # drivers/char/random.c Signed-off-by: Will Deacon <will@kernel.org>
2019-08-23f2fs: include charset encoding information in the superblockDaniel Rosenberg
Add charset encoding to f2fs to support casefolding. It is modeled after the same feature introduced in commit c83ad55eaa91 ("ext4: include charset encoding information in the superblock") Currently this is not compatible with encryption, similar to the current ext4 imlpementation. This will change in the future. >From the ext4 patch: """ The s_encoding field stores a magic number indicating the encoding format and version used globally by file and directory names in the filesystem. The s_encoding_flags defines policies for using the charset encoding, like how to handle invalid sequences. The magic number is mapped to the exact charset table, but the mapping is specific to ext4. Since we don't have any commitment to support old encodings, the only encoding I am supporting right now is utf8-12.1.0. The current implementation prevents the user from enabling encoding and per-directory encryption on the same filesystem at the same time. The incompatibility between these features lies in how we do efficient directory searches when we cannot be sure the encryption of the user provided fname will match the actual hash stored in the disk without decrypting every directory entry, because of normalization cases. My quickest solution is to simply block the concurrent use of these features for now, and enable it later, once we have a better solution. """ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: fix panic of IO alignment featureChao Yu
Since 07173c3ec276 ("block: enable multipage bvecs"), one bio vector can store multi pages, so that we can not calculate max IO size of bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of f2fs always has that assumption, so finally, it may cause panic during IO submission as below stack. kernel BUG at fs/f2fs/data.c:317! RIP: 0010:__submit_merged_bio+0x8b0/0x8c0 Call Trace: f2fs_submit_page_write+0x3cd/0xdd0 do_write_page+0x15d/0x360 f2fs_outplace_write_data+0xd7/0x210 f2fs_do_write_data_page+0x43b/0xf30 __write_data_page+0xcf6/0x1140 f2fs_write_cache_pages+0x3ba/0xb40 f2fs_write_data_pages+0x3dd/0x8b0 do_writepages+0xbb/0x1e0 __writeback_single_inode+0xb6/0x800 writeback_sb_inodes+0x441/0x910 wb_writeback+0x261/0x650 wb_workfn+0x1f9/0x7a0 process_one_work+0x503/0x970 worker_thread+0x7d/0x820 kthread+0x1ad/0x210 ret_from_fork+0x35/0x40 This patch adds one extra condition to check left space in bio while trying merging page to bio, to avoid panic. This bug was reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204043 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23soc: mediatek: cmdq: change the type of input parameterBibby Hsieh
According to the cmdq hardware design, the subsys is u8, the offset is u16 and the event id is u16. This patch changes the type of subsys, offset and event id to the correct type. Signed-off-by: Bibby Hsieh <bibby.hsieh@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
2019-08-23soc: mediatek: cmdq: reorder the parameterBibby Hsieh
The order of gce instructions is [subsys offset value] so reorder the parameter of cmdq_pkt_write_mask and cmdq_pkt_write function. Signed-off-by: Bibby Hsieh <bibby.hsieh@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
2019-08-23gpio: Move gpiochip_lock/unlock_as_irq to gpio/driver.hYueHaibing
If CONFIG_GPIOLIB is not, gpiochip_lock/unlock_as_irq will conflict as this: In file included from sound/soc/codecs/wm5100.c:18:0: ./include/linux/gpio.h:224:19: error: static declaration of gpiochip_lock_as_irq follows non-static declaration static inline int gpiochip_lock_as_irq(struct gpio_chip *chip, ^~~~~~~~~~~~~~~~~~~~ In file included from sound/soc/codecs/wm5100.c:17:0: ./include/linux/gpio/driver.h:494:5: note: previous declaration of gpiochip_lock_as_irq was here int gpiochip_lock_as_irq(struct gpio_chip *chip, unsigned int offset); ^~~~~~~~~~~~~~~~~~~~ In file included from sound/soc/codecs/wm5100.c:18:0: ./include/linux/gpio.h:231:20: error: static declaration of gpiochip_unlock_as_irq follows non-static declaration static inline void gpiochip_unlock_as_irq(struct gpio_chip *chip, ^~~~~~~~~~~~~~~~~~~~~~ In file included from sound/soc/codecs/wm5100.c:17:0: ./include/linux/gpio/driver.h:495:6: note: previous declaration of gpiochip_unlock_as_irq was here void gpiochip_unlock_as_irq(struct gpio_chip *chip, unsigned int offset); ^~~~~~~~~~~~~~~~~~~~~~ Move them to gpio/driver.h and use CONFIG_GPIOLIB guard this. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: d74be6dfea1b ("gpio: remove gpiod_lock/unlock_as_irq()") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20190822031817.32888-1-yuehaibing@huawei.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-08-23pinctrl/gpio: Take MUX usage into accountStefan Wahren
The user space like gpioinfo only see the GPIO usage but not the MUX usage (e.g. I2C or SPI usage) of a pin. As a user we want to know which pin is free/safe to use. So take the MUX usage of strict pinmux controllers into account to get a more realistic view for ioctl GPIO_GET_LINEINFO_IOCTL. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ramon Fried <rfried.dev@gmail.com> Signed-off-by: Ramon Fried <rfried.dev@gmail.com> Link: https://lore.kernel.org/r/20190814110035.13451-1-ramon.fried@linux.intel.com Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-08-23iommu: Add helpers to set/get default domain typeJoerg Roedel
Add a couple of functions to allow changing the default domain type from architecture code and a function for iommu drivers to request whether the default domain is passthrough. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-08-23soundwire: intel: handle disabled linksPierre-Louis Bossart
On most hardware platforms, SoundWire interfaces are pin-muxed with other interfaces (typically DMIC or I2S) and the status of each link needs to be checked at boot time. For Intel platforms, the BIOS provides a menu to enable/disable the links separately, and the information is provided to the OS with an Intel-specific _DSD property. The same capability will be added to revisions of the MIPI DisCo specification. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190821185821.12690-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-08-23soundwire: add debugfs supportPierre-Louis Bossart
Add base debugfs mechanism for SoundWire bus by creating soundwire root and master-N and slave-x hierarchy. Also add SDW Slave SCP, DP0 and DP-N register debug file. Registers not implemented will print as "XX" Credits: this patch is based on an earlier internal contribution by Vinod Koul, Sanyog Kale, Shreyas Nc and Hardik Shah. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sanyog Kale <sanyog.r.kale@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190821185821.12690-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-08-23timekeeping/vsyscall: Prevent math overflow in BOOTTIME updateThomas Gleixner
The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the nanoseconds based boot time offset left by the clocksource shift. That overflows once the boot time offset becomes large enough. As a consequence CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to misbehave. Fix it by storing a timespec64 representation of the offset when boot time is adjusted and add that to the MONOTONIC base time value in the vdso data page. Using the timespec64 representation avoids a 64bit division in the update code. Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Reported-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chris Clayton <chris2553@googlemail.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
2019-08-22Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - A few more RCU flavor consolidation cleanups. - Miscellaneous fixes. - Updates to RCU's list-traversal macros improving lockdep usability. - Torture-test updates. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - LKMM updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-22driver core: initialize a default DMA mask for platform deviceChristoph Hellwig
We still treat devices without a DMA mask as defaulting to 32-bits for both mask, but a few releases ago we've started warning about such cases, as they require special cases to work around this sloppyness. Add a dma_mask field to struct platform_device so that we can initialize the dma_mask pointer in struct device and initialize both masks to 32-bits by default, replacing similar functionality in m68k and powerpc. The arch_setup_pdev_archdata hooks is now unused and removed. Note that the code looks a little odd with the various conditionals because we have to support platform_device structures that are statically allocated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20190816062435.881-7-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22libceph: allow ceph_buffer_put() to receive a NULL ceph_bufferLuis Henriques
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-22net/mlx5: Add HV VHCA infrastructureEran Ben Elisha
HV VHCA is a layer which provides PF to VF communication channel based on HyperV PCI config channel. It implements Mellanox's Inter VHCA control communication protocol. The protocol contains control block in order to pass messages between the PF and VF drivers, and data blocks in order to pass actual data. The infrastructure is agent based. Each agent will be responsible of contiguous buffer blocks in the VHCA config space. This infrastructure will bind agents to their blocks, and those agents can only access read/write the buffer blocks assigned to them. Each agent will provide three callbacks (control, invalidate, cleanup). Control will be invoked when block-0 is invalidated with a command that concerns this agent. Invalidate callback will be invoked if one of the blocks assigned to this agent was invalidated. Cleanup will be invoked before the agent is being freed in order to clean all of its open resources or deferred works. Block-0 serves as the control block. All execution commands from the PF will be written by the PF over this block. VF will ack on those by writing on block-0 as well. Its format is described by struct mlx5_hv_vhca_control_block layout. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interfaceHaiyang Zhang
This interface driver is a helper driver allows other drivers to have a common interface with the Hyper-V PCI frontend driver. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22PCI: hv: Add a paravirtual backchannel in softwareDexuan Cui
Windows SR-IOV provides a backchannel mechanism in software for communication between a VF driver and a PF driver. These "configuration blocks" are similar in concept to PCI configuration space, but instead of doing reads and writes in 32-bit chunks through a very slow path, packets of up to 128 bytes can be sent or received asynchronously. Nearly every SR-IOV device contains just such a communications channel in hardware, so using this one in software is usually optional. Using the software channel, however, allows driver implementers to leverage software tools that fuzz the communications channel looking for vulnerabilities. The usage model for these packets puts the responsibility for reading or writing on the VF driver. The VF driver sends a read or a write packet, indicating which "block" is being referred to by number. If the PF driver wishes to initiate communication, it can "invalidate" one or more of the first 64 blocks. This invalidation is delivered via a callback supplied by the VF driver by this driver. No protocol is implied, except that supplied by the PF and VF drivers. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22crypto: sha256 - Move lib/sha256.c to lib/cryptoHans de Goede
Generic crypto implementations belong under lib/crypto not directly in lib, likewise the header should be in include/crypto, not include/linux. Note that the code in lib/crypto/sha256.c is not yet available for generic use after this commit, it is still only used by the s390 and x86 purgatory code. Making it suitable for generic use is done in further patches in this series. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-21mm/mmu_notifiers: remove unregister_no_releaseJason Gunthorpe
mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no longer have any users, they have all been converted to use mmu_notifier_put(). So delete this difficult to use interface. Link: https://lore.kernel.org/r/20190806231548.25242-12-jgg@ziepe.ca Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21Merge branch 'odp_fixes' into hmm.gitJason Gunthorpe
From rdma.git Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies, and is being taken into hmm.git due to dependencies in the next patches. * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21posix-cpu-timers: Remove tsk argument from run_posix_cpu_timers()Thomas Gleixner
It's always current. Don't give people wrong ideas. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190819143801.945469967@linutronix.de
2019-08-21ARM: s3c64xx: squash samsung_usb_phy.h into setup-usb-phy.cMasahiro Yamada
This is only used by arch/arm/mach-s3c64xx/setup-usb-phy.c $ git grep samsung_usb_phy_type include/linux/usb/samsung_usb_phy.h:enum samsung_usb_phy_type { $ git grep USB_PHY_TYPE_DEVICE arch/arm/mach-s3c64xx/setup-usb-phy.c: if (type == USB_PHY_TYPE_DEVICE) arch/arm/mach-s3c64xx/setup-usb-phy.c: if (type == USB_PHY_TYPE_DEVICE) include/linux/usb/samsung_usb_phy.h: USB_PHY_TYPE_DEVICE, $ git grep USB_PHY_TYPE_HOST include/linux/usb/samsung_usb_phy.h: USB_PHY_TYPE_HOST, Actually, 'enum samsung_usb_phy_type' is unused; the 'type' parameter has 'int' type. Anyway, there is no need to declare this enum in the globally visible header. Squash the header. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2019-08-21Merge branch 'odp_fixes' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21dma-mapping: remove is_device_dma_capableChristoph Hellwig
No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190816062435.881-6-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-21usb: add a HCD_DMA flag instead of guestimating DMA capabilitiesChristoph Hellwig
The usb core is the only major place in the kernel that checks for a non-NULL device dma_mask to see if a device is DMA capable. This is generally a bad idea, as all major busses always set up a DMA mask, even if the device is not DMA capable - in fact bus layers like PCI can't even know if a device is DMA capable at enumeration time. This leads to lots of workaround in HCD drivers, and also prevented us from setting up a DMA mask for platform devices by default last time we tried. Replace this guess with an explicit HCD_DMA that is set by drivers that appear to have DMA support. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190816062435.881-4-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-21net/mlx5: Create bypass and loopback flow steering namespaces for RDMA RXMark Zhang
Use different namespaces for bypass and switchdev loopback because they have different priorities and default table miss action requirement: 1. bypass: with multiple priorities support, and MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action; 2. switchdev loopback: with single priority support, and MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss action. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-21extable: Add function to search only kernel exception tableSantosh Sivaraj
Certain architecture specific operating modes (e.g., in powerpc machine check handler that is unable to access vmalloc memory), the search_exception_tables cannot be called because it also searches the module exception tables if entry is not found in the kernel exception table. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-5-santosh@fossix.org