summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-15x86/tsc: Force TSC_ADJUST register to value >= zeroThomas Gleixner
Roland reported that his DELL T5810 sports a value add BIOS which completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with random negative TSC_ADJUST values, different on all CPUs. That renders the TSC useless because the sycnchronization check fails. Roland tested the new TSC_ADJUST mechanism. While it manages to readjust the TSCs he needs to disable the TSC deadline timer, otherwise the machine just stops booting. Deeper investigation unearthed that the TSC deadline timer is sensitive to the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an interrupt storm caused by the TSC deadline timer. This does not make any sense and it's hard to imagine what kind of hardware wreckage is behind that misfeature, but it's reliably reproducible on other systems which have TSC_ADJUST and TSC deadline timer. While it would be understandable that a big enough negative value which moves the resulting TSC readout into the negative space could have the described effect, this happens even with a adjust value of -1, which keeps the TSC readout definitely in the positive space. The compare register for the TSC deadline timer is set to a positive value larger than the TSC, but despite not having reached the deadline the interrupt is raised immediately. If this happens on the boot CPU, then the machine dies silently because this setup happens before the NMI watchdog is armed. Further experiments showed that any other adjustment of TSC_ADJUST works as expected as long as it stays in the positive range. The direction of the adjustment has no influence either. See the lkml link for further analysis. Yet another proof for the theory that timers are designed by janitors and the underlying (obviously undocumented) mechanisms which allow BIOSes to wreckage them are considered a feature. Well done Intel - NOT! To address this wreckage add the following sanity measures: - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0 - If the TSC_ADJUST value on any cpu is negative, set it to 0 - Prevent the cross package synchronization mechanism from setting negative TSC_ADJUST values. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15x86/tsc: Validate TSC_ADJUST after resumeThomas Gleixner
Some 'feature' BIOSes fiddle with the TSC_ADJUST register during suspend/resume which renders the TSC unusable. Add sanity checks into the resume path and restore the original value if it was adjusted. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15Merge branch 'patchwork' into v4l_for_linusMauro Carvalho Chehab
* patchwork: (496 commits) [media] v4l: tvp5150: Add missing break in set control handler [media] v4l: tvp5150: Don't inline the tvp5150_selmux() function [media] v4l: tvp5150: Compile tvp5150_link_setup out if !CONFIG_MEDIA_CONTROLLER [media] em28xx: don't store usb_device at struct em28xx [media] em28xx: use usb_interface for dev_foo() calls [media] em28xx: don't change the device's name [media] mn88472: fix chip id check on probe [media] mn88473: fix chip id check on probe [media] lirc: fix error paths in lirc_cdev_add() [media] s5p-mfc: Add support for MFC v8 available in Exynos 5433 SoCs [media] s5p-mfc: Rework clock handling [media] s5p-mfc: Don't keep clock prepared all the time [media] s5p-mfc: Kill all IS_ERR_OR_NULL in clocks management code [media] s5p-mfc: Remove dead conditional code [media] s5p-mfc: Ensure that clock is disabled before turning power off [media] s5p-mfc: Remove special clock rate management [media] s5p-mfc: Use printk_ratelimited for reporting ioctl errors [media] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES [media] vivid: Set color_enc on HSV formats [media] v4l2-tpg: Init hv_enc field with a valid value ...
2016-12-15ACPI/NUMA: Do not map pxm to node when NUMA is turned offBoris Ostrovsky
acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned off. So acpi_get_node() might return a node > 0, which is fatal when NUMA is disabled as the rest of the kernel assumes that only node 0 exists. Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: linux-ia64@vger.kernel.org Cc: catalin.marinas@arm.com Cc: rjw@rjwysocki.net Cc: will.deacon@arm.com Cc: linux-acpi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15x86/acpi: Use proper macro for invalid nodeBoris Ostrovsky
Use NUMA_NO_NODE instead of -1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: pavel@ucw.cz Link: http://lkml.kernel.org/r/1481570993-13941-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15x86/smpboot: Prevent false positive out of bounds cpumask access warningThomas Gleixner
prefill_possible_map() reinitializes the cpu_possible_map by setting the possible cpu bits and clearing all other bits up to NR_CPUS. This is technically always correct because cpu_possible_map is statically allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is initialized to NR_CPUS and only limited after the set/clear bit loops have been executed. But if the system was booted with "nr_cpus=N" on the command line, where N is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function before prefill_possible_map() is invoked. As a consequence the cpumask bounds check triggers when clearing the bits past nr_cpu_ids. Add a helper which allows to reset cpu_possible_map w/o the bounds check and then set only the possible bits which are well inside bounds. Reported-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: 0x7f454c46@gmail.com Cc: Jan Beulich <JBeulich@novell.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15mac80211: fix legacy and invalid rx-rate reportBen Greear
This fixes obtaining the rate info via sta_set_sinfo when the rx rate is invalid (for instance, on IBSS interface that has received no frames from one of its peers). Also initialize rinfo->flags for legacy rates, to not rely on the whole sinfo being initialized to zero. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15Merge branches 'work.namei', 'work.dcache' and 'work.iov_iter' into for-linusAl Viro
2016-12-14Merge tag 'xfs-for-linus-4.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs updates from Dave Chinner: "There is quite a varied bunch of stuff in this update, and some of it you will have already merged through the ext4 tree which imported the dax-4.10-iomap-pmd topic branch from the XFS tree. There is also a new direct IO implementation that uses the iomap infrastructure. It's much simpler, faster, and has lower IO latency than the existing direct IO infrastructure. Summary: - DAX PMD faults via iomap infrastructure - Direct-io support in iomap infrastructure - removal of now-redundant XFS inode iolock, replaced with VFS i_rwsem - synchronisation with fixes and changes in userspace libxfs code - extent tree lookup helpers - lots of little corruption detection improvements to verifiers - optimised CRC calculations - faster buffer cache lookups - deprecation of barrier/nobarrier mount options - we always use REQ_FUA/REQ_FLUSH where appropriate for data integrity now - cleanups to speculative preallocation - miscellaneous minor bug fixes and cleanups" * tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (63 commits) xfs: nuke unused tracepoint definitions xfs: use GPF_NOFS when allocating btree cursors xfs: use xfs_vn_setattr_size to check on new size xfs: deprecate barrier/nobarrier mount option xfs: Always flush caches when integrity is required xfs: ignore leaf attr ichdr.count in verifier during log replay xfs: use rhashtable to track buffer cache xfs: optimise CRC updates xfs: make xfs btree stats less huge xfs: don't cap maximum dedupe request length xfs: don't allow di_size with high bit set xfs: error out if trying to add attrs and anextents > 0 xfs: don't crash if reading a directory results in an unexpected hole xfs: complain if we don't get nextents bmap records xfs: check for bogus values in btree block headers xfs: forbid AG btrees with level == 0 xfs: several xattr functions can be void xfs: handle cow fork in xfs_bmap_trace_exlist xfs: pass state not whichfork to trace_xfs_extlist xfs: Move AGI buffer type setting to xfs_read_agi ...
2016-12-14printk: remove console flushing special cases for partial buffered linesLinus Torvalds
It actively hurts proper merging, and makes for a lot of special cases. There was a good(ish) reason for doing it originally, but it's getting too painful to maintain. And most of the original reasons for it are long gone. So instead of having special code to flush partial lines to the console (as opposed to the record buffers), do _all_ the console writing from the record buffer, and be done with it. If an oops happens (or some other synchronous event), we will flush the partial lines due to the oops printing activity, so this does not affect that. It does mean that if you have a completely hung machine, a partial preceding line may not have been printed out. That was some of the original reason for this complexity, in fact, back when we used to test for the historical i386 "halt" instruction problem by doing pr_info("Checking 'hlt' instruction... "); if (!boot_cpu_data.hlt_works_ok) { pr_cont("disabled\n"); return; } halt(); halt(); halt(); halt(); pr_cont("OK\n"); and that model no longer works (it the 'hlt' instruction kills the machine, the partial line won't have been flushed, so you won't even see it). Of course, that was also back in the days when people actually had textual console output rather than a graphical splash-screen at bootup. How times change.. Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Tested-by: Petr Mladek <pmladek@suse.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14printk: remove games with previous record flagsLinus Torvalds
The record logging code looks at the previous record flags in various ways, and they are all wrong. You can't use the previous record flags to determine anything about the next record, because they may simply not be related. In particular, the reason the previous record was a continuation record may well be exactly _because_ the new record was printed by a different process, which is why the previous record was flushed. So all those games are simply wrong, and make the code hard to understand (because the code fundamentally cdoes not make sense). So remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15vsock/virtio: fix src/dst cid formatMichael S. Tsirkin
These fields are 64 bit, using le32_to_cpu and friends on these will not do the right thing. Fix this up. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15vsock/virtio: mark an internal function staticMichael S. Tsirkin
virtio_transport_alloc_pkt is only used locally, make it static. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15vsock/virtio: add a missing __le annotationMichael S. Tsirkin
guest cid is read from config space, therefore it's in little endian format and is treated as such, annotate it accordingly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15vhost: add missing __user annotationsMichael S. Tsirkin
Several vhost functions were missing __user annotations on pointers, causing sparse warnings. Fix this up. sparse also warns about vhost_process_iotlb_msg which is local and should be static. Fix that up as well. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15vhost: make interval tree static inlineMichael S. Tsirkin
vhost_umem_interval_tree is only used locally within vhost.c, mark it static. As some functions generated go unused, this triggers warnings unless we also mark it inline. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15drm/virtio: annotate virtio_gpu_queue_ctrl_buffer_lockedMichael S. Tsirkin
virtio_gpu_queue_ctrl_buffer_locked is called with ctrlq.qlock taken, it releases and acquires this lock. This causes a sparse warning. Add appropriate annotations for sparse context checking. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15drm/virtio: fix lock context imbalanceMichael S. Tsirkin
When virtio_gpu_free_vbufs exits due to list empty, it does not drop the free_vbufs lock that it took. list empty is not expected to happen anyway, but it can't hurt to fix this and drop the lock. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15drm/virtio: fix endianness in primary_plane_updateMichael S. Tsirkin
virtio_gpu_cmd_transfer_to_host_2d expects x and y parameters in LE, but virtio_gpu_primary_plane_update passes in the CPU format instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15virtio_console: drop unused config fieldsMichael S. Tsirkin
struct ports_device includes a config field including the whole virtio_console_config, but only max_nr_ports in there is ever updated or used. The rest is unused and in fact does not even mirror the device config. Drop everything except max_nr_ports, saving some memory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2016-12-14Merge tag 'for-linus-4.10' of git://git.code.sf.net/p/openipmi/linux-ipmiLinus Torvalds
Pull IPMI updates from Corey Minyard: "Various small fixes for IPMI. Cleanups in the documentation and convertion printk() to pr_xxx() and removal of an unused module parameter. Some small bug fixes and enhancements. This also adds a post softdep from the IPMI core module to the IPMI device interface. Many people have complained that the device interface isn't automatically avaiable when IPMI is loaded. I don't want to make the device interface mandatory, though, plenty of people use IPMI internally (like with ACPI) and don't need a device interface or the added possible security entry. A softdep should make it work 'out of the box' but allow people to not have it if they don't want it" * tag 'for-linus-4.10' of git://git.code.sf.net/p/openipmi/linux-ipmi: ipmi: create hardware-independent softdep for ipmi_devintf ipmi: Fix sequence number handling ipmi: Pick up slave address from SMBIOS on an ACPI device ipmi_si: Clean up printks Move platform device creation earlier in the initialization ipmi: Update documentation ipmi_ssif: Remove an unused module parameter ipmi: Periodically check for events, not messages
2016-12-14logfs: remove from treeChristoph Hellwig
Logfs was introduced to the kernel in 2009, and hasn't seen any non drive-by changes since 2012, while having lots of unsolved issues including the complete lack of error handling, with more and more issues popping up without any fixes. The logfs.org domain has been bouncing from a mail, and the maintainer on the non-logfs.org domain hasn't repsonded to past queries either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-14Merge tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine updates from Vinod Koul: "Fairly routine update this time around with all changes specific to drivers: - New driver for STMicroelectronics FDMA - Memory-to-memory transfers on dw dmac - Support for slave maps on pl08x devices - Bunch of driver fixes to use dma_pool_zalloc - Bunch of compile and warning fixes spread across drivers" [ The ST FDMA driver already came in earlier through the remoteproc tree ] * tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (68 commits) dmaengine: sirf-dma: remove unused ‘sdesc’ dmaengine: pl330: remove unused ‘regs’ dmaengine: s3c24xx: remove unused ‘cdata’ dmaengine: stm32-dma: remove unused ‘src_addr’ dmaengine: stm32-dma: remove unused ‘dst_addr’ dmaengine: stm32-dma: remove unused ‘sfcr’ dmaengine: pch_dma: remove unused ‘cookie’ dmaengine: mic_x100_dma: remove unused ‘data’ dmaengine: img-mdc: remove unused ‘prev_phys’ dmaengine: usb-dmac: remove unused ‘uchan’ dmaengine: ioat: remove unused ‘res’ dmaengine: ioat: remove unused ‘ioat_dma’ dmaengine: ioat: remove unused ‘is_raid_device’ dmaengine: pl330: do not generate unaligned access dmaengine: k3dma: move to dma_pool_zalloc dmaengine: at_hdmac: move to dma_pool_zalloc dmaengine: at_xdmac: don't restore unsaved status dmaengine: ioat: set error code on failures dmaengine: ioat: set error code on failures dmaengine: DW DMAC: add multi-block property to device tree ...
2016-12-15virtio_ring: fix complaint by sparseGonglei
# make C=2 CF="-D__CHECK_ENDIAN__" ./drivers/virtio/ drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types) drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next drivers/virtio/virtio_ring.c:604:39: warning: incorrect type in initializer (different base types) drivers/virtio/virtio_ring.c:604:39: expected unsigned short [unsigned] [usertype] nextflag drivers/virtio/virtio_ring.c:604:39: got restricted __virtio16 drivers/virtio/virtio_ring.c:612:33: warning: restricted __virtio16 degrades to integer Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15virtio_pci_modern: fix complaint by sparseGonglei
drivers/virtio/virtio_pci_modern.c:66:40: warning: incorrect type in argument 2 (different base types) drivers/virtio/virtio_pci_modern.c:66:40: expected unsigned int [noderef] [usertype] <asn:2>*addr drivers/virtio/virtio_pci_modern.c:66:40: got restricted __le32 [noderef] [usertype] <asn:2>*lo drivers/virtio/virtio_pci_modern.c:67:33: warning: incorrect type in argument 2 (different base types) drivers/virtio/virtio_pci_modern.c:67:33: expected unsigned int [noderef] [usertype] <asn:2>*addr drivers/virtio/virtio_pci_modern.c:67:33: got restricted __le32 [noderef] [usertype] <asn:2>*hi drivers/virtio/virtio_pci_modern.c:150:32: warning: incorrect type in argument 2 (different base types) drivers/virtio/virtio_pci_modern.c:150:32: expected unsigned int [noderef] [usertype] <asn:2>*addr drivers/virtio/virtio_pci_modern.c:150:32: got restricted __le32 [noderef] <asn:2>*<noident> drivers/virtio/virtio_pci_modern.c:151:39: warning: incorrect type in argument 1 (different base types) drivers/virtio/virtio_pci_modern.c:151:39: expected unsigned int [noderef] [usertype] <asn:2>*addr drivers/virtio/virtio_pci_modern.c:151:39: got restricted __le32 [noderef] <asn:2>*<noident> drivers/virtio/virtio_pci_modern.c:152:32: warning: incorrect type in argument 2 (different base types) drivers/virtio/virtio_pci_modern.c:152:32: expected unsigned int [noderef] [usertype] <asn:2>*addr Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-14Merge tag 'modules-for-v4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.10 merge window: - The rodata= cmdline parameter has been extended to additionally apply to module mappings - Fix a hard to hit race between module loader error/clean up handling and ftrace registration - Some code cleanups, notably panic.c and modules code use a unified taint_flags table now. This is much cleaner than duplicating the taint flag code in modules.c" * tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: fix DEBUG_SET_MODULE_RONX typo module: extend 'rodata=off' boot cmdline parameter to module mappings module: Fix a comment above strong_try_module_get() module: When modifying a module's text ignore modules which are going away too module: Ensure a module's state is set accordingly during module coming cleanup code module: remove trailing whitespace taint/module: Clean up global and module taint flags handling modpost: free allocated memory
2016-12-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - a few misc things - kexec updates - DMA-mapping updates to better support networking DMA operations - IPC updates - various MM changes to improve DAX fault handling - lots of radix-tree changes, mainly to the test suite. All leading up to reimplementing the IDA/IDR code to be a wrapper layer over the radix-tree. However the final trigger-pulling patch is held off for 4.11. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) radix tree test suite: delete unused rcupdate.c radix tree test suite: add new tag check radix-tree: ensure counts are initialised radix tree test suite: cache recently freed objects radix tree test suite: add some more functionality idr: reduce the number of bits per level from 8 to 6 rxrpc: abstract away knowledge of IDR internals tpm: use idr_find(), not idr_find_slowpath() idr: add ida_is_empty radix tree test suite: check multiorder iteration radix-tree: fix replacement for multiorder entries radix-tree: add radix_tree_split_preload() radix-tree: add radix_tree_split radix-tree: add radix_tree_join radix-tree: delete radix_tree_range_tag_if_tagged() radix-tree: delete radix_tree_locate_item() radix-tree: improve multiorder iterators btrfs: fix race in btrfs_free_dummy_fs_info() radix-tree: improve dump output radix-tree: make radix_tree_find_next_bit more useful ...
2016-12-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO fixes from Jens Axboe: "A few fixes that I collected as post-merge. I was going to wait a bit with sending this out, but the O_DIRECT fix should really go in sooner rather than later" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: Fix failed allocation path when mapping queues blk-mq: Avoid memory reclaim when remapping queues block_dev: don't update file access position for sync direct IO nvme/pci: Log PCI_STATUS when the controller dies block_dev: don't test bdev->bd_contains when it is not stable
2016-12-14Merge branch 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull fs meta data unmap optimization from Jens Axboe: "A series from Jan Kara, providing a more efficient way for unmapping meta data from in the buffer cache than doing it block-by-block. Provide a general helper that existing callers can use" * 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block: fs: Remove unmap_underlying_metadata fs: Add helper to clean bdev aliases under a bh and use it ext2: Use clean_bdev_aliases() instead of iteration ext4: Use clean_bdev_aliases() instead of iteration direct-io: Use clean_bdev_aliases() instead of handmade iteration fs: Provide function to unmap metadata for a range of blocks
2016-12-14docs: add back 'Documentation/Changes' file (as symlink)Linus Torvalds
Jaegeuk Kim reports that the debian kernel package build gets confused by the lack of Documentation/Changes file. We also refer to that path name in ver_linux and various how-to files and Kconfig files. The file got renamed away in commit 186128f75392 ("docs-rst: add documents to development-process"), and as Jaegeuk Kim points out, the commit message for that change says "use symlinks instead of renames", but then the commit itself actually does renames after all. Maybe we should do the other files too, but for now this just adds the minimal symlink back to the historical name, so that people looking for Documentation/Changes will actually find what they are looking for, and the debian scripts continue to work. Reported-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix tree test suite: delete unused rcupdate.cMatthew Wilcox
This file was used to implement call_rcu() before liburcu implemented that function. It hasn't even been compiled since before the test suite was added to the kernel. Remove it to reduce confusion. Link: http://lkml.kernel.org/r/1481667692-14500-5-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix tree test suite: add new tag checkMatthew Wilcox
We have a check that setting a tag on a single entry at root succeeds, but we were missing a check that clearing a tag on that same entry also succeeds. Link: http://lkml.kernel.org/r/1481667692-14500-4-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: ensure counts are initialisedMatthew Wilcox
radix_tree_join() was freeing nodes with a non-zero ->exceptional count, and radix_tree_split() wasn't zeroing ->exceptional when it allocated the new node. Fix this by making all callers of radix_tree_node_alloc() pass in the new counts (and some other always-initialised fields), which will prevent the problem recurring if in future we decide to do something similar. Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix tree test suite: cache recently freed objectsMatthew Wilcox
The kmem_cache_alloc implementation simply allocates new memory from malloc() and calls the ctor, which zeroes out the entire object. This means it cannot spot bugs where the object isn't properly reinitialised before being freed. Add a small (11 objects) cache before freeing objects back to malloc. This is enough to let us write a test to catch it, although the memory allocator is now aware of the structure of the radix tree node, since it chains free objects through ->private_data (like the percpu cache does). Link: http://lkml.kernel.org/r/1481667692-14500-2-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix tree test suite: add some more functionalityMatthew Wilcox
IDR needs more functionality from the kernel: kmalloc()/kfree(), and xchg(). Link: http://lkml.kernel.org/r/1480369871-5271-67-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14idr: reduce the number of bits per level from 8 to 6Matthew Wilcox
In preparation for merging the IDR and radix tree, reduce the fanout at each level from 256 to 64. If this causes a performance problem then a bisect will point to this commit, and we'll have a better idea about what we might do to fix it. Link: http://lkml.kernel.org/r/1480369871-5271-66-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14rxrpc: abstract away knowledge of IDR internalsMatthew Wilcox
Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference to IDR_SIZE. Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14tpm: use idr_find(), not idr_find_slowpath()Matthew Wilcox
idr_find_slowpath() is not intended to be part of the public API, it's an implementation detail. There's no reason to skip straight to the slowpath here. Link: http://lkml.kernel.org/r/1480369871-5271-64-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Peter Huewe <peterhuewe@gmx.de> Cc: Marcel Selhorst <tpmdd@selhorst.net> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14idr: add ida_is_emptyMatthew Wilcox
Two of the USB Gadgets were poking around in the internals of struct ida in order to determine if it is empty. Add the appropriate abstraction. Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix tree test suite: check multiorder iterationMatthew Wilcox
The random iteration test only inserts order-0 entries currently. Update it to insert entries of order between 7 and 0. Also make the maximum index configurable, make some variables static, make the test duration variable, remove some useless spinning, and add a fifth thread which calls tag_tagged_items(). Link: http://lkml.kernel.org/r/1480369871-5271-62-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: fix replacement for multiorder entriesMatthew Wilcox
When replacing an entry with NULL, we need to delete any sibling entries. Also account deleting exceptional entries properly. Also fix a bug with radix_tree_iter_replace() where we would fail to remove entirely freed nodes. Also fix accounting bug when switching between normal and exceptional entries with replace_slot. Also add testcases for all these bugs. Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: add radix_tree_split_preload()Matthew Wilcox
Calculate how many nodes we need to allocate to split an old_order entry into multiple entries, each of size new_order. The test suite checks that we allocated exactly the right number of nodes; neither too many (checked by rtp->nr == 0), nor too few (checked by comparing nr_allocated before and after the call to radix_tree_split()). Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: add radix_tree_splitMatthew Wilcox
This new function splits a larger multiorder entry into smaller entries (potentially multi-order entries). These entries are initialised to RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't confused. The caller should then call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: add radix_tree_joinMatthew Wilcox
This new function allows for the replacement of many smaller entries in the radix tree with one larger multiorder entry. From the point of view of an RCU walker, they may see a mixture of the smaller entries and the large entry during the same walk, but they will never see NULL for an index which was populated before the join. Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: delete radix_tree_range_tag_if_tagged()Matthew Wilcox
This is an exceptionally complicated function with just one caller (tag_pages_for_writeback). We devote a large portion of the runtime of the test suite to testing this one function which has one caller. By introducing the new function radix_tree_iter_tag_set(), we can eliminate all of the complexity while keeping the performance. The caller can now use a fairly standard radix_tree_for_each() loop, and it doesn't need to worry about tricksy things like 'start' wrapping. The test suite continues to spend a large amount of time investigating this function, but now it's testing the underlying primitives such as radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator which are also used by other parts of the kernel. Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: delete radix_tree_locate_item()Matthew Wilcox
This rather complicated function can be better implemented as an iterator. It has only one caller, so move the functionality to the only place that needs it. Update the test suite to follow the same pattern. Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: improve multiorder iteratorsMatthew Wilcox
This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14btrfs: fix race in btrfs_free_dummy_fs_info()Matthew Wilcox
We drop the lock which protects the radix tree, so we must call radix_tree_iter_next() in order to avoid a modification to the tree invalidating the iterator state. Link: http://lkml.kernel.org/r/1480369871-5271-54-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: improve dump outputMatthew Wilcox
Print the indices of the entries as unsigned (instead of signed) integers and print the parent node of each entry to help navigate around larger trees where the layout is not quite so obvious. Print the indices covered by a node. Rearrange the order of fields printed so the indices and parents line up for each type of entry. Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14radix-tree: make radix_tree_find_next_bit more usefulMatthew Wilcox
Since this function is specialised to the radix tree, pass in the node and tag to calculate the address of the bitmap in radix_tree_find_next_bit() instead of the caller. Likewise, there is no need to pass in the size of the bitmap. Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>