linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-09-06	x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large ↵	Steve Wahl
	to fix kexec relocation errors The last change to this Makefile caused relocation errors when loading a kdump kernel. Restore -mcmodel=large (not -mcmodel=kernel), -ffreestanding, and -fno-zero-initialized-bsss, without reverting to the former practice of resetting KBUILD_CFLAGS. Purgatory.ro is a standalone binary that is not linked against the rest of the kernel. Its image is copied into an array that is linked to the kernel, and from there kexec relocates it wherever it desires. With the previous change to compiler flags, the error "kexec: Overflow in relocation type 11 value 0x11fffd000" was encountered when trying to load the crash kernel. This is from kexec code trying to relocate the purgatory.ro object. From the error message, relocation type 11 is R_X86_64_32S. The x86_64 ABI says: "The R_X86_64_32 and R_X86_64_32S relocations truncate the computed value to 32-bits. The linker must verify that the generated value for the R_X86_64_32 (R_X86_64_32S) relocation zero-extends (sign-extends) to the original 64-bit value." This type of relocation doesn't work when kexec chooses to place the purgatory binary in memory that is not reachable with 32 bit addresses. The compiler flag -mcmodel=kernel allows those type of relocations to be emitted, so revert to using -mcmodel=large as was done before. Also restore the -ffreestanding and -fno-zero-initialized-bss flags because they are appropriate for a stand alone piece of object code which doesn't explicitly zero the bss, and one other report has said undefined symbols are encountered without -ffreestanding. These identical compiler flag changes need to happen for every object that becomes part of the purgatory.ro object, so gather them together first into PURGATORY_CFLAGS_REMOVE and PURGATORY_CFLAGS, and then apply them to each of the objects that have C source. Do not apply any of these flags to kexec-purgatory.o, which is not part of the standalone object but part of the kernel proper. Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Tested-by: Andreas Smas <andreas@lonelycoder.com> Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: None Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: clang-built-linux@googlegroups.com Cc: dimitri.sivanich@hpe.com Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Fixes: b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS") Link: https://lkml.kernel.org/r/20190905202346.GA26595@swahl-linux Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into ↵	Dave Airlie
	drm-next single etnaviv fix for an error path. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/4ae00cfb47c8e6fffca5dbb45ae9370cd4e5eaf4.camel@pengutronix.de
2019-09-06	Merge tag 'drm-next-5.4-2019-08-30' of ↵	Dave Airlie
	git://people.freedesktop.org/~agd5f/linux into drm-next drm-next-5.4-2019-08-30: amdgpu: - Add DC support for Renoir - Add some GPUVM hw bug workarounds - add support for the smu11 i2c controller - GPU reset vram lost bug fixes - Navi1x powergating fixes - Navi12 power fixes - Renoir power fixes - Misc bug fixes and cleanups Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190830212650.5055-1-alexander.deucher@amd.com
2019-09-06	Merge tag 'drm-misc-fixes-2019-09-05' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.3 final: - Make ingenic panel type DPI insteado f unknown. - Fixes for command line parser modes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/606d87b2-1840-c893-eb30-d6c471c9e50a@linux.intel.com
2019-09-06	Merge branch 'vmwgfx-fixes-5.3' of ↵	Dave Airlie
	git://people.freedesktop.org/~thomash/linux into drm-fixes Single vmwgfx double free fix. Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-09-06	perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization	Mark-PK Tsai
	If we disable the compiler's auto-initialization feature, if -fplugin-arg-structleak_plugin-byref or -ftrivial-auto-var-init=pattern are disabled, arch_hw_breakpoint may be used before initialization after: 9a4903dde2c86 ("perf/hw_breakpoint: Split attribute parse and commit") On our ARM platform, the struct step_ctrl in arch_hw_breakpoint, which used to be zero-initialized by kzalloc(), may be used in arch_install_hw_breakpoint() without initialization. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alix Wu <alix.wu@mediatek.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: YJ Chiang <yj.chiang@mediatek.com> Link: https://lkml.kernel.org/r/20190906060115.9460-1-mark-pk.tsai@mediatek.com [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	iio: hid-sensor-attributes: Fix divisions for 32-bit platforms	Andy Shevchenko
	The commit 473d12f7638c ("iio: hid-sensor-attributes: Convert to use int_pow()") converted to use generic int_pow() helper. Though, the generic one returns 64-bit value and, in cases when it is used as divisor, it compels 64-bit division from compiler. In order to fix this, introduce a temporary 32-bit variable to hold the result of int_pow() and use it as divisor afterwards. In couple of cases, replace int_pow() with a predefined unit factors for time and frequency. Fixes: 473d12f7638c ("iio: hid-sensor-attributes: Convert to use int_pow()") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20190905112759.13035-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-06	drm/i915/gvt: update RING_START reg of vGPU when the context is submitted to ↵	Weinan Li
	i915 The guest may use this register to identify the running state of one context. Emulate it as the value in context image as if the context runs on the GPU hardware. Signed-off-by: Weinan Li <weinan.z.li@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-09-06	drm/i915/gvt: update vgpu workload head pointer correctly	Xiaolin Zhang
	when creating a vGPU workload, the guest context head pointer should be updated correctly by comparing with the exsiting workload in the guest worklod queue including the current running context. in some situation, there is a running context A and then received 2 new vGPU workload context B and A. in the new workload context A, it's head pointer should be updated with the running context A's tail. v2: walk through guest workload list in backward way. Cc: stable@vger.kernel.org Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-09-06	Merge tag 'misc-habanalabs-next-2019-09-05' of ↵	Greg Kroah-Hartman
	git://people.freedesktop.org/~gabbayo/linux into char-misc-next Oded writes: This tag contains the following changes for kernel 5.4: - Create an additional char device per PCI device. The new char device allows any application to query the device for stats, information, idle state and more. This is needed to support system/monitoring applications, while also allowing the deep-learning application to send work to the ASIC through the main (original) char device. - Fix possible kernel crash in case user supplies a smaller-than-required buffer to the DEBUG IOCTL. - Expose the device to userspace only after initialization was done, to prevent a race between the initialization and user submitting workloads. - Add uapi, as part of INFO IOCTL, to allow user to query the device utilization rate. - Add uapi, as part of INFO IOCTL, to allow user to retrieve aggregate H/W events, i.e. counting H/W events from the loading of the driver. - Register to the HWMON subsystem with the board's name, to allow the user to prepare a custom sensor file per board. - Use correct macros for endian swapping. - Improve error printing in multiple places. - Small bug fixes. * tag 'misc-habanalabs-next-2019-09-05' of git://people.freedesktop.org/~gabbayo/linux: (30 commits) habanalabs: correctly cast variable to __le32 habanalabs: show correct id in error print habanalabs: stop using the acronym KMD habanalabs: display card name as sensors header habanalabs: add uapi to retrieve aggregate H/W events habanalabs: add uapi to retrieve device utilization habanalabs: Make the Coresight timestamp perpetual habanalabs: explicitly set the queue-id enumerated numbers habanalabs: print to kernel log when reset is finished habanalabs: replace __le32_to_cpu with le32_to_cpu habanalabs: replace __cpu_to_le32/64 with cpu_to_le32/64 habanalabs: Handle HW_IP_INFO if device disabled or in reset habanalabs: Expose devices after initialization is done habanalabs: improve security in Debug IOCTL habanalabs: use default structure for user input in Debug IOCTL habanalabs: Add descriptive name to PSOC app status register habanalabs: Add descriptive names to PSOC scratch-pad registers habanalabs: create two char devices per ASIC habanalabs: change device_setup_cdev() to be more generic habanalabs: maintain a list of file private data objects ...
2019-09-06	x86/platform/uv: Fix kmalloc() NULL check routine	Austin Kim
	The result of kmalloc() should have been checked ahead of below statement: pqp = (struct bau_pq_entry *)vp; Move BUG_ON(!vp) before above statement. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <mike.travis@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: allison@lohutok.net Cc: andy@infradead.org Cc: armijn@tjaldur.nl Cc: bp@alien8.de Cc: dvhart@infradead.org Cc: gregkh@linuxfoundation.org Cc: hpa@zytor.com Cc: kjlu@umn.edu Cc: platform-driver-x86@vger.kernel.org Link: https://lkml.kernel.org/r/20190905232951.GA28779@LGEARND20B15 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	Merge tag 'v5.3-rc7' into x86/platform, to refresh the branch	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	x86/cpu: Update init data for new Airmont CPU model	Rahul Tanwar
	Update properties for newly added Airmont CPU variant. Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190905193020.14707-5-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	x86/cpu: Add new Airmont variant to Intel family	Rahul Tanwar
	Add new Airmont variant CPU model to Intel family. Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190905193020.14707-4-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	x86/cpu: Add Elkhart Lake to Intel family	Gayatri Kammela
	Add the model number/CPUID of atom based Elkhart Lake to the Intel family. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190905193020.14707-3-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	x86/cpu: Add Tiger Lake to Intel family	Gayatri Kammela
	Add the model numbers/CPUIDs of Tiger Lake mobile and desktop to the Intel family. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190905193020.14707-2-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06	Merge branch 'x86/cleanups' into x86/cpu, to pick up dependent changes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-05	xfs: push the grant head when the log head moves forward	Dave Chinner
	When the log fills up, we can get into the state where the outstanding items in the CIL being committed and aggregated are larger than the range that the reservation grant head tail pushing will attempt to clean. This can result in the tail pushing range being trimmed back to the the log head (l_last_sync_lsn) and so may not actually move the push target at all. When the iclogs associated with the CIL commit finally land, the log head moves forward, and this removes the restriction on the AIL push target. However, if we already have transactions sleeping on the grant head, and there's nothing in the AIL still to flush from the current push target, then nothing will move the tail of the log and trigger a log reservation wakeup. Hence the there is nothing that will trigger xlog_grant_push_ail() to recalculate the AIL push target and start pushing on the AIL again to write back the metadata objects that pin the tail of the log and hence free up space and allow the transaction reservations to be woken and make progress. Hence we need to push on the grant head when we move the log head forward, as this may be the only trigger we have that can move the AIL push target forwards in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: push iclog state cleaning into xlog_state_clean_log	Dave Chinner
	xlog_state_clean_log() is only called from one place, and it occurs when an iclog is transitioning back to ACTIVE. Prior to calling xlog_state_clean_log, the iclog we are processing has a hard coded state check to DIRTY so that xlog_state_clean_log() processes it correctly. We also have a hard coded wakeup after xlog_state_clean_log() to enfore log force waiters on that iclog are woken correctly. Both of these things are operations required to finish processing an iclog and return it to the ACTIVE state again, so they make little sense to be separated from the rest of the clean state transition code. Hence push these things inside xlog_state_clean_log(), document the behaviour and rename it xlog_state_clean_iclog() to indicate that it's being driven by an iclog state change and does the iclog state change work itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: factor iclog state processing out of xlog_state_do_callback()	Dave Chinner
	The iclog IO completion state processing is somewhat complex, and because it's inside two nested loops it is highly indented and very hard to read. Factor it out, flatten the logic flow and clean up the comments so that it much easier to see what the code is doing both in processing the individual iclogs and in the over xlog_state_do_callback() operation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: factor callbacks out of xlog_state_do_callback()	Dave Chinner
	Simplify the code flow by lifting the iclog callback work out of the main iclog iteration loop. This isolates the log juggling and callbacks from the iclog state change logic in the loop. Note that the loopdidcallbacks variable is not actually tracking whether callbacks are actually run - it is tracking whether the icloglock was dropped during the loop and so determines if we completed the entire iclog scan loop atomically. Hence we know for certain there are either no more ordered completions to run or that the next completion will run the remaining ordered iclog completions. Hence rename that variable appropriately for it's function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: factor debug code out of xlog_state_do_callback()	Dave Chinner
	Start making this function readable by lifting the debug code into a conditional function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: prevent CIL push holdoff in log recovery	Dave Chinner
	generic/530 on a machine with enough ram and a non-preemptible kernel can run the AGI processing phase of log recovery enitrely out of cache. This means it never blocks on locks, never waits for IO and runs entirely through the unlinked lists until it either completes or blocks and hangs because it has run out of log space. It runs out of log space because the background CIL push is scheduled but never runs. queue_work() queues the CIL work on the current CPU that is busy, and the workqueue code will not run it on any other CPU. Hence if the unlinked list processing never yields the CPU voluntarily, the push work is delayed indefinitely. This results in the CIL aggregating changes until all the log space is consumed. When the log recoveyr processing evenutally blocks, the CIL flushes but because the last iclog isn't submitted for IO because it isn't full, the CIL flush never completes and nothing ever moves the log head forwards, or indeed inserts anything into the tail of the log, and hence nothing is able to get the log moving again and recovery hangs. There are several problems here, but the two obvious ones from the trace are that: a) log recovery does not yield the CPU for over 4 seconds, b) binding CIL pushes to a single CPU is a really bad idea. This patch addresses just these two aspects of the problem, and are suitable for backporting to work around any issues in older kernels. The more fundamental problem of preventing the CIL from consuming more than 50% of the log without committing will take more invasive and complex work, so will be done as followup work. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: fix missed wakeup on l_flush_wait	Rik van Riel
	The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker. Doing the wakeup after releasing the spinlock opens up the following race condition: Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable. Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE. This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks. [dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.] Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: push the AIL in xlog_grant_head_wake	Dave Chinner
	In the situation where the log is full and the CIL has not recently flushed, the AIL push threshold is throttled back to the where the last write of the head of the log was completed. This is stored in log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space pinned by flushes and/or aggregation in progress, we can get the situation where the head of the log lags a long way behind the reservation grant head. When this happens, the AIL push target is trimmed back from where the reservation grant head wants to push the log tail to, back to where the head of the log currently is. This means the push target doesn't reach far enough into the log to actually move the tail before the transaction reservation goes to sleep. When the CIL push completes, it moves the log head forward such that the AIL push target can now be moved, but that has no mechanism for puhsing the log tail. Further, if the next tail movement of the log is not large enough wake the waiter (i.e. still not enough space for it to have a reservation granted), we don't wake anything up, and hence we do not update the AIL push target to take into account the head of the log moving and allowing the push target to be moved forwards. To avoid this particular condition, if we fail to wake the first waiter on the grant head because we don't have enough space, push on the AIL again. This will pick up any movement of the log head and allow the push target to move forward due to completion of CIL pushing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	xfs: Use WARN_ON_ONCE for bailout mount-operation	Austin Kim
	If the CONFIG_BUG is enabled, BUG is executed and then system is crashed. However, the bailout for mount is no longer proceeding. Using WARN_ON_ONCE rather than BUG can prevent this situation. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-05	sd: Set ELEVATOR_F_ZBD_SEQ_WRITE for ZBC disks	Damien Le Moal
	Using the helper blk_queue_required_elevator_features(), set the elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request queue of SCSI ZBC disks. This feature requirement can always be satisfied as the mq-deadline elevator is always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Set ELEVATOR_F_ZBD_SEQ_WRITE for nullblk zoned disks	Damien Le Moal
	Using the helper blk_queue_required_elevator_features(), set the elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request queue of null_blk devices created with zoned mode enabled. This feature requirement can always be satisfied as the mq-deadline elevator is always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Delay default elevator initialization	Damien Le Moal
	When elevator_init_mq() is called from blk_mq_init_allocated_queue(), the only information known about the device is the number of hardware queues as the block device scan by the device driver is not completed yet for most drivers. The device type and elevator required features are not set yet, preventing to correctly select the default elevator most suitable for the device. This currently affects all multi-queue zoned block devices which default to the "none" elevator instead of the required "mq-deadline" elevator. These drives currently include host-managed SMR disks connected to a smartpqi HBA and null_blk block devices with zoned mode enabled. Upcoming NVMe Zoned Namespace devices will also be affected. Fix this by adding the boolean elevator_init argument to blk_mq_init_allocated_queue() to control the execution of elevator_init_mq(). Two cases exist: 1) elevator_init = false is used for calls to blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this case, a call to elevator_init_mq() is added to __device_add_disk(), resulting in the delayed initialization of the queue elevator after the device driver finished probing the device information. This effectively allows elevator_init_mq() access to more information about the device. 2) elevator_init = true preserves the current behavior of initializing the elevator directly from blk_mq_init_allocated_queue(). This case is used for the special request based DM devices where the device gendisk is created before the queue initialization and device information (e.g. queue limits) is already known when the queue initialization is executed. Additionally, to make sure that the elevator initialization is never done while requests are in-flight (there should be none when the device driver calls device_add_disk()), freeze and quiesce the device request queue before calling blk_mq_init_sched() in elevator_init_mq(). Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Improve default elevator selection	Damien Le Moal
	For block devices that do not specify required features, preserve the current default elevator selection (mq-deadline for single queue devices, none for multi-queue devices). However, for devices specifying required features (e.g. zoned block devices ELEVATOR_F_ZBD_SEQ_WRITE feature), select the first available elevator providing the required features. In all cases, default to "none" if no elevator is available or if the initialization of the default elevator fails. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Introduce elevator features	Damien Le Moal
	Introduce the definition of elevator features through the elevator_features flags in the elevator_type structure. Each flag can represent a feature supported by an elevator. The first feature defined by this patch is support for zoned block device sequential write constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented by the mq-deadline elevator using zone write locking. Other possible features are IO priorities, write hints, latency targets or single-LUN dual-actuator disks (for which the elevator could maintain one LBA ordered list per actuator). The required_elevator_features field is also added to the request_queue structure to allow a device driver to specify elevator feature flags that an elevator must support for the correct operation of the device (e.g. device drivers for zoned block devices can have the ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature). The helper function blk_queue_required_elevator_features() is defined for setting this new field. With these two new fields in place, the elevator functions elevator_match() and elevator_find() are modified to allow a user to set only an elevator with a set of features that satisfies the device required features. Elevators not matching the device requirements are not shown in the device sysfs queue/scheduler file to prevent their use. The "none" elevator can always be selected as before. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Change elevator_init_mq() to always succeed	Damien Le Moal
	If the default elevator chosen is mq-deadline, elevator_init_mq() may return an error if mq-deadline initialization fails, leading to blk_mq_init_allocated_queue() returning an error, which in turn will cause the block device initialization to fail and the device not being exposed. Instead of taking such extreme measure, handle mq-deadline initialization failures in the same manner as when mq-deadline is not available (no module to load), that is, default to the "none" scheduler. With this change, elevator_init_mq() return type can be changed to void. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	block: Cleanup elevator_init_mq() use	Damien Le Moal
	Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before calling elevator_init_mq() to make sure that the queue supports IO scheduling, use the elevator.c function elv_support_iosched() in elevator_init_mq(). This does not introduce any functional change but ensure that elevator_init_mq() does the right thing based on the queue settings. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05	Input: sidewinder - make array seq static const, makes object smaller	Colin Ian King
	Don't populate the array seq on the stack but instead make it static const. Makes the object code smaller by 30 bytes. Before: text data bss dec hex filename 22284 3184 0 25468 637c drivers/input/joystick/sidewinder.o After: text data bss dec hex filename 22158 3280 0 25438 635e drivers/input/joystick/sidewinder.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-09-05	Input: reset device timestamp on sync	Dmitry Torokhov
	We need to reset input device's timestamp on input_sync(), otherwise drivers not using input_set_timestamp() will end up with a stale timestamp after their clients consume first input event. Fixes: 3b51c44bd693 ("Input: allow drivers specify timestamp for input events") Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-09-05	libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check	Aneesh Kumar K.V
	Architectures have different page size than 4K. Use the PAGE_SIZE to make sure ranges are correctly aligned. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-7-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	libnvdimm/label: Remove the dpa align check	Aneesh Kumar K.V
	There's no strict requirement why slot_valid() needs to check for page alignment and it would seem to actively hurt cross-page-size compatibility. Let's delete the check and rely on checksum validation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-6-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock	Aneesh Kumar K.V
	This is needed so that pmem probe don't wrongly initialize a namespace which doesn't have enough space reserved for holding struct pages with the current kernel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-5-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page ↵	Aneesh Kumar K.V
	size change Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve block area. We need to make sure we account for the right struct page size while doing this. Instead of directly depending on sizeof(struct page) which can change based on different kernel config option, use the max struct page size (64) while calculating the reserve block area. This makes sure pmem device can be used across kernels built with different configs. If the above assumption of max struct page size change, we need to update the reserve block allocation space for new namespaces created. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	libnvdimm/pmem: Advance namespace seed for specific probe errors	Aneesh Kumar K.V
	In order to support marking namespaces with unsupported feature/versions disabled, nvdimm core should advance the namespace seed on these probe failures. Otherwise, these failed namespaces will be considered a seed namespace and will be wrongly used while creating new namespaces. Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace initialization failures due to pfn superblock feature/version mismatch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	libnvdimm/region: Rewrite _probe_success() to _advance_seeds()	Dan Williams
	The nd_region_probe_success() helper collides seed management with nvdimm->busy tracking. Given the 'busy' increment is handled internal to the nd_region driver 'probe' path move the decrement to the 'remove' path. With that cleanup the routine can be renamed to the more descriptive nd_region_advance_seeds(). The change is prompted by an incoming need to optionally advance the seeds on other events besides 'probe' success. Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-2-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05	PCI: Add ACS quirk for iProc PAXB	Abhinav Ratna
	iProc PAXB Root Ports don't advertise an ACS capability, but they do not allow peer-to-peer transactions between Root Ports. Add an ACS quirk so each Root Port can be in a separate IOMMU group. [bhelgaas: commit log, comment, use common implementation style] Link: https://lore.kernel.org/r/1566275985-25670-1-git-send-email-srinath.mannam@broadcom.com Signed-off-by: Abhinav Ratna <abhinav.ratna@broadcom.com> Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Scott Branden <scott.branden@broadcom.com>
2019-09-06	parisc: Save some bytes in dino driver	Helge Deller
	Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-05	PCI: Force trailing new line to resource_alignment_param in sysfs	Logan Gunthorpe
	When 'pci=resource_alignment=' is specified on the command line, there is no trailing new line. Then, when it's read through the corresponding sysfs attribute, there will be no newline and a cat command will not show correctly in a shell. If the parameter is set through sysfs a new line will be stored and it will 'cat' correctly. To solve this, append a new line character in the show function if one does not already exist. Link: https://lore.kernel.org/r/20190822161013.5481-4-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-09-05	PCI: Move pci_[get\|set]_resource_alignment_param() into their callers	Logan Gunthorpe
	Both the functions pci_get_resource_alignment_param() and pci_set_resource_alignment_param() are now only called in one place: resource_alignment_show() and resource_alignment_store() respectively. There is no value in this extra set of functions so move both into their callers respectively. [bhelgaas: fold in "GFP_KERNEL while atomic" fix from Christoph Hellwig <hch@infradead.org> https://lore.kernel.org/r/20190902075006.GB754@infradead.org] Link: https://lore.kernel.org/r/20190822161013.5481-3-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-09-05	net/mlx5e: Add port buffer's congestion counters	Aya Levin
	Add 3 counters per priority to ethtool using PPCNT: 1) rx_prio[p]_buf_discard - the number of packets discarded by device due to lack of per host receive buffers 2) rx_prio[p]_cong_discard - the number of packets discarded by device due to per host congestion 3) rx_prio[p]_marked - the number of packets ECN marked by device due to per host congestion Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-05	net/mlx5: Expose HW capability bits for port buffer per priority congestion ↵	Aya Levin
	counters Map capability bit indicating that HCA supports port buffer's congestion counters. Also map registers with the corresponding counters. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-05	net/mlx5: DR, Remove redundant dev_name print from err log	Saeed Mahameed
	mlx5_core_err already prints the name of the device. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-05	net/mlx5: DR, Fix error return code in dr_domain_init_resources()	Wei Yongjun
	Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: 4ec9e7b02697 ("net/mlx5: DR, Expose steering domain functionality") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-05	net/mlx5: DR, Remove useless set memory to zero use memset()	Wei Yongjun
	The memory return by kzalloc() has already be set to zero, so remove useless memset(0). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>