Age | Commit message (Collapse) | Author |
|
There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol
let us actually simplify the code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
Call Trace:
flush_tlb_mm_range+0x68/0x140
tlb_flush_mmu.part.75+0x37/0xe0
tlb_finish_mmu+0x55/0x60
zap_page_range+0x142/0x190
SyS_madvise+0x3cd/0x9c0
system_call_fastpath+0x1c/0x21
swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.
This patch fixes it by checking conservatively a subset of events.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a header include guard just in case.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20190720115858.7015-1-yamada.masahiro@socionext.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Align in-parameter names for the declarations of pm_genpd_add|
remove_subdomain() and of_genpd_add_subdomain() according to their
implementations, as to improve consistency.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Complete the abstraction of the ww_mutex inside the reservation object.
This allows us to add more handling and debugging to the reservation
object in the future.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/320761/
|
|
SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq). Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq. A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch introduces a new request operation REQ_OP_ZONE_RESET_ALL.
This is useful for the applications like mkfs where it needs to reset
all the zones present on the underlying block device. As part for this
patch we also introduce new QUEUE_FLAG_ZONE_RESETALL which indicates the
queue zone reset all capability and corresponding helper macro.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
See also commit 8f4236d9008b ("block: remove QUEUE_FLAG_BYPASS and ->bypass") # v5.0.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make it clear to the compiler and also to humans that the functions
that query request queue properties do not modify any member of the
request_queue data structure.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.
In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.
Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.
So introduce it.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the "reserved_at_40" field to match the device specification.
Fixes: 84df61ebc69b ("net/mlx5: Add HW interfaces used by LAG")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
There are a lot of those warnings with GCC8+ 64-bit,
In file included from ./include/linux/sctp.h:42,
from net/core/skbuff.c:47:
./include/uapi/linux/sctp.h:395:1: warning: alignment 4 of 'struct
sctp_paddr_change' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:728:1: warning: alignment 4 of 'struct
sctp_setpeerprim' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:727:26: warning: 'sspp_addr' offset 4 in
'struct sctp_setpeerprim' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage sspp_addr;
^~~~~~~~~
./include/uapi/linux/sctp.h:741:1: warning: alignment 4 of 'struct
sctp_prim' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:740:26: warning: 'ssp_addr' offset 4 in
'struct sctp_prim' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage ssp_addr;
^~~~~~~~
./include/uapi/linux/sctp.h:792:1: warning: alignment 4 of 'struct
sctp_paddrparams' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:784:26: warning: 'spp_address' offset 4 in
'struct sctp_paddrparams' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage spp_address;
^~~~~~~~~~~
./include/uapi/linux/sctp.h:905:1: warning: alignment 4 of 'struct
sctp_paddrinfo' is less than 8 [-Wpacked-not-aligned]
} __attribute__((packed, aligned(4)));
^
./include/uapi/linux/sctp.h:899:26: warning: 'spinfo_address' offset 4
in 'struct sctp_paddrinfo' isn't aligned to 8 [-Wpacked-not-aligned]
struct sockaddr_storage spinfo_address;
^~~~~~~~~~~~~~
This is because the commit 20c9c825b12f ("[SCTP] Fix SCTP socket options
to work with 32-bit apps on 64-bit kernels.") added "packed, aligned(4)"
GCC attributes to some structures but one of the members, i.e, "struct
sockaddr_storage" in those structures has the attribute,
"aligned(__alignof__ (struct sockaddr *)" which is 8-byte on 64-bit
systems, so the commit overwrites the designed alignments for
"sockaddr_storage".
To fix this, "struct sockaddr_storage" needs to be aligned to 4-byte as
it is only used in those packed sctp structure which is part of UAPI,
and "struct __kernel_sockaddr_storage" is used in some other
places of UAPI that need not to change alignments in order to not
breaking userspace.
Use an implicit alignment for "struct __kernel_sockaddr_storage" so it
can keep the same alignments as a member in both packed and un-packed
structures without breaking UAPI.
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After introduce "mss_encode" field in the synproxy_options struct the field
"mss" is a little confusing. It has been renamed to "mss_option".
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/acpi/scan.c: document why we don't need the device_hotplug_lock
memremap: move from kernel/ to mm/
lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
asm-generic: fix -Wtype-limits compiler warnings
cgroup: kselftest: relax fs_spec checks
mm/memory_hotplug.c: remove unneeded return for void function
mm/migrate.c: initialize pud_entry in migrate_vma()
coredump: split pipe command whitespace before expanding template
page flags: prioritize kasan bits over last-cpuid
ubsan: build ubsan.c more conservatively
kasan: remove clang version check for KASAN_STACK
mm: compaction: avoid 100% CPU usage during compaction when a task is killed
mm: migrate: fix reference check race between __find_get_block() and migration
mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
ocfs2: remove set but not used variable 'last_hash'
Revert "kmemleak: allow to coexist with fault injection"
kernel/signal.c: fix a kernel-doc markup
|
|
Commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a
compilation warning because "rx_frag_size" is an "ushort" while
PAGE_SHIFT here is 16.
The commit changed the get_order() to be a multi-line macro where
compilers insist to check all statements in the macro even when
__builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
is a module parameter.
In file included from ./arch/powerpc/include/asm/page_64.h:107,
from ./arch/powerpc/include/asm/page.h:242,
from ./arch/powerpc/include/asm/mmu.h:132,
from ./arch/powerpc/include/asm/lppaca.h:47,
from ./arch/powerpc/include/asm/paca.h:17,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./arch/powerpc/include/asm/processor.h:39,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
(((n) < (1UL << PAGE_SHIFT)) ? 0 : \
^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
^~~~~~~~~
Fix it by moving all of this multi-line macro into a proper function,
and killing __get_order() off.
[akpm@linux-foundation.org: remove __get_order() altogether]
[cai@lca.pw: v2]
Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
Fixes: d66acc39c7ce ("bitops: Optimise get_order()")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: James Y Knight <jyknight@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ARM64 randdconfig builds regularly run into a build error, especially
when NUMA_BALANCING and SPARSEMEM are enabled but not SPARSEMEM_VMEMMAP:
#error "KASAN: not enough bits in page flags for tag"
The last-cpuid bits are already contitional on the available space, so
the result of the calculation is a bit random on whether they were
already left out or not.
Adding the kasan tag bits before last-cpuid makes it much more likely to
end up with a successful build here, and should be reliable for
randconfig at least, as long as that does not randomize NR_CPUS or
NODES_SHIFT but uses the defaults.
In order for the modified check to not trigger in the x86 vdso32 code
where all constants are wrong (building with -m32), enclose all the
definitions with an #ifdef.
[arnd@arndb.de: build fix]
Link: http://lkml.kernel.org/r/CAK8P3a3Mno1SWTcuAOT0Wa9VS15pdU6EfnkxLbDpyS55yO04+g@mail.gmail.com
Link: http://lkml.kernel.org/r/20190722115520.3743282-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190618095347.3850490-1-arnd@arndb.de/
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
i.MX8QXP contains Hifi4 DSP. There are four clocks
associated with DSP:
* dsp_lpcg_core_clk
* dsp_lpcg_ipg_clk
* dsp_lpcg_adb_aclk
* ocram_lpcg_ipg_clk
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Add the clock binding doc for i.MX8MN.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Maxime Ripard <maxime.ripard@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
Pull more drm fixes from Daniel Vetter:
"Dave sends his pull, everyone realizes they've been asleep at the
wheel and hits send on their own pulls :-/
Normally I'd just ignore these all because w/e for me and Dave. But
this time around the latecomers also included drm-intel-fixes, which
failed to send out a -fixes pull thus far for this release (screwed up
vacation coverage, despite that 2/3 maintainers were around ... they
all look appropriately guilty), and that really is overdue to get
landed.
And since I had to do a pull request anyway I pulled the other two
late ones too.
intel fixes (didn't have any ever since the main merge window pull):
- gvt fixes (2 cc: stable)
- fix gpu reset vs mm-shrinker vs wakeup fun (needed a few patches)
- two gem locking fixes (one cc: stable)
- pile of misc fixes all over with minor impact, 6 cc: stable, others
from this window
exynos:
- misc minor fixes
misc:
- some build/Kconfig fixes
- regression fix for vm scalability perf test which seems to mostly
exercise dmesg/console logging ...
- the vgem cache flush fix for arm64 broke the world on x86, so
that's reverted again
* tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm: (42 commits)
Revert "drm/vgem: fix cache synchronization on arm/arm64"
drm/exynos: fix missing decrement of retry counter
drm/exynos: add CONFIG_MMU dependency
drm/exynos: remove redundant assignment to pointer 'node'
drm/exynos: using dev_get_drvdata directly
drm/bochs: Use shadow buffer for bochs framebuffer console
drm/fb-helper: Instanciate shadow FB if configured in device's mode_config
drm/fb-helper: Map DRM client buffer only when required
drm/client: Support unmapping of DRM client buffers
drm/i915: Only recover active engines
drm/i915: Add a wakeref getter for iff the wakeref is already active
drm/i915: Lift intel_engines_resume() to callers
drm/vgem: fix cache synchronization on arm/arm64
drm/i810: Use CONFIG_PREEMPTION
drm/bridge: tc358764: Fix build error
drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m
drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.
drm/i915/gvt: grab runtime pm first for forcewake use
drm/i915/gvt: fix incorrect cache entry for guest page mapping
drm/i915/gvt: Checking workload's gma earlier
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a small cleanup
- a fix for a build error on ARM with some configs
- a fix of a patch for the Xen gntdev driver
- three patches for fixing a potential problem in the swiotlb-xen
driver which Konrad was fine with me carrying them through the Xen
tree
* tag 'for-linus-5.3a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: remember having called xen_create_contiguous_region()
xen/swiotlb: simplify range_straddles_page_boundary()
xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
xen: avoid link error on ARM
xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero()
xen/pciback: remove set but not used variable 'old_state'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes to four drivers with no core changes.
The mpt3sas one is theoretical until we get a CPU that goes up to 64
bits physical, the qla2xxx one fixes an oops in a driver
initialization error leg and the others are mostly cosmetic"
[ The fcoe patches may be worth highlighting - they may be "just"
cleanups, but they simplify and fix the odd fc_rport_priv structure
handling rules so that the new gcc-9 warnings about memset crossing
structure boundaries are gone.
The old code was hard for humans to understand too, and really
confused the compiler sanity checks - Linus ]
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix possible fcport null-pointer dereferences
scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
scsi: hpsa: remove printing internal cdb on tag collision
scsi: hpsa: correct scsi command status issue after reset
scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv
scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
scsi: libfc: Whitespace cleanup in libfc.h
|
|
Pull block fixes from Jens Axboe:
"Here's a small collection of fixes that should go into this series.
This contains:
- io_uring potential use-after-free fix (Jackie)
- loop regression fix (Jan)
- O_DIRECT fragmented bio regression fix (Damien)
- Mark Denis as the new floppy maintainer (Denis)
- ataflop switch fall-through annotation (Gustavo)
- libata zpodd overflow fix (Kees)
- libata ahci deferred probe fix (Miquel)
- nbd invalidation BUG_ON() fix (Munehisa)
- dasd endless loop fix (Stefan)"
* tag 'for-linus-20190802' of git://git.kernel.dk/linux-block:
s390/dasd: fix endless loop after read unit address configuration
block: Fix __blkdev_direct_IO() for bio fragments
MAINTAINERS: floppy: take over maintainership
nbd: replace kill_bdev() with __invalidate_device() again
ata: libahci: do not complain in case of deferred probe
io_uring: fix KASAN use after free in io_sq_wq_submit_work
loop: Fix mount(2) failure due to race with LOOP_SET_FD
libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
ataflop: Mark expected switch fall-through
|
|
Pull rdma fixes from Doug Ledford:
"Here's our second -rc pull request. Nothing particularly special in
this one. The client removal deadlock fix is kindy tricky, but we had
multiple eyes on it and no one could find a fault in it. A couple
Spectre V1 fixes too. Otherwise, all just normal -rc fodder:
- A couple Spectre V1 fixes (umad, hfi1)
- Fix a tricky deadlock in the rdma core code with refcounting
instead of locks (client removal patches)
- Build errors (hns)
- Fix a scheduling while atomic issue (mlx5)
- Use after free fix (mad)
- Fix error path return code (hns)
- Null deref fix (siw_crypto_hash)
- A few other misc. minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp()
RDMA/mlx5: Release locks during notifier unregister
IB/hfi1: Fix Spectre v1 vulnerability
IB/mad: Fix use-after-free in ib mad completion handling
RDMA/restrack: Track driver QP types in resource tracker
IB/mlx5: Fix MR registration flow to use UMR properly
RDMA/devices: Remove the lock around remove_client_context
RDMA/devices: Do not deadlock during client removal
IB/core: Add mitigation for Spectre V1
Do not dereference 'siw_crypto_shash' before checking
RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes
RDMA/hns: Fix build error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few fixes for code that came in during the merge window or that
started getting exercised differently this time around:
- Select regmap MMIO kconfig in spreadtrum driver to avoid compile
errors
- Complete kerneldoc on devm_clk_bulk_get_optional()
- Register an essential clk earlier on mediatek mt8183 SoCs so the
clocksource driver can use it
- Fix divisor math in the at91 driver
- Plug a race in Renesas reset control logic"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: renesas: cpg-mssr: Fix reset control race condition
clk: sprd: Select REGMAP_MMIO to avoid compile errors
clk: mediatek: mt8183: Register 13MHz clock earlier for clocksource
clk: Add missing documentation of devm_clk_bulk_get_optional() argument
clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
|
|
Add asic type.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This memory allocation flag will be used to indicate BOs containing
sensitive data that should not be leaked to other processes.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This notifies the driver that a BO is about to be released.
Releasing a BO also invokes the move_notify callback from
ttm_bo_cleanup_memtype_use, but that happens too late for anything
that would add fences to the BO and require a delayed delete.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
dev_groups added to struct driver
Persistent tag for others to pull this branch from
This is the first patch in a longer series that adds the ability for the
driver core to create and remove a list of attribute groups
automatically when the device is bound/unbound from a specific driver.
See:
https://lore.kernel.org/r/20190731124349.4474-2-gregkh@linuxfoundation.org
for details on this patch, and examples of how to use it in other
drivers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add the ability for the driver core to create and remove a list of
attribute groups automatically when the device is bound/unbound from a
specific driver.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Richard Gong <richard.gong@linux.intel.com>
Link: https://lore.kernel.org/r/20190731124349.4474-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a header include guard just in case.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
While looking at CONFIG_PREEMPT dependencies treewide the #ifdef in
crypto_yield() matched.
CONFIG_PREEMPT and CONFIG_PREEMPT_VOLUNTARY are mutually exclusive so the
extra !CONFIG_PREEMPT conditional is redundant.
cond_resched() has only an effect when CONFIG_PREEMPT_VOLUNTARY is set,
otherwise it's a stub which the compiler optimizes out.
Remove the whole conditional.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Function definitions in headers are usually marked as 'static inline'.
Since 'inline' is missing for crypto_reportstat(), if it were not
referenced from a .c file that includes this header, it would produce
a warning.
Also, 'struct crypto_user_alg' is not declared in this header.
I included <linux/crytouser.h> instead of adding the forward declaration
as suggested [1].
Detected by compile-testing this header as a standalone unit:
./include/crypto/internal/cryptouser.h:6:44: warning: ‘struct crypto_user_alg’ declared inside parameter list will not be visible outside of this definition or declaration
struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact);
^~~~~~~~~~~~~~~
./include/crypto/internal/cryptouser.h:11:12: warning: ‘crypto_reportstat’ defined but not used [-Wunused-function]
static int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, struct nlattr **attrs)
^~~~~~~~~~~~~~~~~
[1] https://lkml.org/lkml/2019/6/13/1121
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add header include guards in case they are included multiple times.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
AES GCM encryption allows for authsize values of 4, 8, and 12-16 bytes.
Validate the requested authsize, and retain it to save in the request
context.
Fixes: 36cf515b9bbe2 ("crypto: ccp - Enable support for AES GCM on v5 CCPs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The rcu_dereference_raw_notrace() API name is confusing. It is equivalent
to rcu_dereference_raw() except that it also does sparse pointer checking.
There are only a few users of rcu_dereference_raw_notrace(). This patches
renames all of them to be rcu_dereference_raw_check() with the "_check()"
indicating sparse checking.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Fix checkpatch warnings about parentheses. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
|
This adds the P_PIDFD type to waitid().
One of the last remaining bits for the pidfd api is to make it possible
to wait on pidfds. With P_PIDFD added to waitid() the parts of userspace
that want to use the pidfd api to exclusively manage processes can do so
now.
One of the things this will unblock in the future is the ability to make
it possible to retrieve the exit status via waitid(P_PIDFD) for
non-parent processes if handed a _suitable_ pidfd that has this feature
set. This is similar to what you can do on FreeBSD with kqueue(). It
might even end up being possible to wait on a process as a non-parent if
an appropriate property is enabled on the pidfd.
With P_PIDFD no scoping of the process identified by the pidfd is
possible, i.e. it explicitly blocks things such as wait4(-1), wait4(0),
waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics
equivalent to wait4(pid), waitid(P_PID). Users that need scoping should
rely on pid-based wait*() syscalls for now.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20190727222229.6516-2-christian@brauner.io
|
|
Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].
The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.
The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.
The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.
The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch.
1) Eli improves the handling of the support for QoS element type
2) Gavi refactors and prepares mlx5 flow counters for bulk allocation
support
3) Parav, refactors and improves E-Switch load/unload flows
4) Saeed, two misc cleanups
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Timer deletion on PREEMPT_RT is prone to priority inversion and live
locks. The hrtimer code has a synchronization mechanism for this. Posix CPU
timers will grow one.
But that mechanism cannot be invoked while holding the k_itimer lock
because that can deadlock against the running timer callback. So the lock
must be dropped which allows the timer to be freed.
The timer free can be prevented by taking RCU readlock before dropping the
lock, but because the rcu_head is part of the 'it' union a concurrent free
will overwrite the hrtimer on which the task is trying to synchronize.
Move the rcu_head out of the union to prevent this.
[ tglx: Fixed up kernel-doc. Rewrote changelog ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190730223828.965541887@linutronix.de
|
|
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling del_timer_sync() can lead to two issues:
- If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.
- If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.
To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If del_timer_sync() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.
This addresses both the priority inversion and the life lock issues.
This mechanism is not used for timers which are marked IRQSAFE as for those
preemption is disabled accross the callback and therefore this situation
cannot happen. The callbacks for such timers need to be individually
audited for RT compliance.
The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
del_timer_sync() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.
As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant
of del_timer_sync() needs to be used on UP as well.
[ tglx: Refactored it for mainline ]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de
|
|
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:
- If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.
- If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.
To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.
This addresses both the priority inversion and the life lock issues.
The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.
[ tglx: Refactored it for mainline ]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.737767218@linutronix.de
|
|
hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer
which is marker for softirq expiry is not queued in the hard interrupt base
and vice versa.
When PREEMPT_RT is enabled, timers which are not explicitely marked to
expire in hard interrupt context are deferrred to the soft interrupt. So
the regular check would trigger.
Change the check, so when PREEMPT_RT is enabled, it is verified that the
timers marked for hard interrupt expiry are not tried to be queued for soft
interrupt expiry or any of the unmarked and softirq marked is tried to be
expired in hard interrupt context.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
First reserved field is off by one instead of reserved_at_1 it should be
reserved_at_2, fix that.
Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add a handle to invoke the new FW capability of allocating a bulk of
flow counters.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Towards introducing the ability to allocate bulks of flow counters,
refactor the flow counter bulk query process, removing functions and
structs whose names indicated being used for flow counter bulk
allocation FW commands, despite them actually only being used to
support bulk querying, and migrate their functionality to correctly
named functions in their natural location, fs_counters.c.
Additionally, optimize the bulk query process by:
* Extracting the memory used for the query to mlx5_fc_stats so
that it is only allocated once, and not for each bulk query.
* Querying all the counters in one function call.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Due to the complexity of client->remove() callbacks it is desirable to not
hold any locks while calling them. Remove the last one by tracking only
the highest client ID and running backwards from there over the xarray.
Since the only purpose of that lock was to protect the linked list, we can
drop the lock.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
lockdep reports:
WARNING: possible circular locking dependency detected
modprobe/302 is trying to acquire lock:
0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
but task is already holding lock:
000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&device->client_data_rwsem){++++}:
down_read+0x3f/0x160
ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
cm_process_work+0x29/0x110 [ib_cm]
cm_req_handler+0x10f5/0x1c00 [ib_cm]
cm_work_handler+0x54c/0x311d [ib_cm]
process_one_work+0x4aa/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30
-> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
process_one_work+0x45f/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30
-> #0 ((wq_completion)ib_cm){+.+.}:
lock_acquire+0xc8/0x1d0
flush_workqueue+0x102/0x990
cm_remove_one+0x30e/0x3c0 [ib_cm]
remove_client_context+0x94/0xd0 [ib_core]
disable_device+0x10a/0x1f0 [ib_core]
__ib_unregister_device+0x5a/0xe0 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
__mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
mlx5_remove_device+0x144/0x150 [mlx5_core]
mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
__x64_sys_delete_module+0x227/0x350
do_syscall_64+0xc3/0x6a4
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Which is due to the read side of the client_data_rwsem being obtained
recursively through a work queue flush during cm client removal.
The lock is being held across the remove in remove_client_context() so
that the function is a fence, once it returns the client is removed. This
is required so that the two callers do not proceed with destruction until
the client completes removal.
Instead of using client_data_rwsem use the existing device unregistration
refcount and add a similar client unregistration (client->uses) refcount.
This will fence the two unregistration paths without holding any locks.
Cc: <stable@vger.kernel.org>
Fixes: 921eab1143aa ("RDMA/devices: Re-organize device.c locking")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|