Age | Commit message (Collapse) | Author |
|
The DISCONTIGMEM support was marked as deprecated in v5.2 and since there
were no complaints about it for almost 5 releases it can be completely
removed.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200223094322.15206-1-rppt@kernel.org
|
|
Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
a circular include problem. In order to avoid this, let's move
it to kvm_mmu.h, where it will be a better fit anyway.
In the process, drop the __hyp_text annotation, which doesn't help
as the function is marked as __always_inline.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
advertise the supported Stage-2 page sizes.
Let's check this at boot time.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
AMD's next generation of EPYC processors support the MPK (Memory
Protection Keys) feature. Update the dependency and documentation.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/159068199556.26992.17733929401377275140.stgit@naples-babu.amd.com
|
|
The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
to archs where they work") caused that bpf_probe_read{, str}() functions
were not longer available on architectures where the same logical address
might have different content in kernel and user memory mapping. These
architectures should use probe_read_{user,kernel}_str helpers.
For backward compatibility, the problematic functions are still available
on architectures where the user and kernel address spaces are not
overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
At the moment, these backward compatible functions are enabled only on x86_64,
arm, and arm64. Let's do it also on powerpc that has the non overlapping
address space as well.
Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/20200527122844.19524-1-pmladek@suse.com
|
|
Baikal-T1 SoC provides an embedded process, voltage and temperature
sensor to monitor an internal SoC environment (chip temperature, supply
voltage and process monitor) and on time detect critical situations,
which may cause the system instability and even damages. The IP-block
is based on the Analog Bits PVT sensor, but is equipped with a
dedicated control wrapper, which provides a MMIO registers-based access
to the sensor core functionality (APB3-bus based) and exposes an
additional functions like thresholds/data ready interrupts, its status
and masks, measurements timeout. All of these is used to create a hwmon
driver being added to the kernel by this commit.
The driver implements support for the hardware monitoring capabilities
of Baikal-T1 process, voltage and temperature sensors. PVT IP-core
consists of one temperature and four voltage sensors, each of which is
implemented as a dedicated hwmon channel config.
The driver can optionally provide the hwmon alarms for each sensor the
PVT controller supports. The alarms functionality is made compile-time
configurable due to the hardware interface implementation peculiarity,
which is connected with an ability to convert data from only one sensor
at a time. Additional limitation is that the controller performs the
thresholds checking synchronously with the data conversion procedure.
Due to these limitations in order to have the hwmon alarms
automatically detected the driver code must switch from one sensor to
another, read converted data and manually check the threshold status
bits. Depending on the measurements timeout settings this design may
cause additional burden on the system performance. By default if the
alarms kernel config is disabled the data conversion is performed by
the driver on demand when read operation is requested via corresponding
_input-file.
Co-developed-by: Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru>
Signed-off-by: Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
For hwmon drivers using the hwmon_device_register_with_info() API, it
is desirable to have a generic notification mechanism available. This
mechanism can be used to notify userspace as well as the thermal
subsystem if the driver experiences any events, such as warning or
critical alarms.
Implement hwmon_notify_event() to provide this mechanism. The function
generates a sysfs event and a udev event. If the device is registered
with the thermal subsystem and the event is associated with a temperature
sensor, also notify the thermal subsystem that a thermal event occurred.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Maxim Kaurkin <Maxim.Kaurkin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Baikal-T1 SoC is equipped with an embedded process, voltage and
temperature sensor to monitor the chip internal environment like
temperature, supply voltage and transistors performance.
This bindings describes the external Baikal-T1 PVT control interfaces
like MMIO registers space, interrupt request number and clocks source.
These are then used by the corresponding hwmon device driver to
implement the sysfs files-based access to the sensors functionality.
Co-developed-by: Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru>
Signed-off-by: Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Pull NVMe poll fix from Christoph.
* 'nvme-5.7' of git://git.infradead.org/nvme:
nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()
|
|
<yibin.gong@nxp.com>:
There is ecspi ERR009165 on i.mx6/7 soc family, which cause FIFO
transfer to be send twice in DMA mode. Please get more information from:
https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf. The workaround is adding
new sdma ram script which works in XCH mode as PIO inside sdma instead
of SMC mode, meanwhile, 'TX_THRESHOLD' should be 0. The issue should be
exist on all legacy i.mx6/7 soc family before i.mx6ul.
NXP fix this design issue from i.mx6ul, so newer chips including i.mx6ul/
6ull/6sll do not need this workaroud anymore. All other i.mx6/7/8 chips
still need this workaroud. This patch set add new 'fsl,imx6ul-ecspi'
for ecspi driver and 'ecspi_fixed' in sdma driver to choose if need errata
or not.
The first two reverted patches should be the same issue, though, it
seems 'fixed' by changing to other shp script. Hope Sean or Sascha could
have the chance to test this patch set if could fix their issues.
Besides, enable sdma support for i.mx8mm/8mq and fix ecspi1 not work
on i.mx8mm because the event id is zero.
PS:
Please get sdma firmware from below linux-firmware and copy it to your
local rootfs /lib/firmware/imx/sdma.
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/imx/sdma
v2:
1.Add commit log for reverted patches.
2.Add comment for 'ecspi_fixed' in sdma driver.
3.Add 'fsl,imx6sll-ecspi' compatible instead of 'fsl,imx6ul-ecspi'
rather than remove.
v3:
1.Confirm with design team make sure ERR009165 fixed on i.mx6ul/i.mx6ull
/i.mx6sll, not fixed on i.mx8m/8mm and other i.mx6/7 legacy chips.
Correct dts related dts patch in v2.
2.Clean eratta information in binding doc and new 'tx_glitch_fixed' flag
in spi-imx driver to state ERR009165 fixed or not.
3.Enlarge burst size to fifo size for tx since tx_wml set to 0 in the
errata workaroud, thus improve performance as possible.
v4:
1.Add Ack tag from Mark and Vinod
2.Remove checking 'event_id1' zero as 'event_id0'.
v5:
1.Add the last patch for compatible with the current uart driver which
using rom script, so both uart ram script and rom script supported
in latest firmware, by default uart rom script used. UART driver
will be broken without this patch.
v6:
1.Resend after rebase the latest next branch.
2.Remove below No.13~No.15 patches of v5 because they were mergered.
ARM: dts: imx6ul: add dma support on ecspi
ARM: dts: imx6sll: correct sdma compatible
arm64: defconfig: Enable SDMA on i.mx8mq/8mm
3.Revert "dmaengine: imx-sdma: fix context cache" since
'context_loaded' removed.
v7:
1.Put the last patch 13/13 'Revert "dmaengine: imx-sdma: fix context
cache"' to the ahead of 03/13 'Revert "dmaengine: imx-sdma: refine
to load context only once" so that no building waring during comes out
during bisect.
2.Address Sascha's comments, including eliminating any i.mx6sx in this
series, adding new 'is_imx6ul_ecspi()' instead imx in imx51 and taking
care SMC bit for PIO.
3.Add back missing 'Reviewed-by' tag on 08/15(v5):09/13(v7)
'spi: imx: add new i.mx6ul compatible name in binding doc'
v8:
1.remove 0003-Revert-dmaengine-imx-sdma-fix-context-cache.patch and merge
it into 04/13 of v7
2.add 0005-spi-imx-fallback-to-PIO-if-dma-setup-failure.patch for no any
ecspi function broken even if sdma firmware not updated.
3.merge 'tx.dst_maxburst' changes in the two continous patches into one
patch to avoid confusion.
4.fix typo 'duplicated'.
Robin Gong (13):
Revert "ARM: dts: imx6q: Use correct SDMA script for SPI5 core"
Revert "ARM: dts: imx6: Use correct SDMA script for SPI cores"
Revert "dmaengine: imx-sdma: refine to load context only once"
dmaengine: imx-sdma: remove duplicated sdma_load_context
spi: imx: fallback to PIO if dma setup failure
dmaengine: imx-sdma: add mcu_2_ecspi script
spi: imx: fix ERR009165
spi: imx: remove ERR009165 workaround on i.mx6ul
spi: imx: add new i.mx6ul compatible name in binding doc
dmaengine: imx-sdma: remove ERR009165 on i.mx6ul
dma: imx-sdma: add i.mx6ul compatible name
dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm
dmaengine: imx-sdma: add uart rom script
.../devicetree/bindings/dma/fsl-imx-sdma.txt | 1 +
.../devicetree/bindings/spi/fsl-imx-cspi.txt | 1 +
arch/arm/boot/dts/imx6q.dtsi | 2 +-
arch/arm/boot/dts/imx6qdl.dtsi | 8 +-
drivers/dma/imx-sdma.c | 67 ++++++++++------
drivers/spi/spi-imx.c | 92 +++++++++++++++++++---
include/linux/platform_data/dma-imx-sdma.h | 8 +-
7 files changed, 135 insertions(+), 44 deletions(-)
--
2.7.4
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
|
|
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20200523124758.28604-1-dinghao.liu@zju.edu.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20200523122909.25247-1-dinghao.liu@zju.edu.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20200523125704.30300-1-dinghao.liu@zju.edu.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fallback to PIO in case dma setup failed. For example, sdma firmware not
updated but ERR009165 workaroud added in kernel.
Signed-off-by: Robin Gong <yibin.gong@nxp.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/1590006865-20900-6-git-send-email-yibin.gong@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Commit
987053a30016 ("efi/x86: Move command-line initrd loading to efi_main")
moved the command-line initrd loading into efi_main(), with a check
to ensure that it was attempted only if the EFI stub was booted via
efi_pe_entry rather than the EFI handover entry.
However, in the case where it was booted via handover entry, and thus an
initrd may have already been loaded by the bootloader, it then wrote 0
for the initrd address and size, removing any existing initrd.
Fix this by checking if size is positive before setting the fields in
the bootparams structure.
Fixes: 987053a30016 ("efi/x86: Move command-line initrd loading to efi_main")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lkml.kernel.org/r/20200527232602.21596-1-nivedita@alum.mit.edu
|
|
If boot_secondary() was successful, and cpu_online() was an error in
__cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7
("arm64/kernel: Simplify __cpu_up() by bailing out early").
Therefore, bringup_wait_for_ap() causes the primary core to wait for a
long time, which may cause boot failure.
This commit sets -EIO to return code under the same conditions.
Fixes: d22b115cbfbb ("arm64/kernel: Simplify __cpu_up() by bailing out early")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Acked-by: Will Deacon <will@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200527233457.2531118-1-nobuhiro1.iwamatsu@toshiba.co.jp
[catalin.marinas@arm.com: return -EIO at the end of the function]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Current RCU hard relies on smp_call_function() callbacks running from
interrupt context. A pending optimization is going to break that, it
will allow idle CPUs to run the callbacks from the idle loop. This
avoids raising the IPI on the requesting CPU and avoids handling an
exception on the receiving CPU.
Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
provided it is the idle task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
|
|
The zcomp driver uses per-CPU compression. The per-CPU data pointer is
acquired with get_cpu_ptr() which implicitly disables preemption.
It allocates memory inside the preempt disabled region which conflicts
with the PREEMPT_RT semantics.
Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.
[bigeasy: Use local_lock(), description, drop reordering]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-8-bigeasy@linutronix.de
|
|
zcomp::stream is a per-CPU pointer, pointing to struct zcomp_strm
which contains two pointers. Having struct zcomp_strm allocated
directly as per-CPU memory would avoid one additional memory
allocation and a pointer dereference. This also simplifies the
addition of a local_lock to struct zcomp_strm.
Allocate zcomp::stream directly as per-CPU memory.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-7-bigeasy@linutronix.de
|
|
send_msg() disables preemption to avoid out-of-order messages. As the
code inside the preempt disabled section acquires regular spinlocks,
which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and
eventually calls into a memory allocator, this conflicts with the RT
semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to preempt_disable() as
before. No functional change.
[bigeasy: Patch description]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-6-bigeasy@linutronix.de
|
|
The squashfs multi CPU decompressor makes use of get_cpu_ptr() to
acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables
preemption which serializes the access to the per-CPU data.
But decompression can take quite some time depending on the size. The
observed preempt disabled times in real world scenarios went up to 8ms,
causing massive wakeup latencies. This happens on all CPUs as the
decompression is fully parallelized.
Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.
[ bigeasy: Use local_lock(), patch description]
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-5-bigeasy@linutronix.de
|
|
The various struct pagevec per CPU variables are protected by disabling
either preemption or interrupts across the critical sections. Inside
these sections spinlocks have to be acquired.
These spinlocks are regular spinlock_t types which are converted to
"sleeping" spinlocks on PREEMPT_RT enabled kernels. Obviously sleeping
locks cannot be acquired in preemption or interrupt disabled sections.
local locks provide a trivial way to substitute preempt and interrupt
disable instances. On a non PREEMPT_RT enabled kernel local_lock() maps
to preempt_disable() and local_lock_irq() to local_irq_disable().
Create lru_rotate_pvecs containing the pagevec and the locallock.
Create lru_pvecs containing the remaining pagevecs and the locallock.
Add lru_add_drain_cpu_zone() which is used from compact_zone() to avoid
exporting the pvec structure.
Change the relevant call sites to acquire these locks instead of using
preempt_disable() / get_cpu() / get_cpu_var() and local_irq_disable() /
local_irq_save().
There is neither a functional change nor a change in the generated
binary code for non PREEMPT_RT enabled non-debug kernels.
When lockdep is enabled local locks have lockdep maps embedded. These
allow lockdep to validate the protections, i.e. inappropriate usage of a
preemption only protected sections would result in a lockdep warning
while the same problem would not be noticed with a plain
preempt_disable() based protection.
local locks also improve readability as they provide a named scope for
the protections while preempt/interrupt disable are opaque scopeless.
Finally local locks allow PREEMPT_RT to substitute them with real
locking primitives to ensure the correctness of operation in a fully
preemptible kernel.
[ bigeasy: Adopted to use local_lock ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-4-bigeasy@linutronix.de
|
|
The radix-tree and idr preload mechanisms use preempt_disable() to protect
the complete operation between xxx_preload() and xxx_preload_end().
As the code inside the preempt disabled section acquires regular spinlocks,
which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and
eventually calls into a memory allocator, this conflicts with the RT
semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to preempt_disable() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-3-bigeasy@linutronix.de
|
|
preempt_disable() and local_irq_disable/save() are in principle per CPU big
kernel locks. This has several downsides:
- The protection scope is unknown
- Violation of protection rules is hard to detect by instrumentation
- For PREEMPT_RT such sections, unless in low level critical code, can
violate the preemptability constraints.
To address this PREEMPT_RT introduced the concept of local_locks which are
strictly per CPU.
The lock operations map to preempt_disable(), local_irq_disable/save() and
the enabling counterparts on non RT enabled kernels.
If lockdep is enabled local locks gain a lock map which tracks the usage
context. This will catch cases where an area is protected by
preempt_disable() but the access also happens from interrupt context. local
locks have identified quite a few such issues over the years, the most
recent example is:
b7d5dc21072cd ("random: add a spinlock_t to struct batched_entropy")
Aside of the lockdep coverage this also improves code readability as it
precisely annotates the protection scope.
PREEMPT_RT substitutes these local locks with 'sleeping' spinlocks to
protect such sections while maintaining preemtability and CPU locality.
local locks can replace:
- preempt_enable()/disable() pairs
- local_irq_disable/enable() pairs
- local_irq_save/restore() pairs
They are also used to replace code which implicitly disables preemption
like:
- get_cpu()/put_cpu()
- get_cpu_var()/put_cpu_var()
with PREEMPT_RT friendly constructs.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-2-bigeasy@linutronix.de
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
of devices
In order to be compatible with devices of different versions, V1 in the
accelerator driver is now isolated, and other versions are the previous
V2 processing flow.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Now, in crypto-engine, if hardware queue is full (-ENOSPC),
requeue request regardless of MAY_BACKLOG flag.
If hardware throws any other error code (like -EIO, -EINVAL,
-ENOMEM, etc.) only MAY_BACKLOG requests are enqueued back into
crypto-engine's queue, since the others can be dropped.
The latter case can be fatal error, so those cannot be recovered from.
For example, in CAAM driver, -EIO is returned in case the job descriptor
is broken, so there is no possibility to fix the job descriptor.
Therefore, these errors might be fatal error, so we shouldn’t
requeue the request. This will just be pass back and forth between
crypto-engine and hardware.
Fixes: 6a89f492f8e5 ("crypto: engine - support for parallel requests based on retry mechanism")
Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Reported-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
s/NITORX/NITROX/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This patch enables AMD Fam17h RAPL support for the Package level metric.
The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR.
The same output is available via the energy-pkg pseudo event:
$ perf stat -a -I 1000 --per-socket -e power/energy-pkg/
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-6-eranian@google.com
|
|
This patch modifies perf_probe_msr() by allowing passing of
struct perf_msr array where some entries are not populated, i.e.,
they have either an msr address of 0 or no attribute_group pointer.
This helps with certain call paths, e.g., RAPL.
In case the grp is NULL, the default sysfs visibility rule
applies which is to make the group visible. Without the patch,
you would get a kernel crash with a NULL group.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-5-eranian@google.com
|
|
This patch modifies the default visibility of the attribute_group
for each RAPL event. By default if the grp.is_visible field is NULL,
sysfs considers that it must display the attribute group.
If the field is not NULL (callback function), then the return value
of the callback determines the visibility (0 = not visible). The RAPL
attribute groups had the field set to NULL, meaning that unless they
failed the probing from perf_msr_probe(), they would be visible. We want
to avoid having to specify attribute groups that are not supported by the HW
in the rapl_msrs[] array, they don't have an MSR address to begin with.
Therefore, we intialize the visible field of all RAPL attribute groups
to a callback that returns 0. If the RAPL msr goes through probing
and succeeds the is_visible field will be set back to NULL (visible).
If the probing fails the field is set to a callback that return 0 (not visible).
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-4-eranian@google.com
|
|
This patch modifies the rapl_model struct to include architecture specific
knowledge in this previously Intel specific structure, and in particular
it adds the MSR for POWER_UNIT and the rapl_msrs array.
No functional changes.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-3-eranian@google.com
|
|
To prepare for support of both Intel and AMD RAPL.
As per the AMD PPR, Fam17h support Package RAPL counters to monitor power usage.
The RAPL counter operates as with Intel RAPL, and as such it is beneficial
to share the code.
No change in functionality.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-2-eranian@google.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.7-2020-05-27:
amdgpu:
- Display atomic test fix
- Fix soft hang in display vupdate code
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527222700.4378-1-alexander.deucher@amd.com
|
|
Put the rseq_syscall check point at the prologue of the syscall
will break the a0 ... a7. This will casue system call bug when
DEBUG_RSEQ is enabled.
So move it to the epilogue of syscall, but before syscall_trace.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
There is no fixup or feature in the patch, we only cleanup with:
- Remove unnecessary reg used (r11, r12), just use r9 & r10 &
syscallid regs as temp useage.
- Add _TIF_SYSCALL_WORK and _TIF_WORK_MASK to gather macros.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Current implementation could destory a4 & a5 when strace, so we need to get them
from pt_regs by SAVE_ALL.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
log:
[ 0.13373200] Calibrating delay loop...
[ 0.14077600] ------------[ cut here ]------------
[ 0.14116700] WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:3790 preempt_count_add+0xc8/0x11c
[ 0.14348000] DEBUG_LOCKS_WARN_ON((preempt_count() < 0))Modules linked in:
[ 0.14395100] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0 #7
[ 0.14410800]
[ 0.14427400] Call Trace:
[ 0.14450700] [<807cd226>] dump_stack+0x8a/0xe4
[ 0.14473500] [<80072792>] __warn+0x10e/0x15c
[ 0.14495900] [<80072852>] warn_slowpath_fmt+0x72/0xc0
[ 0.14518600] [<800a5240>] preempt_count_add+0xc8/0x11c
[ 0.14544900] [<807ef918>] _raw_spin_lock+0x28/0x68
[ 0.14572600] [<800e0eb8>] vprintk_emit+0x84/0x2d8
[ 0.14599000] [<800e113a>] vprintk_default+0x2e/0x44
[ 0.14625100] [<800e2042>] vprintk_func+0x12a/0x1d0
[ 0.14651300] [<800e1804>] printk+0x30/0x48
[ 0.14677600] [<80008052>] lockdep_init+0x12/0xb0
[ 0.14703800] [<80002080>] start_kernel+0x558/0x7f8
[ 0.14730000] [<800052bc>] csky_start+0x58/0x94
[ 0.14756600] irq event stamp: 34
[ 0.14775100] hardirqs last enabled at (33): [<80067370>] ret_from_exception+0x2c/0x72
[ 0.14793700] hardirqs last disabled at (34): [<800e0eae>] vprintk_emit+0x7a/0x2d8
[ 0.14812300] softirqs last enabled at (32): [<800655b0>] __do_softirq+0x578/0x6d8
[ 0.14830800] softirqs last disabled at (25): [<8007b3b8>] irq_exit+0xec/0x128
The preempt_count of reg could be destroyed after csky_do_IRQ without reload
from memory.
After reference to other architectures (arm64, riscv), we move preempt entry
into ret_from_exception and disable irq at the beginning of
ret_from_exception instead of RESTORE_ALL.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reported-by: Lu Baoquan <lu.baoquan@intellif.com>
|
|
When connected mode is set, and we have connected and datagram traffic in
parallel, ipoib might crash with double free of datagram skb.
The current mechanism assumes that the order in the completion queue is
the same as the order of sent packets for all QPs. Order is kept only for
specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and
few CM's) in parallel.
The problem:
----------------------------------------------------------
Transmit queue:
-----------------
UD skb pointer kept in queue itself, CM skb kept in spearate queue and
uses transmit queue as a placeholder to count the number of total
transmitted packets.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127
------------------------------------------------------------
NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
^ ^
tail head
Completion queue (problematic scenario) - the order not the same as in
the transmit queue:
1 2 3 4 5 6 7 8 9
------------------------------------
ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
------------------------------------
1. CM1 'wc' processing
- skb freed in cm separate ring.
- tx_tail of transmit queue increased although UD2 is not freed.
Now driver assumes UD2 index is already freed and it could be used for
new transmitted skb.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127
------------------------------------------------------------
NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
^ ^ ^
(Bad)tail head
(Bad - Could be used for new SKB)
In this case (due to heavy load) UD2 skb pointer could be replaced by new
transmitted packet UD_NEW, as the driver assumes its free. At this point
we will have to process two 'wc' with same index but we have only one
pointer to free.
During second attempt to free the same skb we will have NULL pointer
exception.
2. UD2 'wc' processing
- skb freed according the index we got from 'wc', but it was already
overwritten by mistake. So actually the skb that was released is the
skb of the new transmitted packet and not the original one.
3. UD_NEW 'wc' processing
- attempt to free already freed skb. NUll pointer exception.
The fix:
-----------------------------------------------------------------------
The fix is to stop using the UD ring as a placeholder for CM packets, the
cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a
new cyclic variables global_tx_head and global_tx_tail are introduced for
managing and counting the overall outstanding sent packets, then the send
queue will be stopped and waken based on these variables only.
Note that no locking is needed since global_tx_head is updated in the xmit
flow and global_tx_tail is updated in the NAPI flow only. A previous
attempt tried to use one variable to count the outstanding sent packets,
but it did not work since xmit and NAPI flows can run at the same time and
the counter will be updated wrongly. Thus, we use the same simple cyclic
head and tail scheme that we have today for the UD tx_ring.
Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes")
Link: https://lore.kernel.org/r/20200527134705.480068-1-leon@kernel.org
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
[Why]
If VUPDATE_END is before VUPDATE_START the delay calculated can become
very large, causing a soft hang.
[How]
Take the absolute value of the difference between START and END.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
get_cursor_position already handles the case where the cursor has
negative off-screen coordinates by not setting
dc_cursor_position.enabled.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 626bf90fe03f ("drm/amd/display: add basic atomic check for cursor plane")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Be there a platform with the following layout:
Regular NIC
|
+----> DSA master for switch port
|
+----> DSA master for another switch port
After changing DSA back to static lockdep class keys in commit
1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:
[ 13.361198] ============================================
[ 13.366524] WARNING: possible recursive locking detected
[ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[ 13.377874] --------------------------------------------
[ 13.383201] swapper/0/0 is trying to acquire lock:
[ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.397879]
[ 13.397879] but task is already holding lock:
[ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.413593]
[ 13.413593] other info that might help us debug this:
[ 13.420140] Possible unsafe locking scenario:
[ 13.420140]
[ 13.426075] CPU0
[ 13.428523] ----
[ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.440924]
[ 13.440924] *** DEADLOCK ***
[ 13.440924]
[ 13.446860] May be due to missing lock nesting notation
[ 13.446860]
[ 13.453668] 6 locks held by swapper/0/0:
[ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.512000]
[ 13.512000] stack backtrace:
[ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[ 13.530421] Call trace:
[ 13.532871] dump_backtrace+0x0/0x1d8
[ 13.536539] show_stack+0x24/0x30
[ 13.539862] dump_stack+0xe8/0x150
[ 13.543271] __lock_acquire+0x1030/0x1678
[ 13.547290] lock_acquire+0xf8/0x458
[ 13.550873] _raw_spin_lock+0x44/0x58
[ 13.554543] __dev_queue_xmit+0x84c/0xbe0
[ 13.558562] dev_queue_xmit+0x24/0x30
[ 13.562232] dsa_slave_xmit+0xe0/0x128
[ 13.565988] dev_hard_start_xmit+0xf4/0x448
[ 13.570182] __dev_queue_xmit+0x808/0xbe0
[ 13.574200] dev_queue_xmit+0x24/0x30
[ 13.577869] neigh_resolve_output+0x15c/0x220
[ 13.582237] ip6_finish_output2+0x244/0xb10
[ 13.586430] __ip6_finish_output+0x1dc/0x298
[ 13.590709] ip6_output+0x84/0x358
[ 13.594116] mld_sendpack+0x2bc/0x560
[ 13.597786] mld_ifc_timer_expire+0x210/0x390
[ 13.602153] call_timer_fn+0xcc/0x400
[ 13.605822] run_timer_softirq+0x588/0x6e0
[ 13.609927] __do_softirq+0x118/0x590
[ 13.613597] irq_exit+0x13c/0x148
[ 13.616918] __handle_domain_irq+0x6c/0xc0
[ 13.621023] gic_handle_irq+0x6c/0x160
[ 13.624779] el1_irq+0xbc/0x180
[ 13.627927] cpuidle_enter_state+0xb4/0x4d0
[ 13.632120] cpuidle_enter+0x3c/0x50
[ 13.635703] call_cpuidle+0x44/0x78
[ 13.639199] do_idle+0x228/0x2c8
[ 13.642433] cpu_startup_entry+0x2c/0x48
[ 13.646363] rest_init+0x1ac/0x280
[ 13.649773] arch_call_rest_init+0x14/0x1c
[ 13.653878] start_kernel+0x490/0x4bc
Lockdep keys themselves were added in commit ab92d68fc22f ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.
>From Taehee's own words:
This patch was considered that all stackable devices have LLTX flag.
But the dsa doesn't have LLTX, so this splat happened.
After this patch, dsa shares the same lockdep class key.
On the nested dsa interface architecture, which you illustrated,
the same lockdep class key will be used in __dev_queue_xmit() because
dsa doesn't have LLTX.
So that lockdep detects deadlock because the same lockdep class key is
used recursively although actually the different locks are used.
There are some ways to fix this problem.
1. using NETIF_F_LLTX flag.
If possible, using the LLTX flag is a very clear way for it.
But I'm so sorry I don't know whether the dsa could have LLTX or not.
2. using dynamic lockdep again.
It means that each interface uses a separate lockdep class key.
So, lockdep will not detect recursive locking.
But this way has a problem that it could consume lockdep class key
too many.
Currently, lockdep can have 8192 lockdep class keys.
- you can see this number with the following command.
cat /proc/lockdep_stats
lock-classes: 1251 [max: 8192]
...
The [max: 8192] means that the maximum number of lockdep class keys.
If too many lockdep class keys are registered, lockdep stops to work.
So, using a dynamic(separated) lockdep class key should be considered
carefully.
In addition, updating lockdep class key routine might have to be existing.
(lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
3. Using lockdep subclass.
A lockdep class key could have 8 subclasses.
The different subclass is considered different locks by lockdep
infrastructure.
But "lock-classes" is not counted by subclasses.
So, it could avoid stopping lockdep infrastructure by an overflow of
lockdep class keys.
This approach should also have an updating lockdep class key routine.
(lockdep_set_subclass())
4. Using nonvalidate lockdep class key.
The lockdep infrastructure supports nonvalidate lockdep class key type.
It means this lockdep is not validated by lockdep infrastructure.
So, the splat will not happen but lockdep couldn't detect real deadlock
case because lockdep really doesn't validate it.
I think this should be used for really special cases.
(lockdep_set_novalidate_class())
Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.
Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
copy the corresponding pieces of init_fpstate into the gaps instead.
Cc: stable@kernel.org
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
As explained in other commits before (b9cd75e66895 and 87b0f983f66f),
ocelot switches have a single egress-untagged VLAN per port, and the
driver would deny adding a second one while an egress-untagged VLAN
already exists.
But on the CPU port (where the VLAN configuration is implicit, because
there is no net device for the bridge to control), the DSA core attempts
to add a VLAN using the same flags as were used for the front-panel
port. This would make adding any untagged VLAN fail due to the CPU port
rejecting the configuration:
bridge vlan add dev swp0 vid 100 pvid untagged
[ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1
[ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16
(note that port 5 is the CPU port and not the front-panel swp0).
So this hardware will send all VLANs as tagged towards the CPU.
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clang-10 and clang-11 run into a corner case of the register
allocator on 32-bit ARM, leading to excessive stack usage from
register spilling:
net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=]
Work around this by marking one of the internal functions as
noinline_for_stack.
Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g.,
when doing live reset while polling the nvme device.
CPU X CPU Y
nvme_poll()
nvme_dev_disable()
-> nvme_stop_queues()
-> nvme_suspend_io_queues()
-> nvme_suspend_queue()
-> spin_lock(&nvmeq->cq_poll_lock);
-> nvme_reap_pending_cqes()
-> nvme_process_cq() -> nvme_process_cq()
In the above scenario, the nvme_process_cq() for the same queue may be
running on both CPU X and CPU Y concurrently.
It is much more easier to reproduce the issue when CONFIG_PREEMPT is
enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer
time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace
period.
This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in
nvme_reap_pending_cqes().
Fixes: fa46c6fb5d61 ("nvme/pci: move cqe check after device shutdown")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The accept(2) is an "input" socket interface, so we should use
SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.
So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
use the right timeout in the vsock_accept().
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to this change the correct value for the used counter is calculated
but not stored nor, therefore, propagated to user-space. In use-cases such
as OVS use-case at least this results in active flows being removed from
the hardware datapath. Which results in both unnecessary flow tear-down
and setup, and packet processing on the host.
This patch addresses the problem by saving the calculated used value
which allows the value to propagate to user-space.
Found by inspection.
Fixes: aa6ce2ea0c93 ("nfp: flower: support stats update for merge flows")
Signed-off-by: Heinrich Kuhn <heinrich.kuhn@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
this command hangs forever:
# tc qdisc add dev eth0 root fq_pie flows 65536
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028]
[...]
CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167
RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie]
Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48
RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff
RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590
RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699
R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0
R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8
FS: 00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0
Call Trace:
qdisc_create+0x3fd/0xeb0
tc_modify_qdisc+0x3be/0x14a0
rtnetlink_rcv_msg+0x5f3/0x920
netlink_rcv_skb+0x121/0x350
netlink_unicast+0x439/0x630
netlink_sendmsg+0x714/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5b4/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x9a/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
we can't accept 65536 as a valid number for 'nflows', because the loop on
'idx' in fq_pie_init() will never end. The extack message is correct, but
it doesn't say that 0 is not a valid number for 'flows': while at it, fix
this also. Add a tdc selftest to check correct validation of 'flows'.
CC: Ivan Vecera <ivecera@redhat.com>
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|