linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-05-28	sched: Add rq::ttwu_pending	Peter Zijlstra
	In preparation of removing rq->wake_list, replace the !list_empty(rq->wake_list) with rq->ttwu_pending. This is not fully equivalent as this new variable is racy. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161908.070399698@infradead.org
2020-05-28	irq_work, smp: Allow irq_work on call_single_queue	Peter Zijlstra
	Currently irq_work_queue_on() will issue an unconditional arch_send_call_function_single_ipi() and has the handler do irq_work_run(). This is unfortunate in that it makes the IPI handler look at a second cacheline and it misses the opportunity to avoid the IPI. Instead note that struct irq_work and struct __call_single_data are very similar in layout, so use a few bits in the flags word to encode a type and stick the irq_work on the call_single_queue list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161908.011635912@infradead.org
2020-05-28	smp: Optimize send_call_function_single_ipi()	Peter Zijlstra
	Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle CPUs. [ mingo: Fix UP build bug. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.953304789@infradead.org
2020-05-28	smp: Move irq_work_run() out of flush_smp_call_function_queue()	Peter Zijlstra
	This ensures flush_smp_call_function_queue() is strictly about call_single_queue. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.895109676@infradead.org
2020-05-28	smp: Optimize flush_smp_call_function_queue()	Peter Zijlstra
	The call_single_queue can contain (two) different callbacks, synchronous and asynchronous. The current interrupt handler runs them in-order, which means that remote CPUs that are waiting for their synchronous call can be delayed by running asynchronous callbacks. Rework the interrupt handler to first run the synchonous callbacks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.836818381@infradead.org
2020-05-28	sched: Fix smp_call_function_single_async() usage for ILB	Peter Zijlstra
	The recent commit: 90b5363acd47 ("sched: Clean up scheduler_ipi()") got smp_call_function_single_async() subtly wrong. Even though it will return -EBUSY when trying to re-use a csd, that condition is not atomic and still requires external serialization. The change in kick_ilb() got this wrong. While on first reading kick_ilb() has an atomic test-and-set that appears to serialize the use, the matching 'release' is not in the right place to actually guarantee this serialization. Rework the nohz_idle_balance() trigger so that the release is in the IPI callback and thus guarantees the required serialization for the CSD. Fixes: 90b5363acd47 ("sched: Clean up scheduler_ipi()") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: mgorman@techsingularity.net Link: https://lore.kernel.org/r/20200526161907.778543557@infradead.org
2020-05-28	Merge branch 'core/rcu' into sched/core, to pick up dependency	Ingo Molnar
	We are going to rely on the loosening of RCU callback semantics, introduced by this commit: 806f04e9fd2c: ("rcu: Allow for smp_call_function() running callbacks from idle") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28	Merge branch 'sched/urgent' into sched/core, to pick up fix	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28	rcu: Allow for smp_call_function() running callbacks from idle	Peter Zijlstra
	Current RCU hard relies on smp_call_function() callbacks running from interrupt context. A pending optimization is going to break that, it will allow idle CPUs to run the callbacks from the idle loop. This avoids raising the IPI on the requesting CPU and avoids handling an exception on the receiving CPU. Change rcu_is_cpu_rrupt_from_idle() to also accept task context, provided it is the idle task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
2020-05-28	zram: Use local lock to protect per-CPU data	Mike Galbraith
	The zcomp driver uses per-CPU compression. The per-CPU data pointer is acquired with get_cpu_ptr() which implicitly disables preemption. It allocates memory inside the preempt disabled region which conflicts with the PREEMPT_RT semantics. Replace the implicit preemption control with an explicit local lock. This allows RT kernels to substitute it with a real per CPU lock, which serializes the access but keeps the code section preemptible. On non RT kernels this maps to preempt_disable() as before, i.e. no functional change. [bigeasy: Use local_lock(), description, drop reordering] Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-8-bigeasy@linutronix.de
2020-05-28	zram: Allocate struct zcomp_strm as per-CPU memory	Sebastian Andrzej Siewior
	zcomp::stream is a per-CPU pointer, pointing to struct zcomp_strm which contains two pointers. Having struct zcomp_strm allocated directly as per-CPU memory would avoid one additional memory allocation and a pointer dereference. This also simplifies the addition of a local_lock to struct zcomp_strm. Allocate zcomp::stream directly as per-CPU memory. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-7-bigeasy@linutronix.de
2020-05-28	connector/cn_proc: Protect send_msg() with a local lock	Mike Galbraith
	send_msg() disables preemption to avoid out-of-order messages. As the code inside the preempt disabled section acquires regular spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and eventually calls into a memory allocator, this conflicts with the RT semantics. Convert it to a local_lock which allows RT kernels to substitute them with a real per CPU lock. On non RT kernels this maps to preempt_disable() as before. No functional change. [bigeasy: Patch description] Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-6-bigeasy@linutronix.de
2020-05-28	squashfs: Make use of local lock in multi_cpu decompressor	Julia Cartwright
	The squashfs multi CPU decompressor makes use of get_cpu_ptr() to acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables preemption which serializes the access to the per-CPU data. But decompression can take quite some time depending on the size. The observed preempt disabled times in real world scenarios went up to 8ms, causing massive wakeup latencies. This happens on all CPUs as the decompression is fully parallelized. Replace the implicit preemption control with an explicit local lock. This allows RT kernels to substitute it with a real per CPU lock, which serializes the access but keeps the code section preemptible. On non RT kernels this maps to preempt_disable() as before, i.e. no functional change. [ bigeasy: Use local_lock(), patch description] Reported-by: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Alexander Stein <alexander.stein@systec-electronic.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-5-bigeasy@linutronix.de
2020-05-28	mm/swap: Use local_lock for protection	Ingo Molnar
	The various struct pagevec per CPU variables are protected by disabling either preemption or interrupts across the critical sections. Inside these sections spinlocks have to be acquired. These spinlocks are regular spinlock_t types which are converted to "sleeping" spinlocks on PREEMPT_RT enabled kernels. Obviously sleeping locks cannot be acquired in preemption or interrupt disabled sections. local locks provide a trivial way to substitute preempt and interrupt disable instances. On a non PREEMPT_RT enabled kernel local_lock() maps to preempt_disable() and local_lock_irq() to local_irq_disable(). Create lru_rotate_pvecs containing the pagevec and the locallock. Create lru_pvecs containing the remaining pagevecs and the locallock. Add lru_add_drain_cpu_zone() which is used from compact_zone() to avoid exporting the pvec structure. Change the relevant call sites to acquire these locks instead of using preempt_disable() / get_cpu() / get_cpu_var() and local_irq_disable() / local_irq_save(). There is neither a functional change nor a change in the generated binary code for non PREEMPT_RT enabled non-debug kernels. When lockdep is enabled local locks have lockdep maps embedded. These allow lockdep to validate the protections, i.e. inappropriate usage of a preemption only protected sections would result in a lockdep warning while the same problem would not be noticed with a plain preempt_disable() based protection. local locks also improve readability as they provide a named scope for the protections while preempt/interrupt disable are opaque scopeless. Finally local locks allow PREEMPT_RT to substitute them with real locking primitives to ensure the correctness of operation in a fully preemptible kernel. [ bigeasy: Adopted to use local_lock ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-4-bigeasy@linutronix.de
2020-05-28	radix-tree: Use local_lock for protection	Sebastian Andrzej Siewior
	The radix-tree and idr preload mechanisms use preempt_disable() to protect the complete operation between xxx_preload() and xxx_preload_end(). As the code inside the preempt disabled section acquires regular spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and eventually calls into a memory allocator, this conflicts with the RT semantics. Convert it to a local_lock which allows RT kernels to substitute them with a real per CPU lock. On non RT kernels this maps to preempt_disable() as before, but provides also lockdep coverage of the critical region. No functional change. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-3-bigeasy@linutronix.de
2020-05-28	locking: Introduce local_lock()	Thomas Gleixner
	preempt_disable() and local_irq_disable/save() are in principle per CPU big kernel locks. This has several downsides: - The protection scope is unknown - Violation of protection rules is hard to detect by instrumentation - For PREEMPT_RT such sections, unless in low level critical code, can violate the preemptability constraints. To address this PREEMPT_RT introduced the concept of local_locks which are strictly per CPU. The lock operations map to preempt_disable(), local_irq_disable/save() and the enabling counterparts on non RT enabled kernels. If lockdep is enabled local locks gain a lock map which tracks the usage context. This will catch cases where an area is protected by preempt_disable() but the access also happens from interrupt context. local locks have identified quite a few such issues over the years, the most recent example is: b7d5dc21072cd ("random: add a spinlock_t to struct batched_entropy") Aside of the lockdep coverage this also improves code readability as it precisely annotates the protection scope. PREEMPT_RT substitutes these local locks with 'sleeping' spinlocks to protect such sections while maintaining preemtability and CPU locality. local locks can replace: - preempt_enable()/disable() pairs - local_irq_disable/enable() pairs - local_irq_save/restore() pairs They are also used to replace code which implicitly disables preemption like: - get_cpu()/put_cpu() - get_cpu_var()/put_cpu_var() with PREEMPT_RT friendly constructs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-2-bigeasy@linutronix.de
2020-05-28	Merge tag 'v5.7-rc7' into WIP.locking/core, to refresh the tree	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28	Bluetooth: btbcm: Added 003.006.007, changed 001.003.015	Azamat H. Hackimov
	Added new Broadcom device BCM4350C5, changed BCM4354A2 to BCM4356A2. Based on Broadcom Windows drivers 001.003.015 should be BCM4356A2. I have user report that firmware name is misplaced (https://github.com/winterheart/broadcom-bt-firmware/issues/3). Signed-off-by: Azamat H. Hackimov <azamat.hackimov@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-05-28	MIPS: Loongson64: Define PCI_IOBASE	Jiaxun Yang
	PCI_IOBASE is used to create VM maps for PCI I/O ports, it is required by generic PCI drivers to make memory mapped I/O range work. To deal with legacy drivers that have fixed I/O ports range we reserved 0x10000 in PCI_IOBASE, should be enough for i8259 i8042 stuff. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-28	MIPS: CPU_LOONGSON2EF need software to maintain cache consistency	Lichao Liu
	CPU_LOONGSON2EF need software to maintain cache consistency, so modify the 'cpu_needs_post_dma_flush' function to return true when the cpu type is CPU_LOONGSON2EF. Cc: stable@vger.kernel.org Signed-off-by: Lichao Liu <liulichao@loongson.cn> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-28	MIPS: DTS: Fix build errors used with various configs	Tiezhu Yang
	If CONFIG_MIPS_MALTA is not set but CONFIG_LEGACY_BOARD_SEAD3 is set, the subdir arch/mips/boot/dts/mti will not be built, so the sead3.dts which depends on CONFIG_LEGACY_BOARD_SEAD3 in this subdir is also not built, and then there exists the following build error, fix it. LD .tmp_vmlinux.kallsyms1 arch/mips/generic/board-sead3.o:(.mips.machines.init+0x4): undefined reference to `__dtb_sead3_begin' Makefile:1106: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Additionally, add CONFIG_FIT_IMAGE_FDT_BOSTON check for subdir img to fix the following build error when CONFIG_MACH_PISTACHIO is not set but CONFIG_FIT_IMAGE_FDT_BOSTON is set. FATAL ERROR: Couldn't open "boot/dts/img/boston.dtb": No such file or directory Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 41528ba6afe6 ("MIPS: DTS: Only build subdir of current platform") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-28	crypto: hisilicon - fix driver compatibility issue with different versions ↵	Weili Qian
	of devices In order to be compatible with devices of different versions, V1 in the accelerator driver is now isolated, and other versions are the previous V2 processing flow. Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Shukun Tan <tanshukun1@huawei.com> Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-28	crypto: engine - do not requeue in case of fatal error	Iuliana Prodan
	Now, in crypto-engine, if hardware queue is full (-ENOSPC), requeue request regardless of MAY_BACKLOG flag. If hardware throws any other error code (like -EIO, -EINVAL, -ENOMEM, etc.) only MAY_BACKLOG requests are enqueued back into crypto-engine's queue, since the others can be dropped. The latter case can be fatal error, so those cannot be recovered from. For example, in CAAM driver, -EIO is returned in case the job descriptor is broken, so there is no possibility to fix the job descriptor. Therefore, these errors might be fatal error, so we shouldn’t requeue the request. This will just be pass back and forth between crypto-engine and hardware. Fixes: 6a89f492f8e5 ("crypto: engine - support for parallel requests based on retry mechanism") Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com> Reported-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-28	crypto: cavium/nitrox - Fix a typo in a comment	Christophe JAILLET
	s/NITORX/NITROX/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-28	mac80211: support control port TX status reporting	Markus Theil
	Add support for TX status reporting for the control port TX API; this will be used by hostapd when it moves to the control port TX API. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200527160334.19224-1-markus.theil@tu-ilmenau.de [fix commit message, it was referring to nl80211] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-28	Merge tag 'amd-drm-next-5.8-2020-05-27' of ↵	Dave Airlie
	git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.8-2020-05-27: amdgpu: - SRIOV fixes - RAS fixes - VCN 2.5 DPG (Dynamic PowerGating) fixes - FP16 updates for display - CTF cleanups - Display fixes - Fix pcie bw sysfs handling - Enable resizeable BAR support for gmc 10.x - GFXOFF fixes for Raven - PM sysfs handling fixes amdkfd: - Fix a race condition - Warning fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200527231219.3930-1-alexander.deucher@amd.com
2020-05-28	perf/x86/rapl: Add AMD Fam17h RAPL support	Stephane Eranian
	This patch enables AMD Fam17h RAPL support for the Package level metric. The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR. The same output is available via the energy-pkg pseudo event: $ perf stat -a -I 1000 --per-socket -e power/energy-pkg/ Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-6-eranian@google.com
2020-05-28	perf/x86/rapl: Make perf_probe_msr() more robust and flexible	Stephane Eranian
	This patch modifies perf_probe_msr() by allowing passing of struct perf_msr array where some entries are not populated, i.e., they have either an msr address of 0 or no attribute_group pointer. This helps with certain call paths, e.g., RAPL. In case the grp is NULL, the default sysfs visibility rule applies which is to make the group visible. Without the patch, you would get a kernel crash with a NULL group. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-5-eranian@google.com
2020-05-28	perf/x86/rapl: Flip logic on default events visibility	Stephane Eranian
	This patch modifies the default visibility of the attribute_group for each RAPL event. By default if the grp.is_visible field is NULL, sysfs considers that it must display the attribute group. If the field is not NULL (callback function), then the return value of the callback determines the visibility (0 = not visible). The RAPL attribute groups had the field set to NULL, meaning that unless they failed the probing from perf_msr_probe(), they would be visible. We want to avoid having to specify attribute groups that are not supported by the HW in the rapl_msrs[] array, they don't have an MSR address to begin with. Therefore, we intialize the visible field of all RAPL attribute groups to a callback that returns 0. If the RAPL msr goes through probing and succeeds the is_visible field will be set back to NULL (visible). If the probing fails the field is set to a callback that return 0 (not visible). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-4-eranian@google.com
2020-05-28	perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs	Stephane Eranian
	This patch modifies the rapl_model struct to include architecture specific knowledge in this previously Intel specific structure, and in particular it adds the MSR for POWER_UNIT and the rapl_msrs array. No functional changes. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-3-eranian@google.com
2020-05-28	perf/x86/rapl: Move RAPL support to common x86 code	Stephane Eranian
	To prepare for support of both Intel and AMD RAPL. As per the AMD PPR, Fam17h support Package RAPL counters to monitor power usage. The RAPL counter operates as with Intel RAPL, and as such it is beneficial to share the code. No change in functionality. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200527224659.206129-2-eranian@google.com
2020-05-28	Merge tag 'v5.7-rc7' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28	Merge tag 'amd-drm-fixes-5.7-2020-05-27' of ↵	Dave Airlie
	git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-05-27: amdgpu: - Display atomic test fix - Fix soft hang in display vupdate code Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200527222700.4378-1-alexander.deucher@amd.com
2020-05-28	Merge tag 'drm-misc-next-fixes-2020-05-27' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-next Short summary of fixes pull (less than what git shortlog provides): There's a fix for panel brighness on Lenovo X13 Yoga devices and a fix for -Wformat warnings on architectures where atomic-64 counters are not of type unsigned long long. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200527080123.GA8186@linux-uq9g
2020-05-27	ice: Check UMEM FQ size when allocating bufs	Krzysztof Kazimierczak
	If a UMEM is present on a queue when an interface/queue pair is being enabled, the driver will try to prepare the Rx buffers in advance to improve performance. However, if fill queue is shorter than HW Rx ring, the driver will report failure after getting the last address from the fill queue. This still lets the driver process the packets correctly during the NAPI poll, but leads to a constant NAPI rescheduling. Not allocating the buffers in advance would result in a potential performance decrease. Commit d57d76428ae9 ("xsk: Add API to check for available entries in FQ") provides an API that lets drivers check the number of addresses that the fill queue holds. Notify the user if fill queue is not long enough to prepare all buffers before packet processing starts, and allocate the buffers during the NAPI poll. If the fill queue size is sufficient, prepare Rx buffers in advance. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-27	net/mlx5: DR, Split RX and TX lock for parallel insertion	Alex Vesker
	Change the locking flow to support RX and TX locks, splitting the single lock to two will allow inserting rules in parallel for RX and TX parts of the FDB. Locking the dr_domain will be done by locking the RX domain and the TX domain locks, this is mostly used for control operations on the dr_domain. When inserting rules for RX or TX the single nic_doamin RX or TX lock will be used. Splitting the lock is safe since RX and TX domains are logically separated from each other, shared objects such the send-ring and memory pool are protected by locks. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5: DR, Add a spinlock to protect the send ring	Alex Vesker
	Adding this lock will allow writing steering entries without locking the dr_domain and allow parallel insertion. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Optimize performance for IPv4/IPv6 ethertype	Eli Britstein
	The HW is optimized for IPv4/IPv6. For such cases, pending capability, avoid matching on ethertype, and use ip_version field instead. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Helper function to set ethertype	Eli Britstein
	Set ethertype match in a helper function as a pre-step towards optimizing it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5: Add missing mutex destroy	Parav Pandit
	Add mutex destroy calls to balance with mutex_init() done in the init path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Use change upper event to setup representors' bond_metadata	Vu Pham
	Use change upper event to detect slave representor from enslaving/unslaving to/from lag device. On enslaving event, call mlx5_enslave_rep() API to create, add this slave representor shadow entry to the slaves list of bond_metadata structure representing master lag device and use its metadata to setup ingress acl metadata header. On unslaving event, resetting the vport of unslaved representor to use its default ingress/egress acls and rx rules with its default_metadata. The last slave will free the shared bond_metadata and its unique metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Slave representors sharing unique metadata for match	Vu Pham
	Bonded slave representors' vports must share a unique metadata for match. On enslaving event of slave representor to lag device, allocate new unique "bond_metadata" for match if this is the first slave. The subsequent enslaved representors will share the same unique "bond_metadata". On unslaving event of slave representor, reset the slave representor's vport to use its own default metadata. Replace ingress acl and rx rules of the slave representors' vports using new vport->bond_metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5: E-Switch, Alloc and free unique metadata for match	Vu Pham
	Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Add bond_metadata and its slave entries	Vu Pham
	Adding bond_metadata and its slave entries to represent a lag device and its slaves VF representors. Bond_metadata structure includes a unique metadata shared by slaves VF respresentors, and a list of slaves representors slave entries. On enslaving event, create a bond_metadata structure representing the upper lag device of this slave representor if it has not been created yet. Create and add entry for the slave representor to the slaves list. On unslaving event, free the slave entry of the slave representor. On the last unslave event, free the bond_metadata structure and its resources. Introduce APIs to create and remove bond_metadata and its resources, enslave and unslave VF representor slave entries. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Offload flow rules to active lower representor	Or Gerlitz
	When a bond device is created over one or more non uplink representors, and when a flow rule is offloaded to such bond device, offload a rule to the active lower device. Assuming that this is active-backup lag, the rules should be offloaded to the active lower device which is the representor of the direct path (not the failover). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Support tc block sharing for representors	Vu Pham
	Currently offloading a rule over a tc block shared by multiple representors fails because an e-switch global hashtable to keep the mapping from tc cookies to mlx5e flow instances is used, and tc block sharing offloads the same rule/cookie multiple times, each time for different representor sharing the tc block. Changing the implementation and behavior by acknowledging and returning success if the same rule/cookie is offloaded again to other slave representor sharing the tc block by setting, checking and comparing the netdev that added the rule first. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule	Or Gerlitz
	Register a notifier block to handle netdev events for bond device of non-uplink representors to support eswitch vports bonding. When a non-uplink representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all slave netdevs (active + standby) to forward to this representor's vport. Use change lower netdev event to do this. Use change upper event to detect slave representor unslaved from lag device to delete its vport egress acl forward rule if any. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule	Vu Pham
	By default, e-switch vport's egress acl just forward packets to its counterpart NIC vport using existing egress acl table. During port failover in bonding scenario where two VFs representors are bonded, the egress acl forward-to-vport rule will be added to the existing egress acl table of e-switch vport of passive/inactive slave representor to forward packets to other NIC vport ie. the active slave representor's NIC vport to handle egress "failover" traffic. Enable egress acl and have APIs to create and destroy egress acl forward-to-vport rule and group. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-27	net/mlx5: E-Switch, Refactor eswitch ingress acl codes	Vu Pham
	Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>
2020-05-27	net/mlx5: E-Switch, Refactor eswitch egress acl codes	Vu Pham
	Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>