linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-01-27	io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE	Hao Xu
	Abaci reported the follow warning: [ 27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0 [ 27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0 [ 27.077604] Modules linked in: [ 27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1 [ 27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 27.080852] RIP: 0010:__might_sleep+0x80/0xa0 [ 27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff 0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7 [ 27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286 [ 27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000 [ 27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570 [ 27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001 [ 27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000 [ 27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800 [ 27.091058] FS: 00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 27.092775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0 [ 27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 27.097011] Call Trace: [ 27.097685] __mutex_lock+0x5d/0xa30 [ 27.098565] ? prepare_to_wait_exclusive+0x71/0xc0 [ 27.099412] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.100441] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.101537] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 27.102656] ? trace_hardirqs_on+0x46/0x110 [ 27.103459] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.104317] io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.105113] io_cqring_wait+0x36e/0x4d0 [ 27.105770] ? find_held_lock+0x28/0xb0 [ 27.106370] ? io_uring_remove_task_files+0xa0/0xa0 [ 27.107076] __x64_sys_io_uring_enter+0x4fb/0x640 [ 27.107801] ? rcu_read_lock_sched_held+0x59/0xa0 [ 27.108562] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.109684] ? syscall_enter_from_user_mode+0x26/0x70 [ 27.110731] do_syscall_64+0x2d/0x40 [ 27.111296] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 27.112056] RIP: 0033:0x7f7b13dc8239 [ 27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa [ 27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239 [ 27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003 [ 27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008 [ 27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480 [ 27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000 [ 27.121490] irq event stamp: 5635 [ 27.121946] hardirqs last enabled at (5643): [] console_unlock+0x5c4/0x740 [ 27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740 [ 27.125192] softirqs last enabled at (5272): [] __do_softirq+0x3c5/0x5aa [ 27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20 [ 27.127634] ---[ end trace 289d7e28fa60f928 ]--- This is caused by calling io_cqring_overflow_flush() which may sleep after calling prepare_to_wait_exclusive() which set task state to TASK_INTERRUPTIBLE Reported-by: Abaci <abaci@linux.alibaba.com> Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync") Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27	Revert "block: simplify set_init_blocksize" to regain lost performance	Maxim Mikityanskiy
	The cited commit introduced a serious regression with SATA write speed, as found by bisecting. This patch reverts this commit, which restores write speed back to the values observed before this commit. The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs (WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a single HDD, the rest are different RAID levels built over the first partitions of 4 HDDs. Test results are in MB/s, R is read, W is write. \| Direct \| RAID0 \| RAID10 f2 \| RAID10 n2 \| RAID6 ----------------+--------+-------+-----------+-----------+-------- 9011495c9466 \| R:256 \| R:313 \| R:276 \| R:313 \| R:323 (before faulty) \| W:254 \| W:253 \| W:195 \| W:204 \| W:117 ----------------+--------+-------+-----------+-----------+-------- 5ff9f19231a0 \| R:257 \| R:398 \| R:312 \| R:344 \| R:391 (faulty commit) \| W:154 \| W:122 \| W:67.7 \| W:66.6 \| W:67.2 ----------------+--------+-------+-----------+-----------+-------- 5.10.10 \| R:256 \| R:401 \| R:312 \| R:356 \| R:375 unpatched \| W:149 \| W:123 \| W:64 \| W:64.1 \| W:61.5 ----------------+--------+-------+-----------+-----------+-------- 5.10.10 \| R:255 \| R:396 \| R:312 \| R:340 \| R:393 patched \| W:247 \| W:274 \| W:220 \| W:225 \| W:121 Applying this patch doesn't hurt read performance, while improves the write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed is restored back to the state before the faulty commit, and even a bit higher in RAID tests (which aren't HDD-bound on this device) - that is likely related to other optimizations done between the faulty commit and 5.10.10 which also improved the read speed. Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Fixes: 5ff9f19231a0 ("block: simplify set_init_blocksize") Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27	Revert "PCI/ASPM: Save/restore L1SS Capability for suspend/resume"	Bjorn Helgaas
	This reverts commit 4257f7e008ea394fcecc050f1569c3503b8bcc15. Kenneth reported that after 4257f7e008ea, he sees a torrent of disk I/O errors on his NVMe device after suspend/resume until a reboot. Link: https://lore.kernel.org/linux-pci/20201228040513.GA611645@bjorn-Precision-5520/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-01-27	gpiolib: cdev: clear debounce period if line set to output	Kent Gibson
	When set_config changes a line from input to output debounce is implicitly disabled, as debounce makes no sense for outputs, but the debounce period is not being cleared and is still reported in the line info. So clear the debounce period when the debouncer is stopped in edge_detector_stop(). Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce") Cc: stable@vger.kernel.org Signed-off-by: Kent Gibson <warthog618@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-01-27	x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled	Juergen Gross
	When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT enabled as a Xen pv guest a warning is issued for each processor: [ 5.964347] ------------[ cut here ]------------ [ 5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90 [ 5.972321] Modules linked in: [ 5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.11.0-rc5-default #75 [ 5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013 [ 5.984313] RIP: e030:get_trap_addr+0x59/0x90 [ 5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48 [ 5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202 [ 5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38 [ 6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40 [ 6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000 [ 6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8 [ 6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0 [ 6.016316] FS: 0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000 [ 6.020313] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660 [ 6.028314] Call Trace: [ 6.032313] cvt_gate_to_trap.part.7+0x3f/0x90 [ 6.036313] ? asm_exc_double_fault+0x30/0x30 [ 6.040313] xen_convert_trap_info+0x87/0xd0 [ 6.044313] xen_pv_cpu_up+0x17a/0x450 [ 6.048313] bringup_cpu+0x2b/0xc0 [ 6.052313] ? cpus_read_trylock+0x50/0x50 [ 6.056313] cpuhp_invoke_callback+0x80/0x4c0 [ 6.060313] _cpu_up+0xa7/0x140 [ 6.064313] cpu_up+0x98/0xd0 [ 6.068313] bringup_nonboot_cpus+0x4f/0x60 [ 6.072313] smp_init+0x26/0x79 [ 6.076313] kernel_init_freeable+0x103/0x258 [ 6.080313] ? rest_init+0xd0/0xd0 [ 6.084313] kernel_init+0xa/0x110 [ 6.088313] ret_from_fork+0x1f/0x30 [ 6.092313] ---[ end trace be9ecf17dceeb4f3 ]--- Reason is that there is no Xen pv trap entry for X86_TRAP_VC. Fix that by adding a generic trap handler for unknown traps and wire all unknown bare metal handlers to this generic handler, which will just crash the system in case such a trap will ever happen. Fixes: 0786138c78e793 ("x86/sev-es: Add a Runtime #VC Exception Handler") Cc: <stable@vger.kernel.org> # v5.10 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-27	ACPI/IORT: Do not blindly trust DMA masks from firmware	Moritz Fischer
	Address issue observed on real world system with suboptimal IORT table where DMA masks of PCI devices would get set to 0 as result. iort_dma_setup() would query the root complex'/named component IORT entry for a DMA mask, and use that over the one the device has been configured with earlier. Ideally we want to use the minimum mask of what the IORT contains for the root complex and what the device was configured with. Fixes: 5ac65e8c8941 ("ACPI/IORT: Support address size limit for root complexes") Signed-off-by: Moritz Fischer <mdf@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20210122012419.95010-1-mdf@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-01-27	s390: uv: Fix sysfs max number of VCPUs reporting	Janosch Frank
	The number reported by the query is N-1 and I think people reading the sysfs file would expect N instead. For users creating VMs there's no actual difference because KVM's limit is currently below the UV's limit. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Fixes: a0f60f8431999 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information") Cc: stable@vger.kernel.org Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27	s390/vfio-ap: No need to disable IRQ after queue reset	Tony Krowiak
	The queues assigned to a matrix mediated device are currently reset when: * The VFIO_DEVICE_RESET ioctl is invoked * The mdev fd is closed by userspace (QEMU) * The mdev is removed from sysfs. Immediately after the reset of a queue, a call is made to disable interrupts for the queue. This is entirely unnecessary because the reset of a queue disables interrupts, so this will be removed. Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which can result in a specification exception (when the corresponding facility is not available), so this is actually a bugfix. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> [pasic@linux.ibm.com: minor rework before merging] Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel") Cc: <stable@vger.kernel.org> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27	s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated	Tony Krowiak
	The vfio_ap device driver registers a group notifier with VFIO when the file descriptor for a VFIO mediated device for a KVM guest is opened to receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM event). When the KVM pointer is set, the vfio_ap driver takes the following actions: 1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state of the mediated device. 2. Calls the kvm_get_kvm() function to increment its reference counter. 3. Sets the function pointer to the function that handles interception of the instruction that enables/disables interrupt processing. 4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to the guest. In order to avoid memory leaks, when the notifier is called to receive notification that the KVM pointer has been set to NULL, the vfio_ap device driver should reverse the actions taken when the KVM pointer was set. Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback") Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201223012013.5418-1-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27	bus: mhi: core: Add helper API to return number of free TREs	Hemant Kumar
	Introduce mhi_get_free_desc_count() API to return number of TREs available to queue buffer. MHI clients can use this API to know before hand if ring is full without calling queue API. Signed-off-by: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Jeffrey Hugo <jhugo@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/1610388462-16322-1-git-send-email-loic.poulain@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2021-01-27	net: Simplify the calculation of variables	Jiapeng Zhong
	Fix the following coccicheck warnings: ./net/ipv4/esp4_offload.c:288:32-34: WARNING !A \|\| A && B is equivalent to !A \|\| B. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-01-27	can: dev: prevent potential information leak in can_fill_info()	Dan Carpenter
	The "bec" struct isn't necessarily always initialized. For example, the mcp251xfd_get_berr_counter() function doesn't initialize anything if the interface is down. Fixes: 52c793f24054 ("can: netlink support for bus-error reporting and counters") Link: https://lore.kernel.org/r/YAkaRdRJncsJO8Ve@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: add BQL support	Marc Kleine-Budde
	This patch adds BQL support to the driver. Support for netdev_xmit_more() will be added in a separate patch series. Link: https://lore.kernel.org/r/20210114153448.1506901-7-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: add len8_dlc support	Marc Kleine-Budde
	This patch adds support for the Classical CAN raw DLC functionality to send and receive DLC values from 9 ... 15 to the mcp251xfd driver. Link: https://lore.kernel.org/r/20210114153448.1506901-6-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): don't copy data for RTR CAN ↵	Marc Kleine-Budde
	frames in TX-path In Classical CAN there are RTR frames. RTR frames have the RTR bit set, may have a dlc != 0, but contain no data. This patch optimizes the TX-path to not copy any data for RTR frames. Link: https://lore.kernel.org/r/20210114153448.1506901-5-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: mcp251xfd_hw_rx_obj_to_skb(): don't copy data for RTR CAN ↵	Marc Kleine-Budde
	frames in RX-path In Classical CAN there are RTR frames. RTR frames have the RTR bit set, may have a dlc != 0, but contain no data. This patch changes the RX-path to no copy any data for RTR frames, so that the data field in the CAN frame stays 0x0. Link: https://lore.kernel.org/r/20210114153448.1506901-4-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): clean up padding of CAN-FD frames	Marc Kleine-Budde
	CAN-FD frames have only specific frame length (0, 1, 2, 3, 4, 5, 6, 7, 8, 12, 16, 20, 24, 32, 48, 64). A CAN-FD frame provided by user space might not cover the whole CAN-FD frame. To avoid sending garbage over the CAN bus the driver pads the CAN frame with 0x0 (if MCP251XFD_SANITIZE_CAN is activated). This patch cleans up the pad len calculation. Rounding to full u32 brings no benefit, in case of CRC transfers, the hw_tx_obj->data is not aligned to u32 anyway. Link: https://lore.kernel.org/r/20210114153448.1506901-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: mcp251xfd_start_xmit(): use mcp251xfd_get_tx_free() to check ↵	Marc Kleine-Budde
	TX is is full This patch replaces an open coded check if the TX ring is full by a check if mcp251xfd_get_tx_free() returns 0. Link: https://lore.kernel.org/r/20210114153448.1506901-2-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcp251xfd: replace sizeof(u32) with val_bytes in regmap	Su Yanjun
	The sizeof(u32) is hardcoded. It's better to use the config value from the regmap. It increases the size of target object, but it's flexible when new mcp chip need other val_bytes. Link: https://lore.kernel.org/r/20210122081334.213957-1-suyanjun218@gmail.com Signed-off-by: Su Yanjun <suyanjun218@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: mcba_usb: remove h from printk format specifier	Tom Rix
	This change fixes the checkpatch warning described in this commit commit cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. Link: https://lore.kernel.org/r/20210124150916.1920434-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: length: can_fd_len2dlc(): make legnth calculation readable again	Marc Kleine-Budde
	In commit 652562e5ff06 ("can: length: can_fd_len2dlc(): simplify length calculcation") the readability of the code degraded and became more error prone. To counteract this, partially convert that patch and replace open coded values (of the original code) with proper defines. Fixes: 652562e5ff06 ("can: length: can_fd_len2dlc(): simplify length calculcation") Cc: Vincent MAILHOL <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20210118201346.79422-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: dev: export can_get_state_str() function	Vincent Mailhol
	The can_get_state_str() function is also relevant to the drivers. Export the symbol and make it visible in the can/dev.h header. Link: https://lore.kernel.org/r/20210119170355.12040-1-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: flexcan: fix typos	Marc Kleine-Budde
	This patch fixes two typos found by codespell. Fixes: 812f0116c66a ("can: flexcan: add CAN wakeup function for i.MX8QM") Link: https://lore.kernel.org/r/20210127085529.2768537-2-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	can: gw: fix typo	Marc Kleine-Budde
	This patch fixes a typo found by codespell. Fixes: 94c23097f991 ("can: gw: support modification of Classical CAN DLCs") Link: https://lore.kernel.org/r/20210127085529.2768537-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-27	rtc: mc146818: Detect and handle broken RTCs	Thomas Gleixner
	The recent fix for handling the UIP bit unearthed another issue in the RTC code. If the RTC is advertised but the readout is straight 0xFF because it's not available, the old code just proceeded with crappy values, but the new code hangs because it waits for the UIP bit to become low. Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID register (Register D) which should have bit 0-6 cleared. If that's not the case then fail to register the CMOS. Add the same check to mc146818_get_time(), warn once when the condition is true and invalidate the rtc_time data. Reported-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mickaël Salaün <mic@linux.microsoft.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/87tur3fx7w.fsf@nanos.tec.linutronix.de
2021-01-27	xen: Fix XenStore initialisation for XS_LOCAL	David Woodhouse
	In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI") I reworked the triggering of xenbus_probe(). I tried to simplify things by taking out the workqueue based startup triggered from wake_waiting(); the somewhat poorly named xenbus IRQ handler. I missed the fact that in the XS_LOCAL case (Dom0 starting its own xenstored or xenstore-stubdom, which happens after the kernel is booted completely), that IRQ-based trigger is still actually needed. So... put it back, except more cleanly. By just spawning a xenbus_probe thread which waits on xb_waitq and runs the probe the first time it gets woken, just as the workqueue-based hack did. This is actually a nicer approach for all the back ends with different interrupt methods, and we can switch them all over to that without the complex conditions for when to trigger it. But not in -rc6. This is the minimal fix for the regression, although it's a step in the right direction instead of doing a partial revert and actually putting the workqueue back. It's also simpler than the workqueue. Fixes: 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI") Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/4c9af052a6e0f6485d1de43f2c38b1461996db99.camel@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-26	net: allow user to set metric on default route learned via Router Advertisement	Praveen Chaudhary
	For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This patch adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces: auto eth0 iface eth0 inet dhcp metric 4261413864 IPv4 Kernel Route Table: $ ip route list default via 172.21.47.1 dev eth0 metric 4261413864 FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.] Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m i.e. User can prefer Default Router learned via Routing Protocol in IPv4. Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705 IP monitor: [When IPv6 RA is received] default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high Kernel IPv6 routing table $ ip -6 route list default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.] Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m If the metric is changed later, the effect will be seen only when next IPv6 RA is received, because the default route must be fully controlled by RA msg. Below metric is changed from 1996489705 to 1996489704. $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704 net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704 IP monitor: [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric] Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	io_uring: fix wqe->lock/completion_lock deadlock	Pavel Begunkov
	Joseph reports following deadlock: CPU0: ... io_kill_linked_timeout // &ctx->completion_lock io_commit_cqring __io_queue_deferred __io_queue_async_work io_wq_enqueue io_wqe_enqueue // &wqe->lock CPU1: ... __io_uring_files_cancel io_wq_cancel_cb io_wqe_cancel_pending_work // &wqe->lock io_cancel_task_cb // &ctx->completion_lock Only __io_queue_deferred() calls queue_async_work() while holding ctx->completion_lock, enqueue drained requests via io_req_task_queue() instead. Cc: stable@vger.kernel.org # 5.9+ Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26	Merge branch 'net-fec-fix-temporary-rmii-clock-reset-on-link-up'	Jakub Kicinski
	Laurent Badel says: ==================== net: fec: Fix temporary RMII clock reset on link up v2: fixed a compilation warning The FEC drivers performs a "hardware reset" of the MAC module when the link is reported to be up. This causes a short glitch in the RMII clock due to the hardware reset clearing the receive control register which controls the MII mode. It seems that some link partners do not tolerate this glitch, and invalidate the link, which leads to a never-ending loop of negotiation-link up-link down events. This was observed with the iMX28 Soc and LAN8720/LAN8742 PHYs, with two Intel adapters I218-LM and X722-DA2 as link partners, though a number of other link partners do not seem to mind the clock glitch. Changing the hardware reset to a software reset (clearing bit 1 of the ECR register) cured the issue. Attempts to optimize fec_restart() in order to minimize the duration of the glitch were unsuccessful. Furthermore manually producing the glitch by setting MII mode and then back to RMII in two consecutive instructions, resulting in a clock glitch <10us in duration, was enough to cause the partner to invalidate the link. This strongly suggests that the root cause of the link being dropped is indeed the change in clock frequency. In an effort to minimize changes to driver, the patch proposes to use soft reset only for tested SoCs (iMX28) and only if the link is up. This preserves hardware reset in other situations, which might be required for proper setup of the MAC. ==================== Link: https://lore.kernel.org/r/20210125100745.5090-1-laurentbadel@eaton.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: fec: Fix temporary RMII clock reset on link up	Laurent Badel
	fec_restart() does a hard reset of the MAC module when the link status changes to up. This temporarily resets the R_CNTRL register which controls the MII mode of the ENET_OUT clock. In the case of RMII, the clock frequency momentarily drops from 50MHz to 25MHz until the register is reconfigured. Some link partners do not tolerate this glitch and invalidate the link causing failure to establish a stable link when using PHY polling mode. Since as per IEEE802.3 the criteria for link validity are PHY-specific, what the partner should tolerate cannot be assumed, so avoid resetting the MII clock by using software reset instead of hardware reset when the link is up. This is generally relevant only if the SoC provides the clock to an external PHY and the PHY is configured for RMII. Signed-off-by: Laurent Badel <laurentbadel@eaton.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	Merge branch 'net-usbnet-convert-to-new-tasklet-api'	Jakub Kicinski
	Emil Renner Berthing says: ==================== net: usbnet: convert to new tasklet API This converts the usbnet driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") It is split into two commits for ease of reviewing. ==================== Link: https://lore.kernel.org/r/20210123173221.5855-1-esmil@mailme.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: usbnet: use new tasklet API	Emil Renner Berthing
	This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: usbnet: initialize tasklet using tasklet_init	Emil Renner Berthing
	Initialize tasklet using tasklet_init() rather than open-coding it. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	Merge branch 'net-dsa-mv88e6xxx-remove-some-6250-specific-methods'	Jakub Kicinski
	Rasmus Villemoes says: ==================== net: dsa: mv88e6xxx: remove some 6250-specific methods v2: - resend now that the bug-fix patch (87fe04367d84, "net: dsa: mv88e6xxx: also read STU state in mv88e6250_g1_vtu_getnext") is in net and also merged to net-next. - include various tags in patch 1. - add second similar patch for loadpurge. ==================== Link: https://lore.kernel.org/r/20210125150449.115032-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: dsa: mv88e6xxx: use mv88e6185_g1_vtu_loadpurge() for the 6250	Rasmus Villemoes
	Apart from the mask used to get the high bits of the fid, mv88e6185_g1_vtu_loadpurge() and mv88e6250_g1_vtu_loadpurge() are identical. Since the entry->fid passed in should never exceed the number of databases, we can simply use the former as-is as replacement for the latter. Suggested-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: dsa: mv88e6xxx: use mv88e6185_g1_vtu_getnext() for the 6250	Rasmus Villemoes
	mv88e6250_g1_vtu_getnext is almost identical to mv88e6185_g1_vtu_getnext, except for the 6250 only having 64 databases instead of 256. We can reduce code duplication by simply masking off the extra two garbage bits when assembling the fid from VTU op [3:0] and [11:8]. Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: lapb: Add locking to the lapb module	Xie He
	In the lapb module, the timers may run concurrently with other code in this module, and there is currently no locking to prevent the code from racing on "struct lapb_cb". This patch adds locking to prevent racing. 1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and "spin_unlock_bh" to APIs, timer functions and notifier functions. 2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us able to ask running timers to abort; Modify "lapb_stop_t1timer" and "lapb_stop_t2timer" to make them able to abort running timers; Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer", and "lapb_start_t1timer", "lapb_start_t2timer". 3. Let lapb_unregister wait for other API functions and running timers to stop. 4. The lapb_device_event function calls lapb_disconnect_request. In order to avoid trying to hold the lock twice, add a new function named "__lapb_disconnect_request" which assumes the lock is held, and make it called by lapb_disconnect_request and lapb_device_event. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20210126040939.69995-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	selftests: add IPv4 unicast extensions tests	Seth David Schoen
	Add selftests for kernel behavior with regard to various classes of unallocated/reserved IPv4 addresses, checking whether or not these addresses can be assigned as unicast addresses on links and used in routing. Expect the current kernel behavior at the time of this patch. That is: * 0/8 and 240/4 may be used as unicast, with the exceptions of 0.0.0.0 and 255.255.255.255; * the lowest address in a subnet may only be used as a broadcast address; * 127/8 may not be used as unicast (the route_localnet option, which is disabled by default, still leaves it treated slightly specially); * 224/4 may not be used as unicast. Signed-off-by: Seth David Schoen <schoen@loyalty.org> Suggested-by: John Gilmore <gnu@toad.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210126040834.GR24989@frotz.zork.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	bonding: add TLS dependency	Arnd Bergmann
	When TLS is a module, the built-in bonding driver may cause a link error: x86_64-linux-ld: drivers/net/bonding/bond_main.o: in function `bond_start_xmit': bond_main.c:(.text+0xc451): undefined reference to `tls_validate_xmit_skb' Add a dependency to avoid the problem. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210125113209.2248522-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	bnxt_en: Convert to use netif_level() helpers.	Michael Chan
	Use the various netif_level() helpers to simplify the C code. This was suggested by Joe Perches. Cc: Joe Perches <joe@perches.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1611642024-3166-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	ARM: zImage: atags_to_fdt: Fix node names on added root nodes	Rob Herring
	Commit 7536c7e03e74 ("of/fdt: Remove redundant kbasename function call") exposed a bug creating DT nodes in the ATAGS to DT fixup code. Non-existent nodes would mistaken get created with a leading '/'. The problem was fdt_path_offset() takes a full path while creating a node with fdt_add_subnode() takes just the basename. Since this we only add root child nodes, we can just skip over the '/'. Fixes: 7536c7e03e74 ("of/fdt: Remove redundant kbasename function call") Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Cc: Qi Zheng <arch0.zheng@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Link: https://lore.kernel.org/r/20210126023905.1631161-1-robh@kernel.org
2021-01-26	team: protect features update by RCU to avoid deadlock	Ivan Vecera
	Function __team_compute_features() is protected by team->lock mutex when it is called from team_compute_features() used when features of an underlying device is changed. This causes a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device is fired due to change propagated from team driver (e.g. MTU change). It's because callbacks like team_change_mtu() or team_vlan_rx_{add,del}_vid() protect their port list traversal by team->lock mutex. Example (r8169 case where this driver disables TSO for certain MTU values): ... [ 6391.348202] __mutex_lock.isra.6+0x2d0/0x4a0 [ 6391.358602] team_device_event+0x9d/0x160 [team] [ 6391.363756] notifier_call_chain+0x47/0x70 [ 6391.368329] netdev_update_features+0x56/0x60 [ 6391.373207] rtl8169_change_mtu+0x14/0x50 [r8169] [ 6391.378457] dev_set_mtu_ext+0xe1/0x1d0 [ 6391.387022] dev_set_mtu+0x52/0x90 [ 6391.390820] team_change_mtu+0x64/0xf0 [team] [ 6391.395683] dev_set_mtu_ext+0xe1/0x1d0 [ 6391.399963] do_setlink+0x231/0xf50 ... In fact team_compute_features() called from team_device_event() does not need to be protected by team->lock mutex and rcu_read_lock() is sufficient there for port list traversal. Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20210125074416.4056484-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	usbnet: fix the indentation of one code snippet	Dongliang Mu
	Every line of code should start with tab (8 characters) Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210123051102.1091541-1-mudongliangabcd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers	Jakub Kicinski
	David has been the de-facto maintainer for much of the IP code for the last couple of years, let's make it official. Link: https://lore.kernel.org/r/20210122173220.3579491-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation	Nikolay Aleksandrov
	Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: baa74d39ca39 ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26	net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable	Paul Blakey
	If a non nat tuple entry is inserted just to the regular tuples rhashtable (ct_tuples_ht) and not to natted tuples rhashtable (ct_nat_tuples_ht). Commit bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") mixed up the return labels and names sot that on cleanup or failure we still try to remove for the natted tuples rhashtable. Fix that by correctly checking if a natted tuples insertion before removing it. While here make it more readable. Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-26	net/mlx5e: Revert parameters on errors when changing MTU and LRO state ↵	Maxim Mikityanskiy
	without reset Sometimes, channel params are changed without recreating the channels. It happens in two basic cases: when the channels are closed, and when the parameter being changed doesn't affect how channels are configured. Such changes invoke a hardware command that might fail. The whole operation should be reverted in such cases, but the code that restores the parameters' values in the driver was missing. This commit adds this handling. Fixes: 2e20a151205b ("net/mlx5e: Fail safe mtu and lro setting") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-26	net/mlx5e: Revert parameters on errors when changing trust state without reset	Maxim Mikityanskiy
	Trust state may be changed without recreating the channels. It happens when the channels are closed, and when channel parameters (min inline mode) stay the same after changing the trust state. Changing the trust state is a hardware command that may fail. The current code didn't restore the channel parameters to their old values if an error happened and the channels were closed. This commit adds handling for this case. Fixes: 6e0504c69811 ("net/mlx5e: Change inline mode correctly when changing trust state") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-26	net/mlx5e: Correctly handle changing the number of queues when the interface ↵	Maxim Mikityanskiy
	is down This commit addresses two issues related to changing the number of queues when the channels are closed: 1. Missing call to mlx5e_num_channels_changed to update real_num_tx_queues when the number of TCs is changed. 2. When mlx5e_num_channels_changed returns an error, the channel parameters must be reverted. Two Fixes: tags correspond to the first commits where these two issues were introduced. Fixes: 3909a12e7913 ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases") Fixes: fa3748775b92 ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-26	net/mlx5e: Fix CT rule + encap slow path offload and deletion	Paul Blakey
	Currently, if a neighbour isn't valid when offloading tunnel encap rules, we offload the original match and replace the original action with "goto slow path" action. For this we use a temporary flow attribute based on the original flow attribute and then change the action. Flow flags, which among those is the CT flag, are still shared for the slow path rule offload, so we end up parsing this flow as a CT + goto slow path rule. Besides being unnecessary, CT action offload saves extra information in the passed flow attribute, such as created ct_flow and mod_hdr, which is lost onces the temporary flow attribute is freed. When a neigh is updated and is valid, we offload the original CT rule with original CT action, which again creates a ct_flow and mod_hdr and saves it in the flow's original attribute. Then we delete the slow path rule with a temporary flow attribute based on original updated flow attribute, and we free the relevant ct_flow and mod_hdr. Then when tc deletes this flow, we try to free the ct_flow and mod_hdr on the flow's attribute again. To fix the issue, skip all furture proccesing (CT/Sample/Split rules) in offload/unoffload of slow path rules. Call trace: [ 758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218 [ 758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O) [ 758.964225] ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E) [ 759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G OE 5.4.60-mlnx.52.gde81e85 #1 [ 759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan 4 2021 [ 759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 759.207344] pstate: a0000005 (NzCv daif -PAN -UAO) [ 759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core] [ 759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core] [ 759.405858] Call trace: [ 759.410804] mlx5_del_flow_rules+0x5c/0x160 [mlx5_core] [ 759.421337] __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core] [ 759.433963] mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core] [ 759.446942] mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core] [ 759.457821] mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core] [ 759.469051] mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core] [ 759.481325] mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core] [ 759.492901] mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core] [ 759.504127] mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core] [ 759.515314] process_one_work+0x178/0x400 [ 759.523350] worker_thread+0x58/0x3e8 [ 759.530685] kthread+0x100/0x12c [ 759.537152] ret_from_fork+0x10/0x18 [ 759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3) [ 759.556548] ---[ end trace fab818bb1085832d ]--- Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>