summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-30can: mcp251xfd: rename all user facing strings to mcp251xfdMarc Kleine-Budde
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver, which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming, but incompatible chips. In the previous patch the autodetect DT compatbile has been renamed to "microchip,mcp251xfd", this patch changes all user facing strings from "mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD", including: - kconfig symbols - name of kernel module - DT and SPI compatible [1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Link: https://lore.kernel.org/r/20200930091424.792165-9-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp251xfd: rename driver files and subdir to mcp251xfdMarc Kleine-Budde
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver, which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming, but incompatible chips. In the previous patch the autodetect DT compatbile has been renamed to "microchip,mcp251xfd", this patch changes the name of the driver subdir and the individual files accordinly. [1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Link: https://lore.kernel.org/r/20200930091424.792165-8-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30selftests/bpf: Test "incremental" btf_dump in C formatAndrii Nakryiko
Add test validating that btf_dump works fine with BTFs that are modified and incrementally generated. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929232843.1249318-5-andriin@fb.com
2020-09-30libbpf: Make btf_dump work with modifiable BTFAndrii Nakryiko
Ensure that btf_dump can accommodate new BTF types being appended to BTF instance after struct btf_dump was created. This came up during attemp to use btf_dump for raw type dumping in selftests, but given changes are not excessive, it's good to not have any gotchas in API usage, so I decided to support such use case in general. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
2020-09-30Merge branch 'Various BPF helper improvements'Alexei Starovoitov
Daniel Borkmann says: ==================== This series adds two BPF helpers, that is, one for retrieving the classid of an skb and another one to redirect via the neigh subsystem, and improves also the cookie helpers by removing the atomic counter. I've also added the bpf_tail_call_static() helper to the libbpf API that we've been using in Cilium for a while now, and last but not least the series adds a few selftests. For details, please check individual patches, thanks! v3 -> v4: - Removed out_rec error path (Martin) - Integrate BPF_F_NEIGH flag into rejecting invalid flags (Martin) - I think this way it's better to avoid bit overlaps given it's right in the place that would need to be extended on new flags v2 -> v3: - Removed double skb->dev = dev assignment (David) - Added headroom check for v6 path (David) - Set set flowi4_proto for ip_route_output_flow (David) - Rebased onto latest bpf-next v1 -> v2: - Rework cookie generator to support nested contexts (Eric) - Use ip_neigh_gw6() and container_of() (David) - Rename __throw_build_bug() and improve comments (Andrii) - Use bpf_tail_call_static() also in BPF samples (Maciej) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30bpf, selftests: Add redirect_neigh selftestDaniel Borkmann
Add a small test that exercises the new redirect_neigh() helper for the IPv4 and IPv6 case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/0fc7d9c5f9a6cc1c65b0d3be83b44b1ec9889f43.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, selftests: Use bpf_tail_call_static where appropriateDaniel Borkmann
For those locations where we use an immediate tail call map index use the newly added bpf_tail_call_static() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, libbpf: Add bpf_tail_call_static helper for bpf programsDaniel Borkmann
Port of tail_call_static() helper function from Cilium's BPF code base [0] to libbpf, so others can easily consume it as well. We've been using this in production code for some time now. The main idea is that we guarantee that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the JITed BPF insns with direct jumps instead of having to fall back to using expensive retpolines. By using inline asm, we guarantee that the compiler won't merge the call from different paths with potentially different content of r2/r3. We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable()) in different places as a neat trick to trigger compilation errors when compiler does not remove code at compilation time. This works for the BPF back end as it does not implement the __builtin_trap(). [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add redirect_neigh helper as redirect drop-inDaniel Borkmann
Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add classid helper only based on skb->skDaniel Borkmann
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-30RISC-V: Check clint_time_val before useAnup Patel
The NoMMU kernel is broken for QEMU virt machine from Linux-5.9-rc6 because clint_time_val is used even before CLINT driver is probed at following places: 1. rand_initialize() calls get_cycles() which in-turn uses clint_time_val 2. boot_init_stack_canary() calls get_cycles() which in-turn uses clint_time_val The issue#1 (above) is fixed by providing custom random_get_entropy() for RISC-V NoMMU kernel. For issue#2 (above), we remove dependency of boot_init_stack_canary() on get_cycles() and this is aligned with the boot_init_stack_canary() implementations of ARM, ARM64 and MIPS kernel. Fixes: d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems") Signed-off-by: Anup Patel <anup.patel@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-30btrfs: fix filesystem corruption after a device replaceFilipe Manana
We use a device's allocation state tree to track ranges in a device used for allocated chunks, and we set ranges in this tree when allocating a new chunk. However after a device replace operation, we were not setting the allocated ranges in the new device's allocation state tree, so that tree is empty after a device replace. This means that a fitrim operation after a device replace will trim the device ranges that have allocated chunks and extents, as we trim every range for which there is not a range marked in the device's allocation state tree. It is also important during chunk allocation, since the device's allocation state is used to determine if a range is already allocated when allocating a new chunk. This is trivial to reproduce and the following script triggers the bug: $ cat reproducer.sh #!/bin/bash DEV1="/dev/sdg" DEV2="/dev/sdh" DEV3="/dev/sdi" wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null # Create a raid1 test fs on 2 devices. mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null mount $DEV1 /mnt/btrfs xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo echo "Starting to replace $DEV1 with $DEV3" btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs echo echo "Running fstrim" fstrim /mnt/btrfs echo echo "Unmounting filesystem" umount /mnt/btrfs echo "Mounting filesystem in degraded mode using $DEV3 only" wipefs -a $DEV1 $DEV2 &> /dev/null mount -o degraded $DEV3 /mnt/btrfs if [ $? -ne 0 ]; then dmesg | tail echo echo "Failed to mount in degraded mode" exit 1 fi echo echo "File foo data (expected all bytes = 0xab):" od -A d -t x1 /mnt/btrfs/foo umount /mnt/btrfs When running the reproducer: $ ./replace-test.sh wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec) Starting to replace /dev/sdg with /dev/sdi Running fstrim Unmounting filesystem Mounting filesystem in degraded mode using /dev/sdi only mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error. [19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started [19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished [19582.208293] BTRFS info (device sdi): allowing degraded mounts [19582.208298] BTRFS info (device sdi): disk space caching is enabled [19582.208301] BTRFS info (device sdi): has skinny extents [19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing [19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed [19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0 [19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5 [19582.231576] BTRFS error (device sdi): open_ctree failed Failed to mount in degraded mode So fix by setting all allocated ranges in the replace target device when the replace operation is finishing, when we are holding the chunk mutex and we can not race with new chunk allocations. A test case for fstests follows soon. Fixes: 1c11b63eff2a67 ("btrfs: replace pending/pinned chunks lists with io tree") CC: stable@vger.kernel.org # 5.2+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locksJosef Bacik
When closing and freeing the source device we could end up doing our final blkdev_put() on the bdev, which will grab the bd_mutex. As such we want to be holding as few locks as possible, so move this call outside of the dev_replace->lock_finishing_cancel_unmount lock. Since we're modifying the fs_devices we need to make sure we're holding the uuid_mutex here, so take that as well. There's a report from syzbot probably hitting one of the cases where the bd_mutex and device_list_mutex are taken in the wrong order, however it's not with device replace, like this patch fixes. As there's no reproducer available so far, we can't verify the fix. https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/ dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419 WARNING: possible circular locking dependency detected 5.9.0-rc5-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/6878 is trying to acquire lock: ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804 but task is already holding lock: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline] find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #3 (sb_internal#2){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_intwrite include/linux/fs.h:1690 [inline] start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline] find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __flush_work+0x60e/0xac0 kernel/workqueue.c:3041 wb_shutdown+0x180/0x220 mm/backing-dev.c:355 bdi_unregister+0x174/0x590 mm/backing-dev.c:872 del_gendisk+0x820/0xa10 block/genhd.c:933 loop_remove drivers/block/loop.c:2192 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lo_open+0x19/0xd0 drivers/block/loop.c:1893 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507 blkdev_get fs/block_dev.c:1639 [inline] blkdev_open+0x227/0x300 fs/block_dev.c:1753 do_dentry_open+0x4b9/0x11b0 fs/open.c:817 do_open fs/namei.c:3251 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3368 do_filp_open+0x17e/0x3c0 fs/namei.c:3395 do_sys_openat2+0x16d/0x420 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_sys_open fs/open.c:1192 [inline] __se_sys_open fs/open.c:1188 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1188 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(sb_internal#2); lock(&fs_devs->device_list_mutex); lock(&bdev->bd_mutex); *** DEADLOCK *** 3 locks held by syz-executor.0/6878: #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365 #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178 #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 stack backtrace: CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x460027 RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027 RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0 RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460 R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460 Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add syzbot reference ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30ARM: imx6q: Fixup RCU usage for cpuidleUlf Hansson
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to expose us for tracepoints. For imx6q cpuidle, this leads to an RCU splat according to below. [6.870684] [<c0db7690>] (_raw_spin_lock) from [<c011f6a4>] (imx6q_enter_wait+0x18/0x9c) [6.878846] [<c011f6a4>] (imx6q_enter_wait) from [<c09abfb0>] (cpuidle_enter_state+0x168/0x5e4) To fix the problem, let's assign the corresponding idlestate->flags the CPUIDLE_FLAG_RCU_IDLE bit, which enables us to call rcu_idle_enter|exit() at the proper point. Reported-by: Dong Aisheng <aisheng.dong@nxp.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30Documentation: PM: Fix a reStructuredText syntax errorYoann Congal
Fix a reStructuredText syntax error in the cpuidle PM admin-guide documentation: the ``...'' quotation marks are parsed as partial ''...'' reStructuredText markup and break the output formatting. This change them to "...". Signed-off-by: Yoann Congal <yoann.congal@smile.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30cpufreq: intel_pstate: Fix missing return statementZhang Rui
Fix missing return statement when writing "off" to intel_pstate status sysfs I/F. Fixes: 55671ea3257a ("cpufreq: intel_pstate: Free memory only when turning off") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30bpf: fix raw_tp test run in preempt kernelSong Liu
In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers: [ 35.874974] BUG: using smp_processor_id() in preemptible [00000000] code: new_name/87 [ 35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1 [ 35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 35.916941] Call Trace: [ 35.919660] dump_stack+0x77/0x9b [ 35.923273] check_preemption_disabled+0xb4/0xc0 [ 35.928376] bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.933872] ? selinux_bpf+0xd/0x70 [ 35.937532] __do_sys_bpf+0x6bb/0x21e0 [ 35.941570] ? find_held_lock+0x2d/0x90 [ 35.945687] ? vfs_write+0x150/0x220 [ 35.949586] do_syscall_64+0x2d/0x40 [ 35.953443] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling migrate_disable() before smp_processor_id(). Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30ice: preserve NVM capabilities in safe modeJacob Keller
If the driver initializes in safe mode, it will call ice_set_safe_mode_caps. This results in clearing the capabilities structures, in order to set them up for operating in safe mode, ensuring many features are disabled. This has a side effect of also clearing the capability bits that relate to NVM update. The result is that the device driver will not indicate support for unified update, even if the firmware is capable. Fix this by adding the relevant capability fields to the list of values we preserve. To simplify the code, use a common_cap structure instead of a handful of local variables. To reduce some duplication of the capability name, introduce a couple of macros used to restore the capabilities values from the cached copy. Fixes: de9b277ee032 ("ice: Add support for unified NVM update flow capability") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Brijesh Behera <brijeshx.behera@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-30ice: increase maximum wait time for flash write commandsJacob Keller
The ice driver needs to wait for a firmware response to each command to write a block of data to the scratch area used to update the device firmware. The driver currently waits for up to 1 second for this to be returned. It turns out that firmware might take longer than 1 second to return a completion in some cases. If this happens, the flash update will fail to complete. Fix this by increasing the maximum time that the driver will wait for both writing a block of data, and for activating the new NVM bank. The timeout for an erase command is already several minutes, as the firmware had to erase the entire bank which was already expected to take a minute or more in the worst case. In the case where firmware really won't respond, we will now take longer to fail. However, this ensures that if the firmware is simply slow to respond, the flash update can still complete. This new maximum timeout should not adversely increase the update time, as the implementation for wait_event_interruptible_timeout, and should wake very soon after we get a completion event. It is better for a flash update be slow but still succeed than to fail because we gave up too quickly. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Brijesh Behera <brijeshx.behera@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-30vhost vdpa: fix vhost_vdpa_open error handlingMike Christie
We must free the vqs array in the open failure path, because vhost_vdpa_release will not be called. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1600712588-9514-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-09-30drm: drm_dsc.h: fix a kernel-doc markupMauro Carvalho Chehab
As warned by Sphinx: ./Documentation/gpu/drm-kms-helpers:305: ./include/drm/drm_dsc.h:587: WARNING: Unparseable C cross-reference: 'struct' Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 6] struct ------^ The markup for one struct is wrong, as struct is used twice. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/3d467022325e15bba8dcb13da8fb730099303266.1601467849.git.mchehab+huawei@kernel.org
2020-09-30Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver"Peter Collingbourne
Also partially revert the follow-up change "drm: pl111: Absorb the external register header". This reverts the parts of commits 7e4e589db76a3cf4c1f534eb5a09cc6422766b93 and 0fb8125635e8eb5483fb095f98dcf0651206a7b8 that touch paths outside of drivers/gpu/drm/pl111. The fbdev driver is used by Android's FVP configuration. Using the DRM driver together with DRM's fbdev emulation results in a failure to boot Android. The root cause is that Android's generic fbdev userspace driver relies on the ability to set the pixel format via FBIOPUT_VSCREENINFO, which is not supported by fbdev emulation. There have been other less critical behavioral differences identified between the fbdev driver and the DRM driver with fbdev emulation. The DRM driver exposes different values for the panel's width, height and refresh rate, and the DRM driver fails a FBIOPUT_VSCREENINFO syscall with yres_virtual greater than the maximum supported value instead of letting the syscall succeed and setting yres_virtual based on yres. Signed-off-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200929195344.2219796-1-pcc@google.com
2020-09-30drm/amdgpu: disable gfxoff temporarily for navy_flounderJiansong Chen
gfxoff is temporarily disabled for navy_flounder, since at present the feature caused some tdr when performing display operations. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-30drm/amd/pm: setup APU dpm clock table in SMU HW initializationEvan Quan
As the dpm clock table is needed during DC HW initialization. And that (DC HW initialization) comes before smu_late_init() where current APU dpm clock table setup is performed. So, NULL pointer dereference will be triggered. By moving APU dpm clock table setup to smu_hw_init(), this can be avoided. Fixes: 02cf91c113ea ("drm/amd/powerplay: postpone operations not required for hw setup to late_init") Acked-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Evan Quan <evan.quan@amd.com> Reported-by: Dirk Gouders <dirk@gouders.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-30netfilter: nf_tables: add userdata attributes to nft_chainJose M. Guisado Gomez
Enables storing userdata for nft_chain. Field udata points to user data and udlen stores its length. Adds new attribute flag NFTA_CHAIN_USERDATA. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-30netfilter: nf_tables: use nla_memdup to copy udataJose M. Guisado Gomez
When userdata support was added to tables and objects, user data coming from user space was allocated and copied using kzalloc + nla_memcpy. Use nla_memdup to copy userdata of tables and objects. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-30netfilter: nf_tables: fix userdata memleakJose M. Guisado Gomez
When userdata was introduced for tables and objects its allocation was only freed inside the error path of the new{table, object} functions. Free user data inside corresponding destroy functions for tables and objects. Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object") Fixes: 7a81575b806e ("netfilter: nf_tables: add userdata attributes to nft_table") Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-30Merge tag 'gpio-fixes-for-v5.9' of ↵Linus Walleij
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes gpio fixes for v5.9 - correct logic of GPIO_LINE_DIRECTION in gpio-amd-fch
2020-09-30can: mcp25xxfd: narrow down wildcards in device tree bindings to ↵Thomas Kopp
"microchip,mcp251xfd" The wildcard should be narrowed down to prevent existing and future devices that are not compatible from matching. It is very unlikely that incompatible devices will be released that do not match the wildcard. Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com> Link: https://lore.kernel.org/r/20200930091423.755-1-thomas.kopp@microchip.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30dt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to ↵Thomas Kopp
"microchip,mcp251xfd" The wildcard should be narrowed down to prevent existing and future devices that are not compatible from matching. It is very unlikely that incompatible devices will be released that do not match the wildcard. This is the documentation part of the commit. Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com> Link: https://lore.kernel.org/r/20200930091423.755-2-thomas.kopp@microchip.com [mkl: rename file, too] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-29 The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 14 day(s) which contain a total of 7 files changed, 28 insertions(+), 8 deletions(-). The main changes are: 1) fix xdp loading regression in libbpf for old kernels, from Andrii. 2) Do not discard packet when NETDEV_TX_BUSY, from Magnus. 3) Fix corner cases in libbpf related to endianness and kconfig, from Tony. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30dt-binding: can: mcp25xxfd: documentation fixesOleksij Rempel
Apply following fixes: - Use 'interrupts'. (interrupts-extended will automagically be supported by the tools) - *-supply is always a single item. So, drop maxItems=1 - add "additionalProperties: false" flag to detect unneeded properties. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200923125301.27200-1-o.rempel@pengutronix.de Reported-by: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Fixes: 1b5a78e69c1f ("dt-binding: can: mcp25xxfd: document device tree bindings") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable ↵Marc Kleine-Budde
set_normal mode This patch fixes the following warning: drivers/net/can/spi/mcp25xxfd/mcp25xxfd-core.c:2155 mcp25xxfd_irq() error: uninitialized symbol 'set_normal_mode'. by adding the missing initialization. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Link: https://lore.kernel.org/r/20200923114726.2704426-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanupDan Carpenter
This loop doesn't free the first element of the array. The "i > 0" has to be changed to "i >= 0". Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200923112752.GA1473821@mwanda Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata informationThomas Kopp
This patch adds a reference to the recent released MCP2517FD and MCP2518FD errata sheets and paste the explanation. The driver already implements the proposed fix. Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com> Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com [mkl: split into two patches, adjust subject and commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update ↵Thomas Kopp
log messages This patch adds a reference to the recent released MCP2517FD and MCP2518FD errata sheets and paste the explanation. The single error correction does not always work, so always indicate that a single error occurred. If the location of the ECC error is outside of the TX-RAM always use netdev_notice() to log the problem. For ECC errors in the TX-RAM, there is a recovery procedure. Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com> Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com [mkl: split into two patches, adjust subject and commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-29clocksource: clint: Export clint_time_val for modulesPalmer Dabbelt
clint_time_val will soon be used by the RISC-V implementation of random_get_entropy(), which is a static inline function that may be used by modules (at least CRYPTO_JITTERENTROPY=m). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-30Merge branch 'vmwgfx-fixes-5.9' of ↵Dave Airlie
git://people.freedesktop.org/~sroland/linux into drm-fixes One vmwgfx regression fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: "Roland Scheidegger (VMware)" <rscheidegger.oss@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930041000.2423-1-rscheidegger.oss@gmail.com
2020-09-30drm/vmwgfx: Fix error handling in get_nodeZack Rusin
ttm_mem_type_manager_func.get_node was changed to return -ENOSPC instead of setting the node pointer to NULL. Unfortunately vmwgfx still had two places where it was explicitly converting -ENOSPC to 0 causing regressions. This fixes those spots by allowing -ENOSPC to be returned. That seems to fix recent regressions with vmwgfx. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Sigend-off-by: Roland Scheidegger <sroland@vmware.com>
2020-09-29scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()Mark Mielke
The kernel may fail to boot or devices may fail to come up when initializing iscsi_tcp devices starting with Linux 5.8. Commit a79af8a64d39 ("[SCSI] iscsi_tcp: use iscsi_conn_get_addr_param libiscsi function") introduced getpeername() within the session spinlock. Commit 1b66d253610c ("bpf: Add get{peer, sock}name attach types for sock_addr") introduced BPF_CGROUP_RUN_SA_PROG_LOCK() within getpeername(), which acquires a mutex and when used from iscsi_tcp devices can now lead to "BUG: scheduling while atomic:" and subsequent damage. Ensure that the spinlock is released before calling getpeername() or getsockname(). sock_hold() and sock_put() are used to ensure that the socket reference is preserved until after the getpeername() or getsockname() complete. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1877345 Link: https://lkml.org/lkml/2020/7/28/1085 Link: https://lkml.org/lkml/2020/8/31/459 Link: https://lore.kernel.org/r/20200928043329.606781-1-mark.mielke@gmail.com Fixes: a79af8a64d39 ("[SCSI] iscsi_tcp: use iscsi_conn_get_addr_param libiscsi function") Fixes: 1b66d253610c ("bpf: Add get{peer, sock}name attach types for sock_addr") Cc: stable@vger.kernel.org Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mark Mielke <mark.mielke@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-29Merge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'David S. Miller
Vladimir Oltean says: ==================== HW support for VCAP IS1 and ES0 in mscc_ocelot The patches from RFC series "Offload tc-flower to mscc_ocelot switch using VCAP chains" have been split into 2: https://patchwork.ozlabs.org/project/netdev/list/?series=204810&state=* This is the boring part, that deals with the prerequisites, and not with tc-flower integration. Apart from the initialization of some hardware blocks, which at this point still don't do anything, no new functionality is introduced. - Key and action field offsets are defined for the supported switches. - VCAP properties are added to the driver for the new TCAM blocks. But instead of adding them manually as was done for IS2, which is error prone, the driver is refactored to read these parameters from hardware, which is possible. - Some improvements regarding the processing of struct ocelot_vcap_filter. - Extending the code to be compatible with full and quarter keys. This series was tested, along with other patches not yet submitted, on the Felix and Seville switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()Vladimir Oltean
Currently a new filter is created, containing just enough correct information to be able to call ocelot_vcap_block_find_filter_by_index() on it. This will be limiting us in the future, when we'll have more metadata associated with a filter, which will matter in the stats() and destroy() callbacks, and which we can't make up on the spot. For example, we'll start "offloading" some dummy tc filter entries for the TCAM skeleton, but we won't actually be adding them to the hardware, or to block->rules. So, it makes sense to avoid deleting those rules too. That's the kind of thing which is difficult to determine unless we look up the real filter. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id functionVladimir Oltean
And rename the existing find to ocelot_vcap_block_find_filter_by_index. The index is the position in the TCAM, and the id is the flow cookie given by tc. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()Vladimir Oltean
The 'cnt' variable is actually used for 2 purposes, to hold the number of sub-words per VCAP entry, and the number of sub-words per VCAP action. In fact, I'm pretty sure these 2 numbers can never be different from one another. By hardware definition, the entry (key) TCAM rows are divided into the same number of sub-words as its associated action RAM rows. But nonetheless, let's at least rename the variables such that observations like this one are easier to make in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: rename variable 'count' in vcap_data_offset_get()Vladimir Oltean
This gets rid of one of the 2 variables named, very generically, "count". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: calculate vcap offsets correctly for full and quarter entriesXiaoliang Yang
When calculating the offsets for the current entry within the row and placing them inside struct vcap_data, the function assumes half key entry (2 keys per row). This patch modifies the vcap_data_offset_get() function to calculate a correct data offset when the setting VCAP Type-Group of a key to VCAP_TG_FULL or VCAP_TG_QUARTER. This is needed because, for example, VCAP ES0 only supports full keys. Also rename the 'count' variable to 'num_entries_per_row' to make the function just one tiny bit easier to follow. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: parse flower action before keyVladimir Oltean
When we'll make the switch to multiple chain offloading, we'll want to know first what VCAP block the rule is offloaded to. This impacts what keys are available. Since the VCAP block is determined by what actions are used, parse the action first. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: remove unneeded VCAP parameters for IS2Vladimir Oltean
Now that we are deriving these from the constants exposed by the hardware, we can delete the static info we're keeping in the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: automatically detect VCAP constantsVladimir Oltean
The numbers in struct vcap_props are not intuitive to derive, because they are not a straightforward copy-and-paste from the reference manual but instead rely on a fairly detailed level of understanding of the layout of an entry in the TCAM and in the action RAM. For this reason, bugs are very easy to introduce here. Ease the work of hardware porters and read from hardware the constants that were exported for this particular purpose. Note that this implies that struct vcap_props can no longer be const. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>