summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-07ipv4: Reject routes specifying ECN bits in rtm_tosGuillaume Nault
Use the new dscp_t type to replace the fc_tos field of fib_config, to ensure IPv4 routes aren't influenced by ECN bits when configured with non-zero rtm_tos. Before this patch, IPv4 routes specifying an rtm_tos with some of the ECN bits set were accepted. However they wouldn't work (never match) as IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB lookup (although a few buggy code paths don't). After this patch, IPv4 routes specifying an rtm_tos with any ECN bit set is rejected. Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted, but treated as if it were 0. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07ipv4: Stop taking ECN bits into account in fib4-rulesGuillaume Nault
Use the new dscp_t type to replace the tos field of struct fib4_rule, so that fib4-rules consistently ignore ECN bits. Before this patch, fib4-rules did accept rules with the high order ECN bit set (but not the low order one). Also, it relied on its callers masking the ECN bits of ->flowi4_tos to prevent those from influencing the result. This was brittle and a few call paths still do the lookup without masking the ECN bits first. After this patch fib4-rules only compare the DSCP bits. ECN can't influence the result anymore, even if the caller didn't mask these bits. Also, fib4-rules now must have both ECN bits cleared or they will be rejected. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rulesGuillaume Nault
Define a dscp_t type and its appropriate helpers that ensure ECN bits are not taken into account when handling DSCP. Use this new type to replace the tclass field of struct fib6_rule, so that fib6-rules don't get influenced by ECN bits anymore. Before this patch, fib6-rules didn't make any distinction between the DSCP and ECN bits. Therefore, rules specifying a DSCP (tos or dsfield options in iproute2) stopped working as soon a packets had at least one of its ECN bits set (as a work around one could create four rules for each DSCP value to match, one for each possible ECN value). After this patch fib6-rules only compare the DSCP bits. ECN doesn't influence the result anymore. Also, fib6-rules now must have the ECN bits cleared or they will be rejected. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07net: stmmac: optimize locking around PTP clock readsYannick Vignon
Reading the PTP clock is a simple operation requiring only 3 register reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is counter-productive: if the 2nd task preempting the 1st has a higher prio but needs to read time as well, it will require 2 context switches, which will pretty much always be more costly than just disabling preemption for the duration of the reads. Moreover, with the code logic recently added to get_systime(), disabling preemption is not even required anymore: reads and writes just need to be protected from each other, to prevent a clock read while the clock is being updated. Improve the above situation by replacing the PTP spinlock by a rwlock, and using read_lock for PTP clock reads so simultaneous reads do not block each other. Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com> Link: https://lore.kernel.org/r/20220204135545.2770625-1-yannick.vignon@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYsPavel Parkhomenko
It is mandatory for a software to issue a reset upon modifying RGMII Receive Timing Control and RGMII Transmit Timing Control bit fields of MAC Specific Control register 2 (page 2, register 21) otherwise the changes won't be perceived by the PHY (the same is applicable for a lot of other registers). Not setting the RGMII delays on the platforms that imply it' being done on the PHY side will consequently cause the traffic loss. We discovered that the denoted soft-reset is missing in the m88e1121_config_aneg() method for the case if the RGMII delays are modified but the MDIx polarity isn't changed or the auto-negotiation is left enabled, thus causing the traffic loss on our platform with Marvell Alaska 88E1510 installed. Let's fix that by issuing the soft-reset if the delays have been actually set in the m88e1121_config_aneg_rgmii_delays() method. Cc: stable@vger.kernel.org Fixes: d6ab93364734 ("net: phy: marvell: Avoid unnecessary soft reset") Signed-off-by: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20220205203932.26899-1-Pavel.Parkhomenko@baikalelectronics.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07net: typhoon: include <net/vxlan.h>Eric Dumazet
We need this to get vxlan_features_check() definition. Fixes: d2692eee05b8 ("net: typhoon: implement ndo_features_check method") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220208003502.1799728-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07bpf: test_run: Fix overflow in bpf_test_finish frags parsingStanislav Fomichev
This place also uses signed min_t and passes this singed int to copy_to_user (which accepts unsigned argument). I don't think there is an issue, but let's be consistent. Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220204235849.14658-2-sdf@google.com
2022-02-07bpf: test_run: Fix overflow in xdp frags parsingStanislav Fomichev
When kattr->test.data_size_in > INT_MAX, signed min_t will assign negative value to data_len. This negative value then gets passed over to copy_from_user where it is converted to (big) unsigned. Use unsigned min_t to avoid this overflow. usercopy: Kernel memory overwrite attempt detected to wrapped address (offset 0, size 18446612140539162846)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3781 Comm: syz-executor226 Not tainted 4.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 RSP: 0018:ffff8801e9703a38 EFLAGS: 00010286 RAX: 000000000000006c RBX: ffffffff84fc7040 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff816560a2 RDI: ffffed003d2e0739 RBP: ffff8801e9703a90 R08: 000000000000006c R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff84fc73a0 R13: ffffffff84fc7180 R14: ffffffff84fc7040 R15: ffffffff84fc7040 FS: 00007f54e0bec300(0000) GS:ffff8801f6600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000280 CR3: 00000001e90ea000 CR4: 00000000003426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: check_bogus_address mm/usercopy.c:155 [inline] __check_object_size mm/usercopy.c:263 [inline] __check_object_size.cold+0x8c/0xad mm/usercopy.c:253 check_object_size include/linux/thread_info.h:112 [inline] check_copy_size include/linux/thread_info.h:143 [inline] copy_from_user include/linux/uaccess.h:142 [inline] bpf_prog_test_run_xdp+0xe57/0x1240 net/bpf/test_run.c:989 bpf_prog_test_run kernel/bpf/syscall.c:3377 [inline] __sys_bpf+0xdf2/0x4a50 kernel/bpf/syscall.c:4679 SYSC_bpf kernel/bpf/syscall.c:4765 [inline] SyS_bpf+0x26/0x50 kernel/bpf/syscall.c:4763 do_syscall_64+0x21a/0x3e0 arch/x86/entry/common.c:305 entry_SYSCALL_64_after_hwframe+0x46/0xbb Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220204235849.14658-1-sdf@google.com
2022-02-07Merge branch 'bpf_prog_pack allocator'Alexei Starovoitov
Song Liu says: ==================== Changes v8 => v9: 1. Fix an error with multi function program, in 4/9. Changes v7 => v8: 1. Rebase and fix conflicts. 2. Lock text_mutex for text_poke_copy. (Daniel) Changes v6 => v7: 1. Redesign the interface between generic and arch logic, based on feedback from Alexei and Ilya. 2. Split 6/7 of v6 to 7/9 and 8/9 in v7, for cleaner logic. 3. Add bpf_arch_text_copy in 6/9. Changes v5 => v6: 1. Make jit_hole_buffer 128 byte long. Only fill the first and last 128 bytes of header with INT3. (Alexei) 2. Use kvmalloc for temporary buffer. (Alexei) 3. Rename tmp_header/tmp_image => rw_header/rw_image. Remove tmp_image from x64_jit_data. (Alexei) 4. Change fall back round_up_to in bpf_jit_binary_alloc_pack() from BPF_PROG_MAX_PACK_PROG_SIZE to PAGE_SIZE. Changes v4 => v5: 1. Do not use atomic64 for bpf_jit_current. (Alexei) Changes v3 => v4: 1. Rename text_poke_jit() => text_poke_copy(). (Peter) 2. Change comment style. (Peter) Changes v2 => v3: 1. Fix tailcall. Changes v1 => v2: 1. Use text_poke instead of writing through linear mapping. (Peter) 2. Avoid making changes to non-x86_64 code. Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this could also add significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system, which includes visible performance degradation for production workloads. This set tries to solve this problem with customized allocator that pack multiple programs into a huge page. Patches 1-6 prepare the work. Patch 7 contains key logic of bpf_prog_pack allocator. Patch 8 contains bpf_jit_binary_pack_alloc logic on top of bpf_prog_pack allocator. Patch 9 uses this allocator in x86_64 jit. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-02-07bpf, x86_64: Use bpf_jit_binary_pack_allocSong Liu
Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes the program to the rw buffer. When the jit is done, the program is copied to the final location with bpf_jit_binary_pack_finalize. Note that we need to do bpf_tail_call_direct_fixup after finalize. Therefore, the text_live = false logic in __bpf_arch_text_poke is no longer needed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-10-song@kernel.org
2022-02-07bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]Song Liu
This is the jit binary allocator built on top of bpf_prog_pack. bpf_prog_pack allocates RO memory, which cannot be used directly by the JIT engine. Therefore, a temporary rw buffer is allocated for the JIT engine. Once JIT is done, bpf_jit_binary_pack_finalize is used to copy the program to the RO memory. bpf_jit_binary_pack_alloc reserves 16 bytes of extra space for illegal instructions, which is small than the 128 bytes space reserved by bpf_jit_binary_alloc. This change is necessary for bpf_jit_binary_hdr to find the correct header. Also, flag use_bpf_prog_pack is added to differentiate a program allocated by bpf_jit_binary_pack_alloc. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-9-song@kernel.org
2022-02-07bpf: Introduce bpf_prog_pack allocatorSong Liu
Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this could add significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system, which includes visible performance degradation for production workloads. Introduce bpf_prog_pack allocator to pack multiple BPF programs in a huge page. The memory is then allocated in 64 byte chunks. Memory allocated by bpf_prog_pack allocator is RO protected after initial allocation. To write to it, the user (jit engine) need to use text poke API. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-8-song@kernel.org
2022-02-07bpf: Introduce bpf_arch_text_copySong Liu
This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-7-song@kernel.org
2022-02-07x86/alternative: Introduce text_poke_copySong Liu
This will be used by BPF jit compiler to dump JITed binary to a RX huge page, and thus allow multiple BPF programs sharing the a huge (2MB) page. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-6-song@kernel.org
2022-02-07bpf: Use prog->jited_len in bpf_prog_ksym_set_addr()Song Liu
Using prog->jited_len is simpler and more accurate than current estimation (header + header->size). Also, fix missing prog->jited_len with multi function program. This hasn't been a real issue before this. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-5-song@kernel.org
2022-02-07bpf: Use size instead of pages in bpf_binary_headerSong Liu
This is necessary to charge sub page memory for the BPF program. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-4-song@kernel.org
2022-02-07bpf: Use bytes instead of pages for bpf_jit_[charge|uncharge]_modmemSong Liu
This enables sub-page memory charge and allocation. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-3-song@kernel.org
2022-02-07x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAPSong Liu
This enables module_alloc() to allocate huge page for 2MB+ requests. To check the difference of this change, we need enable config CONFIG_PTDUMP_DEBUGFS, and call module_alloc(2MB). Before the change, /sys/kernel/debug/page_tables/kernel shows pte for this map. With the change, /sys/kernel/debug/page_tables/ show pmd for thie map. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-2-song@kernel.org
2022-02-07Merge tag '5.17-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: - NTLMSSP authentication improvement - RDMA (smbdirect) fix allowing broader set of NICs to be supported - improved buffer validation - additional small fixes, including a posix extensions fix for stable * tag '5.17-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: add support for key exchange ksmbd: reduce smb direct max read/write size ksmbd: don't align last entry offset in smb2 query directory ksmbd: fix same UniqueId for dot and dotdot entries ksmbd: smbd: validate buffer descriptor structures ksmbd: fix SMB 3.11 posix extension mount failure
2022-02-07igb: refactor XDP registrationCorinna Vinschen
On changing the RX ring parameters igb uses a hack to avoid a warning when calling xdp_rxq_info_reg via igb_setup_rx_resources. It just clears the struct xdp_rxq_info content. Instead, change this to unregister if we're already registered. Align code to the igc code. Fixes: 9cbc948b5a20c ("igb: add XDP support") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-07igc: avoid kernel warning when changing RX ring parametersCorinna Vinschen
Calling ethtool changing the RX ring parameters like this: $ ethtool -G eth0 rx 1024 on igc triggers kernel warnings like this: [ 225.198467] ------------[ cut here ]------------ [ 225.198473] Missing unregister, handled but fix driver [ 225.198485] WARNING: CPU: 7 PID: 959 at net/core/xdp.c:168 xdp_rxq_info_reg+0x79/0xd0 [...] [ 225.198601] Call Trace: [ 225.198604] <TASK> [ 225.198609] igc_setup_rx_resources+0x3f/0xe0 [igc] [ 225.198617] igc_ethtool_set_ringparam+0x30e/0x450 [igc] [ 225.198626] ethnl_set_rings+0x18a/0x250 [ 225.198631] genl_family_rcv_msg_doit+0xca/0x110 [ 225.198637] genl_rcv_msg+0xce/0x1c0 [ 225.198640] ? rings_prepare_data+0x60/0x60 [ 225.198644] ? genl_get_cmd+0xd0/0xd0 [ 225.198647] netlink_rcv_skb+0x4e/0xf0 [ 225.198652] genl_rcv+0x24/0x40 [ 225.198655] netlink_unicast+0x20e/0x330 [ 225.198659] netlink_sendmsg+0x23f/0x480 [ 225.198663] sock_sendmsg+0x5b/0x60 [ 225.198667] __sys_sendto+0xf0/0x160 [ 225.198671] ? handle_mm_fault+0xb2/0x280 [ 225.198676] ? do_user_addr_fault+0x1eb/0x690 [ 225.198680] __x64_sys_sendto+0x20/0x30 [ 225.198683] do_syscall_64+0x38/0x90 [ 225.198687] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 225.198693] RIP: 0033:0x7f7ae38ac3aa igc_ethtool_set_ringparam() copies the igc_ring structure but neglects to reset the xdp_rxq_info member before calling igc_setup_rx_resources(). This in turn calls xdp_rxq_info_reg() with an already registered xdp_rxq_info. Make sure to unregister the xdp_rxq_info structure first in igc_setup_rx_resources. Fixes: 73f1071c1d29 ("igc: Add support for XDP_TX action") Reported-by: Lennert Buytenhek <buytenh@arista.com> Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-07Merge branch 'bpf: Fix strict mode calculation'Andrii Nakryiko
Mauricio Vásquez <mauricio@kinvolk.io> says: ==================== This series fixes a bad calculation of strict mode in two places. It also updates libbpf to make it easier for the users to disable a specific LIBBPF_STRICT_* flag. v1 -> v2: - remove check in libbpf_set_strict_mode() - split in different commits v1: https://lore.kernel.org/bpf/20220204220435.301896-1-mauricio@kinvolk.io/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-02-07selftests/bpf: Fix strict mode calculationMauricio Vásquez
"(__LIBBPF_STRICT_LAST - 1) & ~LIBBPF_STRICT_MAP_DEFINITIONS" is wrong as it is equal to 0 (LIBBPF_STRICT_NONE). Let's use "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" now that the previous commit makes it possible in libbpf. Fixes: 93b8952d223a ("libbpf: deprecate legacy BPF map definitions") Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220207145052.124421-4-mauricio@kinvolk.io
2022-02-07bpftool: Fix strict mode calculationMauricio Vásquez
"(__LIBBPF_STRICT_LAST - 1) & ~LIBBPF_STRICT_MAP_DEFINITIONS" is wrong as it is equal to 0 (LIBBPF_STRICT_NONE). Let's use "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" now that the previous commit makes it possible in libbpf. Fixes: 93b8952d223a ("libbpf: deprecate legacy BPF map definitions") Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220207145052.124421-3-mauricio@kinvolk.io
2022-02-07libbpf: Remove mode check in libbpf_set_strict_mode()Mauricio Vásquez
libbpf_set_strict_mode() checks that the passed mode doesn't contain extra bits for LIBBPF_STRICT_* flags that don't exist yet. It makes it difficult for applications to disable some strict flags as something like "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" is rejected by this check and they have to use a rather complicated formula to calculate it.[0] One possibility is to change LIBBPF_STRICT_ALL to only contain the bits of all existing LIBBPF_STRICT_* flags instead of 0xffffffff. However it's not possible because the idea is that applications compiled against older libbpf_legacy.h would still be opting into latest LIBBPF_STRICT_ALL features.[1] The other possibility is to remove that check so something like "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" is allowed. It's what this commit does. [0]: https://lore.kernel.org/bpf/20220204220435.301896-1-mauricio@kinvolk.io/ [1]: https://lore.kernel.org/bpf/CAEf4BzaTWa9fELJLh+bxnOb0P1EMQmaRbJVG0L+nXZdy0b8G3Q@mail.gmail.com/ Fixes: 93b8952d223a ("libbpf: deprecate legacy BPF map definitions") Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220207145052.124421-2-mauricio@kinvolk.io
2022-02-07Merge tag 'ata-5.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata fix from Damien Le Moal: "A single patch from me, to fix a bug that is causing boot issues in the field (reports of problems with Fedora 35). The bug affects mostly old-ish drives that have issues with read log page command handling" * tag 'ata-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata-core: Fix ata_dev_config_cpr()
2022-02-07Merge tag 'mmc-v5.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix support for SD Power off notification MMC host: - moxart: Fix potential use-after-free on remove path - sdhci-of-esdhc: Fix error path when setting dma mask - sh_mmcif: Fix potential NULL pointer dereference" * tag 'mmc-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: moxart: fix potential use-after-free on remove path mmc: core: Wait for command setting 'Power Off Notification' bit to complete mmc: sh_mmcif: Check for null res pointer mmc: sdhci-of-esdhc: Check for error num after setting mask
2022-02-07Merge tag 'integrity-v5.17-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fixes from Mimi Zohar: "Fixes for recently found bugs. One was found/noticed while reviewing IMA support for fsverity digests and signatures. Two of them were found/noticed while working on IMA namespacing. Plus two other bugs. All of them are for previous kernel releases" * tag 'integrity-v5.17-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: Do not print policy rule with inactive LSM labels ima: Allow template selection with ima_template[_fmt]= after ima_hash= ima: Remove ima_policy file before directory integrity: check the return value of audit_log_start() ima: fix reference leak in asymmetric_verify()
2022-02-07selftests/bpf: Fix tests to use arch-dependent syscall entry pointsNaveen N. Rao
Some of the tests are using x86_64 ABI-specific syscall entry points (such as __x64_sys_nanosleep and __x64_sys_getpgid). Update them to use architecture-dependent syscall entry names. Also update fexit_sleep test to not use BPF_PROG() so that it is clear that the syscall parameters aren't being accessed in the bpf prog. Note that none of the bpf progs in these tests are actually accessing any of the syscall parameters. The only exception is perfbuf_bench, which passes on the bpf prog context into bpf_perf_event_output() as a pointer to pt_regs, but that looks to be mostly ignored. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/e35f7051f03e269b623a68b139d8ed131325f7b7.1643973917.git.naveen.n.rao@linux.vnet.ibm.com
2022-02-07selftests/bpf: Use "__se_" prefix on architectures without syscall wrapperNaveen N. Rao
On architectures that don't use a syscall wrapper, sys_* function names are set as an alias of __se_sys_* functions. Due to this, there is no BTF associated with sys_* function names. This results in some of the test progs failing to load. Set the SYS_PREFIX to "__se_" to fix this issue. Fixes: 38261f369fb905 ("selftests/bpf: Fix probe_user test failure with clang build kernel") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/013d632aacd3e41290445c0025db6a7055ec6e18.1643973917.git.naveen.n.rao@linux.vnet.ibm.com
2022-02-07ata: libata-core: Fix ata_dev_config_cpr()Damien Le Moal
The concurrent positioning ranges log page 47h is a general purpose log page and not a subpage of the indentify device log. Using ata_identify_page_supported() to test for concurrent positioning ranges support is thus wrong. ata_log_supported() must be used. Furthermore, unlike other advanced ATA features (e.g. NCQ priority), accesses to the concurrent positioning ranges log page are not gated by a feature bit from the device IDENTIFY data. Since many older drives react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued to read device log pages, avoid problems with older drives by limiting the concurrent positioning ranges support detection to drives implementing at least the ACS-4 ATA standard (major version 11). This additional condition effectively turns ata_dev_config_cpr() into a nop for older drives, avoiding problems in the field. Fixes: fe22e1c2f705 ("libata: support concurrent positioning ranges log") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519 Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Abderraouf Adjal <adjal.arf@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-07net: dsa: mv88e6xxx: Unlock on error in mv88e6xxx_port_bridge_join()Dan Carpenter
Call mv88e6xxx_reg_unlock(chip) before returning on this error path. Fixes: 7af4a361a62f ("net: dsa: mv88e6xxx: Improve isolation of standalone ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: dsa: mv88e6xxx: Fix off by in one in mv88e6185_phylink_get_caps()Dan Carpenter
The <= ARRAY_SIZE() needs to be < ARRAY_SIZE() to prevent an out of bounds error. Fixes: d4ebf12bcec4 ("net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilities") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: hns3: add support for TX push modeYufeng Mo
For the device that supports the TX push capability, the BD can be directly copied to the device memory. However, due to hardware restrictions, the push mode can be used only when there are no more than two BDs, otherwise, the doorbell mode based on device memory is used. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: asix: add proper error handling of usb read errorsPavel Skripkin
Syzbot once again hit uninit value in asix driver. The problem still the same -- asix_read_cmd() reads less bytes, than was requested by caller. Since all read requests are performed via asix_read_cmd() let's catch usb related error there and add __must_check notation to be sure all callers actually check return value. So, this patch adds sanity check inside asix_read_cmd(), that simply checks if bytes read are not less, than was requested and adds missing error handling of asix_read_cmd() all across the driver code. Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Reported-and-tested-by: syzbot+6ca9f7867b77c2d316ac@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07r8169: factor out redundant RTL8168d PHY config functionality to ↵Heiner Kallweit
rtl8168d_1_common() rtl8168d_2_hw_phy_config() shares quite some functionality with rtl8168d_1_hw_phy_config(), so let's factor out the common part to a new function rtl8168d_1_common(). In addition improve the code a little. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07ip6mr: fix use-after-free in ip6mr_sk_done()Eric Dumazet
Apparently addrconf_exit_net() is called before igmp6_net_exit() and ndisc_net_exit() at netns dismantle time: net_namespace: call ip6table_mangle_net_exit() net_namespace: call ip6_tables_net_exit() net_namespace: call ipv6_sysctl_net_exit() net_namespace: call ioam6_net_exit() net_namespace: call seg6_net_exit() net_namespace: call ping_v6_proc_exit_net() net_namespace: call tcpv6_net_exit() ip6mr_sk_done sk=ffffa354c78a74c0 net_namespace: call ipv6_frags_exit_net() net_namespace: call addrconf_exit_net() net_namespace: call ip6addrlbl_net_exit() net_namespace: call ip6_flowlabel_net_exit() net_namespace: call ip6_route_net_exit_late() net_namespace: call fib6_rules_net_exit() net_namespace: call xfrm6_net_exit() net_namespace: call fib6_net_exit() net_namespace: call ip6_route_net_exit() net_namespace: call ipv6_inetpeer_exit() net_namespace: call if6_proc_net_exit() net_namespace: call ipv6_proc_exit_net() net_namespace: call udplite6_proc_exit_net() net_namespace: call raw6_exit_net() net_namespace: call igmp6_net_exit() ip6mr_sk_done sk=ffffa35472b2a180 ip6mr_sk_done sk=ffffa354c78a7980 net_namespace: call ndisc_net_exit() ip6mr_sk_done sk=ffffa35472b2ab00 net_namespace: call ip6mr_net_exit() net_namespace: call inet6_net_exit() This was fine because ip6mr_sk_done() would not reach the point decreasing net->ipv6.devconf_all->mc_forwarding until my patch in ip6mr_sk_done(). To fix this without changing struct pernet_operations ordering, we can clear net->ipv6.devconf_dflt and net->ipv6.devconf_all when they are freed from addrconf_exit_net() BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 Read of size 4 at addr ffff88801ff08688 by task kworker/u4:4/963 CPU: 0 PID: 963 Comm: kworker/u4:4 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 rawv6_close+0x58/0x80 net/ipv6/raw.c:1201 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release net/socket.c:650 [inline] sock_release+0x87/0x1b0 net/socket.c:678 inet_ctl_sock_destroy include/net/inet_common.h:65 [inline] igmp6_net_exit+0x6b/0x170 net/ipv6/mcast.c:3173 ops_exit_list+0xb0/0x170 net/core/net_namespace.c:168 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:600 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: f2f2325ec799 ("ip6mr: ip6mr_sk_done() can exit early in common cases") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07caif: cleanup double word in commentTom Rix
Replace the second 'so' with 'free'. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net/smc: use GFP_ATOMIC allocation in smc_pnet_add_eth()Eric Dumazet
My last patch moved the netdev_tracker_alloc() call to a section protected by a write_lock(). I should have replaced GFP_KERNEL with GFP_ATOMIC to avoid the infamous: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 Fixes: 28f922213886 ("net/smc: fix ref_tracker issue in smc_pnet_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07Merge branch 'mlxsw-dip-sip-mangling'David S. Miller
Ido Schimmel says: ==================== mlxsw: Add SIP and DIP mangling support Danielle says: On Spectrum-2 onwards, it is possible to overwrite SIP and DIP address of an IPv4 or IPv6 packet in the ACL engine. That corresponds to pedit munges of, respectively, ip src and ip dst fields, and likewise for ip6. Offload these munges on the systems where they are supported. Patchset overview: Patch #1: introduces SIP_DIP_ACTION and its fields. Patch #2-#3: adds the new pedit fields, and dispatches on them on Spectrum-2 and above. Patch #4 adds a selftest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07selftests: forwarding: Add a test for pedit munge SIP and DIPDanielle Ratson
Add a test that checks that pedit adjusts source and destination addresses of IPv4 and IPv6 packets. Output example: $ ./pedit_ip.sh TEST: ping [ OK ] TEST: ping6 [ OK ] TEST: dev swp2 ingress pedit ip src set 198.51.100.1 [ OK ] TEST: dev swp3 egress pedit ip src set 198.51.100.1 [ OK ] TEST: dev swp2 ingress pedit ip dst set 198.51.100.1 [ OK ] TEST: dev swp3 egress pedit ip dst set 198.51.100.1 [ OK ] TEST: dev swp2 ingress pedit ip6 src set 2001:db8:2::1 [ OK ] TEST: dev swp3 egress pedit ip6 src set 2001:db8:2::1 [ OK ] TEST: dev swp2 ingress pedit ip6 dst set 2001:db8:2::1 [ OK ] TEST: dev swp3 egress pedit ip6 dst set 2001:db8:2::1 [ OK ] Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv6 addressesDanielle Ratson
Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6 source and destination addresses change. Offload suitable mangles to the IPv6 address change action. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv4 addressesDanielle Ratson
Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6 source and destination addresses change. Offload suitable mangles to the IPv4 address change action. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: core_acl_flex_actions: Add SIP_DIP_ACTIONDanielle Ratson
Add fields related to SIP_DIP_ACTION, which is used for changing of SIP and DIP addresses. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07Merge branch 'ipv6-kfree_skb_reason'David S. Miller
Menglong Dong says: ==================== net: use kfree_skb_reason() for ip/udp packet receive In this series patches, kfree_skb() is replaced with kfree_skb_reason() during ipv4 and udp4 packet receiving path, and following drop reasons are introduced: SKB_DROP_REASON_SOCKET_FILTER SKB_DROP_REASON_NETFILTER_DROP SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM TCP is more complex, so I left it in the next series. I just figure out how __print_symbolic() works. It doesn't base on the array index, but searching for symbols by loop. So I'm a little afraid it's performance. Changes since v3: - fix some small problems in the third patch (net: ipv4: use kfree_skb_reason() in ip_rcv_core()), as David Ahern said Changes since v2: - use SKB_DROP_REASON_PKT_TOO_SMALL for a path in ip_rcv_core() Changes since v1: - add document for all drop reasons, as David advised - remove unreleated cleanup - remove EARLY_DEMUX and IP_ROUTE_INPUT drop reason - replace {UDP, TCP}_FILTER with SOCKET_FILTER ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in __udp_queue_rcv_skb(). Following new drop reasons are introduced: SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: udp: use kfree_skb_reason() in udp_queue_rcv_one_skb()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in udp_queue_rcv_one_skb(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu(). Following new drop reasons are introduced: SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(), following drop reasons are introduced: SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_rcv_core()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new drop reasons are introduced: SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>