summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-05tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()Naveen N. Rao
With the new osnoise tracer, we are seeing the below splat: Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0xc7d880000 Faulting instruction address: 0xc0000000002ffa10 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0 LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0 Call Trace: [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable) [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90 [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290 [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710 [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130 [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270 [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180 [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278 osnoise tracer on ppc64le is triggering osnoise_taint() for negative duration in get_int_safe_duration() called from trace_sched_switch_callback()->thread_exit(). The problem though is that the check for a valid trace_percpu_buffer is incorrect in get_trace_buf(). The check is being done after calculating the pointer for the current cpu, rather than on the main percpu pointer. Fix the check to be against trace_percpu_buffer. Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: e2ace001176dc9 ("tracing: Choose static tp_printk buffer by explicit nesting count") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05libbpf: Support repeated legacy kprobes on same functionQiang Wang
If repeated legacy kprobes on same function in one process, libbpf will register using the same probe name and got -EBUSY error. So append index to the probe name format to fix this problem. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211227130713.66933-2-wangqiang.wq.frank@bytedance.com
2022-01-05libbpf: Use probe_name for legacy kprobeQiang Wang
Fix a bug in commit 46ed5fc33db9, which wrongly used the func_name instead of probe_name to register legacy kprobe. Fixes: 46ed5fc33db9 ("libbpf: Refactor and simplify legacy kprobe code") Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com> Link: https://lore.kernel.org/bpf/20211227130713.66933-1-wangqiang.wq.frank@bytedance.com
2022-01-05ftrace/samples: Add missing prototypes direct functionsJiri Olsa
There's another compilation fail (first here [1]) reported by kernel test robot for W=1 clang build: >> samples/ftrace/ftrace-direct-multi-modify.c:7:6: warning: no previous prototype for function 'my_direct_func1' [-Wmissing-prototypes] void my_direct_func1(unsigned long ip) Direct functions in ftrace direct sample modules need to have prototypes defined. They are already global in order to be visible for the inline assembly, so there's no problem. The kernel test robot reported just error for ftrace-direct-multi-modify, but I got same errors also for the rest of the modules touched by this patch. [1] 67d4f6e3bf5d ftrace/samples: Add missing prototype for my_direct_func Link: https://lkml.kernel.org/r/20211219135317.212430-1-jolsa@kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface") Fixes: ae0cc3b7e7f5 ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()") Fixes: 156473a0ff4f ("ftrace: Add another example of register_ftrace_direct() use case") Fixes: b06457c83af6 ("ftrace: Add sample module that uses register_ftrace_direct()") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05libbpf: Deprecate bpf_perf_event_read_simple() APIChristy Lee
With perf_buffer__poll() and perf_buffer__consume() APIs available, there is no reason to expose bpf_perf_event_read_simple() API to users. If users need custom perf buffer, they could re-implement the function. Mark bpf_perf_event_read_simple() and move the logic to a new static function so it can still be called by other functions in the same file. [0] Closes: https://github.com/libbpf/libbpf/issues/310 Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211229204156.13569-1-christylee@fb.com
2022-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05bpf: Add SO_RCVBUF/SO_SNDBUF in _bpf_getsockopt().Kuniyuki Iwashima
This patch exposes SO_RCVBUF/SO_SNDBUF through bpf_getsockopt(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220104013153.97906-3-kuniyu@amazon.co.jp
2022-01-05bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().Kuniyuki Iwashima
The commit 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") added a change to prevent underflow in setsockopt() around SO_SNDBUF/SO_RCVBUF. This patch adds the same change to _bpf_setsockopt(). Fixes: 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220104013153.97906-2-kuniyu@amazon.co.jp
2022-01-05Merge tag 'net-5.16-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski" "Networking fixes, including fixes from bpf, and WiFi. One last pull request, turns out some of the recent fixes did more harm than good. Current release - regressions: - Revert "xsk: Do not sleep in poll() when need_wakeup set", made the problem worse - Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register", broke EPROBE_DEFER handling - Revert "net: usb: r8152: Add MAC pass-through support for more Lenovo Docks", broke setups without a Lenovo dock Current release - new code bugs: - selftests: set amt.sh executable Previous releases - regressions: - batman-adv: mcast: don't send link-local multicast to mcast routers Previous releases - always broken: - ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY - sctp: hold endpoint before calling cb in sctp_transport_lookup_process - mac80211: mesh: embed mesh_paths and mpp_paths into ieee80211_if_mesh to avoid complicated handling of sub-object allocation failures - seg6: fix traceroute in the presence of SRv6 - tipc: fix a kernel-infoleak in __tipc_sendmsg()" * tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) selftests: set amt.sh executable Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks" sfc: The RX page_ring is optional iavf: Fix limit of total number of queues to active queues of VF i40e: Fix incorrect netdev's real number of RX/TX queues i40e: Fix for displaying message regarding NVM version i40e: fix use-after-free in i40e_sync_filters_subtask() i40e: Fix to not show opcode msg on unsuccessful VF MAC change ieee802154: atusb: fix uninit value in atusb_set_extended_addr mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc netrom: fix copying in user data in nr_setsockopt udp6: Use Segment Routing Header for dest address if present icmp: ICMPV6: Examine invoking packet for Segment Route Headers. seg6: export get_srh() for ICMP handling Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register" ipv6: Do cleanup if attribute validation fails in multipath route ipv6: Continue processing multipath route even if gateway attribute is invalid net/fsl: Remove leftover definition in xgmac_mdio ...
2022-01-05bpf: Fix verifier support for validation of async callbacksKris Van Hees
Commit bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.") added support for BPF_FUNC_timer_set_callback to the __check_func_call() function. The test in __check_func_call() is flaweed because it can mis-interpret a regular BPF-to-BPF pseudo-call as a BPF_FUNC_timer_set_callback callback call. Consider the conditional in the code: if (insn->code == (BPF_JMP | BPF_CALL) && insn->imm == BPF_FUNC_timer_set_callback) { The BPF_FUNC_timer_set_callback has value 170. This means that if you have a BPF program that contains a pseudo-call with an instruction delta of 170, this conditional will be found to be true by the verifier, and it will interpret the pseudo-call as a callback. This leads to a mess with the verification of the program because it makes the wrong assumptions about the nature of this call. Solution: include an explicit check to ensure that insn->src_reg == 0. This ensures that calls cannot be mis-interpreted as an async callback call. Fixes: bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.") Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220105210150.GH1559@oracle.com
2022-01-05bpf, docs: Fully document the JMP mode modifiersChristoph Hellwig
Add a description for all the modifiers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-7-hch@lst.de
2022-01-05bpf, docs: Fully document the JMP opcodesChristoph Hellwig
Add pseudo-code to document all the different BPF_JMP / BPF_JMP64 opcodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-6-hch@lst.de
2022-01-05bpf, docs: Fully document the ALU opcodesChristoph Hellwig
Add pseudo-code to document all the different BPF_ALU / BPF_ALU64 opcodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-5-hch@lst.de
2022-01-05bpf, docs: Document the opcode classesChristoph Hellwig
Add a description for each opcode class. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-4-hch@lst.de
2022-01-05bpf, docs: Add subsections for ALU and JMP instructionsChristoph Hellwig
Add a little more stucture to the ALU/JMP documentation with sections and improve the example text. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-3-hch@lst.de
2022-01-05bpf, docs: Add a setion to explain the basic instruction encodingChristoph Hellwig
The eBPF instruction set document does not currently document the basic instruction encoding. Add a section to do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-2-hch@lst.de
2022-01-05can: isotp: convert struct tpcon::{idx,len} to unsigned intMarc Kleine-Budde
In isotp_rcv_ff() 32 bit of data received over the network is assigned to struct tpcon::len. Later in that function the length is checked for the maximal supported length against MAX_MSG_LENGTH. As struct tpcon::len is an "int" this check does not work, if the provided length overflows the "int". Later on struct tpcon::idx is compared against struct tpcon::len. To fix this problem this patch converts both struct tpcon::{idx,len} to unsigned int. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20220105132429.1170627-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-01-05can: gs_usb: fix use of uninitialized variable, detach device on reception ↵Marc Kleine-Budde
of invalid USB data The received data contains the channel the received data is associated with. If the channel number is bigger than the actual number of channels assume broken or malicious USB device and shut it down. This fixes the error found by clang: | drivers/net/can/usb/gs_usb.c:386:6: error: variable 'dev' is used | uninitialized whenever 'if' condition is true | if (hf->channel >= GS_MAX_INTF) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | drivers/net/can/usb/gs_usb.c:474:10: note: uninitialized use occurs here | hf, dev->gs_hf_size, gs_usb_receive_bulk_callback, | ^~~ Link: https://lore.kernel.org/all/20211210091158.408326-1-mkl@pengutronix.de Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-01-05RDMA/core: Don't infoleak GRH fieldsLeon Romanovsky
If dst->is_global field is not set, the GRH fields are not cleared and the following infoleak is reported. ===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 vfs_write+0x8ce/0x2030 fs/read_write.c:588 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline] __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Local variable resp created at: ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 Bytes 40-59 of 144 are uninitialized Memory access of size 144 starts at ffff888167523b00 Data copied to user address 0000000020000100 CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ===================================================== Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr") Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05bpf, selftests: Add verifier test for mem_or_null register with offset.Daniel Borkmann
Add a new test case with mem_or_null typed register with off > 0 to ensure it gets rejected by the verifier: # ./test_verifier 1011 #1009/u check with invalid reg offset 0 OK #1009/p check with invalid reg offset 0 OK Summary: 2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05bpf: Don't promote bogus looking registers after null check.Daniel Borkmann
If we ever get to a point again where we convert a bogus looking <ptr>_or_null typed register containing a non-zero fixed or variable offset, then lets not reset these bounds to zero since they are not and also don't promote the register to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown register could be an avenue as well, but then if we run into this case it would allow to leak a kernel pointer this way. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05bpf, sockmap: Fix double bpf_prog_put on error case in map_linkJohn Fastabend
sock_map_link() is called to update a sockmap entry with a sk. But, if the sock_map_init_proto() call fails then we return an error to the map_update op against the sockmap. In the error path though we need to cleanup psock and dec the refcnt on any programs associated with the map, because we refcnt them early in the update process to ensure they are pinned for the psock. (This avoids a race where user deletes programs while also updating the map with new socks.) In current code we do the prog refcnt dec explicitely by calling bpf_prog_put() when the program was found in the map. But, after commit '38207a5e81230' in this error path we've already done the prog to psock assignment so the programs have a reference from the psock as well. This then causes the psock tear down logic, invoked by sk_psock_put() in the error path, to similarly call bpf_prog_put on the programs there. To be explicit this logic does the prog->psock assignment: if (msg_*) psock_set_prog(...) Then the error path under the out_progs label does a similar check and dec with: if (msg_*) bpf_prog_put(...) And the teardown logic sk_psock_put() does ... psock_set_prog(msg_*, NULL) ... triggering another bpf_prog_put(...). Then KASAN gives us this splat, found by syzbot because we've created an inbalance between bpf_prog_inc and bpf_prog_put calling put twice on the program. BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829 BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829 Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641 To fix clean up error path so it doesn't try to do the bpf_prog_put in the error path once progs are assigned then it relies on the normal psock tear down logic to do complete cleanup. For completness we also cover the case whereh sk_psock_init_strp() fails, but this is not expected because it indicates an incorrect socket type and should be caught earlier. Fixes: 38207a5e8123 ("bpf, sockmap: Attach map progs to psock early for feature probes") Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
2022-01-05bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser()John Fastabend
Applications can be confused slightly because we do not always return the same error code as expected, e.g. what the TCP stack normally returns. For example on a sock err sk->sk_err instead of returning the sock_error we return EAGAIN. This usually means the application will 'try again' instead of aborting immediately. Another example, when a shutdown event is received we should immediately abort instead of waiting for data when the user provides a timeout. These tend to not be fatal, applications usually recover, but introduces bogus errors to the user or introduces unexpected latency. Before 'c5d2177a72a16' we fell back to the TCP stack when no data was available so we managed to catch many of the cases here, although with the extra latency cost of calling tcp_msg_wait_data() first. To fix lets duplicate the error handling in TCP stack into tcp_bpf so that we get the same error codes. These were found in our CI tests that run applications against sockmap and do longer lived testing, at least compared to test_sockmap that does short-lived ping/pong tests, and in some of our test clusters we deploy. Its non-trivial to do these in a shorter form CI tests that would be appropriate for BPF selftests, but we are looking into it so we can ensure this keeps working going forward. As a preview one idea is to pull in the packetdrill testing which catches some of this. Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220104205918.286416-1-john.fastabend@gmail.com
2022-01-05bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNCHou Tao
The following error is reported when running "./test_progs -t for_each" under arm64: bpf_jit: multi-func JIT bug 58 != 56 [...] JIT doesn't support bpf-to-bpf calls The root cause is the size of BPF_PSEUDO_FUNC instruction increases from 2 to 3 after the address of called bpf-function is settled and there are two bpf-to-bpf calls in test_pkt_access. The generated instructions are shown below: 0x48: 21 00 C0 D2 movz x1, #0x1, lsl #32 0x4c: 21 00 80 F2 movk x1, #0x1 0x48: E1 3F C0 92 movn x1, #0x1ff, lsl #32 0x4c: 41 FE A2 F2 movk x1, #0x17f2, lsl #16 0x50: 81 70 9F F2 movk x1, #0xfb84 Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so the size of jited image will not change. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
2022-01-05net: gemini: allow any RGMII interface modeRussell King (Oracle)
The four RGMII interface modes take care of the required RGMII delay configuration at the PHY and should not be limited by the network MAC driver. Sadly, gemini was only permitting RGMII mode with no delays, which would require the required delay to be inserted via PCB tracking or by the MAC. However, there are designs that require the PHY to add the delay, which is impossible without Gemini permitting the other three PHY interface modes. Fix the driver to allow these. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Link: https://lore.kernel.org/r/E1n4mpT-002PLd-Ha@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05Merge branch 'fix-rgmii-delays-for-88e1118'Jakub Kicinski
Russell King says: ==================== Fix RGMII delays for 88E1118 This series fixes the RGMII delays for 88E1118 Marvell PHYs, after a report by Corentin Labbe that the Marvell driver fails to work. Patch 1 cleans up the paged register accesses in m88e1118_config_init() and patch 2 adds the RGMII delay configuration. This comes with an element of risk as existing DT may need to be fixed for this in a similar way as we have done in the recent past for other PHY drivers that have misinterpreted the RGMII interface modes. ==================== Link: https://lore.kernel.org/r/YdR3wYFkm4eJApwb@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05net: phy: marvell: configure RGMII delays for 88E1118Russell King (Oracle)
Corentin Labbe reports that the SSI 1328 does not work when allowing the PHY to operate at gigabit speeds, but does work with the generic PHY driver. This appears to be because m88e1118_config_init() writes a fixed value to the MSCR register, claiming that this is to enable 1G speeds. However, this always sets bits 4 and 5, enabling RGMII transmit and receive delays. The suspicion is that the original board this was added for required the delays to make 1G speeds work. Add the necessary configuration for RGMII delays for the 88E1118 to bring this into line with the requirements for RGMII support, and thus make the SSI 1328 work. Corentin Labbe has tested this on gemini-ssi1328 and gemini-ns2502. Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05net: phy: marvell: use phy_write_paged() to set MSCRRussell King (Oracle)
Use phy_write_paged() in m88e1118_config_init() to set the MSCR value. We leave the other paged write for the LEDs in case the DT register parsing is relying on this page. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05selftests: set amt.sh executableTaehee Yoo
amt.sh test script will not work because it doesn't have execution permission. So, it adds execution permission. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: c08e8baea78e ("selftests: add amt interface selftest script") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220105144436.13415-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05RDMA/uverbs: Check for null return of kmalloc_arrayJiasheng Jiang
Because of the possible failure of the allocation, data might be NULL pointer and will cause the dereference of the NULL pointer later. Therefore, it might be better to check it and return -ENOMEM. Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api") Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', ↵Catalin Marinas
'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (32 commits) arm64: perf: Don't register user access sysctl handler multiple times drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check perf/smmuv3: Fix unused variable warning when CONFIG_OF=n arm64: perf: Support new DT compatibles arm64: perf: Simplify registration boilerplate arm64: perf: Support Denver and Carmel PMUs drivers/perf: hisi: Add driver for HiSilicon PCIe PMU docs: perf: Add description for HiSilicon PCIe PMU driver dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings drivers: perf: Add LLC-TAD perf counter support perf/smmuv3: Synthesize IIDR from CoreSight ID registers perf/smmuv3: Add devicetree support dt-bindings: Add Arm SMMUv3 PMCG binding perf/arm-cmn: Add debugfs topology info perf/arm-cmn: Add CI-700 Support dt-bindings: perf: arm-cmn: Add CI-700 perf/arm-cmn: Support new IP features perf/arm-cmn: Demarcate CMN-600 specifics perf/arm-cmn: Move group validation data off-stack perf/arm-cmn: Optimise DTC counter accesses ... * for-next/misc: : Miscellaneous patches arm64: Use correct method to calculate nomap region boundaries arm64: Drop outdated links in comments arm64: errata: Fix exec handling in erratum 1418040 workaround arm64: Unhash early pointer print plus improve comment asm-generic: introduce io_stop_wc() and add implementation for ARM64 arm64: remove __dma_*_area() aliases docs/arm64: delete a space from tagged-address-abi arm64/fp: Add comments documenting the usage of state restore functions arm64: mm: Use asid feature macro for cheanup arm64: mm: Rename asid2idx() to ctxid2asid() arm64: kexec: reduce calls to page_address() arm64: extable: remove unused ex_handler_t definition arm64: entry: Use SDEI event constants arm64: Simplify checking for populated DT arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c * for-next/cache-ops-dzp: : Avoid DC instructions when DCZID_EL0.DZP == 1 arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1 arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1 * for-next/stacktrace: : Unify the arm64 unwind code arm64: Make some stacktrace functions private arm64: Make dump_backtrace() use arch_stack_walk() arm64: Make profile_pc() use arch_stack_walk() arm64: Make return_address() use arch_stack_walk() arm64: Make __get_wchan() use arch_stack_walk() arm64: Make perf_callchain_kernel() use arch_stack_walk() arm64: Mark __switch_to() as __sched arm64: Add comment for stack_info::kr_cur arch: Make ARCH_STACKWALK independent of STACKTRACE * for-next/xor-neon: : Use SHA3 instructions to speed up XOR arm64/xor: use EOR3 instructions when available * for-next/kasan: : Log potential KASAN shadow aliases arm64: mm: log potential KASAN shadow alias arm64: mm: use die_kernel_fault() in do_mem_abort() * for-next/armv8_7-fp: : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES arm64: cpufeature: add HWCAP for FEAT_RPRES arm64: add ID_AA64ISAR2_EL1 sys register arm64: cpufeature: add HWCAP for FEAT_AFP * for-next/atomics: : arm64 atomics clean-ups and codegen improvements arm64: atomics: lse: define RETURN ops in terms of FETCH ops arm64: atomics: lse: improve constraints for simple ops arm64: atomics: lse: define ANDs in terms of ANDNOTs arm64: atomics lse: define SUBs in terms of ADDs arm64: atomics: format whitespace consistently * for-next/bti: : BTI clean-ups arm64: Ensure that the 'bti' macro is defined where linkage.h is included arm64: Use BTI C directly and unconditionally arm64: Unconditionally override SYM_FUNC macros arm64: Add macro version of the BTI instruction arm64: ftrace: add missing BTIs arm64: kexec: use __pa_symbol(empty_zero_page) arm64: update PAC description for kernel * for-next/sve: : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions arm64/sve: Minor clarification of ABI documentation arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Make sysctl interface for SVE reusable by SME * for-next/kselftest: : arm64 kselftest additions kselftest/arm64: Add pidbench for floating point syscall cases kselftest/arm64: Add a test program to exercise the syscall ABI kselftest/arm64: Allow signal tests to trigger from a function kselftest/arm64: Parameterise ptrace vector length information * for-next/kcsan: : Enable KCSAN for arm64 arm64: Enable KCSAN
2022-01-05Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"Aaron Ma
This reverts commit f77b83b5bbab53d2be339184838b19ed2c62c0a5. This change breaks multiple usb to ethernet dongles attached on Lenovo USB hub. Fixes: f77b83b5bbab ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Link: https://lore.kernel.org/r/20220105155102.8557-1-aaron.ma@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05mm: Make SLAB_MERGE_DEFAULT depend on SL[AU]BHyeonggon Yoo
SLOB always manage objects of different caches in same page regardless of SLAB_MERGE_DEFAULT. Because it has no effect on SLOB, make it depend on SLAB || SLUB. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20211225060921.13584-1-42.hyeyoo@gmail.com
2022-01-05netlink: do not allocate a device refcount tracker in ethnl_default_notify()Eric Dumazet
As reported by Johannes, the tracker allocated in ethnl_default_notify() is not really needed, as this function is not expected to change a device reference count. Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20220105170849.2610470-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05net/sched: add missing tracker information in qdisc_create()Eric Dumazet
qdisc_create() error path needs to use dev_put_track() because qdisc_alloc() allocated the tracker. Fixes: 606509f27f67 ("net/sched: add net device refcount tracker to struct Qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220104170439.3790052-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05cpuidle: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the cpuidle sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-05Merge tag 'gpio-fixes-for-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "Here are two last fixes for this release cycle from the GPIO subsystem: - fix irq offset calculation in gpio-aspeed-sgpio - update the MAINTAINERS entry for gpio-brcmstb" * tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: update gpio-brcmstb maintainers gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
2022-01-05Merge tag 'ieee802154-for-net-2022-01-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2022-01-05 Below I have a last minute fix for the atusb driver. Pavel fixes a KASAN uninit report for the driver. This version is the minimal impact fix to ease backporting. A bigger rework of the driver to avoid potential similar problems is ongoing and will come through net-next when ready. * tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: ieee802154: atusb: fix uninit value in atusb_set_extended_addr ==================== Link: https://lore.kernel.org/r/20220105153914.512305-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05netfilter: ipt_CLUSTERIP: fix refcount leak in clusterip_tg_check()Xin Xiong
The issue takes place in one error path of clusterip_tg_check(). When memcmp() returns nonzero, the function simply returns the error code, forgetting to decrease the reference count of a clusterip_config object, which is bumped earlier by clusterip_config_find_get(). This may incur reference count leak. Fix this issue by decrementing the refcount of the object in specific error path. Fixes: 06aa151ad1fc74 ("netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-05Merge branch 'dsa-notifier-cleanup'David S. Miller
Vladimir Oltean says: ==================== DSA cross-chip notifier cleanup This series deletes the no-op cross-chip notifier support for MRP and HSR, features which were introduced relatively recently and did not get full review at the time. The new code is functionally equivalent, but simpler. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Cc: George McCollister <george.mccollister@gmail.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: remove cross-chip support for HSRVladimir Oltean
The cross-chip notifiers for HSR are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: George McCollister <george.mccollister@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: George McCollister <george.mccollister@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: remove cross-chip support for MRPVladimir Oltean
The cross-chip notifiers for MRP are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: fix incorrect function pointer check for MRP ring rolesVladimir Oltean
The cross-chip notifier boilerplate code meant to check the presence of ds->ops->port_mrp_add_ring_role before calling it, but checked ds->ops->port_mrp_add instead, before calling ds->ops->port_mrp_add_ring_role. Therefore, a driver which implements one operation but not the other would trigger a NULL pointer dereference. There isn't any such driver in DSA yet, so there is no reason to backport the change. Issue found through code inspection. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Fixes: c595c4330da0 ("net: dsa: add MRP support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05mlxsw: pci: Avoid flow control for EMAD packetsDanielle Ratson
Locally generated packets ingress the device through its CPU port. When the CPU port is congested and there are not enough credits in its headroom buffer, packets can be dropped. While this might be acceptable for data packets that traverse the network, configuration packets exchanged between the host and the device (EMADs) should not be subjected to this flow control. The "sdq_lp" bit in the SDQ (Send Descriptor Queue) context allows the host to instruct the device to treat packets sent on this queue as "local processing" and always process them, regardless of the state of the CPU port's headroom. Add the definition of this bit and set it for the dedicated SDQ reserved for the transmission of EMAD packets. This makes the "local processing" bit in the WQE (Work Queue Element) redundant, so clear it. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05Merge tag 'linux-can-next-for-5.17-20220105' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-01-05 this is a pull request of 15 patches for net-next/master. The first patch is by me and removed an unused variable from the usb_8dev driver. Andy Shevchenko contributes a patch for the mcp251x driver, which removes an unneeded assignment. Jimmy Assarsson's patch for the kvaser_usb makes use of units.h in the assignment of frequencies. Lad Prabhakar provides 2 patches, converting the ti_hecc and the sja1000 driver to make use of platform_get_irq(). The 10 remaining patches are by Vincent Mailhol. First the etas_es58x driver populates the net_device::dev_port. The next 5 patches cleanup the handling of CAN error and CAN RTR messages of all drivers. The remaining 4 patches enhance the CAN controller mode flag handling and export it via netlink to user space. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05Merge branch 'dsa-cleanups'David S. Miller
Vladimir Oltean says: ==================== Cleanup to main DSA structures This series contains changes that do the following: - struct dsa_port reduced from 576 to 544 bytes, and first cache line a bit better organized - struct dsa_switch from 160 to 136 bytes, and first cache line a bit better organized - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a bit better organized ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: combine two holes in struct dsa_switch_treeVladimir Oltean
There is a 7 byte hole after dst->setup and a 4 byte hole after dst->default_proto. Combining them, we have a single hole of just 3 bytes on 64 bit machines. Before: pahole -C dsa_switch_tree net/dsa/slave.o struct dsa_switch_tree { struct list_head list; /* 0 16 */ struct list_head ports; /* 16 16 */ struct raw_notifier_head nh; /* 32 8 */ unsigned int index; /* 40 4 */ struct kref refcount; /* 44 4 */ struct net_device * * lags; /* 48 8 */ bool setup; /* 56 1 */ /* XXX 7 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ const struct dsa_device_ops * tag_ops; /* 64 8 */ enum dsa_tag_protocol default_proto; /* 72 4 */ /* XXX 4 bytes hole, try to pack */ struct dsa_platform_data * pd; /* 80 8 */ struct list_head rtable; /* 88 16 */ unsigned int lags_len; /* 104 4 */ unsigned int last_switch; /* 108 4 */ /* size: 112, cachelines: 2, members: 13 */ /* sum members: 101, holes: 2, sum holes: 11 */ /* last cacheline: 48 bytes */ }; After: pahole -C dsa_switch_tree net/dsa/slave.o struct dsa_switch_tree { struct list_head list; /* 0 16 */ struct list_head ports; /* 16 16 */ struct raw_notifier_head nh; /* 32 8 */ unsigned int index; /* 40 4 */ struct kref refcount; /* 44 4 */ struct net_device * * lags; /* 48 8 */ const struct dsa_device_ops * tag_ops; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ enum dsa_tag_protocol default_proto; /* 64 4 */ bool setup; /* 68 1 */ /* XXX 3 bytes hole, try to pack */ struct dsa_platform_data * pd; /* 72 8 */ struct list_head rtable; /* 80 16 */ unsigned int lags_len; /* 96 4 */ unsigned int last_switch; /* 100 4 */ /* size: 104, cachelines: 2, members: 13 */ /* sum members: 101, holes: 1, sum holes: 3 */ /* last cacheline: 40 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: move dsa_switch_tree :: ports and lags to first cache lineVladimir Oltean
dst->ports is accessed most notably by dsa_master_find_slave(), which is invoked in the RX path. dst->lags is accessed by dsa_lag_dev(), which is invoked in the RX path of tag_dsa.c. dst->tag_ops, dst->default_proto and dst->pd don't need to be in the first cache line, so they are moved out by this change. Before: pahole -C dsa_switch_tree net/dsa/slave.o struct dsa_switch_tree { struct list_head list; /* 0 16 */ struct raw_notifier_head nh; /* 16 8 */ unsigned int index; /* 24 4 */ struct kref refcount; /* 28 4 */ bool setup; /* 32 1 */ /* XXX 7 bytes hole, try to pack */ const struct dsa_device_ops * tag_ops; /* 40 8 */ enum dsa_tag_protocol default_proto; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ struct dsa_platform_data * pd; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct list_head ports; /* 64 16 */ struct list_head rtable; /* 80 16 */ struct net_device * * lags; /* 96 8 */ unsigned int lags_len; /* 104 4 */ unsigned int last_switch; /* 108 4 */ /* size: 112, cachelines: 2, members: 13 */ /* sum members: 101, holes: 2, sum holes: 11 */ /* last cacheline: 48 bytes */ }; After: pahole -C dsa_switch_tree net/dsa/slave.o struct dsa_switch_tree { struct list_head list; /* 0 16 */ struct list_head ports; /* 16 16 */ struct raw_notifier_head nh; /* 32 8 */ unsigned int index; /* 40 4 */ struct kref refcount; /* 44 4 */ struct net_device * * lags; /* 48 8 */ bool setup; /* 56 1 */ /* XXX 7 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ const struct dsa_device_ops * tag_ops; /* 64 8 */ enum dsa_tag_protocol default_proto; /* 72 4 */ /* XXX 4 bytes hole, try to pack */ struct dsa_platform_data * pd; /* 80 8 */ struct list_head rtable; /* 88 16 */ unsigned int lags_len; /* 104 4 */ unsigned int last_switch; /* 108 4 */ /* size: 112, cachelines: 2, members: 13 */ /* sum members: 101, holes: 2, sum holes: 11 */ /* last cacheline: 48 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: make dsa_switch :: num_ports an unsigned intVladimir Oltean
Currently, num_ports is declared as size_t, which is defined as __kernel_ulong_t, therefore it occupies 8 bytes of memory. Even switches with port numbers in the range of tens are exotic, so there is no need for this amount of storage. Additionally, because the max_num_bridges member right above it is also 4 bytes, it means the compiler needs to add padding between the last 2 fields. By reducing the size, we don't need that padding and can reduce the struct size. Before: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device * dev; /* 0 8 */ struct dsa_switch_tree * dst; /* 8 8 */ unsigned int index; /* 16 4 */ u32 setup:1; /* 20: 0 4 */ u32 vlan_filtering_is_global:1; /* 20: 1 4 */ u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */ u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */ u32 untag_bridge_pvid:1; /* 20: 4 4 */ u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */ u32 vlan_filtering:1; /* 20: 6 4 */ u32 pcs_poll:1; /* 20: 7 4 */ u32 mtu_enforcement_ingress:1; /* 20: 8 4 */ /* XXX 23 bits hole, try to pack */ struct notifier_block nb; /* 24 24 */ /* XXX last struct has 4 bytes of padding */ void * priv; /* 48 8 */ void * tagger_data; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct dsa_chip_data * cd; /* 64 8 */ const struct dsa_switch_ops * ops; /* 72 8 */ u32 phys_mii_mask; /* 80 4 */ /* XXX 4 bytes hole, try to pack */ struct mii_bus * slave_mii_bus; /* 88 8 */ unsigned int ageing_time_min; /* 96 4 */ unsigned int ageing_time_max; /* 100 4 */ struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */ struct devlink * devlink; /* 112 8 */ unsigned int num_tx_queues; /* 120 4 */ unsigned int num_lag_ids; /* 124 4 */ /* --- cacheline 2 boundary (128 bytes) --- */ unsigned int max_num_bridges; /* 128 4 */ /* XXX 4 bytes hole, try to pack */ size_t num_ports; /* 136 8 */ /* size: 144, cachelines: 3, members: 27 */ /* sum members: 132, holes: 2, sum holes: 8 */ /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 16 bytes */ }; After: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device * dev; /* 0 8 */ struct dsa_switch_tree * dst; /* 8 8 */ unsigned int index; /* 16 4 */ u32 setup:1; /* 20: 0 4 */ u32 vlan_filtering_is_global:1; /* 20: 1 4 */ u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */ u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */ u32 untag_bridge_pvid:1; /* 20: 4 4 */ u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */ u32 vlan_filtering:1; /* 20: 6 4 */ u32 pcs_poll:1; /* 20: 7 4 */ u32 mtu_enforcement_ingress:1; /* 20: 8 4 */ /* XXX 23 bits hole, try to pack */ struct notifier_block nb; /* 24 24 */ /* XXX last struct has 4 bytes of padding */ void * priv; /* 48 8 */ void * tagger_data; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct dsa_chip_data * cd; /* 64 8 */ const struct dsa_switch_ops * ops; /* 72 8 */ u32 phys_mii_mask; /* 80 4 */ /* XXX 4 bytes hole, try to pack */ struct mii_bus * slave_mii_bus; /* 88 8 */ unsigned int ageing_time_min; /* 96 4 */ unsigned int ageing_time_max; /* 100 4 */ struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */ struct devlink * devlink; /* 112 8 */ unsigned int num_tx_queues; /* 120 4 */ unsigned int num_lag_ids; /* 124 4 */ /* --- cacheline 2 boundary (128 bytes) --- */ unsigned int max_num_bridges; /* 128 4 */ unsigned int num_ports; /* 132 4 */ /* size: 136, cachelines: 3, members: 27 */ /* sum members: 128, holes: 1, sum holes: 4 */ /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 8 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: merge all bools of struct dsa_switch into a single u32Vladimir Oltean
struct dsa_switch has 9 boolean properties, many of which are in fact set by drivers for custom behavior (vlan_filtering_is_global, needs_standalone_vlan_filtering, etc etc). The binary layout of the structure could be improved. For example, the "bool setup" at the beginning introduces a gratuitous 7 byte hole in the first cache line. The change merges all boolean properties into bitfields of an u32, and places that u32 in the first cache line of the structure, since many bools are accessed from the data path (untag_bridge_pvid, vlan_filtering, vlan_filtering_is_global). We place this u32 after the existing ds->index, which is also 4 bytes in size. As a positive side effect, ds->tagger_data now fits into the first cache line too, because 4 bytes are saved. Before: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { bool setup; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ struct device * dev; /* 8 8 */ struct dsa_switch_tree * dst; /* 16 8 */ unsigned int index; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ struct notifier_block nb; /* 32 24 */ /* XXX last struct has 4 bytes of padding */ void * priv; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ void * tagger_data; /* 64 8 */ struct dsa_chip_data * cd; /* 72 8 */ const struct dsa_switch_ops * ops; /* 80 8 */ u32 phys_mii_mask; /* 88 4 */ /* XXX 4 bytes hole, try to pack */ struct mii_bus * slave_mii_bus; /* 96 8 */ unsigned int ageing_time_min; /* 104 4 */ unsigned int ageing_time_max; /* 108 4 */ struct dsa_8021q_context * tag_8021q_ctx; /* 112 8 */ struct devlink * devlink; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ unsigned int num_tx_queues; /* 128 4 */ bool vlan_filtering_is_global; /* 132 1 */ bool needs_standalone_vlan_filtering; /* 133 1 */ bool configure_vlan_while_not_filtering; /* 134 1 */ bool untag_bridge_pvid; /* 135 1 */ bool assisted_learning_on_cpu_port; /* 136 1 */ bool vlan_filtering; /* 137 1 */ bool pcs_poll; /* 138 1 */ bool mtu_enforcement_ingress; /* 139 1 */ unsigned int num_lag_ids; /* 140 4 */ unsigned int max_num_bridges; /* 144 4 */ /* XXX 4 bytes hole, try to pack */ size_t num_ports; /* 152 8 */ /* size: 160, cachelines: 3, members: 27 */ /* sum members: 141, holes: 4, sum holes: 19 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 32 bytes */ }; After: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device * dev; /* 0 8 */ struct dsa_switch_tree * dst; /* 8 8 */ unsigned int index; /* 16 4 */ u32 setup:1; /* 20: 0 4 */ u32 vlan_filtering_is_global:1; /* 20: 1 4 */ u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */ u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */ u32 untag_bridge_pvid:1; /* 20: 4 4 */ u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */ u32 vlan_filtering:1; /* 20: 6 4 */ u32 pcs_poll:1; /* 20: 7 4 */ u32 mtu_enforcement_ingress:1; /* 20: 8 4 */ /* XXX 23 bits hole, try to pack */ struct notifier_block nb; /* 24 24 */ /* XXX last struct has 4 bytes of padding */ void * priv; /* 48 8 */ void * tagger_data; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct dsa_chip_data * cd; /* 64 8 */ const struct dsa_switch_ops * ops; /* 72 8 */ u32 phys_mii_mask; /* 80 4 */ /* XXX 4 bytes hole, try to pack */ struct mii_bus * slave_mii_bus; /* 88 8 */ unsigned int ageing_time_min; /* 96 4 */ unsigned int ageing_time_max; /* 100 4 */ struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */ struct devlink * devlink; /* 112 8 */ unsigned int num_tx_queues; /* 120 4 */ unsigned int num_lag_ids; /* 124 4 */ /* --- cacheline 2 boundary (128 bytes) --- */ unsigned int max_num_bridges; /* 128 4 */ /* XXX 4 bytes hole, try to pack */ size_t num_ports; /* 136 8 */ /* size: 144, cachelines: 3, members: 27 */ /* sum members: 132, holes: 2, sum holes: 8 */ /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 16 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>