linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-01-06	mm/slub: Make object_err() static	Vlastimil Babka
	There are no callers outside of mm/slub.c anymore. Move freelist_corrupted() that calls object_err() to avoid a need for forward declaration. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06	mm/slab: Dissolve slab_map_pages() in its caller	Vlastimil Babka
	The function no longer does what its name and comment suggests, and just sets two struct page fields, which can be done directly in its sole caller. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06	netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone	Florian Westphal
	This is needed in case a new transaction is made that doesn't insert any new elements into an already existing set. Else, after second 'nft -f ruleset.txt', lookups in such a set will fail because ->lookup() encounters raw_cpu_ptr(m->scratch) == NULL. For the initial rule load, insertion of elements takes care of the allocation, but for rule reloads this isn't guaranteed: we might not have additions to the set. Fixes: 3c4287f62044a90e ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: etkaar <lists.netfilter.org@prvy.eu> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06	netfilter: nft_payload: do not update layer 4 checksum when mangling fragments	Pablo Neira Ayuso
	IP fragments do not come with the transport header, hence skip bogus layer 4 checksum updates. Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields") Reported-and-tested-by: Steffen Weinreich <steve@weinreich.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06	selftests: netfilter: switch to socat for tests using -q option	Hangbin Liu
	The nc cmd(nmap-ncat) that distributed with Fedora/Red Hat does not have option -q. This make some tests failed with: nc: invalid option -- 'q' Let's switch to socat which is far more dependable. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-05	xdp: Add xdp_do_redirect_frame() for pre-computed xdp_frames	Toke Høiland-Jørgensen
	Add an xdp_do_redirect_frame() variant which supports pre-computed xdp_frame structures. This will be used in bpf_prog_run() to avoid having to write to the xdp_frame structure when the XDP program doesn't modify the frame boundaries. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-6-toke@redhat.com
2022-01-05	xdp: Move conversion to xdp_frame out of map functions	Toke Høiland-Jørgensen
	All map redirect functions except XSK maps convert xdp_buff to xdp_frame before enqueueing it. So move this conversion of out the map functions and into xdp_do_redirect(). This removes a bit of duplicated code, but more importantly it makes it possible to support caller-allocated xdp_frame structures, which will be added in a subsequent commit. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-5-toke@redhat.com
2022-01-05	page_pool: Store the XDP mem id	Toke Høiland-Jørgensen
	Store the XDP mem ID inside the page_pool struct so it can be retrieved later for use in bpf_prog_run(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com
2022-01-05	page_pool: Add callback to init pages when they are allocated	Toke Høiland-Jørgensen
	Add a new callback function to page_pool that, if set, will be called every time a new page is allocated. This will be used from bpf_test_run() to initialise the page data with the data provided by userspace when running XDP programs with redirect turned on. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com
2022-01-05	xdp: Allow registering memory model without rxq reference	Toke Høiland-Jørgensen
	The functions that register an XDP memory model take a struct xdp_rxq as parameter, but the RXQ is not actually used for anything other than pulling out the struct xdp_mem_info that it embeds. So refactor the register functions and export variants that just take a pointer to the xdp_mem_info. This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a page_pool instance that is not connected to any network device. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com
2022-01-05	Merge branch 'samples/bpf: xdpsock app enhancements'	Alexei Starovoitov
	Ong Boon says: ==================== First of all, sorry for taking more time to get back to this series and thanks to all valuble feedback in series-1 at [1] from Jesper and Song Liu. Since then I have looked into what Jesper suggested in [2] and worked on revising the patch series into several patches for ease of review: v1->v2: 1/7: [No change]. Add VLAN tag (ID & Priority) to the generated Tx-Only frames. 2/7: [No change]. Add DMAC and SMAC setting to the generated Tx-Only frames. If parameters are not set, previous DMAC and SMAC are used. 3/7: [New]. Add support for selecting different CLOCK for clock_gettime() used in get_nsecs. 4/7: [New]. This is a total rework from series-1 3/4-patch [3]. It uses clock_nanosleep() suggested by Jesper. In addition, added statistic for Tx schedule variance under application stat (-a\|--app-stats). Make the cyclic Tx operation and --poll mode to be mutually- exclusive. Still, the ability to specify TX cycle time and used together with batch size and packet count remain the same. 5/7: [New]. Add the support for TX process schedule policy and priority setting. By default, SCHED_OTHER policy is used. This too is matching the schedule policy setting in [2]. 6/7: [Change]. This is update from series-1 4/4-patch [4]. Added TX clean process time-out in 1s granularity with configurable retries count (-O\|--retries). 7/7: [New]. Added timestamp for TX packet following pktgen_hdr format matching the implementation in [2]. However, the sequence ID remains the same as it is instead of process schedule diff in [2]. To summarize on what program options have been added with v2 series using an example below:- DMAC (-G) = fa:8d:f1:e2:0b:e8 SMAC (-H) = ce:17:07:17:3e:3a VLAN tagged (-V) VLAN ID (-J) = 12 VLAN Pri (-K) = 3 Tx Queue (-q) = 3 Cycle Time in us (-T) = 1000 Batch (-b) = 2 Packet Count = 6 Tx schedule policy (-W) = FIFO Tx schedule priority (-U) = 50 Clock selection (-w) = REALTIME Tx timeout retries(-O) = 5 Tx timestamp (-y) Cyclic Tx schedule stat (-a) Note: xdpsock sets UDP dest-port and src-port to 0x1000 as default. Sending Board ============= $ xdpsock -i eth0 -t -N -z -H ce:17:07:17:3e:3a -G fa:8d:f1:e2:0b:e8 \ -V -J 12 -K 3 -q 3 \ -T 1000 -b 2 -C 6 -W FIFO -U 50 -w REALTIME \ -O 5 -y -a sock0@eth0:3 txonly xdp-drv pps pkts 0.00 rx 0 0 tx 0 6 calls/s count rx empty polls 0 0 fill fail polls 0 0 copy tx sendtos 0 0 tx wakeup sendtos 0 5 opt polls 0 0 period min ave max cycle Cyclic TX 1000000 31033 32009 33397 3 Receiving Board =============== $ tcpdump -nei eth0 udp port 0x1000 -vv -Q in -X \ --time-stamp-precision nano tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 03:46:40.520111580 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e997 be9b e955 ...............U 0x0020: 0000 0000 61cd 2ba1 0006 987c ....a.+....\| 03:46:40.520112163 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e996 be9b e955 ...............U 0x0020: 0000 0001 61cd 2ba1 0006 987c ....a.+....\| 03:46:40.521066860 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e5af be9b e955 ...............U 0x0020: 0000 0002 61cd 2ba1 0006 9c62 ....a.+....b 03:46:40.521067012 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e5ae be9b e955 ...............U 0x0020: 0000 0003 61cd 2ba1 0006 9c62 ....a.+....b 03:46:40.522061935 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e1c5 be9b e955 ...............U 0x0020: 0000 0004 61cd 2ba1 0006 a04a ....a.+....J 03:46:40.522062173 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e1c4 be9b e955 ...............U 0x0020: 0000 0005 61cd 2ba1 0006 a04a ....a.+....J I have tested the above with both tagged and untagged packet format and based on the timestamp in tcpdump found that the timing of the batch cyclic transmission is correct. Appreciate if community can give the patch series v2 a try and point out any gap. Thanks Boon Leong [1] https://patchwork.kernel.org/project/netdevbpf/cover/20211124091821.3916046-1-boon.leong.ong@intel.com/ [2] https://github.com/netoptimizer/network-testing/blob/master/src/udp_pacer.c [3] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-4-boon.leong.ong@intel.com/ [4] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-5-boon.leong.ong@intel.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05	samples/bpf: xdpsock: Add timestamp for Tx-only operation	Ong Boon Leong
	It may be useful to add timestamp for Tx packets for continuous or cyclic transmit operation. The timestamp and sequence ID of a Tx packet are stored according to pktgen header format. To enable per-packet timestamp, use -y\|--tstamp option. If timestamp is off, pktgen header is not included in the UDP payload. This means receiving side can use the magic number for pktgen for differentiation. The implementation supports both VLAN tagged and untagged option. By default, the minimum packet size is set at 64B. However, if VLAN tagged is on (-V), the minimum packet size is increased to 66B just so to fit the pktgen_hdr size. Added hex_dump() into the code path just for future cross-checking. As before, simply change to "#define DEBUG_HEXDUMP 1" to inspect the accuracy of TX packet. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-8-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add time-out for cleaning Tx	Ong Boon Leong
	When user sets tx-pkt-count and in case where there are invalid Tx frame, the complete_tx_only_all() process polls indefinitely. So, this patch adds a time-out mechanism into the process so that the application can terminate automatically after it retries 3*polling interval duration. v1->v2: Thanks to Jesper's and Song Liu's suggestion. - clean-up git message to remove polling log - make the Tx time-out retries configurable with 1s granularity Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-7-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add sched policy and priority support	Ong Boon Leong
	By default, TX schedule policy is SCHED_OTHER (round-robin time-sharing). To improve TX cyclic scheduling, we add SCHED_FIFO policy and its priority by using -W FIFO or --policy=FIFO and -U <PRIO> or --schpri=<PRIO>. A) From xdpsock --app-stats, for SCHED_OTHER policy: $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a period min ave max cycle Cyclic TX 1000000 53507 75334 712642 6250 B) For SCHED_FIFO policy and schpri=50: $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a -W FIFO -U 50 period min ave max cycle Cyclic TX 1000000 3699 24859 54397 6250 Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-6-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add cyclic TX operation capability	Ong Boon Leong
	Tx cycle time is in micro-seconds unit. By combining the batch size (-b M) and Tx cycle time (-T\|--tx-cycle N), xdpsock now can transmit batch-size of packets every N-us periodically. Cyclic TX operation is not applicable if --poll mode is used. To transmit 16 packets every 1ms cycle time for total of 100000 packets silently: $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 To print cyclic TX schedule variance stats, use --app-stats\|-a: $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 -a sock0@eth0:0 txonly xdp-drv pps pkts 0.00 rx 0 0 tx 0 100000 calls/s count rx empty polls 0 0 fill fail polls 0 0 copy tx sendtos 0 0 tx wakeup sendtos 0 6254 opt polls 0 0 period min ave max cycle Cyclic TX 1000000 53507 75334 712642 6250 Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-5-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add clockid selection support	Ong Boon Leong
	User specifies the clock selection by using -w CLOCK or --clock=CLOCK where CLOCK=[REALTIME, TAI, BOOTTIME, MONOTONIC]. The default CLOCK selection is MONOTONIC. The implementation of clock selection parsing is borrowed from iproute2/tc/q_taprio.c Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-4-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add Dest and Src MAC setting for Tx-only operation	Ong Boon Leong
	To set Dest MAC address (-G\|--tx-dmac) only: $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff To set Source MAC address (-H\|--tx-smac) only: $ xdpsock -i eth0 -t -N -z -H 11:22:33:44:55:66 To set both Dest and Source MAC address: $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff \ -H 11:22:33:44:55:66 The default Dest and Source MAC address remain the same as before. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20211230035447.523177-3-boon.leong.ong@intel.com
2022-01-05	samples/bpf: xdpsock: Add VLAN support for Tx-only operation	Ong Boon Leong
	In multi-queue environment testing, the support for VLAN-tag based steering is useful. So, this patch adds the capability to add VLAN tag (VLAN ID and Priority) to the generated Tx frame. To set the VLAN ID=10 and Priority=2 for Tx only through TxQ=3: $ xdpsock -i eth0 -t -N -z -q 3 -V -J 10 -K 2 If VLAN ID (-J) and Priority (-K) is set, it default to VLAN ID = 1 VLAN Priority = 0. For example, VLAN-tagged Tx only, xdp copy mode through TxQ=1: $ xdpsock -i eth0 -t -N -c -q 1 -V Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211230035447.523177-2-boon.leong.ong@intel.com
2022-01-05	Merge branch 'net-lantiq_xrx200-improve-ethernet-performance'	Jakub Kicinski
	Aleksander Jan Bajkowski says: ==================== net: lantiq_xrx200: improve ethernet performance This patchset improves Ethernet performance by 15%. NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 539 Mbps 599 Mbps After 624 Mbps 695 Mbps ==================== Link: https://lore.kernel.org/r/20220104151144.181736-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	net: lantiq_xrx200: convert to build_skb	Aleksander Jan Bajkowski
	We can increase the efficiency of rx path by using buffers to receive packets then build SKBs around them just before passing into the network stack. In contrast, preallocating SKBs too early reduces CPU cache efficiency. NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 577 Mbps 648 Mbps After 624 Mbps 695 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	net: lantiq_xrx200: increase napi poll weigth	Aleksander Jan Bajkowski
	NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 545 Mbps 625 Mbps After 577 Mbps 648 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	MIPS: lantiq: dma: increase descritor count	Aleksander Jan Bajkowski
	NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 539 Mbps 599 Mbps After 545 Mbps 625 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	testptp: set pin function before other requests	Miroslav Lichvar
	When the -L option of the testptp utility is specified with other options (e.g. -p to enable PPS output), the user probably wants to apply it to the pin configured by the -L option. Reorder the code to set the pin function before other function requests to avoid confusing users. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20220105152506.3256026-1-mlichvar@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	Merge tag 'linux-can-fixes-for-5.16-20220105' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-01-05 It consists of 2 patches, both by me. The first one fixes the use of an uninitialized variable in the gs_usb driver the other one a skb_over_panic in the ISOTP stack in case of reception of too large ISOTP messages. * tag 'linux-can-fixes-for-5.16-20220105' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: isotp: convert struct tpcon::{idx,len} to unsigned int can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data ==================== Link: https://lore.kernel.org/r/20220105205443.1274709-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	Merge tag 'socfpga_fix_for_v5.16_part_3' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes SoCFPGA dts updates for v5.16, part 3 - Change the SoCFPGA compatible to "intel,socfpga-qspi" - Update dt-bindings document to include "intel,socfpga-qspi" * tag 'socfpga_fix_for_v5.16_part_3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: (361 commits) ARM: dts: socfpga: change qspi to "intel,socfpga-qspi" dt-bindings: spi: cadence-quadspi: document "intel,socfpga-qspi" Linux 5.16-rc7 mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() mm/damon/dbgfs: protect targets destructions with kdamond_lock mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid mm: delete unsafe BUG from page_cache_add_speculative() mm, hwpoison: fix condition in free hugetlb page path MAINTAINERS: mark more list instances as moderated kernel/crash_core: suppress unknown crashkernel parameter warning mm: mempolicy: fix THP allocations escaping mempolicy restrictions kfence: fix memory leak when cat kfence objects platform/x86: intel_pmc_core: fix memleak on registration failure net: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M r8152: sync ocp base r8152: fix the force speed doesn't work for RTL8156 net: bridge: fix ioctl old_deviceless bridge argument net: stmmac: ptp: fix potentially overflowing expression net: dsa: tag_ocelot: use traffic class to map priority on injected header veth: ensure skb entering GRO are not cloned. ... Link: https://lore.kernel.org/r/20211227103644.566694-1-dinguyen@kernel.org Signed-off-by: Olof Johansson <olof@lixom.net>
2022-01-05	Merge tag 'reset-fixes-for-v5.16-2' of git://git.pengutronix.de/pza/linux ↵	Olof Johansson
	into arm/fixes Reset controller fixes for v5.16, part 2 Fix pm_runtime_resume_and_get() error handling in the reset-rzg2l-usbphy-ctrl driver. * tag 'reset-fixes-for-v5.16-2' of git://git.pengutronix.de/pza/linux: reset: renesas: Fix Runtime PM usage reset: tegra-bpmp: Revert Handle errors in BPMP response Link: https://lore.kernel.org/r/20220105172515.273947-1-p.zabel@pengutronix.de Signed-off-by: Olof Johansson <olof@lixom.net>
2022-01-05	libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API	Christy Lee
	API created with simplistic assumptions about BPF map definitions. It hasn’t worked for a while, deprecate it in preparation for libbpf 1.0. [0] Closes: https://github.com/libbpf/libbpf/issues/302 Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220105003120.2222673-1-christylee@fb.com
2022-01-05	libbpf 1.0: Deprecate bpf_map__is_offload_neutral()	Christy Lee
	Deprecate bpf_map__is_offload_neutral(). It’s most probably broken already. PERF_EVENT_ARRAY isn’t the only map that’s not suitable for hardware offloading. Applications can directly check map type instead. [0] Closes: https://github.com/libbpf/libbpf/issues/306 Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220105000601.2090044-1-christylee@fb.com
2022-01-05	tracing: Tag trace_percpu_buffer as a percpu pointer	Naveen N. Rao
	Tag trace_percpu_buffer as a percpu pointer to resolve warnings reported by sparse: /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3218:46: expected void const [noderef] __percpu __vpp_verify /linux/kernel/trace/trace.c:3218:46: got struct trace_buffer_struct /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3234:9: expected void const [noderef] __percpu __vpp_verify /linux/kernel/trace/trace.c:3234:9: got int Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: 07d777fe8c398 ("tracing: Add percpu buffers for trace_printk()") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05	tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()	Naveen N. Rao
	With the new osnoise tracer, we are seeing the below splat: Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0xc7d880000 Faulting instruction address: 0xc0000000002ffa10 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0 LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0 Call Trace: [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable) [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90 [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290 [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710 [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130 [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270 [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180 [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278 osnoise tracer on ppc64le is triggering osnoise_taint() for negative duration in get_int_safe_duration() called from trace_sched_switch_callback()->thread_exit(). The problem though is that the check for a valid trace_percpu_buffer is incorrect in get_trace_buf(). The check is being done after calculating the pointer for the current cpu, rather than on the main percpu pointer. Fix the check to be against trace_percpu_buffer. Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: e2ace001176dc9 ("tracing: Choose static tp_printk buffer by explicit nesting count") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05	libbpf: Support repeated legacy kprobes on same function	Qiang Wang
	If repeated legacy kprobes on same function in one process, libbpf will register using the same probe name and got -EBUSY error. So append index to the probe name format to fix this problem. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211227130713.66933-2-wangqiang.wq.frank@bytedance.com
2022-01-05	libbpf: Use probe_name for legacy kprobe	Qiang Wang
	Fix a bug in commit 46ed5fc33db9, which wrongly used the func_name instead of probe_name to register legacy kprobe. Fixes: 46ed5fc33db9 ("libbpf: Refactor and simplify legacy kprobe code") Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com> Link: https://lore.kernel.org/bpf/20211227130713.66933-1-wangqiang.wq.frank@bytedance.com
2022-01-05	ftrace/samples: Add missing prototypes direct functions	Jiri Olsa
	There's another compilation fail (first here [1]) reported by kernel test robot for W=1 clang build: >> samples/ftrace/ftrace-direct-multi-modify.c:7:6: warning: no previous prototype for function 'my_direct_func1' [-Wmissing-prototypes] void my_direct_func1(unsigned long ip) Direct functions in ftrace direct sample modules need to have prototypes defined. They are already global in order to be visible for the inline assembly, so there's no problem. The kernel test robot reported just error for ftrace-direct-multi-modify, but I got same errors also for the rest of the modules touched by this patch. [1] 67d4f6e3bf5d ftrace/samples: Add missing prototype for my_direct_func Link: https://lkml.kernel.org/r/20211219135317.212430-1-jolsa@kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface") Fixes: ae0cc3b7e7f5 ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()") Fixes: 156473a0ff4f ("ftrace: Add another example of register_ftrace_direct() use case") Fixes: b06457c83af6 ("ftrace: Add sample module that uses register_ftrace_direct()") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05	libbpf: Deprecate bpf_perf_event_read_simple() API	Christy Lee
	With perf_buffer__poll() and perf_buffer__consume() APIs available, there is no reason to expose bpf_perf_event_read_simple() API to users. If users need custom perf buffer, they could re-implement the function. Mark bpf_perf_event_read_simple() and move the logic to a new static function so it can still be called by other functions in the same file. [0] Closes: https://github.com/libbpf/libbpf/issues/310 Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211229204156.13569-1-christylee@fb.com
2022-01-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05	bpf: Add SO_RCVBUF/SO_SNDBUF in _bpf_getsockopt().	Kuniyuki Iwashima
	This patch exposes SO_RCVBUF/SO_SNDBUF through bpf_getsockopt(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220104013153.97906-3-kuniyu@amazon.co.jp
2022-01-05	bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().	Kuniyuki Iwashima
	The commit 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") added a change to prevent underflow in setsockopt() around SO_SNDBUF/SO_RCVBUF. This patch adds the same change to _bpf_setsockopt(). Fixes: 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220104013153.97906-2-kuniyu@amazon.co.jp
2022-01-05	Merge tag 'net-5.16-final' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski" "Networking fixes, including fixes from bpf, and WiFi. One last pull request, turns out some of the recent fixes did more harm than good. Current release - regressions: - Revert "xsk: Do not sleep in poll() when need_wakeup set", made the problem worse - Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register", broke EPROBE_DEFER handling - Revert "net: usb: r8152: Add MAC pass-through support for more Lenovo Docks", broke setups without a Lenovo dock Current release - new code bugs: - selftests: set amt.sh executable Previous releases - regressions: - batman-adv: mcast: don't send link-local multicast to mcast routers Previous releases - always broken: - ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY - sctp: hold endpoint before calling cb in sctp_transport_lookup_process - mac80211: mesh: embed mesh_paths and mpp_paths into ieee80211_if_mesh to avoid complicated handling of sub-object allocation failures - seg6: fix traceroute in the presence of SRv6 - tipc: fix a kernel-infoleak in __tipc_sendmsg()" * tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) selftests: set amt.sh executable Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks" sfc: The RX page_ring is optional iavf: Fix limit of total number of queues to active queues of VF i40e: Fix incorrect netdev's real number of RX/TX queues i40e: Fix for displaying message regarding NVM version i40e: fix use-after-free in i40e_sync_filters_subtask() i40e: Fix to not show opcode msg on unsuccessful VF MAC change ieee802154: atusb: fix uninit value in atusb_set_extended_addr mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc netrom: fix copying in user data in nr_setsockopt udp6: Use Segment Routing Header for dest address if present icmp: ICMPV6: Examine invoking packet for Segment Route Headers. seg6: export get_srh() for ICMP handling Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register" ipv6: Do cleanup if attribute validation fails in multipath route ipv6: Continue processing multipath route even if gateway attribute is invalid net/fsl: Remove leftover definition in xgmac_mdio ...
2022-01-05	bpf: Fix verifier support for validation of async callbacks	Kris Van Hees
	Commit bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.") added support for BPF_FUNC_timer_set_callback to the __check_func_call() function. The test in __check_func_call() is flaweed because it can mis-interpret a regular BPF-to-BPF pseudo-call as a BPF_FUNC_timer_set_callback callback call. Consider the conditional in the code: if (insn->code == (BPF_JMP \| BPF_CALL) && insn->imm == BPF_FUNC_timer_set_callback) { The BPF_FUNC_timer_set_callback has value 170. This means that if you have a BPF program that contains a pseudo-call with an instruction delta of 170, this conditional will be found to be true by the verifier, and it will interpret the pseudo-call as a callback. This leads to a mess with the verification of the program because it makes the wrong assumptions about the nature of this call. Solution: include an explicit check to ensure that insn->src_reg == 0. This ensures that calls cannot be mis-interpreted as an async callback call. Fixes: bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.") Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220105210150.GH1559@oracle.com
2022-01-05	bpf, docs: Fully document the JMP mode modifiers	Christoph Hellwig
	Add a description for all the modifiers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-7-hch@lst.de
2022-01-05	bpf, docs: Fully document the JMP opcodes	Christoph Hellwig
	Add pseudo-code to document all the different BPF_JMP / BPF_JMP64 opcodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-6-hch@lst.de
2022-01-05	bpf, docs: Fully document the ALU opcodes	Christoph Hellwig
	Add pseudo-code to document all the different BPF_ALU / BPF_ALU64 opcodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-5-hch@lst.de
2022-01-05	bpf, docs: Document the opcode classes	Christoph Hellwig
	Add a description for each opcode class. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-4-hch@lst.de
2022-01-05	bpf, docs: Add subsections for ALU and JMP instructions	Christoph Hellwig
	Add a little more stucture to the ALU/JMP documentation with sections and improve the example text. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-3-hch@lst.de
2022-01-05	bpf, docs: Add a setion to explain the basic instruction encoding	Christoph Hellwig
	The eBPF instruction set document does not currently document the basic instruction encoding. Add a section to do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103183556.41040-2-hch@lst.de
2022-01-05	can: isotp: convert struct tpcon::{idx,len} to unsigned int	Marc Kleine-Budde
	In isotp_rcv_ff() 32 bit of data received over the network is assigned to struct tpcon::len. Later in that function the length is checked for the maximal supported length against MAX_MSG_LENGTH. As struct tpcon::len is an "int" this check does not work, if the provided length overflows the "int". Later on struct tpcon::idx is compared against struct tpcon::len. To fix this problem this patch converts both struct tpcon::{idx,len} to unsigned int. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20220105132429.1170627-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-01-05	can: gs_usb: fix use of uninitialized variable, detach device on reception ↵	Marc Kleine-Budde
	of invalid USB data The received data contains the channel the received data is associated with. If the channel number is bigger than the actual number of channels assume broken or malicious USB device and shut it down. This fixes the error found by clang: \| drivers/net/can/usb/gs_usb.c:386:6: error: variable 'dev' is used \| uninitialized whenever 'if' condition is true \| if (hf->channel >= GS_MAX_INTF) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ \| drivers/net/can/usb/gs_usb.c:474:10: note: uninitialized use occurs here \| hf, dev->gs_hf_size, gs_usb_receive_bulk_callback, \| ^~~ Link: https://lore.kernel.org/all/20211210091158.408326-1-mkl@pengutronix.de Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-01-05	RDMA/core: Don't infoleak GRH fields	Leon Romanovsky
	If dst->is_global field is not set, the GRH fields are not cleared and the following infoleak is reported. ===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 vfs_write+0x8ce/0x2030 fs/read_write.c:588 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline] __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Local variable resp created at: ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 Bytes 40-59 of 144 are uninitialized Memory access of size 144 starts at ffff888167523b00 Data copied to user address 0000000020000100 CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ===================================================== Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr") Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05	bpf, selftests: Add verifier test for mem_or_null register with offset.	Daniel Borkmann
	Add a new test case with mem_or_null typed register with off > 0 to ensure it gets rejected by the verifier: # ./test_verifier 1011 #1009/u check with invalid reg offset 0 OK #1009/p check with invalid reg offset 0 OK Summary: 2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05	bpf: Don't promote bogus looking registers after null check.	Daniel Borkmann
	If we ever get to a point again where we convert a bogus looking <ptr>_or_null typed register containing a non-zero fixed or variable offset, then lets not reset these bounds to zero since they are not and also don't promote the register to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown register could be an avenue as well, but then if we run into this case it would allow to leak a kernel pointer this way. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>