summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-11bpf, sockmap: Remove dropped data on errors in redirect caseJohn Fastabend
In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226863689.5692.13861422742592309285.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: Remove skb_set_owner_w wmem will be taken later from sendpageJohn Fastabend
The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226861708.5692.17964237936462425136.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: On receive programs try to fast track SK_PASS ingressJohn Fastabend
When we receive an skb and the ingress skb verdict program returns SK_PASS we currently set the ingress flag and put it on the workqueue so it can be turned into a sk_msg and put on the sk_msg ingress queue. Then finally telling userspace with data_ready hook. Here we observe that if the workqueue is empty then we can try to convert into a sk_msg type and call data_ready directly without bouncing through a workqueue. Its a common pattern to have a recv verdict program for visibility that always returns SK_PASS. In this case unless there is an ENOMEM error or we overrun the socket we can avoid the workqueue completely only using it when we fall back to error cases caused by memory pressure. By doing this we eliminate another case where data may be dropped if errors occur on memory limits in workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: Skb verdict SK_PASS to self already checked rmem limitsJohn Fastabend
For sk_skb case where skb_verdict program returns SK_PASS to continue to pass packet up the stack, the memory limits were already checked before enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks here. The theory is if the TCP stack believes we have memory to receive the packet then lets trust the stack and not double check the limits. In fact the accounting here can cause a drop if sk_rmem_alloc has increased after the stack accepted this packet, but before the duplicate check here. And worse if this happens because TCP stack already believes the data has been received there is no retransmit. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226857664.5692.668205469388498375.stgit@john-Precision-5820-Tower
2020-10-12Merge tag 'amd-drm-fixes-5.10-2020-10-09' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-fixes-5.10-2020-10-09: amdgpu: - Clean up indirect register access - Navy Flounder fixes - SMU11 AC/DC interrupt fixes - GPUVM alignment fix - Display fixes - Misc other fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201009222810.4030-1-alexander.deucher@amd.com
2020-10-12kbuild: enforce -Werror=return-typeOlaf Hering
Catch errors which at least gcc tolerates by default: warning: 'return' with no value, in function returning non-void [-Wreturn-type] Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-12ipvs: clear skb->tstamp in forwarding pathJulian Anastasov
fq qdisc requires tstamp to be cleared in forwarding path Reported-by: Evgeny B <abt-admin@mail.ru> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427 Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12selftests: netfilter: extend nfqueue test caseFlorian Westphal
add a test with re-queueing: usespace doesn't pass accept verdict, but tells to re-queue to another nf_queue instance. Also, make the second nf-queue program use non-gso mode, kernel will have to perform software segmentation. Lastly, do not queue every packet, just one per second, and add delay when re-injecting the packet to the kernel. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: flowtable: reduce calls to pskb_may_pull()Pablo Neira Ayuso
Make two unfront calls to pskb_may_pull() to linearize the network and transport header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: nf_tables: add inet ingress supportPablo Neira Ayuso
This patch adds a new ingress hook for the inet family. The inet ingress hook emulates the IP receive path code, therefore, unclean packets are drop before walking over the ruleset in this basechain. This patch also introduces the nft_base_chain_netdev() helper function to check if this hook is bound to one or more devices (through the hook list infrastructure). This check allows to perform the same handling for the inet ingress as it would be a netdev ingress chain from the control plane. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add inet ingress supportPablo Neira Ayuso
This patch adds the NF_INET_INGRESS pseudohook for the NFPROTO_INET family. This is a mapping this new hook to the existing NFPROTO_NETDEV and NF_NETDEV_INGRESS hook. The hook does not guarantee that packets are inet only, users must filter out non-ip traffic explicitly. This infrastructure makes it easier to support this new hook in nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add nf_ingress_hook() helper functionPablo Neira Ayuso
Add helper function to check if this is an ingress hook. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add nf_static_key_{inc,dec}Pablo Neira Ayuso
Add helper functions increment and decrement the hook static keys. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12ipvs: inspect reply packets from DR/TUN real serverslongguang.yue
Just like for MASQ, inspect the reply packets coming from DR/TUN real servers and alter the connection's state and timeout according to the protocol. It's ipvs's duty to do traffic statistic if packets get hit, no matter what mode it is. Signed-off-by: longguang.yue <bigclouds@163.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12Merge tag 'drm-intel-next-fixes-2020-10-02' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Propagated from drm-intel-next-queued: - Fix CRTC state checker (Ville) Propated from drm-intel-gt-next: - Avoid implicit vmpa for highmem on 32b (Chris) - Prevent PAT attriutes for writecombine if CPU doesn't support PAT (Chris) - Clear the buffer pool age before use. (Chris) - Fix error code (Dan) - Break up error capture compression loops (Chris) - Fix uninitialized variable in context_create_request (Maarten) - Check for errors on i915_vm_alloc_pt_stash to avoid NULL dereference (Matt) - Serialize debugfs i915_gem_objects with ctx->mutex (Chris) - Fix a rebase mistake caused during drm-intel-gt-next creation (Chris) - Hold request reference for canceling an active context (Chris) - Heartbeats fixes (Chris) - Use usigned during batch copies (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201002182610.GA2204465@intel.com
2020-10-11um: vector: Add dynamic tap interfaces and scriptingAnton Ivanov
Provide functionality roughly compatible with the existing qemu ifup scripting: * invocation of an ifup script. The interface name is passed as the first and only argument * allocating tap interfaces on the fly if they are not explicitly specified Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Clean up stacktrace dumpJohannes Berg
We currently get a few stray newlines, due to the interaction between printk() and the code here. Remove a few explicit newline prints to neaten the output. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Fix incorrect assumptions about max pid lengthMaciej Żenczykowski
pids are no longer limited to 16-bits, bump to 32-bits, ie. 9 decimal characters. Additionally sizeof("/") already returns 2 - ie. it already accounts for trailing zero. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Linux UM Mailing List <linux-um@lists.infradead.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Remove dead usage of TIF_IA32Gabriel Krisman Bertazi
This seems like a dead artifact since TIF_IA32 is not even defined as a TI flag for UM. Looking back in git history, it made sense in the old days, but it is apparently not used since UM was split out of the x86 arch/. It is also going away from the x86 tree soon. Also, I think the variable clean up it performs is not needed as 64-bit UML doesn't run 32-bit binaries as far as I can tell, and 32-bit UML has 32-bit ulong. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Remove redundant NULL checkLi Heng
Fix below warnings reported by coccicheck: ./arch/um/drivers/vector_user.c:403:2-7: WARNING: NULL check before some freeing functions is not needed. Fixes: bc8f8e4e6e7a ("um: Add a generic "fd" vector transport") Signed-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: change sigio_spinlock to a mutexJohannes Berg
Lockdep complains at boot: ============================= [ BUG: Invalid wait context ] 5.7.0-05093-g46d91ecd597b #98 Not tainted ----------------------------- swapper/1 is trying to lock: 0000000060931b98 (&desc[i].request_mutex){+.+.}-{3:3}, at: __setup_irq+0x11d/0x623 other info that might help us debug this: context-{4:4} 1 lock held by swapper/1: #0: 000000006074fed8 (sigio_spinlock){+.+.}-{2:2}, at: sigio_lock+0x1a/0x1c stack backtrace: CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0-05093-g46d91ecd597b #98 Stack: 7fa4fab0 6028dfd1 0000002a 6008bea5 7fa50700 7fa50040 7fa4fac0 6028e016 7fa4fb50 6007f6da 60959c18 00000000 Call Trace: [<60023a0e>] show_stack+0x13b/0x155 [<6028e016>] dump_stack+0x2a/0x2c [<6007f6da>] __lock_acquire+0x515/0x15f2 [<6007eb50>] lock_acquire+0x245/0x273 [<6050d9f1>] __mutex_lock+0xbd/0x325 [<6050dc76>] mutex_lock_nested+0x1d/0x1f [<6008e27e>] __setup_irq+0x11d/0x623 [<6008e8ed>] request_threaded_irq+0x169/0x1a6 [<60021eb0>] um_request_irq+0x1ee/0x24b [<600234ee>] write_sigio_irq+0x3b/0x76 [<600383ca>] sigio_broken+0x146/0x2e4 [<60020bd8>] do_one_initcall+0xde/0x281 Because we hold sigio_spinlock and then get into requesting an interrupt with a mutex. Change the spinlock to a mutex to avoid that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11Linux 5.9v5.9Linus Torvalds
2020-10-11um: time-travel: Return the sequence number in ACK messagesJohannes Berg
For external time travel, the protocol says to return the incoming sequence number in the ACK message to aid debugging, so do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: time-travel: Fix IRQ handling in time_travel_handle_message()Johannes Berg
As the comment here indicates, we need to do the polling in the idle loop without blocking interrupts, since interrupts can be vhost-user messages that we must process even while in our idle loop. I don't know why I explained one thing and implemented another, but we have indeed observed random hangs due to this, depending on the timing of the messages. Fixes: 88ce64249233 ("um: Implement time-travel=ext") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Allow static linking for non-glibc implementationsIgnat Korchagin
It is possible to produce a statically linked UML binary with UML_NET_VECTOR, UML_NET_VDE and UML_NET_PCAP options enabled using alternative libc implementations, which do not rely on NSS, such as musl. Allow static linking in this case. Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Some fixes to build UML with muslIgnat Korchagin
musl toolchain and headers are a bit more strict. These fixes enable building UML with musl as well as seem not to break on glibc. Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: vector: Use GFP_ATOMIC under spin lockTiezhu Yang
Use GFP_ATOMIC instead of GFP_KERNEL under spin lock to fix possible sleep-in-atomic-context bugs. Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Fix null pointer dereference in vector_user_bpfGaurav Singh
The bpf_prog is being checked for !NULL after uml_kmalloc but later its used directly for example: bpf_prog->filter = bpf and is also later returned upon success. Fix this, do a NULL check and return right away. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11Merge tag 'cfi/for-5.10' of ↵Richard Weinberger
gitolite.kernel.org:pub/scm/linux/kernel/git/mtd/linux into mtd/next HyperBus changes * DMA support for TI's AM654 HyperBus controller driver. * HyperBus frontend driver for Renesas RPC-IF driver.
2020-10-11Merge tag 'spi-nor/for-5.10' of ↵Richard Weinberger
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux into mtd/next SPI NOR core changes: - Support for Winbond w25q64jwm flash - Enable 4K sector support for mx25l12805d SPI NOR controller drivers changes: - intel-spi: - Add Alder Lake-S PCI ID
2020-10-11Merge tag 'nand/for-5.10' of ↵Richard Weinberger
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux into mtd/next NAND core changes: * Use the new generic ECC object * Create helpers to set/extract the ECC requirements * Create a helper to extract the ECC configuration * Add a NAND page I/O request type * Introduce the ECC engine framework Raw NAND core changes: * Don't overwrite the error code from nand_set_ecc_soft_ops() * Introduce nand_set_ecc_on_host_ops() * Use the NAND framework user_conf object for ECC flags * Use the ECC framework user input parsing bits * Use the ECC framework nand_ecc_is_strong_enough() helper * Use the ECC framework OOB layouts * Make use of the ECC framework * Use nanddev_get/set_ecc_requirements() when relevant * Use the new ECC engine type enumeration * Separate the ECC engine type and the ECC byte placement * Move the nand_ecc_algo enum to the generic NAND layer * Rename the ECC algorithm enumeration items * Add a kernel doc to the ECC algorithm enumeration * DT bindings: - Document boolean NAND ECC properties - Document nand-ecc-engine - Document nand-ecc-placement Raw NAND drivers changes: * Ams-Delta: Fix non-OF build warning * Atmel: - Check return values for nand_read_data_op - Simplify with dev_err_probe() - Get rid of the legacy interface implementation - Convert the driver to exec_op() - Use nand_prog_page_end_op() - Use nand_{write,read}_data_op() - Drop redundant nand_read_page_op() - Enable the NFC controller at probe time - Disable clk on error handling path in probe * Cadence: remove a redundant dev_err call * Gpmi: - Simplify with dev_err_probe() * Marvell: - Fix and update kerneldoc - Simplify with dev_err_probe() - Fix and update kerneldoc - Simplify with dev_err_probe() - Support panic_write for mtdoops * Onenand: - Simplify the return expression of onenand_transfer_auto_oob - Simplify with dev_err_probe() * Oxnas: cleanup/simplify code * Pasemi: Make pasemi_device_ready() static * Qcom: Simplify with dev_err_probe() * Stm32_fmc2: fix a buffer overflow * Vf610: Remove unused function vf610_nfc_transfer_size() SPI-NAND changes: * Use nanddev_get_ecc_conf() when relevant * Gigadevice: - Add support for GD5F4GQ4xC - Add QE Bit - Use only one dummy byte in QUADIO * Macronix: - Add support for MX31UF1GE4BC - Add support for MX31LF1GE4BC
2020-10-11ubifs: mount_ubifs: Release authentication resource in error handling pathZhihao Cheng
Release the authentication related resource in some error handling branches in mount_ubifs(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11ubifs: Don't parse authentication mount options in remount processZhihao Cheng
There is no need to dump authentication options while remounting, because authentication initialization can only be doing once in the first mount process. Dumping authentication mount options in remount process may cause memory leak if UBIFS has already been mounted with old authentication mount options. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11ubifs: Fix a memleak after dumping authentication mount optionsZhihao Cheng
Fix a memory leak after dumping authentication mount options in error handling branch. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org.Alexei Starovoitov
Move the bpf/bpf-next patch processing queue to patchwork.kernel.org. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201011200149.66537-1-alexei.starovoitov@gmail.com
2020-10-11bpf: Always return target ifindex in bpf_fib_lookupToke Høiland-Jørgensen
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com
2020-10-11Merge branch 'samples: bpf: Refactor XDP programs with libbpf'Alexei Starovoitov
"Daniel T. Lee" says: ==================== To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to convert the previous bpf_load loader with the libbpf loader. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. And due to addition of generic bpf_program__attach() to libbpf, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This patchset refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Also, attach_tracepoint() is replaced with the generic __attach() method in xdp_redirect_cpu. Moreover, maps in kern program have been converted to BTF-defined map. --- Changes in v2: - added cleanup logic for bpf_link and bpf_object in xdp_monitor - program section match with bpf_program__is_<type> instead of strncmp - revert BTF key/val type to default of BPF_MAP_TYPE_PERF_EVENT_ARRAY - split increment into seperate satement - refactor pointer array initialization - error code cleanup ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11samples: bpf: Refactor XDP kern program maps with BTF-defined mapDaniel T. Lee
Most of the samples were converted to use the new BTF-defined MAP as they moved to libbpf, but some of the samples were missing. Instead of using the previous BPF MAP definition, this commit refactors xdp_monitor and xdp_sample_pkts_kern MAP definition with the new BTF-defined MAP format. Also, this commit removes the max_entries attribute at PERF_EVENT_ARRAY map type. The libbpf's bpf_object__create_map() will automatically set max_entries to the maximum configured number of CPUs on the host. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-4-danieltimlee@gmail.com
2020-10-11samples: bpf: Replace attach_tracepoint() to attach() in xdp_redirect_cpuDaniel T. Lee
>From commit d7a18ea7e8b6 ("libbpf: Add generic bpf_program__attach()"), for some BPF programs, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This commit refactors the __attach_tracepoint() with libbpf's generic __attach() method. In addition, this refactors the logic of setting the map FD to simplify the code. Also, the missing removal of bpf_load.o in Makefile has been fixed. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-3-danieltimlee@gmail.com
2020-10-11samples: bpf: Refactor xdp_monitor with libbpfDaniel T. Lee
To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to change to the libbpf loader instead of using the bpf_load. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. This commit refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-2-danieltimlee@gmail.com
2020-10-11Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'Jakub Kicinski
Vladimir Oltean says: ==================== Offload tc-vlan mangle to mscc_ocelot switch This series offloads one more action to the VCAP IS1 ingress TCAM, which is to change the classified VLAN for packets, according to the VCAP IS1 keys (VLAN, source MAC, source IP, EtherType, etc). ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11selftests: net: mscc: ocelot: add test for VLAN modify actionVladimir Oltean
Create a test that changes a VLAN ID from 200 to 300. We also need to modify the preferences of the filters installed for the other rules so that they are unique, because we now install the "tc-vlan modify" filter in VCAP IS1 only temporarily, and we need to perform the deletion by filter preference number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11net: dsa: tag_ocelot: use VLAN information from tagging header when availableVladimir Oltean
When the Extraction Frame Header contains a valid classified VLAN, use that instead of the VLAN header present in the packet. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11net: mscc: ocelot: offload VLAN mangle action to VCAP IS1Vladimir Oltean
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM, changes the classified VLAN. We are only exposing this ability for switch ports that are under VLAN aware bridges. This is because in standalone ports mode and under a bridge with vlan_filtering=0, the ocelot driver configures the switch to operate as VLAN-unaware, so the classified VLAN is not derived from the 802.1Q header from the packet, but instead is always equal to the port-based VLAN ID of the ingress port. We _can_ still change the classified VLAN for packets when operating in this mode, but the end result will most likely be a drop, since both the ingress and the egress port need to be members of the modified VLAN. And even if we install the new classified VLAN into the VLAN table of the switch, the result would still not be as expected: we wouldn't see, on the output port, the modified VLAN tag, but the original one, even though the classified VLAN was indeed modified. This is because of how the hardware works: on egress, what is pushed to the frame is a "port tag", which gives us the following options: - Tag all frames with port tag (derived from the classified VLAN) - Tag all frames with port tag, except if the classified VLAN is 0 or equal to the native VLAN of the egress port - No port tag Needless to say, in VLAN-unaware mode we are disabling the port tag. Otherwise, the existing VLAN tag would be ignored, and a second VLAN tag (the port tag), holding the classified VLAN, would be pushed (instead of replacing the existing 802.1Q tag). This is definitely not what the user wanted when installing a "vlan modify" action. So it is simply not worth bothering with VLAN modify rules under other configurations except when the ports are fully VLAN-aware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Five fixes. Subsystems affected by this patch series: MAINTAINERS, mm/pagemap, mm/swap, and mm/hugetlb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged mm: validate inode in mapping_set_error() mm: mmap: Fix general protection fault in unlink_file_vma() MAINTAINERS: Antoine Tenart's email address MAINTAINERS: change hardening mailing list
2020-10-11Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fix from Al Viro: "Fixes an obvious bug (memory leak introduced in 5.8)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pipe: Fix memory leaks in create_pipe_files()
2020-10-11Merge branch 'enetc-Migrate-to-PHYLINK-and-PCS_LYNX'Jakub Kicinski
Claudiu Manoil says: ==================== enetc: Migrate to PHYLINK and PCS_LYNX Transitioning the enetc driver from phylib to phylink. Offloading the serdes configuration to the PCS_LYNX module is a mandatory part of this transition. Aiming for a cleaner, more maintainable design, and better code reuse. The first 2 patches are clean up prerequisites. Tested on a p1028rdb board. v2: validate() explicitly rejects now all interface modes not supported by the driver instead of relying on the device tree to provide only supported interfaces, and dropped redundant activation of pcs_poll (addressing Ioana's findings) ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Migrate to PHYLINK and PCS_LYNXClaudiu Manoil
This is a methodical transition of the driver from phylib to phylink, following the guidelines from sfp-phylink.rst. The MAC register configurations based on interface mode were moved from the probing path to the mac_config() hook. MAC enable and disable commands (enabling Rx and Tx paths at MAC level) were also extracted and assigned to their corresponding phylink hooks. As part of the migration to phylink, the serdes configuration from the driver was offloaded to the PCS_LYNX module, introduced in commit 0da4c3d393e4 ("net: phy: add Lynx PCS module"), the PCS_LYNX module being a mandatory component required to make the enetc driver work with phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11arm64: dts: fsl-ls1028a-rdb: Specify in-band mode for ENETC port 0Claudiu Manoil
As part of the transition of the enetc ethernet driver from phylib to phylink, the in-band operation mode of the SGMII interface from enetc port 0 needs to be specified explicitly for phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Clean up serdes configurationClaudiu Manoil
Decouple internal mdio bus creation from serdes configuration, as a prerequisite to offloading serdes configuration to a different module. Group together mdio bus creation routines, cleanup. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>