summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-12-04[SECURITY] fix namespaced fscaps when !CONFIG_SECURITYSerge Hallyn
Namespaced file capabilities were introduced in 8db6c34f1dbc . When userspace reads an xattr for a namespaced capability, a virtualized representation of it is returned if the caller is in a user namespace owned by the capability's owning rootid. The function which performs this virtualization was not hooked up if CONFIG_SECURITY=n. Therefore in that case the original xattr was shown instead of the virtualized one. To test this using libcap-bin (*1), $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin-eip $v $ unshare -Ur setcap -v cap_sys_admin-eip $v /tmp/tmp.lSiIFRvt8Y: OK "setcap -v" verifies the values instead of setting them, and will check whether the rootid value is set. Therefore, with this bug un-fixed, and with CONFIG_SECURITY=n, setcap -v will fail: $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin=eip $v $ unshare -Ur setcap -v cap_sys_admin=eip $v nsowner[got=1000, want=0],/tmp/tmp.HHDiOOl9fY differs in [] Fix this bug by calling cap_inode_getsecurity() in security_inode_getsecurity() instead of returning -EOPNOTSUPP, when CONFIG_SECURITY=n. *1 - note, if libcap is too old for getcap to have the '-n' option, then use verify-caps instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209689 Cc: Hervé Guillemet <herve@guillemet.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Serge Hallyn <shallyn@cisco.com> Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2020-12-04PCI/PM: Rename pci_wakeup_bus() to pci_resume_bus()Mika Westerberg
A "wakeup" is a signal from a device telling the system that the device or the whole system should be awakened and made active. PCI devices are made active by "resuming" them. pci_wakeup_bus() is not involved with the wakeup signal; it *resumes* devices on a bus (possibly in response to a wakeup signal, but that's at a higher level). Rename pci_wakeup_bus() to pci_resume_bus() to better reflect what it does. No functional change intended. [bhelgaas: commit log, reorder before removal of pci_wakeup_event()] Link: https://lore.kernel.org/r/20201125090733.77782-2-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-04block: fix incorrect branching in blk_max_size_offset()Mike Snitzer
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that override will be incorrectly ignored. Old blk_max_size_offset() branching, prior to commit 3ee16db390b4, must be used only if passed 'chunk_sectors' override is zero. Fixes: 3ee16db390b4 ("dm: fix IO splitting") Cc: stable@vger.kernel.org # 5.9 Reported-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04net-zerocopy: Defer vm zap unless actually needed.Arjun Roy
Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.Arjun Roy
When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04bpf: Add a bpf_sock_from_file helperFlorent Revest
While eBPF programs can check whether a file is a socket by file->f_op == &socket_file_ops, they cannot convert the void private_data pointer to a struct socket BTF pointer. In order to do this a new helper wrapping sock_from_file is added. This is useful to tracing programs but also other program types inheriting this set of helpers such as iterators or LSM programs. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-2-revest@google.com
2020-12-04net: Remove the err argument from sock_from_fileFlorent Revest
Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com
2020-12-04seg6: add support for the SRv6 End.DT4 behaviorAndrea Mayer
SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04Merge tag 'for-5.10/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM's bio splitting changes that were made during v5.9. This restores splitting in terms of varied per-target ti->max_io_len rather than use block core's single stacked 'chunk_sectors' limit. - Like DM crypt, update DM integrity to not use crypto drivers that have CRYPTO_ALG_ALLOCATES_MEMORY set. - Fix DM writecache target's argument parsing and status display. - Remove needless BUG() from dm writecache's persistent_memory_claim() - Remove old gcc workaround in DM cache target's block_div() for ARM link errors now that gcc >= 4.9 is required. - Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range. - Remove old, and now frowned upon, BUG_ON(in_interrupt()) in dm_table_event(). - Remove invalid sparse annotations from dm_prepare_ioctl() and dm_unprepare_ioctl(). * tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: remove invalid sparse __acquires and __releases annotations dm: fix double RCU unlock in dm_dax_zero_page_range() error path dm: fix IO splitting dm writecache: remove BUG() and fail gracefully instead dm table: Remove BUG_ON(in_interrupt()) dm: fix bug with RCU locking in dm_blk_report_zones Revert "dm cache: fix arm link errors with inline" dm writecache: fix the maximum number of arguments dm writecache: advance the number of arguments when reporting max_age dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
2020-12-04PCI: Return u16 from pci_find_ext_capability() and similarBjorn Helgaas
PCI Express Extended Capabilities are in config space between offsets 256 and 4K. These offsets all fit in 16 bits. Change the return type of pci_find_ext_capability() and supporting functions from int to u16 to match the specification. Many callers use "int", which is fine, but there's no need to store more than a u16. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-12-04PCI: Return u8 from pci_find_capability() and similarPuranjay Mohan
PCI Capabilities are linked in a list that must appear in the first 256 bytes of config space. Each capabilities list pointer is 8 bits. Change the return type of pci_find_capability() and supporting functions from int to u8 to match the specification. [bhelgaas: change other related interfaces, fix HyperTransport typos] Link: https://lore.kernel.org/r/20201129164626.12887-1-puranjay12@gmail.com Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Mark Brown
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into asoc-5.11 Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04dm: fix IO splittingMike Snitzer
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") Cc: stable@vger.kernel.org Reported-by: John Dorminy <jdorminy@redhat.com> Reported-by: Bruce Johnston <bjohnsto@redhat.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2020-12-04bpf: Remove trailing semicolon in macro definitionTom Rix
The macro use will already have a semicolon. Clean up escaped newlines. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201202212810.3774614-1-trix@redhat.com
2020-12-04Merge tag 'wireless-drivers-next-2020-12-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly * tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (180 commits) wl1251: remove trailing semicolon in macro definition airo: remove trailing semicolon in macro definition wilc1000: added queue support for WMM wilc1000: call complete() for failure in wilc_wlan_txq_add_cfg_pkt() wilc1000: free resource in wilc_wlan_txq_add_mgmt_pkt() for failure path wilc1000: free resource in wilc_wlan_txq_add_net_pkt() for failure path wilc1000: added 'ndo_set_mac_address' callback support brcmfmac: expose firmware config files through modinfo wlcore: Switch to using the new API kobj_to_dev() rtw88: coex: add feature to enhance HID coexistence performance rtw88: coex: upgrade coexistence A2DP mechanism rtw88: coex: add action for coexistence in hardware initial rtw88: coex: add function to avoid cck lock rtw88: coex: change the coexistence mechanism for WLAN connected rtw88: coex: change the coexistence mechanism for HID rtw88: coex: update AFH information while in free-run mode rtw88: coex: update the mechanism for A2DP + PAN rtw88: coex: add debug message rtw88: coex: run coexistence when WLAN entering/leaving LPS Revert "rtl8xxxu: Add Buffalo WI-U3-866D to list of supported devices" ... ==================== Link: https://lore.kernel.org/r/20201203185732.9CFA5C433ED@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()Sean V Kelley
Extend support for Root Complex Event Collectors by decoding and caching the RCEC Endpoint Association Extended Capabilities when enumerating. Use that cached information for later error source reporting. See PCIe r5.0, sec 7.9.10. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20201121001036.8560-4-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-12-04PCI/ERR: Bind RCEC devices to the Root Port driverQiuxu Zhuo
If a Root Complex Integrated Endpoint (RCiEP) is implemented, it may signal errors through a Root Complex Event Collector (RCEC). Each RCiEP must be associated with no more than one RCEC. For an RCEC (which is technically not a Bridge), error messages "received" from associated RCiEPs must be enabled for "transmission" in order to cause a System Error via the Root Control register or (when the Advanced Error Reporting Capability is present) reporting via the Root Error Command register and logging in the Root Error Status register and Error Source Identification register. Given the commonality with Root Ports and the need to also support AER and PME services for RCECs, extend the Root Port driver to support RCEC devices by adding the RCEC Class ID to the driver structure. Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com> Link: https://lore.kernel.org/r/20201121001036.8560-3-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2020-12-04block: remove the request_queue to argument request based tracepointsChristoph Hellwig
The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_split tracepointChristoph Hellwig
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: simplify and extend the block_bio_merge tracepoint classChristoph Hellwig
The block_bio_merge tracepoint class can be reused for most bio-based tracepoints. For that it just needs to lose the superfluous q and rq parameters. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the unused block_sleeprq tracepointChristoph Hellwig
The block_sleeprq tracepoint was only used by the legacy request code. Remove it now that the legacy request code is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04tty: Fix ->session lockingJann Horn
Currently, locking of ->session is very inconsistent; most places protect it using the legacy tty mutex, but disassociate_ctty(), __do_SAK(), tiocspgrp() and tiocgsid() don't. Two of the writers hold the ctrl_lock (because they already need it for ->pgrp), but __proc_set_tty() doesn't do that yet. On a PREEMPT=y system, an unprivileged user can theoretically abuse this broken locking to read 4 bytes of freed memory via TIOCGSID if tiocgsid() is preempted long enough at the right point. (Other things might also go wrong, especially if root-only ioctls are involved; I'm not sure about that.) Change the locking on ->session such that: - tty_lock() is held by all writers: By making disassociate_ctty() hold it. This should be fine because the same lock can already be taken through the call to tty_vhangup_session(). The tricky part is that we need to shorten the area covered by siglock to be able to take tty_lock() without ugly retry logic; as far as I can tell, this should be fine, since nothing in the signal_struct is touched in the `if (tty)` branch. - ctrl_lock is held by all writers: By changing __proc_set_tty() to hold the lock a little longer. - All readers that aren't holding tty_lock() hold ctrl_lock: By adding locking to tiocgsid() and __do_SAK(), and expanding the area covered by ctrl_lock in tiocspgrp(). Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04tty: Remove dead termiox codeJann Horn
set_termiox() and the TCGETX handler bail out with -EINVAL immediately if ->termiox is NULL, but there are no code paths that can set ->termiox to a non-NULL pointer; and no such code paths seem to have existed since the termiox mechanism was introduced back in commit 1d65b4a088de ("tty: Add termiox") in v2.6.28. Similarly, no driver actually implements .set_termiox; and it looks like no driver ever has. Delete this dead code; but leave the definition of struct termiox in the UAPI headers intact. Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20201203020331.2394754-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04Merge tag 'mhi-for-v5.11' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI patches for v5.11 Here is the MHI patch set for v5.11. Most of the patches are cleanups and fixes but there are some noticeable changes too: 1. Loic finally removed the auto-start option from the channel parameters of the MHI controller. It is the duty of the client drivers like qrtr to start/stop the channels when required, so we decided to remove this option. As a side effect, we changed the qrtr driver to start the channels during its probe and removed the auto-start option from ath11k controller. **NOTE** Since these changes spawns both MHI and networking trees, the patches are maintained in an immutable branch [1] and pulled into both mhi-next and ath11k-next branches. The networking patches got acks from ath11k and networking maintainers as well. 2. Loic added a generic MHI pci controller driver. This driver will be used by the PCI based Qualcomm modems like SDX55 and exposes channels such as QMI, IP_HW0, IPCR etc... 3. Loic fixed the MHI device hierarchy by maintaining the correct parent child relationships. Earlier all MHI devices lived in the same level under the parent device like PCIe. But now, the MHI devices belonging to channels will become the children of controller MHI device. 4. Finally Loic also improved the MHI device naming by using indexed names such as mhi0, mhi1, etc... This will break the userspace applications depending on the old naming convention but since the only one user so far is Jeff Hugo's AI accelerator apps, we decided to make this change now itself with his agreement. 5. Bhaumik fixed the qrtr driver by stopping the channels during remove. This patch also got ack from networking maintainer and we decided to take it through MHI tree (via immutable branch) since we already had a qrtr change. [1] https://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git/log/?h=mhi-ath11k-immutable * tag 'mhi-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (30 commits) mhi: pci_generic: Fix implicit conversion warning bus: mhi: core: Fix error handling in mhi_register_controller() bus: mhi: core: Fix device hierarchy bus: mhi: core: Indexed MHI controller name net: qrtr: Unprepare MHI channels during remove bus: mhi: core: Remove MHI event ring IRQ handlers when powering down bus: mhi: core: Mark and maintain device states early on after power down bus: mhi: core: Separate system error and power down handling bus: mhi: core: Check for IRQ availability during registration bus: mhi: core: Move to an error state on mission mode failure bus: mhi: core: Use appropriate label in firmware load handler API bus: mhi: core: Move to an error state on any firmware load failure bus: mhi: core: Prevent sending multiple RDDM entry callbacks bus: mhi: core: Move to SYS_ERROR regardless of RDDM capability bus: mhi: core: Skip device wake in error or shutdown states bus: mhi: core: Move to using high priority workqueue bus: mhi: core: Use appropriate names for firmware load functions bus: mhi: core: Skip RDDM download for unknown execution environment bus: mhi: core: Rename RDDM download function to use proper words bus: mhi: core: Remove unused mhi_fw_load_worker() declaration ...
2020-12-04of: fix linker-section match-table corruptionJohan Hovold
Specify type alignment when declaring linker-section match-table entries to prevent gcc from increasing alignment and corrupting the various tables with padding (e.g. timers, irqchips, clocks, reserved memory). This is specifically needed on x86 where gcc (typically) aligns larger objects like struct of_device_id with static extent on 32-byte boundaries which at best prevents matching on anything but the first entry. Specifying alignment when declaring variables suppresses this optimisation. Here's a 64-bit example where all entries are corrupt as 16 bytes of padding has been inserted before the first entry: ffffffff8266b4b0 D __clk_of_table ffffffff8266b4c0 d __of_table_fixed_factor_clk ffffffff8266b5a0 d __of_table_fixed_clk ffffffff8266b680 d __clk_of_table_sentinel And here's a 32-bit example where the 8-byte-aligned table happens to be placed on a 32-byte boundary so that all but the first entry are corrupt due to the 28 bytes of padding inserted between entries: 812b3ec0 D __irqchip_of_table 812b3ec0 d __of_table_irqchip1 812b3fa0 d __of_table_irqchip2 812b4080 d __of_table_irqchip3 812b4160 d irqchip_of_match_end Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte alignment), and on arm using gcc-7.2. Note that there are no in-tree users of these tables on x86 currently (even if they are included in the image). Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations") Fixes: f6e916b82022 ("irqchip: add basic infrastructure") Cc: stable <stable@vger.kernel.org> # 3.9 Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04earlycon: simplify earlycon-table implementationJohan Hovold
Instead of using the array-of-pointers trick to avoid having gcc mess up the earlycon array stride, specify type alignment when declaring entries to prevent gcc from increasing alignment. This is essentially an alternative (one-line) fix to the problem addressed by commit dd709e72cb93 ("earlycon: Use a pointer table to fix __earlycon_table stride"). gcc can increase the alignment of larger objects with static extent as an optimisation, but this can be suppressed by using the aligned attribute when declaring variables. Note that we have been relying on this behaviour for kernel parameters for 16 years and it indeed hasn't changed since the introduction of the aligned attribute in gcc-3.1. Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20201123102319.8090-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04net/mlx5: Register mlx5 devices to auxiliary virtual busLeon Romanovsky
Create auxiliary devices under new virtual bus. This will replace the custom-made mlx5 ->add()/->remove() interfaces and next patches will fill the missing callback and remove the old interface logic. The attachment of auxiliary drivers to the devices is possible in 1-to-1 manner only and it requires us to create device for every protocol, so that device (module) will be able to connect to it. System with 2 IB and 1 RoCE cards: [leonro@vm ~]$ lspci |grep nox 00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2 [leonro@vm ~]$ rdma dev 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457 System with RoCE SR-IOV card with 4 VFs: [leonro@vm ~]$ lspci |grep nox 01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4 [leonro@vm ~]$ rdma dev 0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456 2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457 3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458 4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459 Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-04vdpa/mlx5: Make hardware definitions visible to all mlx5 devicesLeon Romanovsky
Move mlx5_vdpa IFC header file to the general include folder, so mlx5_core will be able to reuse it to check if VDPA is supported prior to creating an auxiliary device. As part of this move, update the header file name to mlx5 general naming scheme. Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-04net/mlx5_core: Clean driver version and nameLeon Romanovsky
Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled. And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables. Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Leon Romanovsky
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into mlx5-next Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * tag 'auxbus-5.11-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: auxiliary bus: minor coding style tweaks driver core: auxiliary bus: make remove function return void driver core: auxiliary bus: move slab.h from include file Add auxiliary bus support
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into driver-core-next Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04driver core: auxiliary bus: minor coding style tweaksGreg Kroah-Hartman
For some reason, the original aux bus patch had some really long lines in a few places, probably due to it being a very long-lived patch in development by many different people. Fix that up so that the two files all have the same length lines and function formatting styles. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Ertman <david.m.ertman@intel.com> Cc: Fred Oh <fred.oh@linux.intel.com> Cc: Kiran Patil <kiran.patil@intel.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Martin Habets <mhabets@solarflare.com> Cc: Parav Pandit <parav@mellanox.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Cc: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/X8oiSFTpYHw1xE/o@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04driver core: auxiliary bus: make remove function return voidGreg Kroah-Hartman
There's an effort to move the remove() callback in the driver core to not return an int, as nothing can be done if this function fails. To make that effort easier, make the aux bus remove function void to start with so that no users have to be changed sometime in the future. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Ertman <david.m.ertman@intel.com> Cc: Fred Oh <fred.oh@linux.intel.com> Cc: Kiran Patil <kiran.patil@intel.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Martin Habets <mhabets@solarflare.com> Cc: Parav Pandit <parav@mellanox.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Cc: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/X8ohB1ks1NK7kPop@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04driver core: auxiliary bus: move slab.h from include fileGreg Kroah-Hartman
No need to include slab.h in include/linux/auxiliary_bus.h, as it is not needed there. Move it to drivers/base/auxiliary.c instead. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Ertman <david.m.ertman@intel.com> Cc: Fred Oh <fred.oh@linux.intel.com> Cc: Kiran Patil <kiran.patil@intel.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Martin Habets <mhabets@solarflare.com> Cc: Parav Pandit <parav@mellanox.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Cc: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/X8og8xi3WkoYXet9@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04mmc: tmio: set max_busy_timeoutWolfram Sang
Set max_busy_timeouts for variants known to support the TOPxx bits in the SD_OPTION register. The timeout mechanism was running in the background but not yet properly handled in the driver. So, let the MMC core know when to not use R1B to avoid unhandled timeouts. My datasheets for older variants (tmio_mmc.c) suggest that they support it, too. However, actual bit descriptions are lacking, so I chose an opt-in approach. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20201125213001.15003-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-12-04Add auxiliary bus supportDave Ertman
Add support for the Auxiliary Bus, auxiliary_device and auxiliary_driver. It enables drivers to create an auxiliary_device and bind an auxiliary_driver to it. The bus supports probe/remove shutdown and suspend/resume callbacks. Each auxiliary_device has a unique string based id; driver binds to an auxiliary_device based on this id through the bus. Co-developed-by: Kiran Patil <kiran.patil@intel.com> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Co-developed-by: Fred Oh <fred.oh@linux.intel.com> Co-developed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Fred Oh <fred.oh@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Martin Habets <mhabets@solarflare.com> Link: https://lore.kernel.org/r/20201113161859.1775473-2-david.m.ertman@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/160695681289.505290.8978295443574440604.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-04fs, close_range: add flag CLOSE_RANGE_CLOEXECGiuseppe Scrivano
When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't immediately close the files but it sets the close-on-exec bit. It is useful for e.g. container runtimes that usually install a seccomp profile "as late as possible" before execv'ing the container process itself. The container runtime could either do: 1 2 - install_seccomp_profile(); - close_range(MIN_FD, MAX_INT, 0); - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile(); - execve(...); - execve(...); Both alternative have some disadvantages. In the first variant the seccomp_profile cannot block the close_range syscall, as well as opendir/read/close/... for the fallback on older kernels. In the second variant, close_range() can be used only on the fds that are not going to be needed by the runtime anymore, and it must be potentially called multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the existing open fds, the seccomp profile can block close_range() and the syscalls used for its fallback. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Link: https://lore.kernel.org/r/20201118104746.873084-2-gscrivan@redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-12-04batman-adv: Allow selection of routing algorithm over rtnetlinkSven Eckelmann
A batadv net_device is associated to a B.A.T.M.A.N. routing algorithm. This algorithm has to be selected before the interface is initialized and cannot be changed after that. The only way to select this algorithm was a module parameter which specifies the default algorithm used during the creation of the net_device. This module parameter is writeable over /sys/module/batman_adv/parameters/routing_algo and thus allows switching of the routing algorithm: 1. change routing_algo parameter 2. create new batadv net_device But this is not race free because another process can be scheduled between 1 + 2 and in that time frame change the routing_algo parameter again. It is much cleaner to directly provide this information inside the rtnetlink's RTM_NEWLINK message. The two processes would be (in regards of the creation parameter of their batadv interfaces) be isolated. This also eases the integration of batadv devices inside tools like network-manager or systemd-networkd which are not expecting to operate on /sys before a new net_device is created. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-12-04batman-adv: Prepare infrastructure for newlink settingsSven Eckelmann
The batadv generic netlink family can be used to retrieve the current state and set various configuration settings. But there are also settings which must be set before the actual interface is created. The rtnetlink already uses IFLA_INFO_DATA to allow net_device families to transfer such configurations. The minimal required functionality for this is now available for the batadv rtnl_link_ops. Also a new IFLA class of attributes will be attached to it because rtnetlink only allows 51 different attributes but batadv_nl_attrs already contains 62 attributes. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-12-04crypto: lib/blake2s - Move selftest prototype into header fileHerbert Xu
This patch fixes a missing prototype warning on blake2s_selftest. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org
2020-12-03bpf: Remove hard-coded btf_vmlinux assumption from BPF verifierAndrii Nakryiko
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org
2020-12-03bpf: Adds support for setting window clampPrankur gupta
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP, which sets the maximum receiver window size. It will be useful for limiting receiver window based on RTT. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-2-prankgup@fb.com
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03vfio-ccw: Wire in the request callbackEric Farman
The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-03vfio-mdev: Wire in a request handler for mdev parentEric Farman
While performing some destructive tests with vfio-ccw, where the paths to a device are forcible removed and thus the device itself is unreachable, it is rather easy to end up in an endless loop in vfio_del_group_dev() due to the lack of a request callback for the associated device. In this example, one MDEV (77c) is used by a guest, while another (77b) is not. The symptom is that the iommu is detached from the mdev for 77b, but not 77c, until that guest is shutdown: [ 238.794867] vfio_ccw 0.0.077b: MDEV: Unregistering [ 238.794996] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: Removing from iommu group 2 [ 238.795001] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: MDEV: detaching iommu [ 238.795036] vfio_ccw 0.0.077c: MDEV: Unregistering ...silence... Let's wire in the request call back to the mdev device, so that a device being physically removed from the host can be (gracefully?) handled by the parent device at the time the device is removed. Add a message when registering the device if a driver doesn't provide this callback, so a clue is given that this same loop may be encountered in a similar situation, and a message when this occurs instead of the awkward silence noted above. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-03Merge tag 'net-5.10-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...
2020-12-03tcp: merge 'init_req' and 'route_req' functionsFlorian Westphal
The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send a TCP reset if the token in a MP_JOIN request is unknown. At this time we don't do this, the 3whs completes and the 'new subflow' is reset afterwards. There are two ways to allow MPTCP to send the reset. 1. override 'send_synack' callback and emit the rst from there. The drawback is that the request socket gets inserted into the listeners queue just to get removed again right away. 2. Send the reset from the 'route_req' function instead. This avoids the 'add&remove request socket', but route_req lacks the skb that is required to send the TCP reset. Instead of just adding the skb to that function for MPTCP sake alone, Paolo suggested to merge init_req and route_req functions. This saves one indirection from syn processing path and provides the skb to the merged function at the same time. 'send reset on unknown mptcp join token' is added in next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>