summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-11-22ethtool: extend ringparam setting/getting API with rx_buf_lenHao Chen
Add two new parameters kernel_ringparam and extack for .get_ringparam and .set_ringparam to extend more ring params through netlink. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22ethtool: add support to set/get rx buf len via ethtoolHao Chen
Add support to set rx buf len via ethtool -G parameter and get rx buf len via ethtool -g parameter. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20dev_addr: add a modification checkJakub Kicinski
netdev->dev_addr should only be modified via helpers, but someone may be casting off the const. Add a runtime check to catch abuses. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-20net: constify netdev->dev_addrJakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. We converted all users to make modifications via appropriate helpers, make netdev->dev_addr const. The update helpers need to upcast from the buffer to struct netdev_hw_addr. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18Merge tag 'regmap-no-bus-update-bits' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Mark Brown says: =================== regmap: Allow regmap_update_bits() to be offloaded with no bus Some hardware can do this so let's use that capability. =================== Link: https://lore.kernel.org/all/YZWDOidBOssP10yS@sirena.org.uk/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18Merge tag 'net-5.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, mac80211. Current release - regressions: - devlink: don't throw an error if flash notification sent before devlink visible - page_pool: Revert "page_pool: disable dma mapping support...", turns out there are active arches who need it Current release - new code bugs: - amt: cancel delayed_work synchronously in amt_fini() Previous releases - regressions: - xsk: fix crash on double free in buffer pool - bpf: fix inner map state pruning regression causing program rejections - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue, preventing mis-selecting the best effort queue - mac80211: do not access the IV when it was stripped - mac80211: fix radiotap header generation, off-by-one - nl80211: fix getting radio statistics in survey dump - e100: fix device suspend/resume Previous releases - always broken: - tcp: fix uninitialized access in skb frags array for Rx 0cp - bpf: fix toctou on read-only map's constant scalar tracking - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs - tipc: only accept encrypted MSG_CRYPTO msgs - smc: transfer remaining wait queue entries during fallback, fix missing wake ups - udp: validate checksum in udp_read_sock() (when sockmap is used) - sched: act_mirred: drop dst for the direction from egress to ingress - virtio_net_hdr_to_skb: count transport header in UFO, prevent allowing bad skbs into the stack - nfc: reorder the logic in nfc_{un,}register_device, fix unregister - ipsec: check return value of ipv6_skip_exthdr - usb: r8152: add MAC passthrough support for more Lenovo Docks" * tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits) ptp: ocp: Fix a couple NULL vs IS_ERR() checks net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound ipv6: check return value of ipv6_skip_exthdr e100: fix device suspend/resume devlink: Don't throw an error if flash notification sent before devlink visible page_pool: Revert "page_pool: disable dma mapping support..." ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() octeontx2-af: debugfs: don't corrupt user memory NFC: add NCI_UNREG flag to eliminate the race NFC: reorder the logic in nfc_{un,}register_device NFC: reorganize the functions in nci_request tipc: check for null after calling kmemdup i40e: Fix display error code in dmesg i40e: Fix creation of first queue by omitting it if is not power of two i40e: Fix warning message and call stack during rmmod i40e driver i40e: Fix ping is lost after configuring ADq on VF i40e: Fix changing previously set num_queue_pairs for PFs i40e: Fix NULL ptr dereference on VSI filter sync i40e: Fix correct max_pkt_size on VF RX queue ...
2021-11-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Selftest changes: - Cleanups for the perf test infrastructure and mapping hugepages - Avoid contention on mmap_sem when the guests start to run - Add event channel upcall support to xen_shinfo_test x86 changes: - Fixes for Xen emulation - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache - Fixes for migration of 32-bit nested guests on 64-bit hypervisor - Compilation fixes - More SEV cleanups Generic: - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS and num_online_cpus(). Most architectures were only using one of the two" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus() KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() KVM: x86: Assume a 64-bit hypercall for guests with protected state selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore riscv: kvm: fix non-kernel-doc comment block KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror() KVM: SEV: Drop a redundant setting of sev->asid during initialization KVM: SEV: WARN if SEV-ES is marked active but SEV is not KVM: SEV: Set sev_info.active after initial checks in sev_guest_init() KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache KVM: nVMX: Use a gfn_to_hva_cache for vmptrld KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check KVM: x86/xen: Use sizeof_field() instead of open-coding it KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12 KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO ...
2021-11-18Merge tag 'printk-for-5.16-fixup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fixes from Petr Mladek: - Try to flush backtraces from other CPUs also on the local one. This was a regression caused by printk_safe buffers removal. - Remove header dependency warning. * tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Remove printk.h inclusion in percpu.h printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
2021-11-18tcp: add missing htmldocs for skb->ll_node and sk->defer_listEric Dumazet
Add missing entries to fix these "make htmldocs" warnings. ./include/linux/skbuff.h:953: warning: Function parameter or member 'll_node' not described in 'sk_buff' ./include/net/sock.h:540: warning: Function parameter or member 'defer_list' not described in 'sock' Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18page_pool: Revert "page_pool: disable dma mapping support..."Yunsheng Lin
This reverts commit d00e60ee54b12de945b8493cf18c1ada9e422514. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee54b1 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: "kernelci.org bot" <bot@kernelci.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek
2021-11-18Merge branch 'kvm-5.16-fixes' into kvm-masterPaolo Bonzini
* Fixes for Xen emulation * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache * Fixes for migration of 32-bit nested guests on 64-bit hypervisor * Compilation fixes * More SEV cleanups
2021-11-18KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cacheDavid Woodhouse
In commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") I removed the only user of these functions because it was basically impossible to use them safely. There are two stages to the GFN->PFN mapping; first through the KVM memslots to a userspace HVA and then through the page tables to translate that HVA to an underlying PFN. Invalidations of the former were being handled correctly, but no attempt was made to use the MMU notifiers to invalidate the cache when the HVA->GFN mapping changed. As a prelude to reinventing the gfn_to_pfn_cache with more usable semantics, rip it out entirely and untangle the implementation of the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it. All current users of kvm_vcpu_map() also look broken right now, and will be dealt with separately. They broadly fall into two classes: * Those which map, access the data and immediately unmap. This is mostly gratuitous and could just as well use the existing user HVA, and could probably benefit from a gfn_to_hva_cache as they do so. * Those which keep the mapping around for a longer time, perhaps even using the PFN directly from the guest. These will need to be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map() can be removed too. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-8-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-17net: do not inline netif_tx_lock()/netif_tx_unlock()Eric Dumazet
These are not fast path, there is no point in inlining them. Also provide netif_freeze_queues()/netif_unfreeze_queues() so that we can use them from dev_watchdog() in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: annotate accesses to queue->trans_startEric Dumazet
In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: use an atomic_long_t for queue->trans_timeoutEric Dumazet
tx_timeout_show() assumed dev_watchdog() would stop all the queues, to fetch queue->trans_timeout under protection of the queue->_xmit_lock. As we want to no longer disrupt transmits, we use an atomic_long_t instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: david decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17Merge tag 'for-net-next-2021-11-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for AOSP Bluetooth Quality Report - Enables AOSP extension for Mediatek Chip (MT7921 & MT7922) - Rework of HCI command execution serialization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: virtio_net_hdr_to_skb: count transport header in UFOJonathan Davies
virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type correctly for UFO packets received via virtio-net that are a little over the GSO size. This can lead to problems elsewhere in the networking stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is not set. This is due to the comparison if (skb->len - p_off > gso_size) not properly accounting for the transport layer header. p_off includes the size of the transport layer header (thlen), so skb->len - p_off is the size of the TCP/UDP payload. gso_size is read from the virtio-net header. For UFO, fragmentation happens at the IP level so does not need to include the UDP header. Hence the calculation could be comparing a TCP/UDP payload length with an IP payload length, causing legitimate virtio-net packets to have lack gso_type/gso_size information. Example: a UDP packet with payload size 1473 has IP payload size 1481. If the guest used UFO, it is not fragmented and the virtio-net header's flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with gso_size = 1480 for an MTU of 1500. skb->len will be 1515 and p_off will be 42, so skb->len - p_off = 1473. Hence the comparison fails, and shinfo->gso_size and gso_type are not set as they should be. Instead, add the UDP header length before comparing to gso_size when using UFO. In this way, it is the size of the IP payload that is compared to gso_size. Fixes: 6dd912f82680 ("net: check untrusted gso_size at kernel entry") Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17Merge tag 'mlx5-fixes-2021-11-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2021-11-16 Please pull this mlx5 fixes series, or let me know in case of any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: document SMII and correct phylink's new validation mechanismRussell King (Oracle)
SMII has not been documented in the kernel, but information on this PHY interface mode has been recently found. Document it, and correct the recently introduced phylink handling for this interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net: align static siphash keysEric Dumazet
siphash keys use 16 bytes. Define siphash_aligned_key_t macro so that we can make sure they are not crossing a cache line boundary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net: use .data.once section in netdev_level_once()Eric Dumazet
Same rationale than prior patch : using the dedicated section avoid holes and pack all these bool values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16once: use __section(".data.once")Eric Dumazet
.data.once contains nicely packed bool variables. It is used already by DO_ONCE_LITE(). Using it also in DO_ONCE() removes holes in .data section. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-11-16 We've added 12 non-merge commits during the last 5 day(s) which contain a total of 23 files changed, 573 insertions(+), 73 deletions(-). The main changes are: 1) Fix pruning regression where verifier went overly conservative rejecting previsouly accepted programs, from Alexei Starovoitov and Lorenz Bauer. 2) Fix verifier TOCTOU bug when using read-only map's values as constant scalars during verification, from Daniel Borkmann. 3) Fix a crash due to a double free in XSK's buffer pool, from Magnus Karlsson. 4) Fix libbpf regression when cross-building runqslower, from Jean-Philippe Brucker. 5) Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_*() helpers in tracing programs due to deadlock possibilities, from Dmitrii Banshchikov. 6) Fix checksum validation in sockmap's udp_read_sock() callback, from Cong Wang. 7) Various BPF sample fixes such as XDP stats in xdp_sample_user, from Alexander Lobakin. 8) Fix libbpf gen_loader error handling wrt fd cleanup, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: udp: Validate checksum in udp_read_sock() bpf: Fix toctou on read-only map's constant scalar tracking samples/bpf: Fix build error due to -isystem removal selftests/bpf: Add tests for restricted helpers bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs libbpf: Perform map fd cleanup for gen_loader in case of error samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu tools/runqslower: Fix cross-build samples/bpf: Fix summary per-sec stats in xdp_sample_user selftests/bpf: Check map in map pruning bpf: Fix inner map state pruning regression. xsk: Fix crash on double free in buffer pool ==================== Link: https://lore.kernel.org/r/20211116141134.6490-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdevPaul Blakey
E-Switch encap mode is relevant only when in switchdev mode. The RDMA driver can query the encap configuration via mlx5_eswitch_get_encap_mode(). Make sure it returns the currently used mode and not the set one. This reverts the cited commit which reset the encap mode on entering switchdev and fixes the original issue properly. Fixes: 9a64144d683a ("net/mlx5: E-Switch, Fix default encap mode") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16net: gro: populate net/core/gro.cEric Dumazet
Move gro code and data from net/core/dev.c to net/core/gro.c to ease maintenance. gro_normal_list() and gro_normal_one() are inlined because they are called from both files. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: gro: move skb_gro_receive into net/core/gro.cEric Dumazet
net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: gro: move skb_gro_receive_list to udp_offload.cEric Dumazet
This helper is used once, no need to keep it in fat net/core/skbuff.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: move gro definitions to include/net/gro.hEric Dumazet
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16tcp: defer skb freeing after socket lock is releasedEric Dumazet
tcp recvmsg() (or rx zerocopy) spends a fair amount of time freeing skbs after their payload has been consumed. A typical ~64KB GRO packet has to release ~45 page references, eventually going to page allocator for each of them. Currently, this freeing is performed while socket lock is held, meaning that there is a high chance that BH handler has to queue incoming packets to tcp socket backlog. This can cause additional latencies, because the user thread has to process the backlog at release_sock() time, and while doing so, additional frames can be added by BH handler. This patch adds logic to defer these frees after socket lock is released, or directly from BH handler if possible. Being able to free these skbs from BH handler helps a lot, because this avoids the usual alloc/free assymetry, when BH handler and user thread do not run on same cpu or NUMA node. One cpu can now be fully utilized for the kernel->user copy, and another cpu is handling BH processing and skb/page allocs/frees (assuming RFS is not forcing use of a single CPU) Tested: 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs MTU : 1500 Before: 55 Gbit After: 66 Gbit MTU : 4096+(headers) Before: 82 Gbit After: 95 Gbit Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-16net: use sk_is_tcp() in more placesEric Dumazet
Move sk_is_tcp() to include/net/sock.h and use it where we can. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-15bpf: Fix toctou on read-only map's constant scalar trackingDaniel Borkmann
Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is checking whether maps are read-only both from BPF program side and user space side, and then, given their content is constant, reading out their data via map->ops->map_direct_value_addr() which is then subsequently used as known scalar value for the register, that is, it is marked as __mark_reg_known() with the read value at verification time. Before a23740ec43ba, the register content was marked as an unknown scalar so the verifier could not make any assumptions about the map content. The current implementation however is prone to a TOCTOU race, meaning, the value read as known scalar for the register is not guaranteed to be exactly the same at a later point when the program is executed, and as such, the prior made assumptions of the verifier with regards to the program will be invalid which can cause issues such as OOB access, etc. While the BPF_F_RDONLY_PROG map flag is always fixed and required to be specified at map creation time, the map->frozen property is initially set to false for the map given the map value needs to be populated, e.g. for global data sections. Once complete, the loader "freezes" the map from user space such that no subsequent updates/deletes are possible anymore. For the rest of the lifetime of the map, this freeze one-time trigger cannot be undone anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_* cmd calls which would update/delete map entries will be rejected with -EPERM since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also means that pending update/delete map entries must still complete before this guarantee is given. This corner case is not an issue for loaders since they create and prepare such program private map in successive steps. However, a malicious user is able to trigger this TOCTOU race in two different ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is used to expand the competition interval, so that map_update_elem() can modify the contents of the map after map_freeze() and bpf_prog_load() were executed. This works, because userfaultfd halts the parallel thread which triggered a map_update_elem() at the time where we copy key/value from the user buffer and this already passed the FMODE_CAN_WRITE capability test given at that time the map was not "frozen". Then, the main thread performs the map_freeze() and bpf_prog_load(), and once that had completed successfully, the other thread is woken up to complete the pending map_update_elem() which then changes the map content. For ii) the idea of the batched update is similar, meaning, when there are a large number of updates to be processed, it can increase the competition interval between the two. It is therefore possible in practice to modify the contents of the map after executing map_freeze() and bpf_prog_load(). One way to fix both i) and ii) at the same time is to expand the use of the map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with the rationale to make a writable mmap()'ing of a map mutually exclusive with read-only freezing. The counter indicates writable mmap() mappings and then prevents/fails the freeze operation. Its semantics can be expanded beyond just mmap() by generally indicating ongoing write phases. This would essentially span any parallel regular and batched flavor of update/delete operation and then also have map_freeze() fail with -EBUSY. For the check_mem_access() in the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all last pending writes have completed via bpf_map_write_active() test. Once the map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0 only then we are really guaranteed to use the map's data as known constants. For map->frozen being set and pending writes in process of still being completed we fall back to marking that register as unknown scalar so we don't end up making assumptions about it. With this, both TOCTOU reproducers from i) and ii) are fixed. Note that the map->writecnt has been converted into a atomic64 in the fix in order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning the freeze_mutex over entire map update/delete operations in syscall side would not be possible due to then causing everything to be serialized. Similarly, something like synchronize_rcu() after setting map->frozen to wait for update/deletes to complete is not possible either since it would also have to span the user copy which can sleep. On the libbpf side, this won't break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed mmap()-ed memory where for .rodata it's non-writable. Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars") Reported-by: w1tcher.bupt@gmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-11-15Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2021-11-15 We've added 72 non-merge commits during the last 13 day(s) which contain a total of 171 files changed, 2728 insertions(+), 1143 deletions(-). The main changes are: 1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to BTF such that BPF verifier will be able to detect misuse, from Yonghong Song. 2) Big batch of libbpf improvements including various fixes, future proofing APIs, and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko. 3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush. 4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to ensure exception handling still works, from Russell King and Alan Maguire. 5) Add a new bpf_find_vma() helper for tracing to map an address to the backing file such as shared library, from Song Liu. 6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump, updating documentation and bash-completion among others, from Quentin Monnet. 7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as the API is heavily tailored around perf and is non-generic, from Dave Marchevsky. 8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an opt-out for more relaxed BPF program requirements, from Stanislav Fomichev. 9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits) bpftool: Use libbpf_get_error() to check error bpftool: Fix mixed indentation in documentation bpftool: Update the lists of names for maps and prog-attach types bpftool: Fix indent in option lists in the documentation bpftool: Remove inclusion of utilities.mak from Makefiles bpftool: Fix memory leak in prog_dump() selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning selftests/bpf: Fix an unused-but-set-variable compiler warning bpf: Introduce btf_tracing_ids bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs bpftool: Enable libbpf's strict mode by default docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support selftests/bpf: Clarify llvm dependency with btf_tag selftest selftests/bpf: Add a C test for btf_type_tag selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests selftests/bpf: Test libbpf API function btf__add_type_tag() bpftool: Support BTF_KIND_TYPE_TAG libbpf: Support BTF_KIND_TYPE_TAG ... ==================== Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15Revert "Merge branch 'mctp-i2c-driver'"Jakub Kicinski
This reverts commit 71812af7234f30362b43ccff33f93890ae4c0655, reversing changes made to cc0be1ad686fb29a4d127948486f40b17fb34b50. Wolfram Sang says: Please revert. Besides the driver in net, it modifies the I2C core code. This has not been acked by the I2C maintainer (in this case me). So, please don't pull this in via the net tree. The question raised here (extending SMBus calls to 255 byte) is complicated because we need ABI backwards compatibility. Link: https://lore.kernel.org/all/YZJ9H4eM%2FM7OXVN0@shikoro/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15net: phylink: add generic validate implementationRussell King (Oracle)
Add a generic validate() implementation using the supported_interfaces and a bitmask of MAC pause/speed/duplex capabilities. This allows us to entirely eliminate many driver private validate() implementations. We expose the underlying phylink_get_linkmodes() function so that drivers which have special needs can still benefit from conversion. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-15i2c: core: Allow 255 byte transfers for SMBus 3.xMatt Johnston
SMBus 3.0 increased the maximum block transfer size from 32 bytes to 255 bytes. We increase the size of struct i2c_smbus_data's block[] member. i2c_smbus_xfer() and i2c_smbus_xfer_emulated() now support 255 byte block operations, other block functions remain limited to 32 bytes for compatibility with existing callers. We allow adapters to indicate support for the larger size with I2C_FUNC_SMBUS_V3_BLOCK. Most emulated drivers should be able to use 255 byte blocks by replacing I2C_SMBUS_BLOCK_MAX with I2C_SMBUS_V3_BLOCK_MAX though some will have hardware limitations that need testing. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-15printk: Remove printk.h inclusion in percpu.hAndy Shevchenko
After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") the printk.h is not needed anymore in percpu.h. Moreover `make headerdep` complains (an excerpt) In file included from linux/printk.h, from linux/dynamic_debug.h:188 from linux/printk.h:559 <-- here from linux/percpu.h:9 from linux/idr.h:17 include/net/9p/client.h:13: warning: recursive header inclusion Yeah, it's not a root cause of this, but removing will help to reduce the noise. Fixes: 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211112140749.80042-1-andriy.shevchenko@linux.intel.com
2021-11-15regmap: allow to define reg_update_bits for no bus configurationAnsuel Smith
Some device requires a special handling for reg_update_bits and can't use the normal regmap read write logic. An example is when locking is handled by the device and rmw operations requires to do atomic operations. Allow to declare a dedicated function in regmap_config for reg_update_bits in no bus configuration. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20211104150040.1260-1-ansuelsmth@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-11-15net: Clean up some inconsistent indentingJiapeng Chong
Eliminate the follow smatch warning: ./include/linux/skbuff.h:4229 skb_remcsum_process() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-14Merge tag 'trace-v5.16-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Update to tracing histogram variable string copy A fix to only copy the size of the field to the histogram string did not take into account that the size can be larger than the storage" * tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add length protection to histogram string copies
2021-11-14tracing: Add length protection to histogram string copiesSteven Rostedt (VMware)
The string copies to the histogram storage has a max size of 256 bytes (defined by MAX_FILTER_STR_VAL). Only the string size of the event field needs to be copied to the event storage, but no more than what is in the event storage. Although nothing should be bigger than 256 bytes, there's no protection against overwriting of the storage if one day there is. Copy no more than the destination size, and enforce it. Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the min() comparison of the string sizes of comparable types. Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-14Merge tag 'timers-urgent-2021-11-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for POSIX CPU timers to address a problem where POSIX CPU timer delivery stops working for a new child task because copy_process() copies state information which is only valid for the parent task" * tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
2021-11-14Merge tag 'irq-urgent-2021-11-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem Core code: - A regression fix for the Open Firmware interrupt mapping code where a interrupt controller property in a node caused a map property in the same node to be ignored. Interrupt chip drivers: - Workaround a limitation in SiFive PLIC interrupt chip which silently ignores an EOI when the interrupt line is masked. - Provide the missing mask/unmask implementation for the CSKY MP interrupt controller. PCI/MSI: - Prevent a use after free when PCI/MSI interrupts are released by destroying the sysfs entries before freeing the memory which is accessed in the sysfs show() function. - Implement a mask quirk for the Nvidia ION AHCI chip which does not advertise masking capability despite implementing it. Even worse the chip comes out of reset with all MSI entries masked, which due to the missing masking capability never get unmasked. - Move the check which prevents accessing the MSI[X] masking for XEN back into the low level accessors. The recent consolidation missed that these accessors can be invoked from places which do not have that check which broke XEN. Move them back to he original place instead of sprinkling tons of these checks all over the code" * tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: of/irq: Don't ignore interrupt-controller when interrupt-map failed irqchip/sifive-plic: Fixup EOI failed when masked irqchip/csky-mpintc: Fixup mask/unmask implementation PCI/MSI: Destroy sysfs before freeing entries PCI: Add MSI masking quirk for Nvidia ION AHCI PCI/MSI: Deal with devices lying about their MSI mask capability PCI/MSI: Move non-mask check back into low level accessors
2021-11-14Merge tag 'sched_urgent_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Avoid touching ~100 config files in order to be able to select the preemption model - clear cluster CPU masks too, on the CPU unplug path - prevent use-after-free in cfs - Prevent a race condition when updating CPU cache domains - Factor out common shared part of smp_prepare_cpus() into a common helper which can be called by both baremetal and Xen, in order to fix a booting of Xen PV guests * tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Restore preemption model selection configs arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() sched/fair: Prevent dead task groups from regaining cfs_rq's sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14net,lsm,selinux: revert the security_sctp_assoc_established() hookPaul Moore
This patch reverts two prior patches, e7310c94024c ("security: implement sctp_assoc_established hook in selinux") and 7c2ef0240e6a ("security: add sctp_assoc_established hook"), which create the security_sctp_assoc_established() LSM hook and provide a SELinux implementation. Unfortunately these two patches were merged without proper review (the Reviewed-by and Tested-by tags from Richard Haines were for previous revisions of these patches that were significantly different) and there are outstanding objections from the SELinux maintainers regarding these patches. Work is currently ongoing to correct the problems identified in the reverted patches, as well as others that have come up during review, but it is unclear at this point in time when that work will be ready for inclusion in the mainline kernel. In the interest of not keeping objectionable code in the kernel for multiple weeks, and potentially a kernel release, we are reverting the two problematic patches. Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-13Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linuxLinus Torvalds
Pull zstd update from Nick Terrell: "Update to zstd-1.4.10. Add myself as the maintainer of zstd and update the zstd version in the kernel, which is now 4 years out of date, to a much more recent zstd release. This includes bug fixes, much more extensive fuzzing, and performance improvements. And generates the kernel zstd automatically from upstream zstd, so it is easier to keep the zstd verison up to date, and we don't fall so far out of date again. This includes 5 commits that update the zstd library version: - Adds a new kernel-style wrapper around zstd. This wrapper API is functionally equivalent to the subset of the current zstd API that is currently used. The wrapper API changes to be kernel style so that the symbols don't collide with zstd's symbols. The update to zstd-1.4.10 maintains the same API and preserves the semantics, so that none of the callers need to be updated. All callers are updated in the commit, because there are zero functional changes. - Adds an indirection for `lib/decompress_unzstd.c` so it doesn't depend on the layout of `lib/zstd/` to include every source file. This allows the next patch to be automatically generated. - Imports the zstd-1.4.10 source code. This commit is automatically generated from upstream zstd (https://github.com/facebook/zstd). - Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`. - Fixes a newly added build warning for clang. The discussion around this patchset has been pretty long, so I've included a FAQ-style summary of the history of the patchset, and why we are taking this approach. Why do we need to update? ------------------------- The zstd version in the kernel is based off of zstd-1.3.1, which is was released August 20, 2017. Since then zstd has seen many bug fixes and performance improvements. And, importantly, upstream zstd is continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to older versions. So the only way to sanely get these fixes is to keep up to date with upstream zstd. There are no known security issues that affect the kernel, but we need to be able to update in case there are. And while there are no known security issues, there are relevant bug fixes. For example the problem with large kernel decompression has been fixed upstream for over 2 years [1] Additionally the performance improvements for kernel use cases are significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz: - BtrFS zstd compression at levels 1 and 3 is 5% faster - BtrFS zstd decompression+read is 15% faster - SquashFS zstd decompression+read is 15% faster - F2FS zstd compression+write at level 3 is 8% faster - F2FS zstd decompression+read is 20% faster - ZRAM decompression+read is 30% faster - Kernel zstd decompression is 35% faster - Initramfs zstd decompression+build is 5% faster On top of this, there are significant performance improvements coming down the line in the next zstd release, and the new automated update patch generation will allow us to pull them easily. How is the update patch generated? ---------------------------------- The first two patches are preparation for updating the zstd version. Then the 3rd patch in the series imports upstream zstd into the kernel. This patch is automatically generated from upstream. A script makes the necessary changes and imports it into the kernel. The changes are: - Replace all libc dependencies with kernel replacements and rewrite includes. - Remove unncessary portability macros like: #if defined(_MSC_VER). - Use the kernel xxhash instead of bundling it. This automation gets tested every commit by upstream's continuous integration. When we cut a new zstd release, we will submit a patch to the kernel to update the zstd version in the kernel. The automated process makes it easy to keep the kernel version of zstd up to date. The current zstd in the kernel shares the guts of the code, but has a lot of API and minor changes to work in the kernel. This is because at the time upstream zstd was not ready to be used in the kernel envrionment as-is. But, since then upstream zstd has evolved to support being used in the kernel as-is. Why are we updating in one big patch? ------------------------------------- The 3rd patch in the series is very large. This is because it is restructuring the code, so it both deletes the existing zstd, and re-adds the new structure. Future updates will be directly proportional to the changes in upstream zstd since the last import. They will admittidly be large, as zstd is an actively developed project, and has hundreds of commits between every release. However, there is no other great alternative. One option ruled out is to replay every upstream zstd commit. This is not feasible for several reasons: - There are over 3500 upstream commits since the zstd version in the kernel. - The automation to automatically generate the kernel update was only added recently, so older commits cannot easily be imported. - Not every upstream zstd commit builds. - Only zstd releases are "supported", and individual commits may have bugs that were fixed before a release. Another option to reduce the patch size would be to first reorganize to the new file structure, and then apply the patch. However, the current kernel zstd is formatted with clang-format to be more "kernel-like". But, the new method imports zstd as-is, without additional formatting, to allow for closer correlation with upstream, and easier debugging. So the patch wouldn't be any smaller. It also doesn't make sense to import upstream zstd commit by commit going forward. Upstream zstd doesn't support production use cases running of the development branch. We have a lot of post-commit fuzzing that catches many bugs, so indiviudal commits may be buggy, but fixed before a release. So going forward, I intend to import every (important) zstd release into the Kernel. So, while it isn't ideal, updating in one big patch is the only patch I see forward. Who is responsible for this code? --------------------------------- I am. This patchset adds me as the maintainer for zstd. Previously, there was no tree for zstd patches. Because of that, there were several patches that either got ignored, or took a long time to merge, since it wasn't clear which tree should pick them up. I'm officially stepping up as maintainer, and setting up my tree as the path through which zstd patches get merged. I'll make sure that patches to the kernel zstd get ported upstream, so they aren't erased when the next version update happens. How is this code tested? ------------------------ I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS, Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and aarch64. I checked both performance and correctness. Also, thanks to many people in the community who have tested these patches locally. Lastly, this code will bake in linux-next before being merged into v5.16. Why update to zstd-1.4.10 when zstd-1.5.0 has been released? ------------------------------------------------------------ This patchset has been outstanding since 2020, and zstd-1.4.10 was the latest release when it was created. Since the update patch is automatically generated from upstream, I could generate it from zstd-1.5.0. However, there were some large stack usage regressions in zstd-1.5.0, and are only fixed in the latest development branch. And the latest development branch contains some new code that needs to bake in the fuzzer before I would feel comfortable releasing to the kernel. Once this patchset has been merged, and we've released zstd-1.5.1, we can update the kernel to zstd-1.5.1, and exercise the update process. You may notice that zstd-1.4.10 doesn't exist upstream. This release is an artifical release based off of zstd-1.4.9, with some fixes for the kernel backported from the development branch. I will tag the zstd-1.4.10 release after this patchset is merged, so the Linux Kernel is running a known version of zstd that can be debugged upstream. Why was a wrapper API added? ---------------------------- The first versions of this patchset migrated the kernel to the upstream zstd API. It first added a shim API that supported the new upstream API with the old code, then updated callers to use the new shim API, then transitioned to the new code and deleted the shim API. However, Cristoph Hellwig suggested that we transition to a kernel style API, and hide zstd's upstream API behind that. This is because zstd's upstream API is supports many other use cases, and does not follow the kernel style guide, while the kernel API is focused on the kernel's use cases, and follows the kernel style guide. Where is the previous discussion? --------------------------------- Links for the discussions of the previous versions of the patch set below. The largest changes in the design of the patchset are driven by the discussions in v11, v5, and v1. Sorry for the mix of links, I couldn't find most of the the threads on lkml.org" Link: https://lkml.org/lkml/2020/9/29/27 [1] Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12] Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11] Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10] Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9] Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8] Link: https://lkml.org/lkml/2020/12/3/1195 [v7] Link: https://lkml.org/lkml/2020/12/2/1245 [v6] Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5] Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4] Link: https://lkml.org/lkml/2020/9/23/1074 [v3] Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2] Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1] Signed-off-by: Nick Terrell <terrelln@fb.com> Tested By: Paul Jones <paul@pauljones.id.au> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64 Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> * tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux: lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical MAINTAINERS: Add maintainer entry for zstd lib: zstd: Upgrade to latest upstream zstd version 1.4.10 lib: zstd: Add decompress_sources.h for decompress_unzstd lib: zstd: Add kernel-specific API
2021-11-13Merge tag 'block-5.16-2021-11-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Set of fixes that should go into this merge window: - ioctl vs read data race fixes (Shin'ichiro) - blkcg use-after-free fix (Laibin) - Last piece of the puzzle for add_disk() error handling, enable __must_check for (Luis) - Request allocation fixes (Ming) - Misc fixes (me)" * tag 'block-5.16-2021-11-13' of git://git.kernel.dk/linux-block: blk-mq: fix filesystem I/O request allocation blkcg: Remove extra blkcg_bio_issue_init block: Hold invalidate_lock in BLKRESETZONE ioctl blk-mq: rename blk_attempt_bio_merge blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge block: fix kerneldoc for disk_register_independent_access__ranges() block: add __must_check for *add_disk*() callers block: use enum type for blk_mq_alloc_data->rq_flags block: Hold invalidate_lock in BLKZEROOUT ioctl block: Hold invalidate_lock in BLKDISCARD ioctl
2021-11-13Merge tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "One notable change here is that async creates and unlinks introduced in 5.7 are now enabled by default. This should greatly speed up things like rm, tar and rsync. To opt out, wsync mount option can be used. Other than that we have a pile of bug fixes all across the filesystem from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from Luis" * tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client: ceph: add a new metric to keep track of remote object copies libceph, ceph: move ceph_osdc_copy_from() into cephfs code ceph: clean-up metrics data structures to reduce code duplication ceph: split 'metric' debugfs file into several files ceph: return the real size read when it hits EOF ceph: properly handle statfs on multifs setups ceph: shut down mount on bad mdsmap or fsmap decode ceph: fix mdsmap decode when there are MDS's beyond max_mds ceph: ignore the truncate when size won't change with Fx caps issued ceph: don't rely on error_string to validate blocklisted session. ceph: just use ci->i_version for fscache aux info ceph: shut down access to inode when async create fails ceph: refactor remove_session_caps_cb ceph: fix auth cap handling logic in remove_session_caps_cb ceph: drop private list from remove_session_caps_cb ceph: don't use -ESTALE as special return code in try_get_cap_refs ceph: print inode numbers instead of pointer values ceph: enable async dirops by default libceph: drop ->monmap and err initialization ceph: convert to noop_direct_IO
2021-11-13Merge tag 'netfs-folio-20211111' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs, 9p, afs and ceph (partial) foliation from David Howells: "This converts netfslib, 9p and afs to use folios. It also partially converts ceph so that it uses folios on the boundaries with netfslib. To help with this, a couple of folio helper functions are added in the first two patches. These patches don't touch fscache and cachefiles as I intend to remove all the code that deals with pages directly from there. Only nfs and cifs are using the old fscache I/O API now. The new API uses iov_iter instead. Thanks to Jeff Layton, Dominique Martinet and AuriStor for testing and retesting the patches" * tag 'netfs-folio-20211111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use folios in directory handling netfs, 9p, afs, ceph: Use folios folio: Add a function to get the host inode for a folio folio: Add a function to change the private data attached to a folio