summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-07-23tcp: allow at most one TLP probe per flightYuchung Cheng
Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17rhashtable: drop duplicated word in <linux/rhashtable.h>Randy Dunlap
Drop the doubled word "be" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking BPF programs, from Maciej Żenczykowski. 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu Mariappan. 3) Slay memory leak in nl80211 bss color attribute parsing code, from Luca Coelho. 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin. 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric Dumazet. 6) xsk code dips too deeply into DMA mapping implementation internals. Add dma_need_sync and use it. From Christoph Hellwig 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko. 8) Check for disallowed attributes when loading flow dissector BPF programs. From Lorenz Bauer. 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from Jason A. Donenfeld. 10) Don't advertise checksum offload on ipa devices that don't support it. From Alex Elder. 11) Resolve several issues in TCP MD5 signature support. Missing memory barriers, bogus options emitted when using syncookies, and failure to allow md5 key changes in established states. All from Eric Dumazet. 12) Fix interface leak in hsr code, from Taehee Yoo. 13) VF reset fixes in hns3 driver, from Huazhong Tan. 14) Make loopback work again with ipv6 anycast, from David Ahern. 15) Fix TX starvation under high load in fec driver, from Tobias Waldekranz. 16) MLD2 payload lengths not checked properly in bridge multicast code, from Linus Lüssing. 17) Packet scheduler code that wants to find the inner protocol currently only works for one level of VLAN encapsulation. Allow Q-in-Q situations to work properly here, from Toke Høiland-Jørgensen. 18) Fix route leak in l2tp, from Xin Long. 19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport support and various protocols. From Martin KaFai Lau. 20) Fix socket cgroup v2 reference counting in some situations, from Cong Wang. 21) Cure memory leak in mlx5 connection tracking offload support, from Eli Britstein. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) mlxsw: pci: Fix use-after-free in case of failed devlink reload mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON() net: macb: fix call to pm_runtime in the suspend/resume functions net: macb: fix macb_suspend() by removing call to netif_carrier_off() net: macb: fix macb_get/set_wol() when moving to phylink net: macb: mark device wake capable when "magic-packet" property present net: macb: fix wakeup test in runtime suspend/resume routines bnxt_en: fix NULL dereference in case SR-IOV configuration fails libbpf: Fix libbpf hashmap on (I)LP32 architectures net/mlx5e: CT: Fix memory leak in cleanup net/mlx5e: Fix port buffers cell size value net/mlx5e: Fix 50G per lane indication net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash net/mlx5e: Fix VXLAN configuration restore after function reload net/mlx5e: Fix usage of rcu-protected pointer net/mxl5e: Verify that rpriv is not NULL net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode net/mlx5: Fix eeprom support for SFP module cgroup: Fix sock_cgroup_data on big-endian. selftests: bpf: Fix detach from sockmap tests ...
2020-07-10Merge tag 'mlx5-fixes-2020-07-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-07-02 This series introduces some fixes to mlx5 driver. V1->v2: - Drop "ip -s" patch and mirred device hold reference patch. - Will revise them in a later submission. Please pull and let me know if there is any problem. For -stable v5.2 ('net/mlx5: Fix eeprom support for SFP module') For -stable v5.4 ('net/mlx5e: Fix 50G per lane indication') For -stable v5.5 ('net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash') ('net/mlx5e: Fix VXLAN configuration restore after function reload') For -stable v5.7 ('net/mlx5e: CT: Fix memory leak in cleanup') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/miscLinus Torvalds
Pull in-kernel read and write op cleanups from Christoph Hellwig: "Cleanup in-kernel read and write operations Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure all users of in-kernel file I/O use them if they don't use iov_iter based methods already. The new WARN_ONs in combination with syzcaller already found a missing input validation in 9p. The fix should be on your way through the maintainer ASAP". [ This is prep-work for the real changes coming 5.9 ] * tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc: fs: remove __vfs_read fs: implement kernel_read using __kernel_read integrity/ima: switch to using __kernel_read fs: add a __kernel_read helper fs: remove __vfs_write fs: implement kernel_write using __kernel_write fs: check FMODE_WRITE in __kernel_write fs: unexport __kernel_write bpfilter: switch to kernel_write autofs: switch to kernel_write cachefiles: switch to kernel_write
2020-07-10Merge tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: - add a warning when the atomic pool is depleted (David Rientjes) - protect the parameters of the new scatterlist helper macros (Marek Szyprowski ) * tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping: scatterlist: protect parameters of the sg_table related macros dma-mapping: warn when coherent pool is depleted
2020-07-10Merge tag 'gfs2-v5.8-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows gfs2 to use the generic fiel read iterator functions without having to worry about being called back while holding locks". * tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Rework read and page fault locking fs: Add IOCB_NOIO flag for generic_file_read_iter
2020-07-09net/mlx5e: Fix port buffers cell size valueEran Ben Elisha
Device unit for port buffers size, xoff_threshold and xon_threshold is cells. Fix a bug in driver where cell unit size was hard-coded to 128 bytes. This hard-coded value is buggy, as it is wrong for some hardware versions. Driver to read cell size from SBCAM register and translate bytes to cell units accordingly. In order to fix the bug, this patch exposes SBCAM (Shared buffer capabilities mask) layout and defines. If SBCAM.cap_cell_size is valid, use it for all bytes to cells calculations. If not valid, fallback to 128. Cell size do not change on the fly per device. Instead of issuing SBCAM access reg command every time such translation is needed, cache it in mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz as a param to every function that needs bytes to cells translation. While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to en_dcbnl.c, as it is only used by that file. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-09cgroup: Fix sock_cgroup_data on big-endian.Cong Wang
In order for no_refcnt and is_data to be the lowest order two bits in the 'val' we have to pad out the bitfield of the u8. Fixes: ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09Merge tag 'kallsyms_show_value-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kallsyms fix from Kees Cook: "Refactor kallsyms_show_value() users for correct cred. I'm not delighted by the timing of getting these changes to you, but it does fix a handful of kernel address exposures, and no one has screamed yet at the patches. Several users of kallsyms_show_value() were performing checks not during "open". Refactor everything needed to gain proper checks against file->f_cred for modules, kprobes, and bpf" * tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests: kmod: Add module address visibility test bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() kprobes: Do not expose probe addresses to non-CAP_SYSLOG module: Do not expose section addresses to non-CAP_SYSLOG module: Refactor section attr into bin attribute kallsyms: Refactor kallsyms_show_value() to take cred
2020-07-08bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()Kees Cook
When evaluating access control over kallsyms visibility, credentials at open() time need to be used, not the "current" creds (though in BPF's case, this has likely always been the same). Plumb access to associated file->f_cred down through bpf_dump_raw_ok() and its callers now that kallsysm_show_value() has been refactored to take struct cred. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump") Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08kallsyms: Refactor kallsyms_show_value() to take credKees Cook
In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08Raise gcc version requirement to 4.9Linus Torvalds
I realize that we fairly recently raised it to 4.8, but the fact is, 4.9 is a much better minimum version to target. We have a number of workarounds for actual bugs in pre-4.9 gcc versions (including things like internal compiler errors on ARM), but we also have some syntactic workarounds for lacking features. In particular, raising the minimum to 4.9 means that we can now just assume _Generic() exists, which is likely the much better replacement for a lot of very convoluted built-time magic with conditionals on sizeof and/or __builtin_choose_expr() with same_type() etc. Using _Generic also means that you will need to have a very recent version of 'sparse', but thats easy to build yourself, and much less of a hassle than some old gcc version can be. The latest (in a long string) of reasons for minimum compiler version upgrades was commit 5435f73d5c4a ("efi/x86: Fix build with gcc 4"). Ard points out that RHEL 7 uses gcc-4.8, but the people who stay back on old RHEL versions persumably also don't build their own kernels anyway. And maybe they should cross-built or just have a little side affair with a newer compiler? Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-08fs: remove __vfs_readChristoph Hellwig
Fold it into the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: add a __kernel_read helperChristoph Hellwig
This is the counterpart to __kernel_write, and skip the rw_verify_area call compared to kernel_read. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-07vlan: consolidate VLAN parsing code and limit max parsing depthToke Høiland-Jørgensen
Toshiaki pointed out that we now have two very similar functions to extract the L3 protocol number in the presence of VLAN tags. And Daniel pointed out that the unbounded parsing loop makes it possible for maliciously crafted packets to loop through potentially hundreds of tags. Fix both of these issues by consolidating the two parsing functions and limiting the VLAN tag parsing to a max depth of 8 tags. As part of this, switch over __vlan_get_protocol() to use skb_header_pointer() instead of pskb_may_pull(), to avoid the possible side effects of the latter and keep the skb pointer 'const' through all the parsing functions. v2: - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT) Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Reported-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07fs: Add IOCB_NOIO flag for generic_file_read_iterAndreas Gruenbacher
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it shouldn't trigger any filesystem I/O for the actual request or for readahead. This allows to do tentative reads out of the page cache as some filesystems allow, and to take the appropriate locks and retry the reads only if the requested pages are not cached. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-07cgroup: fix cgroup_sk_alloc() for sk_clone_lock()Cong Wang
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is copied, so the cgroup refcnt must be taken too. And, unlike the sk_alloc() path, sock_update_netprioidx() is not called here. Therefore, it is safe and necessary to grab the cgroup refcnt even when cgroup_sk_alloc is disabled. sk_clone_lock() is in BH context anyway, the in_interrupt() would terminate this function if called there. And for sk_alloc() skcd->val is always zero. So it's safe to factor out the code to make it more readable. The global variable 'cgroup_sk_alloc_disabled' is used to determine whether to take these reference counts. It is impossible to make the reference counting correct unless we save this bit of information in skcd->val. So, add a new bit there to record whether the socket has already taken the reference counts. This obviously relies on kmalloc() to align cgroup pointers to at least 4 bytes, ARCH_KMALLOC_MINALIGN is certainly larger than that. This bug seems to be introduced since the beginning, commit d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") tried to fix it but not compeletely. It seems not easy to trigger until the recent commit 090e28b229af ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged. Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Cameron Berkenpas <cam@neo-zeon.de> Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reported-by: Daniël Sonck <dsonck92@gmail.com> Reported-by: Zhang Qiang <qiang.zhang@windriver.com> Tested-by: Cameron Berkenpas <cam@neo-zeon.de> Tested-by: Peter Geis <pgwipeout@gmail.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Zefan Li <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-06scatterlist: protect parameters of the sg_table related macrosMarek Szyprowski
Add brackets to protect parameters of the recently added sg_table related macros from side-effects. Fixes: 709d6d73c756 ("scatterlist: add generic wrappers for iterating over sgtable objects") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-03sched: consistently handle layer3 header accesses in the presence of VLANsToke Høiland-Jørgensen
There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-03Merge tag 'pci-v5.8-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix a pcie_find_root_port() simplification that broke power management because it didn't handle the edge case of finding the Root Port of a Root Port itself (Mika Westerberg)"" * tag 'pci-v5.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Make pcie_find_root_port() work for Root Ports
2020-07-02Merge tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Use kvfree_sensitive() for the block keyslot free (Eric) - Sync blk-mq debugfs flags (Hou) - Memory leak fix in virtio-blk error path (Hou) * tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-block: virtio-blk: free vblk-vqs in error path of virtblk_probe() block/keyslot-manager: use kvfree_sensitive() blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flags
2020-07-02Merge tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "One fix in here, for a regression in 5.7 where a task is waiting in the kernel for a condition, but that condition won't become true until task_work is run. And the task_work can't be run exactly because the task is waiting in the kernel, so we'll never make any progress. One example of that is registering an eventfd and queueing io_uring work, and then the task goes and waits in eventfd read with the expectation that it'll get woken (and read an event) when the io_uring request completes. The io_uring request is finished through task_work, which won't get run while the task is looping in eventfd read" * tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-block: io_uring: use signal based task_work running task_work: teach task_work_add() to do signal_wake_up()
2020-06-30PCI: Make pcie_find_root_port() work for Root PortsMika Westerberg
Commit 6ae72bfa656e ("PCI: Unify pcie_find_root_port() and pci_find_pcie_root_port()") broke acpi_pci_bridge_d3() because calling pcie_find_root_port() on a Root Port returned NULL when it should return the Root Port, which in turn broke power management of PCIe hierarchies. Rework pcie_find_root_port() so it returns its argument when it is already a Root Port. [bhelgaas: test device only once, test for PCIe] Fixes: 6ae72bfa656e ("PCI: Unify pcie_find_root_port() and pci_find_pcie_root_port()") Link: https://lore.kernel.org/r/20200622161248.51099-1-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2020-06-30 The following pull-request contains BPF updates for your *net* tree. We've added 28 non-merge commits during the last 9 day(s) which contain a total of 35 files changed, 486 insertions(+), 232 deletions(-). The main changes are: 1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer types, from Yonghong Song. 2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki. 3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb internals and integrate it properly into DMA core, from Christoph Hellwig. 4) Fix RCU splat from recent changes to avoid skipping ingress policy when kTLS is enabled, from John Fastabend. 5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its position masking to work, from Andrii Nakryiko. 6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading of network programs, from Maciej Żenczykowski. 7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer. 8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30Merge tag 'fixes-v5.8-rc3-a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem fixes from James Morris: "Two simple fixes for v5.8: - Fix hook iteration and default value for inode_copy_up_xattr (KP Singh) - Fix the key_permission LSM hook function type (Sami Tolvanen)" * tag 'fixes-v5.8-rc3-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: Fix hook iteration and default value for inode_copy_up_xattr security: fix the key_permission LSM hook function type
2020-06-30task_work: teach task_work_add() to do signal_wake_up()Oleg Nesterov
So that the target task will exit the wait_event_interruptible-like loop and call task_work_run() asap. The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the new TWA_SIGNAL flag implies signal_wake_up(). However, it needs to avoid the race with recalc_sigpending(), so the patch also adds the new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK. TODO: once this patch is merged we need to change all current users of task_work_add(notify = true) to use TWA_RESUME. Cc: stable@vger.kernel.org # v5.7 Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30bpf: sockmap: Require attach_bpf_fd when detaching a programLorenz Bauer
The sockmap code currently ignores the value of attach_bpf_fd when detaching a program. This is contrary to the usual behaviour of checking that attach_bpf_fd represents the currently attached program. Ensure that attach_bpf_fd is indeed the currently attached program. It turns out that all sockmap selftests already do this, which indicates that this is unlikely to cause breakage. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
2020-06-30bpf: flow_dissector: Check value of unused flags to BPF_PROG_DETACHLorenz Bauer
Using BPF_PROG_DETACH on a flow dissector program supports neither attach_flags nor attach_bpf_fd. Yet no value is enforced for them. Enforce that attach_flags are zero, and require the current program to be passed via attach_bpf_fd. This allows us to remove the check for CAP_SYS_ADMIN, since userspace can now no longer remove arbitrary flow dissector programs. Fixes: b27f7bb590ba ("flow_dissector: Move out netns_bpf prog callbacks") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200629095630.7933-3-lmb@cloudflare.com
2020-06-30dma-mapping: Add a new dma_need_sync APIChristoph Hellwig
Add a new API to check if calls to dma_sync_single_for_{device,cpu} are required for a given DMA streaming mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200629130359.2690853-2-hch@lst.de
2020-06-29Merge tag 'mac80211-for-net-2020-06-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Couple of fixes/small things: * TX control port status check fixed to not assume frame format * mesh control port fixes * error handling/leak fixes when starting AP, with HE attributes * fix broadcast packet handling with encapsulation offload * add new AKM suites * and a small code cleanup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flagsHou Tao
Else there may be magic numbers in /sys/kernel/debug/block/*/state. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28Merge tag 'timers-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "A single DocBook fix" * tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix kerneldoc system_device_crosststamp & al
2020-06-28Merge tag 'efi-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: - Fix build regression on v4.8 and older - Robustness fix for TPM log parsing code - kobject refcount fix for the ESRT parsing code - Two efivarfs fixes to make it behave more like an ordinary file system - Style fixup for zero length arrays - Fix a regression in path separator handling in the initrd loader - Fix a missing prototype warning - Add some kerneldoc headers for newly introduced stub routines - Allow support for SSDT overrides via EFI variables to be disabled - Report CPU mode and MMU state upon entry for 32-bit ARM - Use the correct stack pointer alignment when entering from mixed mode * tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: arm: Print CPU boot mode and MMU state at boot efi/libstub: arm: Omit arch specific config table matching array on arm64 efi/x86: Setup stack correctly for efi_pe_entry efi: Make it possible to disable efivar_ssdt entirely efi/libstub: Descriptions for stub helper functions efi/libstub: Fix path separator regression efi/libstub: Fix missing-prototype warning for skip_spaces() efi: Replace zero-length array and use struct_size() helper efivarfs: Don't return -EINTR when rate-limiting reads efivarfs: Update inode modification time for successful writes efi/esrt: Fix reference count leak in esre_create_sysfs_entry. efi/tpm: Verify event log header before parsing efi/x86: Fix build with gcc 4
2020-06-28Merge tag 'sched_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: "The most anticipated fix in this pull request is probably the horrible build fix for the RANDSTRUCT fail that didn't make -rc2. Also included is the cleanup that removes those BUILD_BUG_ON()s and replaces it with ugly unions. Also included is the try_to_wake_up() race fix that was first triggered by Paul's RCU-torture runs, but was independently hit by Dave Chinner's fstest runs as well" * tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/cfs: change initial value of runnable_avg smp, irq_work: Continue smp_call_function*() and irq_work*() integration sched/core: s/WF_ON_RQ/WQ_ON_CPU/ sched/core: Fix ttwu() race sched/core: Fix PI boosting between RT and DEADLINE tasks sched/deadline: Initialize ->dl_boosted sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
2020-06-28Merge tag 'x86_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - AMD Memory bandwidth counter width fix, by Babu Moger. - Use the proper length type in the 32-bit truncate() syscall variant, by Jiri Slaby. - Reinit IA32_FEAT_CTL during wakeup to fix the case where after resume, VMXON would #GP due to VMX not being properly enabled, by Sean Christopherson. - Fix a static checker warning in the resctrl code, by Dan Carpenter. - Add a CR4 pinning mask for bits which cannot change after boot, by Kees Cook. - Align the start of the loop of __clear_user() to 16 bytes, to improve performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming. * tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/64: Align start of __clear_user() loop to 16-bytes x86/cpu: Use pinning mask for CR4 bits needing to be 0 x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get() x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup syscalls: Fix offset type of ksys_ftruncate() x86/resctrl: Fix memory bandwidth counter width for AMD
2020-06-28Merge tag 'rcu_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU-vs-KCSAN fixes from Borislav Petkov: "A single commit that uses "arch_" atomic operations to avoid the instrumentation that comes with the non-"arch_" versions. In preparation for that commit, it also has another commit that makes these "arch_" atomic operations available to generic code. Without these commits, KCSAN uses can see pointless errors" * tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Fixup noinstr warnings locking/atomics: Provide the arch_atomic_ interface to generic code
2020-06-28Merge tag 'x86_entry_for_5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry fixes from Borislav Petkov: "This is the x86/entry urgent pile which has accumulated since the merge window. It is not the smallest but considering the almost complete entry core rewrite, the amount of fixes to follow is somewhat higher than usual, which is to be expected. Peter Zijlstra says: 'These patches address a number of instrumentation issues that were found after the x86/entry overhaul. When combined with rcu/urgent and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy again. Part of making this all work is bumping the minimum GCC version for KASAN builds to gcc-8.3, the reason for this is that the __no_sanitize_address function attribute is broken in GCC releases before that. No known GCC version has a working __no_sanitize_undefined, however because the only noinstr violation that results from this happens when an UB is found, we treat it like WARN. That is, we allow it to violate the noinstr rules in order to get the warning out'" * tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Fix #UD vs WARN more x86/entry: Increase entry_stack size to a full page x86/entry: Fixup bad_iret vs noinstr objtool: Don't consider vmlinux a C-file kasan: Fix required compiler version compiler_attributes.h: Support no_sanitize_undefined check with GCC 4 x86/entry, bug: Comment the instrumentation_begin() usage for WARN() x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*() x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline() compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr kasan: Bump required compiler version x86, kcsan: Add __no_kcsan to noinstr kcsan: Remove __no_kcsan_or_inline x86, kcsan: Remove __no_kcsan_or_inline usage
2020-06-28smp, irq_work: Continue smp_call_function*() and irq_work*() integrationPeter Zijlstra
Instead of relying on BUG_ON() to ensure the various data structures line up, use a bunch of horrible unions to make it all automatic. Much of the union magic is to ensure irq_work and smp_call_function do not (yet) see the members of their respective data structures change name. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-28sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build failPeter Zijlstra
As a temporary build fix, the proper cleanup needs more work. Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Eric Biggers <ebiggers@kernel.org> Suggested-by: Eric Biggers <ebiggers@kernel.org> Suggested-by: Kees Cook <keescook@chromium.org> Fixes: a148866489fb ("sched: Replace rq::wake_list") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-27Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six small fixes, five in drivers and one to correct another minor regression from cc97923a5bcc ("block: move dma drain handling to scsi") where we still need the drain stub to be built in to the kernel for the modular libata, non-modular SAS driver case" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mptscsih: Fix read sense data size scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset() scsi: libata: Fix the ata_scsi_dma_need_drain stub scsi: qla2xxx: Keep initiator ports after RSCN scsi: qla2xxx: Set NVMe status code for failed NVMe FCP request
2020-06-27Merge tag 'dma-mapping-5.8-4' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: - fix dma coherent mmap in nommu (me) - more AMD SEV fallout (David Rientjes, me) - fix alignment in dma_common_*_remap (Eric Auger) * tag 'dma-mapping-5.8-4' of git://git.infradead.org/users/hch/dma-mapping: dma-remap: align the size in dma_common_*_remap() dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR dma-direct: add missing set_memory_decrypted() for coherent mapping dma-direct: check return value when encrypting or decrypting memory dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails dma-direct: always align allocation size in dma_direct_alloc_pages() dma-direct: mark __dma_direct_alloc_pages static dma-direct: re-enable mmap for !CONFIG_MMU
2020-06-27Merge tag 'kgdb-5.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb fixes from Daniel Thompson: "The main change here is a fix for a number of unsafe interactions between kdb and the console system. The fixes are specific to kdb (pure kgdb debugging does not use the console system at all). On systems with an NMI then kdb, if it is enabled, must get messages to the user despite potentially running from some "difficult" calling contexts. These fixes avoid using the console system where we have been provided an alternative (safer) way to interact with the user and, if using the console system in unavoidable, use oops_in_progress for deadlock avoidance. These fixes also ensure kdb honours the console enable flag. Also included is a fix that wraps kgdb trap handling in an RCU read lock to avoids triggering diagnostic warnings. This is a wide lock scope but this is OK because kgdb is a stop-the-world debugger. When we stop the world we put all the CPUs into holding pens and this inhibits RCU update anyway" * tag 'kgdb-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: Avoid suspicious RCU usage warning kdb: Switch to use safer dbg_io_ops over console APIs kdb: Make kdb_printf() console handling more robust kdb: Check status of console prior to invoking handlers kdb: Re-factor kdb_printf() message write code
2020-06-26Merge tag 'iommu-fixes-v5.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "A couple of Intel VT-d fixes: - Make Intel SVM code 64bit only. The code uses pgd_t* and the IOMMU only supports long-mode page-table formats, so its broken on 32bit anyway. - Make sure GFX quirks in for Intel VT-d are not applied to untrusted devices. Those devices might gain full memory access otherwise. - Identity mapping setup fix. - Fix ACS enabling when Intel IOMMU is off and untrusted devices are detected. - Two smaller fixes for coherency and IO page-table setup" * tag 'iommu-fixes-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix misuse of iommu_domain_identity_map() iommu/vt-d: Update scalable mode paging structure coherency iommu/vt-d: Enable PCI ACS for platform opt in hint iommu/vt-d: Don't apply gfx quirks to untrusted devices iommu/vt-d: Set U/S bit in first level page table by default iommu/vt-d: Make Intel SVM code 64-bit only
2020-06-26Merge tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Usual rc3 pickup, lots of little fixes all over. The core VT registration regression fix is probably the largest, otherwise ttm, amdgpu and tegra are the bulk, with some minor driver fixes. No i915 pull this week which may or may not mean I get 2x of it next week, we'll see how it goes. core: - fix VT registration regression ttm: - fix two fence leaks amdgpu: - Fix missed mutex unlock in DC error path - Fix firmware leak for sdma5 - DC bpc property fixes amdkfd: - Fix memleak in an error path radeon: - Fix copy paste typo in NI DPM spll validation rcar-du: - build fix tegra: - add missing zpos property - child driver registeration fix - debugfs cleanup fix - doc fix mcde: - reorder fbdev setup panel: - fix connector type - fix orienation for some panels sun4i: - fix dma/iommu configuration uvesafb: - respect blank flag" * tag 'drm-fixes-2020-06-26' of git://anongit.freedesktop.org/drm/drm: (25 commits) drm/amd: fix potential memleak in err branch drm/amd/display: Fix ineffective setting of max bpc property drm/amd/display: Enable output_bpc property on all outputs drm/amdgpu: add fw release for sdma v5_0 drm/fb-helper: Fix vt restore drm/radeon: fix fb_div check in ni_init_smc_spll_table() drm/amdgpu/display: Unlock mutex on error drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU drm: panel-orientation-quirks: Use generic orientation-data for Acer S1003 drm: panel-orientation-quirks: Add quirk for Asus T101HA panel video: fbdev: uvesafb: fix "noblank" option handling drm/panel-simple: fix connector type for newhaven_nhd_43_480272ef_atxl drm/panel-simple: fix connector type for LogicPD Type28 Display drm: rcar-du: Fix build error drm: mcde: Fix forgotten user of drm->dev_private drm: mcde: Fix display initialization problem drm/tegra: Add zpos property for cursor planes gpu: host1x: Detach driver on unregister gpu: host1x: Correct trivial kernel-doc inconsistencies drm/tegra: hub: Register child devices ...
2020-06-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misx fixes from Andrew Morton: "31 patches. Subsystems affected by this patch series: hotfixes, mm/pagealloc, kexec, ocfs2, lib, mm/slab, mm/slab, mm/slub, mm/swap, mm/pagemap, mm/vmalloc, mm/memcg, mm/gup, mm/thp, mm/vmscan, x86, mm/memory-hotplug, MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) MAINTAINERS: update info for sparse mm/memory_hotplug.c: fix false softlockup during pfn range removal mm: remove vmalloc_exec arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page x86/hyperv: allocate the hypercall page with only read and execute bits mm/memory: fix IO cost for anonymous page mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages" mm: workingset: age nonresident information alongside anonymous pages doc: THP CoW fault no longer allocate THP docs: mm/gup: minor documentation update mm/memcontrol.c: prevent missed memory.low load tears mm/memcontrol.c: add missed css_put() mm: memcontrol: handle div0 crash race condition in memory.low mm/vmalloc.c: fix a warning while make xmldocs media: omap3isp: remove cacheflush.h make asm-generic/cacheflush.h more standalone mm/debug_vm_pgtable: fix build failure with powerpc 8xx mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages() mm: fix swap cache node allocation mask slub: cure list_slab_objects() from double fix ...
2020-06-26kdb: Switch to use safer dbg_io_ops over console APIsSumit Garg
In kgdb context, calling console handlers aren't safe due to locks used in those handlers which could in turn lead to a deadlock. Although, using oops_in_progress increases the chance to bypass locks in most console handlers but it might not be sufficient enough in case a console uses more locks (VT/TTY is good example). Currently when a driver provides both polling I/O and a console then kdb will output using the console. We can increase robustness by using the currently active polling I/O driver (which should be lockless) instead of the corresponding console. For several common cases (e.g. an embedded system with a single serial port that is used both for console output and debugger I/O) this will result in no console handler being used. In order to achieve this we need to reverse the order of preference to use dbg_io_ops (uses polling I/O mode) over console APIs. So we just store "struct console" that represents debugger I/O in dbg_io_ops and while emitting kdb messages, skip console that matches dbg_io_ops console in order to avoid duplicate messages. After this change, "is_console" param becomes redundant and hence removed. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/1591264879-25920-5-git-send-email-sumit.garg@linaro.org Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-26Merge branch 'linus' into x86/entry, to resolve conflictsIngo Molnar
Conflicts: arch/x86/kernel/traps.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-26mm: remove vmalloc_execChristoph Hellwig
Merge vmalloc_exec into its only caller. Note that for !CONFIG_MMU __vmalloc_node_range maps to __vmalloc, which directly clears the __GFP_HIGHMEM added by the vmalloc_exec stub anyway. Link: http://lkml.kernel.org/r/20200618064307.32739-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26mm: workingset: age nonresident information alongside anonymous pagesJohannes Weiner
Patch series "fix for "mm: balance LRU lists based on relative thrashing" patchset" This patchset fixes some problems of the patchset, "mm: balance LRU lists based on relative thrashing", which is now merged on the mainline. Patch "mm: workingset: let cache workingset challenge anon fix" is the result of discussion with Johannes. See following link. http://lkml.kernel.org/r/20200520232525.798933-6-hannes@cmpxchg.org And, the other two are minor things which are found when I try to rebase my patchset. This patch (of 3): After ("mm: workingset: let cache workingset challenge anon fix"), we compare refault distances to active_file + anon. But age of the non-resident information is only driven by the file LRU. As a result, we may overestimate the recency of any incoming refaults and activate them too eagerly, causing unnecessary LRU churn in certain situations. Make anon aging drive nonresident age as well to address that. Link: http://lkml.kernel.org/r/1592288204-27734-1-git-send-email-iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/1592288204-27734-2-git-send-email-iamjoonsoo.kim@lge.com Fixes: 34e58cac6d8f2a ("mm: workingset: let cache workingset challenge anon") Reported-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@surriel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>