summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09Merge tag 'net-6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - core: avoid skb end_offset change in __skb_unclone_keeptruesize() - sched: - act_connmark: handle errno on tcf_idr_check_alloc - flower: fix fl_change() error recovery path - ieee802154: prevent user from crashing the host Current release - new code bugs: - eth: bnxt_en: fix the double free during device removal - tools: ynl: - fix enum-as-flags in the generic CLI - fully inherit attrs in subsets - re-license uniformly under GPL-2.0 or BSD-3-clause Previous releases - regressions: - core: use indirect calls helpers for sk_exit_memory_pressure() - tls: - fix return value for async crypto - avoid hanging tasks on the tx_lock - eth: ice: copy last block omitted in ice_get_module_eeprom() Previous releases - always broken: - core: avoid double iput when sock_alloc_file fails - af_unix: fix struct pid leaks in OOB support - tls: - fix possible race condition - fix device-offloaded sendpage straddling records - bpf: - sockmap: fix an infinite loop error - test_run: fix &xdp_frame misplacement for LIVE_FRAMES - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR - netfilter: tproxy: fix deadlock due to missing BH disable - phylib: get rid of unnecessary locking - eth: bgmac: fix *initial* chip reset to support BCM5358 - eth: nfp: fix csum for ipsec offload - eth: mtk_eth_soc: fix RX data corruption issue Misc: - usb: qmi_wwan: add telit 0x1080 composition" * tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) tools: ynl: fix enum-as-flags in the generic CLI tools: ynl: move the enum classes to shared code net: avoid double iput when sock_alloc_file fails af_unix: fix struct pid leaks in OOB support eth: fealnx: bring back this old driver net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC net: microchip: sparx5: fix deletion of existing DSCP mappings octeontx2-af: Unlock contexts in the queue context cache in case of fault detection net/smc: fix fallback failed while sendmsg with fastopen ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause mailmap: update entries for Stephen Hemminger mailmap: add entry for Maxim Mikityanskiy nfc: change order inside nfc_se_io error path ethernet: ice: avoid gcc-9 integer overflow warning ice: don't ignore return codes in VSI related code ice: Fix DSCP PFC TLV creation net: usb: qmi_wwan: add Telit 0x1080 composition net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 netfilter: conntrack: adopt safer max chain length net: tls: fix device-offloaded sendpage straddling records ...
2023-03-09Merge tag 'for-linus-2023030901' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fix potential out of bound write of zeroes in HID core with a specially crafted uhid device (Lee Jones) - fix potential use-after-free in work function in intel-ish-hid (Reka Norman) - selftests config fixes (Benjamin Tissoires) - few device small fixes and support * tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: intel-ish-hid: ipc: Fix potential use-after-free in work function HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded selftest: hid: fix hid_bpf not set in config HID: uhid: Over-ride the default maximum data buffer value with our own HID: core: Provide new max_buffer_size attribute to over-ride the default
2023-03-09sctp: add weighted fair queueing stream schedulerXin Long
As it says in rfc8260#section-3.6 about the weighted fair queueing scheduler: A Weighted Fair Queueing scheduler between the streams is used. The weight is configurable per outgoing SCTP stream. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to use the capacity according to the given weights. If the weight of stream S1 is n times the weight of stream S2, the scheduler should assign to stream S1 n times the capacity it assigns to stream S2. The details are implementation dependent. Interleaving user messages allows for a better realization of the capacity usage according to the given weights. This patch adds Weighted Fair Queueing Scheduler actually based on the code of Fair Capacity Scheduler by adding fc_weight into struct sctp_stream_out_ext and taking it into account when sorting stream-> fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-09sctp: add fair capacity stream schedulerXin Long
As it says in rfc8260#section-3.5 about the fair capacity scheduler: A fair capacity distribution between the streams is used. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to maintain an equal capacity for all streams. The details are implementation dependent. interleaving user messages allows for a better realization of the fair capacity usage. This patch adds Fair Capacity Scheduler based on the foundations added by commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"): A fc_list and a fc_length are added into struct sctp_stream_out_ext and a fc_list is added into struct sctp_stream. In .enqueue, when there are chunks enqueued into a stream, this stream will be linked into stream-> fc_list by its fc_list ordered by its fc_length. In .dequeue, it always picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length is increased by chunk's len and update its location in stream->fc_list according to the its new fc_length. Note that when the new fc_length overflows in .dequeue_done, instead of resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to avoid a moment of imbalance in the scheduling, as Marcelo suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Andrii Nakryiko says: ==================== pull-request: bpf-next 2023-03-08 We've added 23 non-merge commits during the last 2 day(s) which contain a total of 28 files changed, 414 insertions(+), 104 deletions(-). The main changes are: 1) Add more precise memory usage reporting for all BPF map types, from Yafang Shao. 2) Add ARM32 USDT support to libbpf, from Puranjay Mohan. 3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 4) IMA selftests fix, from Roberto Sassu. 5) libbpf fix in APK support code, from Daniel Müller. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Fix IMA test libbpf: USDT arm arg parsing support libbpf: Refactor parse_usdt_arg() to re-use code libbpf: Fix theoretical u32 underflow in find_cd() function bpf: enforce all maps having memory usage callback bpf: offload map memory usage bpf, net: xskmap memory usage bpf, net: sock_map memory usage bpf, net: bpf_local_storage memory usage bpf: local_storage memory usage bpf: bpf_struct_ops memory usage bpf: queue_stack_maps memory usage bpf: devmap memory usage bpf: cpumap memory usage bpf: bloom_filter memory usage bpf: ringbuf memory usage bpf: reuseport_array memory usage bpf: stackmap memory usage bpf: arraymap memory usage bpf: hashtab memory usage ... ==================== Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08net: reclaim skb->scm_io_uring bitEric Dumazet
Commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") added one bit to struct sk_buff. This structure is critical for networking, and we try very hard to not add bloat on it, unless absolutely required. For instance, we can use a specific destructor as a wrapper around unix_destruct_scm(), to identify skbs that unix_gc() has to special case. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-07net: remove enum skb_free_reasonEric Dumazet
enum skb_drop_reason is more generic, we can adopt it instead. Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason(). This means drivers can use more precise drop reasons if they want to. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07net: phy: improve phy_read_poll_timeoutHeiner Kallweit
cond sometimes is (val & MASK) what may result in a false positive if val is a negative errno. We shouldn't evaluate cond if val < 0. This has no functional impact here, but it's not nice. Therefore switch order of the checks. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07ynl: re-license uniformly under GPL-2.0 OR BSD-3-ClauseJakub Kicinski
I was intending to make all the Netlink Spec code BSD-3-Clause to ease the adoption but it appears that: - I fumbled the uAPI and used "GPL WITH uAPI note" there - it gives people pause as they expect GPL in the kernel As suggested by Chuck re-license under dual. This gives us benefit of full BSD freedom while fulfilling the broad "kernel is under GPL" expectations. Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/ Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07cpumask: be more careful with 'cpumask_setall()'Linus Torvalds
Commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") changed cpumask_setall() to use "bitmap_set()" instead of "bitmap_fill()", because bitmap_fill() would explicitly set all the bits of a constant sized small bitmap, and that's exactly what we don't want: we want to only set bits up to 'nr_cpu_ids', which is what "bitmap_set()" does. However, Yury correctly points out that while "bitmap_set()" does indeed only set bits up to the required bitmap size, it doesn't _clear_ bits above that size, so the upper bits would still not have well-defined values. Now, none of this should really matter, since any bits set past 'nr_cpu_ids' should always be ignored in the first place. Yes, the bit scanning functions might return them as a result, but since users should always consider the ">= nr_cpu_ids" condition to mean "no more bits", that shouldn't have any actual effect (see previous commit 8ca09d5fa354 "cpumask: fix incorrect cpumask scanning result checks"). But let's just do it right, the way the code was _intended_ to work. We have had enough lazy code that works but bites us in the *rse later (again, see previous commit) that there's no reason to not just do this properly. It turns out that "bitmap_fill()" gets this all right for the complex case, and really only fails for the inlined optimized case that just fills the whole word. And while we could just fix bitmap_fill() to use the proper last word mask, there's two issues with that: - the cpumask case wants to do the _optimization_ based on "NR_CPUS is a small constant", but then wants to do the actual bit _fill_ based on "nr_cpu_ids" that isn't necessarily that same constant - we have lots of non-cpumask users of bitmap_fill(), and while they hopefully don't care, and probably would want the proper semantics anyway ("only set bits up to the limit"), I do not want the cpumask changes to impact other parts So this ends up just doing the single-word optimization by hand in the cpumask code. If our cpumask is fundamentally limited to a single word, just do the proper "fill in that word" exactly. And if it's the more complex multi-word case, then the generic bitmap_fill() will DTRT. This is all an example of how our bitmap function optimizations really are somewhat broken. They conflate the "this is size of the bitmap" optimizations with the actual bit(s) we want to set. In many cases we really want to have the two be separate things: sometimes we base our optimizations on the size of the whole bitmap ("I know this whole bitmap fits in a single word, so I'll just use single-word accesses"), and sometimes we base them on the bit we are looking at ("this is just acting on bits that are in the first word, so I'll use single-word accesses"). Notice how the end result of the two optimizations are the same, but the way we get to them are quite different. And all our cpumask optimization games are really about that fundamental distinction, and we'd often really want to pass in both the "this is the bit I'm working on" (which _can_ be a small constant but might be variable), and "I know it's in this range even if it's variable" (based on CONFIG_NR_CPUS). So this cpumask_setall() implementation just makes that explicit. It checks the "I statically know the size is small" using the known static size of the cpumask (which is what that 'small_cpumask_bits' is all about), but then sets the actual bits using the exact number of cpus we have (ie 'nr_cpumask_bits') Of course, in a perfect world, the compiler would have done all the range analysis (possibly with help from us just telling it that "this value is always in this range"), and would do all of this for us. But that is not the world we live in. While we dream of that perfect world, this does that manual logic to make it all work out. And this was a very long explanation for a small code change that shouldn't even matter. Reported-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-07bpf: offload map memory usageYafang Shao
A new helper is introduced to calculate offload map memory usage. But currently the memory dynamically allocated in netdev dev_ops, like nsim_map_update_elem, is not counted. Let's just put it aside now. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf, net: xskmap memory usageYafang Shao
A new helper is introduced to calculate xskmap memory usage. The xfsmap memory usage can be dynamically changed when we add or remove a xsk_map_node. Hence we need to track the count of xsk_map_node to get its memory usage. The result as follows, - before 10: xskmap name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B - after 10: xskmap name count_map flags 0x0 <<< no elements case key 4B value 4B max_entries 65536 memlock 524608B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf, net: bpf_local_storage memory usageYafang Shao
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc. Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: add new map ops ->map_mem_usageYafang Shao
Add a new map ops ->map_mem_usage to print the memory usage of a bpf map. This is a preparation for the followup change. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: Increase size of BTF_ID_LIST without CONFIG_DEBUG_INFO_BTF againNathan Chancellor
After commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"), clang builds without CONFIG_DEBUG_INFO_BTF warn: kernel/bpf/verifier.c:10298:24: warning: array index 16 is past the end of the array (that has type 'u32[16]' (aka 'unsigned int[16]')) [-Warray-bounds] meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { ^ ~~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/verifier.c:9150:1: note: array 'special_kfunc_list' declared here BTF_ID_LIST(special_kfunc_list) ^ include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST' #define BTF_ID_LIST(name) static u32 __maybe_unused name[16]; ^ 1 warning generated. A warning of this nature was previously addressed by commit beb3d47d1d3d ("bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not set") but there have been new kfuncs added since then. Quadruple the size of the CONFIG_DEBUG_INFO_BTF=n definition so that this problem is unlikely to show up for some time. Link: https://github.com/ClangBuiltLinux/linux/issues/1810 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230307-bpf-kfuncs-warray-bounds-v1-1-00ad3191f3a6@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-06Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06cpumask: Fix typo nr_cpumask_size --> nr_cpumask_bitsAndy Shevchenko
The never used nr_cpumask_size is just a typo, hence use existing redefinition that's called nr_cpumask_bits. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-06netfilter: tproxy: fix deadlock due to missing BH disableFlorian Westphal
The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid <major.david@balasys.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-05cpumask: re-introduce constant-sized cpumask optimizationsLinus Torvalds
Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05Merge tag 'irq-urgent-2023-03-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "A set of updates for the interrupt susbsystem: - Prevent possible NULL pointer derefences in irq_data_get_affinity_mask() and irq_domain_create_hierarchy() - Take the per device MSI lock before invoking code which relies on it being hold - Make sure that MSI descriptors are unreferenced before freeing them. This was overlooked when the platform MSI code was converted to use core infrastructure and results in a fals positive warning - Remove dead code in the MSI subsystem - Clarify the documentation for pci_msix_free_irq() - More kobj_type constification" * tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced genirq/msi: Drop dead domain name assignment irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy() genirq/irqdesc: Make kobj_type structures constant PCI/MSI: Clarify usage of pci_msix_free_irq() genirq/msi: Take the per-device MSI lock before validating the control structure genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
2023-03-05Remove Intel compiler supportMasahiro Yamada
include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-04Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. Eight are for MM and seven are for other parts of the kernel. Seven are cc:stable and eight address post-6.3 issues or were judged unsuitable for -stable backporting" * tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: map Dikshita Agarwal's old address to his current one mailmap: map Vikash Garodia's old address to his current one fs/cramfs/inode.c: initialize file_ra_state fs: hfsplus: fix UAF issue in hfsplus_put_super panic: fix the panic_print NMI backtrace setting lib: parser: update documentation for match_NUMBER functions kasan, x86: don't rename memintrinsics in uninstrumented files kasan: test: fix test for new meminstrinsic instrumentation kasan: treat meminstrinsic as builtins in uninstrumented files kasan: emit different calls for instrumentable memintrinsics ocfs2: fix non-auto defrag path not working issue ocfs2: fix defrag path triggering jbd2 ASSERT mailmap: map Georgi Djakov's old Linaro address to his current one mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH mm/damon/paddr: fix missing folio_put() mm/mremap: fix dup_anon_vma() in vma_merge() case 4
2023-03-04Merge tag 'sound-fix-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of various small fixes that have been gathered since the last PR. The majority of changes are for ASoC, and there is a small change in ASoC PCM core, but the rest are all for driver- specific fixes / quirks / updates" * tag 'sound-fix-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: ice1712: Delete unreachable code in aureon_add_controls() ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls() ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC ALSA: hda/realtek: Improve support for Dell Precision 3260 ASoC: mediatek: mt8195: add missing initialization ASoC: mediatek: mt8188: add missing initialization ASoC: amd: yc: Add DMI entries to support HP OMEN 16-n0xxx (8A43) ASoC: zl38060 add gpiolib dependency ASoC: sam9g20ek: Disable capture unless building with microphone input ASoC: mt8192: Fix range for sidetone positive gain ASoC: mt8192: Report an error if when an invalid sidetone gain is written ASoC: mt8192: Fix event generation for controls ASoC: mt8192: Remove spammy log messages ASoC: mchp-pdmc: fix poc noise at capture startup ASoC: dt-bindings: sama7g5-pdmc: add microchip,startup-delay-us binding ASoC: soc-pcm: add option to start DMA after DAI ASoC: mt8183: Fix event generation for I2S DAI operations ASoC: mt8183: Remove spammy logging from I2S DAI driver ASoC: mt6358: Remove undefined HPx Mux enumeration values ASoC: mt6358: Validate Wake on Voice 2 writes ...
2023-03-03Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "Updates that missed the first pull, mostly because of needing more soak time. Driver updates (zfcp, ufs, mpi3mr, plus two ipr bug fixes), an enclosure services (ses) update (mostly bug fixes) and other minor bug fixes and changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi: zfcp: Trace when request remove fails after qdio send fails scsi: zfcp: Change the type of all fsf request id fields and variables to u64 scsi: zfcp: Make the type for accessing request hashtable buckets size_t scsi: ufs: core: Simplify ufshcd_execute_start_stop() scsi: ufs: core: Rely on the block layer for setting RQF_PM scsi: core: Extend struct scsi_exec_args scsi: lpfc: Fix double word in comments scsi: core: Remove the /proc/scsi/${proc_name} directory earlier scsi: core: Fix a source code comment scsi: cxgbi: Remove unneeded version.h include scsi: qedi: Remove unneeded version.h include scsi: mpi3mr: Remove unneeded version.h include scsi: mpi3mr: Fix missing mrioc->evtack_cmds initialization scsi: mpi3mr: Use number of bits to manage bitmap sizes scsi: mpi3mr: Remove unnecessary memcpy() to alltgt_info->dmi scsi: mpi3mr: Fix issues in mpi3mr_get_all_tgt_info() scsi: mpi3mr: Fix an issue found by KASAN scsi: mpi3mr: Replace 1-element array with flex-array scsi: ipr: Work around fortify-string warning scsi: ipr: Make ipr_probe_ioa_part2() return void ...
2023-03-03Merge tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds
Pull more io_uring updates from Jens Axboe: "Here's a set of fixes/changes that didn't make the first cut, either because they got queued before I sent the early merge request, or fixes that came in afterwards. In detail: - Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will error out (David) - Fix for spurious poll wakeups (me) - Fix for a file leak for buffered reads in certain conditions (Joseph) - Don't allow registered buffers of mixed types (Pavel) - Improve handling of huge pages for registered buffers (Pavel) - Provided buffer ring size calculation fix (Wojciech) - Minor cleanups (me)" * tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux: io_uring/poll: don't pass in wake func to io_init_poll_iocb() io_uring: fix fget leak when fs don't support nowait buffered read io_uring/poll: allow some retries for poll triggering spuriously io_uring: remove MSG_NOSIGNAL from recvmsg io_uring/rsrc: always initialize 'folio' to NULL io_uring/rsrc: optimise registered huge pages io_uring/rsrc: optimise single entry advance io_uring/rsrc: disallow multi-source reg buffers io_uring: remove unused wq_list_merge io_uring: fix size calculation when registering buf ring io_uring/rsrc: fix a comment in io_import_fixed() io_uring: rename 'in_idle' to 'in_cancel' io_uring: consolidate the put_ref-and-return section of adding work
2023-03-03Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl
2023-03-03bpf: Use separate RCU callbacks for freeing selemKumar Kartikeya Dwivedi
Martin suggested that instead of using a byte in the hole (which he has a use for in his future patch) in bpf_local_storage_elem, we can dispatch a different call_rcu callback based on whether we need to free special fields in bpf_local_storage_elem data. The free path, described in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"), only waits for call_rcu callbacks when there are special (kptrs, etc.) fields in the map value, hence it is necessary that we only access smap in this case. Therefore, dispatch different RCU callbacks based on the BPF map has a valid btf_record, which dereference and use smap's btf_record only when it is valid. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230303141542.300068-1-memxor@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-03Merge tag 'rtc-6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "A few drivers got some nice cleanups and a new driver are making the bulk of the changes. Subsystem: - allow rtc_read_alarm without read_alarm callback New driver: - NXP BBNSM module RTC Drivers: - use IRQ flags from fwnode when available - abx80x: nvmem support - brcmstb-waketimer: add non-wake alarm support - ingenic: provide CLK32K clock - isl12022: cleanups - moxart: switch to using gpiod API - pcf85363: allow setting quartz load - pm8xxx: cleanups and support for setting time - rv3028, rv3032: add ACPI support" * tag 'rtc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (64 commits) rtc: pm8xxx: add support for nvmem offset dt-bindings: rtc: qcom-pm8xxx: add nvmem-cell offset rtc: abx80x: Add nvmem support rtc: rx6110: Remove unused of_gpio,h rtc: efi: Avoid spamming the log on RTC read failure rtc: isl12022: sort header inclusion alphabetically rtc: isl12022: Join string literals back rtc: isl12022: Drop unneeded OF guards and of_match_ptr() rtc: isl12022: Explicitly use __le16 type for ISL12022_REG_TEMP_L rtc: isl12022: Get rid of unneeded private struct isl12022 rtc: pcf85363: add support for the quartz-load-femtofarads property dt-bindings: rtc: nxp,pcf8563: move pcf85263/pcf85363 to a dedicated binding rtc: allow rtc_read_alarm without read_alarm callback rtc: rv3032: add ACPI support rtc: rv3028: add ACPI support rtc: bbnsm: Add the bbnsm rtc support rtc: jz4740: Register clock provider for the CLK32K pin rtc: jz4740: Use dev_err_probe() rtc: jz4740: Use readl_poll_timeout dt-bindings: rtc: Add #clock-cells property ...
2023-03-03bpf: Refactor RCU enforcement in the verifier.Alexei Starovoitov
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack of such key mechanism makes it impossible for sleepable bpf programs to use RCU pointers. Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't support btf_type_tag yet) and allowlist certain field dereferences in important data structures like tast_struct, cgroup, socket that are used by sleepable programs either as RCU pointer or full trusted pointer (which is valid outside of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such tagging. They will be removed once GCC supports btf_type_tag. With that refactor check_ptr_to_btf_access(). Make it strict in enforcing PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without modifier flags. There is a chance that this strict enforcement might break existing programs (especially on GCC compiled kernels), but this cleanup has to start sooner than later. Note PTR_TO_CTX access still yields old deprecated PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0. Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
2023-03-03bpf: Introduce kptr_rcu.Alexei Starovoitov
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03Merge tag 'asoc-fix-v6.3' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.3 Almost all of this is driver specific fixes and new IDs that have come in during the merge window. A good chunk of them are simple ones from me which came about due to a bunch of Mediatek Chromebooks being enabled in KernelCI, there's more where that came from. We do have one small feature added to the PCM core by Claudiu Beznea in order to allow the sequencing required to resolve a noise issue with the Microchip PDMC driver.
2023-03-02bpf: Add support for absolute value BPF timersTero Kristo
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() to start an absolute value timer instead of the default relative value. This makes the timer expire at an exact point in time, instead of a time with latencies induced by both the BPF and timer subsystems. Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02Merge tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "fbdev: - fix uninit var in error path shmem: - revert unGPLing an export i915: - Don't use stolen memory or BAR mappings for ring buffers with LLC - Add inverted backlight quirk for HP 14-r206nv - Fix GSI offset for MCR lookups - GVT fixes (memleak, debugfs attributes, kconfig, typos) amdgpu: - SMU 13 fixes - Enable TMZ for GC 10.3.6 - Misc display fixes - Buddy allocator fixes - GC 11 fixes - S0ix fix - INFO IOCTL queries for GC 11 - VCN harvest fixes for SR-IOV - UMC 8.10 RAS fixes - Don't restrict bpc to 8 - NBIO 7.5 fix - Allow freesync on PCon for more devices amdkfd: - SDMA fix - Illegal memory access fix" * tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drm: (45 commits) drm/amdgpu/vcn: fix compilation issue with legacy gcc drm/amd/display: Extend Freesync over PCon support for more devices Revert "drm/amd/display: Do not set DRR on pipe commit" drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes drm/amd/display: Ext displays with dock can't recognized after resume drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini drm/amdgpu: remove unused variable ring drm/amd/display: fix dm irq error message in gpu recover drm/amd: Fix initialization for nbio 7.5.1 drm/amd/display: Don't restrict bpc to 8 bpc drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable drm/radeon: Fix eDP for single-display iMac11,2 drm/shmem-helper: Revert accidental non-GPL export drm: omapdrm: Do not use helper unininitialized in omap_fbdev_init() drm/amd/pm: downgrade log level upon SMU IF version mismatch drm/amdgpu: Add ecc info query interface for umc v8_10 drm/amdgpu: Add convert_error_address function for umc v8_10 drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err drm/amdgpu: change default behavior of bad_page_threshold parameter drm/amdgpu: exclude duplicate pages from UMC RAS UE count ...
2023-03-02Merge tag 'objtool-core-2023-03-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Shrink 'struct instruction', to improve objtool performance & memory footprint - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field - Misc fixes & cleanups * tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) objtool: Fix ORC 'signal' propagation objtool: Remove instruction::list x86: Fix FILL_RETURN_BUFFER objtool: Fix overlapping alternatives objtool: Union instruction::{call_dest,jump_table} objtool: Remove instruction::reloc objtool: Shrink instruction::{type,visited} objtool: Make instruction::alts a single-linked list objtool: Make instruction::stack_ops a single-linked list objtool: Change arch_decode_instruction() signature x86/entry: Fix unwinding from kprobe on PUSH/POP instruction x86/unwind/orc: Add 'signal' field to ORC metadata objtool: Optimize layout of struct special_alt objtool: Optimize layout of struct symbol objtool: Allocate multiple structures with calloc() objtool: Make struct check_options static objtool: Make struct entries[] static and const objtool: Fix HOSTCC flag usage objtool: Properly support make V=1 objtool: Install libsubcmd in build ...
2023-03-02genirq/msi, platform-msi: Ensure that MSI descriptors are unreferencedThomas Gleixner
Miquel reported a warning in the MSI core which is triggered when interrupts are freed via platform_msi_device_domain_free(). This code got reworked to use core functions for freeing the MSI descriptors, but nothing took care to clear the msi_desc->irq entry, which then triggers the warning in msi_free_msi_desc() which uses desc->irq to validate that the descriptor has been torn down. The same issue exists in msi_domain_populate_irqs(). Up to the point that msi_free_msi_descs() grew a warning for this case, this went un-noticed. Provide the counterpart of msi_domain_populate_irqs() and invoke it in platform_msi_device_domain_free() before freeing the interrupts and MSI descriptors and also in the error path of msi_domain_populate_irqs(). Fixes: 2f2940d16823 ("genirq/msi: Remove filter from msi_free_descs_free_range()") Reported-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Miquel Raynal <miquel.raynal@bootlin.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87mt4wkwnv.ffs@tglx
2023-03-01bpf: Support kptrs in local storage mapsKumar Kartikeya Dwivedi
Enable support for kptrs in local storage maps by wiring up the freeing of these kptrs from map value. Freeing of bpf_local_storage_map is only delayed in case there are special fields, therefore bpf_selem_free_* path can also only dereference smap safely in that case. This is recorded using a bool utilizing a hole in bpF_local_storage_elem. It could have been tagged in the pointer value smap using the lowest bit (since alignment > 1), but since there was already a hole I went with the simpler option. Only the map structure freeing is delayed using RCU barriers, as the buckets aren't used when selem is being freed, so they can be freed once all readers of the bucket lists can no longer access it. Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230225154010.391965-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01capability: just use a 'u64' instead of a 'u32[2]' arrayLinus Torvalds
Back in 2008 we extended the capability bits from 32 to 64, and we did it by extending the single 32-bit capability word from one word to an array of two words. It was then obfuscated by hiding the "2" behind two macro expansions, with the reasoning being that maybe it gets extended further some day. That reasoning may have been valid at the time, but the last thing we want to do is to extend the capability set any more. And the array of values not only causes source code oddities (with loops to deal with it), but also results in worse code generation. It's a lose-lose situation. So just change the 'u32[2]' into a 'u64' and be done with it. We still have to deal with the fact that the user space interface is designed around an array of these 32-bit values, but that was the case before too, since the array layouts were different (ie user space doesn't use an array of 32-bit values for individual capability masks, but an array of 32-bit slices of multiple masks). So that marshalling of data is actually simplified too, even if it does remain somewhat obscure and odd. This was all triggered by my reaction to the new "cap_isidentical()" introduced recently. By just using a saner data structure, it went from unsigned __capi; CAP_FOR_EACH_U32(__capi) { if (a.cap[__capi] != b.cap[__capi]) return false; } return true; to just being return a.val == b.val; instead. Which is rather more obvious both to humans and to compilers. Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-01bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwrJoanne Koong
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add xdp dynptrsJoanne Koong
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add skb dynptrsJoanne Koong
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Define no-ops for externally called bpf dynptr functionsJoanne Koong
Some bpf dynptr functions will be called from places where if CONFIG_BPF_SYSCALL is not set, then the dynptr function is undefined. For example, when skb type dynptrs are added in the next commit, dynptr functions are called from net/core/filter.c This patch defines no-op implementations of these dynptr functions so that they do not break compilation by being an undefined reference. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-5-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Refactor process_dynptr_funcJoanne Koong
This change cleans up process_dynptr_func's flow to be more intuitive and updates some comments with more context. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-3-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01Merge tag 'sh-for-v6.3-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: - regression fix in connection with the rtl8169 driver on SuperH boards that was introduced when the driver was switched to use devm_clk_get_optional_enabled() to simplify the code (Geert Uytterhoeven) - build warning fix to allow the kernel to be built with CONFIG_WERROR enabled (Michael Karcher) * tag 'sh-for-v6.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: clk: Fix clk_enable() to return 0 on NULL clk sh: intc: Avoid spurious sizeof-pointer-div warning
2023-03-01Merge tag 'loongarch-6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Make -mstrict-align configurable - Add kernel relocation and KASLR support - Add single kernel image implementation for kdump - Add hardware breakpoints/watchpoints support - Add kprobes/kretprobes/kprobes_on_ftrace support - Add LoongArch support for some selftests. * tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (23 commits) selftests/ftrace: Add LoongArch kprobe args string tests support selftests/seccomp: Add LoongArch selftesting support tools: Add LoongArch build infrastructure samples/kprobes: Add LoongArch support LoongArch: Mark some assembler symbols as non-kprobe-able LoongArch: Add kprobes on ftrace support LoongArch: Add kretprobes support LoongArch: Add kprobes support LoongArch: Simulate branch and PC* instructions LoongArch: ptrace: Add hardware single step support LoongArch: ptrace: Add function argument access API LoongArch: ptrace: Expose hardware breakpoints to debuggers LoongArch: Add hardware breakpoints/watchpoints support LoongArch: kdump: Add crashkernel=YM handling LoongArch: kdump: Add single kernel image implementation LoongArch: Add support for kernel address space layout randomization (KASLR) LoongArch: Add support for kernel relocation LoongArch: Add la_abs macro implementation LoongArch: Add JUMP_VIRT_ADDR macro implementation to avoid using la.abs LoongArch: Use la.pcrel instead of la.abs when it's trivially possible ...
2023-03-01Merge tag 'ubifs-for-linus-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull jffs2, ubi and ubifs updates from Richard Weinberger: "JFFS2: - Fix memory corruption in error path - Spelling and coding style fixes UBI: - Switch to BLK_MQ_F_BLOCKING in ubiblock - Wire up partent device (for sysfs) - Multiple UAF bugfixes - Fix for an infinite loop in WL error path UBIFS: - Fix for multiple memory leaks in error paths - Fixes for wrong space accounting - Minor cleanups - Spelling and coding style fixes" * tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (36 commits) ubi: block: Fix a possible use-after-free bug in ubiblock_create() ubifs: make kobj_type structures constant mtd: ubi: block: wire-up device parent mtd: ubi: wire-up parent MTD device ubi: use correct names in function kernel-doc comments ubi: block: set BLK_MQ_F_BLOCKING jffs2: Fix list_del corruption if compressors initialized failed jffs2: Use function instead of macro when initialize compressors jffs2: fix spelling mistake "neccecary"->"necessary" ubifs: Fix kernel-doc ubifs: Fix some kernel-doc comments UBI: Fastmap: Fix kernel-doc ubi: ubi_wl_put_peb: Fix infinite loop when wear-leveling work failed ubi: Fix UAF wear-leveling entry in eraseblk_count_seq_show() ubi: fastmap: Fix missed fm_anchor PEB in wear-leveling after disabling fastmap ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process ubifs: ubifs_writepage: Mark page dirty after writing inode failed ubifs: dirty_cow_znode: Fix memleak in error handling path ubifs: Re-statistic cleaned znode count if commit failed ubi: Fix permission display of the debugfs files ...
2023-02-28Merge tag 'pwm/for-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This rather small set of changes includes some minor fixes and improvements. The AB8500 driver gained support for reading the initial hardware state and the Synopsys DesignWare driver received some work to prepare for device tree and platform support" * tag 'pwm/for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: dwc: Use devm_pwmchip_add() pwm: dwc: Move memory allocation to own function pwm: dwc: Change &pci->dev to dev in probe dt-bindings: pwm: Document Synopsys DesignWare snps,pwm-dw-apb-timers-pwm2 pwm: iqs620a: Replace one remaining instance of regmap_update_bits() pwm: ab8500: Implement .get_state() pwm: ab8500: Fix calculation of duty and period pwm: lp3943: Drop unused i2c include dt-bindings: pwm: mediatek: Convert pwm-mediatek to DT schema pwm: stm32-lp: fix the check on arr and cmp registers update pwm: Move pwm_capture() dummy to restore order pwm: sifive: Always let the first pwm_apply_state succeed
2023-02-28ASoC: soc-pcm: add option to start DMA after DAIClaudiu Beznea
Add option to start DMA component after DAI trigger. This is done by filling the new struct snd_soc_component_driver::start_dma_last. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20230228110145.3770525-2-claudiu.beznea@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-27mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISONNaoya Horiguchi
After a memory error happens on a clean folio, a process unexpectedly receives SIGBUS when it accesses the error page. This SIGBUS killing is pointless and simply degrades the level of RAS of the system, because the clean folio can be dropped without any data lost on memory error handling as we do for a clean pagecache. When memory_failure() is called on a clean folio, try_to_unmap() is called twice (one from split_huge_page() and one from hwpoison_user_mappings()). The root cause of the issue is that pte conversion to hwpoisoned entry is now done in the first call of try_to_unmap() because PageHWPoison is already set at this point, while it's actually expected to be done in the second call. This behavior disturbs the error handling operation like removing pagecache, which results in the malfunction described above. So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only when we really intend to convert pte to hwpoison entry. This can prevent other callers of try_to_unmap() from accidentally converting to hwpoison entries. Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-27capability: add cap_isidenticalMateusz Guzik
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>