summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-09-16Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - improve rwsem scalability - add uninitialized rwsem debugging check - reduce lockdep's stacktrace memory usage and add diagnostics - misc cleanups, code consolidation and constification * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Fix up mutex_waiter usage locking/mutex: Use mutex flags macro instead of hard code locking/mutex: Make __mutex_owner static to mutex.c locking/qspinlock,x86: Clarify virt_spin_lock_key locking/rwsem: Check for operations on an uninitialized rwsem locking/rwsem: Make handoff writer optimistically spin on owner locking/lockdep: Report more stack trace statistics locking/lockdep: Reduce space occupied by stack traces stacktrace: Constify 'entries' arguments locking/lockdep: Make it clear that what lock_class::key points at is not modified
2019-09-16Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "This cycle's RCU changes were: - A few more RCU flavor consolidation cleanups. - Updates to RCU's list-traversal macros improving lockdep usability. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - Miscellaneous fixes. - Torture-test updates. - minor LKMM updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits) MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org rcu: Don't include <linux/ktime.h> in rcutiny.h rcu: Allow rcu_do_batch() to dynamically adjust batch sizes rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended rcu/nocb: Add bypass callback queueing rcu/nocb: Atomic ->len field in rcu_segcblist structure rcu/nocb: Unconditionally advance and wake for excessive CBs rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock rcu/nocb: Reduce contention at no-CBs invocation-done time rcu/nocb: Reduce contention at no-CBs registry-time CB advancement rcu/nocb: Round down for number of no-CBs grace-period kthreads rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks ...
2019-09-16Merge branch 'parisc-5.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - Make the powerpc implementation to read elf files available as a public kexec interface so it can be re-used on other architectures (Sven) - Implement kexec on parisc (Sven) - Add kprobes on ftrace on parisc (Sven) - Fix kernel crash with HSC-PCI cards based on card-mode Dino - Add assembly implementations for memset, strlen, strcpy, strncpy and strcat - Some cleanups, documentation updates, warning fixes, ... * 'parisc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (25 commits) parisc: Have git ignore generated real2.S and firmware.c parisc: Disable HP HSC-PCI Cards to prevent kernel crash parisc: add support for kexec_file_load() syscall parisc: wire up kexec_file_load syscall parisc: add kexec syscall support parisc: add __pdc_cpu_rendezvous() kprobes/parisc: remove arch_kprobe_on_func_entry() kexec_elf: support 32 bit ELF files kexec_elf: remove unused variable in kexec_elf_load() kexec_elf: remove Elf_Rel macro kexec_elf: remove PURGATORY_STACK_SIZE kexec_elf: remove parsing of section headers kexec_elf: change order of elf_*_to_cpu() functions kexec: add KEXEC_ELF parisc: Save some bytes in dino driver parisc: Drop comments which are already in pci.h parisc: Convert eisa_enumerator to use pr_cont() parisc: Avoid warning when loading hppb driver parisc: speed up flush_tlb_all_local with qemu parisc: Add ALTERNATIVE_CODE() and ALT_COND_RUN_ON_QEMU ...
2019-09-16Merge tag 'please-pull-ia64_for_5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 updates from Tony Luck: "The big change here is removal of support for SGI Altix" * tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits) genirq: remove the is_affinity_mask_valid hook ia64: remove CONFIG_SWIOTLB ifdefs ia64: remove support for machvecs ia64: move the screen_info setup to common code ia64: move the ROOT_DEV setup to common code ia64: rework iommu probing ia64: remove the unused sn_coherency_id symbol ia64: remove the SGI UV simulator support ia64: remove the zx1 swiotlb machvec ia64: remove CONFIG_ACPI ifdefs ia64: remove CONFIG_PCI ifdefs ia64: remove the hpsim platform ia64: remove now unused machvec indirections ia64: remove support for the SGI SN2 platform drivers: remove the SGI SN2 IOC4 base support drivers: remove the SGI SN2 IOC3 base support qla2xxx: remove SGI SN2 support qla1280: remove SGI SN2 support misc/sgi-xp: remove SGI SN2 support char/mspec: remove SGI SN2 support ...
2019-09-16Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Although there isn't tonnes of code in terms of line count, there are a fair few headline features which I've noted both in the tag and also in the merge commits when I pulled everything together. The part I'm most pleased with is that we had 35 contributors this time around, which feels like a big jump from the usual small group of core arm64 arch developers. Hopefully they all enjoyed it so much that they'll continue to contribute, but we'll see. It's probably worth highlighting that we've pulled in a branch from the risc-v folks which moves our CPU topology code out to where it can be shared with others. Summary: - 52-bit virtual addressing in the kernel - New ABI to allow tagged user pointers to be dereferenced by syscalls - Early RNG seeding by the bootloader - Improve robustness of SMP boot - Fix TLB invalidation in light of recent architectural clarifications - Support for i.MX8 DDR PMU - Remove direct LSE instruction patching in favour of static keys - Function error injection using kprobes - Support for the PPTT "thread" flag introduced by ACPI 6.3 - Move PSCI idle code into proper cpuidle driver - Relaxation of implicit I/O memory barriers - Build with RELR relocations when toolchain supports them - Numerous cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits) arm64: remove __iounmap arm64: atomics: Use K constraint when toolchain appears to support it arm64: atomics: Undefine internal macros after use arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL arm64: asm: Kill 'asm/atomic_arch.h' arm64: lse: Remove unused 'alt_lse' assembly macro arm64: atomics: Remove atomic_ll_sc compilation unit arm64: avoid using hard-coded registers for LSE atomics arm64: atomics: avoid out-of-line ll/sc atomics arm64: Use correct ll/sc atomic constraints jump_label: Don't warn on __exit jump entries docs/perf: Add documentation for the i.MX8 DDR PMU perf/imx_ddr: Add support for AXI ID filtering arm64: kpti: ensure patched kernel text is fetched from PoU arm64: fix fixmap copy for 16K pages and 48-bit VA perf/smmuv3: Validate groups for global filtering perf/smmuv3: Validate group size arm64: Relax Documentation/arm64/tagged-pointers.rst arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F arm64: mm: Ignore spurious translation faults taken from the kernel ...
2019-09-16Merge tag 'iommu-updates-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - batched unmap support for the IOMMU-API - support for unlocked command queueing in the ARM-SMMU driver - rework the ATS support in the ARM-SMMU driver - more refactoring in the ARM-SMMU driver to support hardware implemention specific quirks and errata - bounce buffering DMA-API implementatation in the Intel VT-d driver for untrusted devices (like Thunderbolt devices) - fixes for runtime PM support in the OMAP iommu driver - MT8183 IOMMU support in the Mediatek IOMMU driver - rework of the way the IOMMU core sets the default domain type for groups. Changing the default domain type on x86 does not require two kernel parameters anymore. - more smaller fixes and cleanups * tag 'iommu-updates-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (113 commits) iommu/vt-d: Declare Broadwell igfx dmar support snafu iommu/vt-d: Add Scalable Mode fault information iommu/vt-d: Use bounce buffer for untrusted devices iommu/vt-d: Add trace events for device dma map/unmap iommu/vt-d: Don't switch off swiotlb if bounce page is used iommu/vt-d: Check whether device requires bounce buffer swiotlb: Split size parameter to map/unmap APIs iommu/omap: Mark pm functions __maybe_unused iommu/ipmmu-vmsa: Disable cache snoop transactions on R-Car Gen3 iommu/ipmmu-vmsa: Move IMTTBCR_SL0_TWOBIT_* to restore sort order iommu: Don't use sme_active() in generic code iommu/arm-smmu-v3: Fix build error without CONFIG_PCI_ATS iommu/qcom: Use struct_size() helper iommu: Remove wrong default domain comments iommu/dma: Fix for dereferencing before null checking iommu/mediatek: Clean up struct mtk_smi_iommu memory: mtk-smi: Get rid of need_larbid iommu/mediatek: Fix VLD_PA_RNG register backup when suspend memory: mtk-smi: Add bus_sel for mt8183 memory: mtk-smi: Invoke pm runtime_callback to enable clocks ...
2019-09-16Merge tag 'core-process-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd/waitid updates from Christian Brauner: "This contains two features and various tests. First, it adds support for waiting on process through pidfds by adding the P_PIDFD type to the waitid() syscall. This completes the basic functionality of the pidfd api (cf. [1]). In the meantime we also have a new adition to the userspace projects that make use of the pidfd api. The qt project was nice enough to send a mail pointing out that they have a pr up to switch to the pidfd api (cf. [2]). Second, this tag contains an extension to the waitid() syscall to make it possible to wait on the current process group in a race free manner (even though the actual problem is very unlikely) by specifing 0 together with the P_PGID type. This extension traces back to a discussion on the glibc development mailing list. There are also a range of tests for the features above. Additionally, the test-suite which detected the pidfd-polling race we fixed in [3] is included in this tag" [1] https://lwn.net/Articles/794707/ [2] https://codereview.qt-project.org/c/qt/qtbase/+/108456 [3] commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state") * tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: waitid: Add support for waiting for the current process group tests: add pidfd poll tests tests: move common definitions and functions into pidfd.h pidfd: add pidfd_wait tests pidfd: add P_PIDFD to waitid()
2019-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-09-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Now that initial BPF backend for gcc has been merged upstream, enable BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to bpf_sysctl.file_pos, from Ilya. 2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions related to recent work on exposing BTF info through sysfs, from Andrii. 3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem headroom to be added twice, from Ciara. 4) Refactoring work to convert sock opt tests into test_progs framework in BPF kselftests, from Stanislav. 5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke. 6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge branch 'sched/rt' into sched/core, to pick up -rt changesIngo Molnar
Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-16Merge branch 'for-5.4' into for-linusPetr Mladek
2019-09-16bpf: fix accessing bpf_sysctl.file_pos on s390Ilya Leoshkevich
"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value != nux". This is because verifier rewrites a complete 32-bit bpf_sysctl.file_pos update to a partial update of the first 32 bits of 64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian systems. Fix by using an offset on big-endian systems. Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect a problem there, since it expects to see 0, which it gets with high probability in error cases, so change it to seek to offset 3 and expect 3 in bpf_sysctl.file_pos. Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
2019-09-16xdp: Fix race in dev_map_hash_update_elem() when replacing elementToke Høiland-Jørgensen
syzbot found a crash in dev_map_hash_update_elem(), when replacing an element with a new one. Jesper correctly identified the cause of the crash as a race condition between the initial lookup in the map (which is done before taking the lock), and the removal of the old element. Rather than just add a second lookup into the hashmap after taking the lock, fix this by reworking the function logic to take the lock before the initial lookup. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-15um: Enable CONFIG_CONSTRUCTORSJohannes Berg
We do need to call the constructors for *modules*, and at least for KASAN in the future, we must call even the kernel constructors only later when the kernel has been initialized. Instead of relying on libc to call them, emit an empty section for libc and let the kernel's CONSTRUCTORS code do the rest of the job. Tested that it indeed doesn't work in modules, and does work after the fixes in both, with a few functions with __attribute__((constructor)) in both dynamic and static builds. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't corrupt xfrm_interface parms before validation, from Nicolas Dichtel. 2) Revert use of usb-wakeup in btusb, from Mario Limonciello. 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from Leonardo Bras. 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso. 5) Missing ULP check in sock_map, from John Fastabend. 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun. 7) Fix length of SKB allocated in 6pack driver, from Christophe JAILLET. 8) ip6_route_info_create() returns an error pointer, not NULL. From Maciej Żenczykowski. 9) Only add RDS sock to the hashes after rs_transport is set, from Ka-Cheong Poon. 10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets. 11) Presence of transmit IPSEC offload in an SKB is not tested for correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff Kirsher. 12) Need rcu_barrier() when register_netdevice() takes one of the notifier based failure paths, from Subash Abhinov Kasiviswanathan. 13) Fix leak in sctp_do_bind(), from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) cdc_ether: fix rndis support for Mediatek based smartphones sctp: destroy bucket if failed to bind addr sctp: remove redundant assignment when call sctp_get_port_local sctp: change return type of sctp_get_port_local ixgbevf: Fix secpath usage for IPsec Tx offload sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' ixgbe: Fix secpath usage for IPsec TX offload. net: qrtr: fix memort leak in qrtr_tun_write_iter net: Fix null de-reference of device refcount ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' tun: fix use-after-free when register netdev failed tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR ixgbe: fix double clean of Tx descriptors with xdp ixgbe: Prevent u8 wrapping of ITR value to something less than 10us mlx4: fix spelling mistake "veify" -> "verify" net: hns3: fix spelling mistake "undeflow" -> "underflow" net: lmc: fix spelling mistake "runnin" -> "running" NFC: st95hf: fix spelling mistake "receieve" -> "receive" net/rds: An rds_sock is added too early to the hash table mac80211: Do not send Layer 2 Update frame before authorization ...
2019-09-13padata: remove cpu_index from the parallel_queueDaniel Jordan
With the removal of the ENODATA case from padata_get_next, the cpu_index field is no longer useful, so it can go away. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13padata: unbind parallel jobs from specific CPUsDaniel Jordan
Padata binds the parallel part of a job to a single CPU and round-robins over all CPUs in the system for each successive job. Though the serial parts rely on per-CPU queues for correct ordering, they're not necessary for parallel work, and it improves performance to run the job locally on NUMA machines and let the scheduler pick the CPU within a node on a busy system. So, make the parallel workqueue unbound. Update the parallel workqueue's cpumask when the instance's parallel cpumask changes. Now that parallel jobs no longer run on max_active=1 workqueues, two or more parallel works that hash to the same CPU may run simultaneously, finish out of order, and so be serialized out of order. Prevent this by keeping the works sorted on the reorder list by sequence number and checking that in the reordering logic. padata_get_next becomes padata_find_next so it can be reused for the end of padata_reorder, where it's used to avoid uselessly queueing work when the next job by sequence number isn't finished yet but a later job that hashed to the same CPU has. The ENODATA case in padata_find_next no longer makes sense because parallel jobs aren't bound to specific CPUs. The EINPROGRESS case takes care of the scenario where a parallel job is potentially running on the same CPU as padata_find_next, and with only one error code left, just use NULL instead. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13padata: use separate workqueues for parallel and serial workDaniel Jordan
padata currently uses one per-CPU workqueue per instance for all work. Prepare for running parallel jobs on an unbound workqueue by introducing dedicated workqueues for parallel and serial work. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possibleDaniel Jordan
With pcrypt's cpumask no longer used, take the CPU hotplug lock inside padata_alloc_possible. Useful later in the series for avoiding nested acquisition of the CPU hotplug lock in padata when padata_alloc_possible is allocating an unbound workqueue. Without this patch, this nested acquisition would happen later in the series: pcrypt_init_padata get_online_cpus alloc_padata_possible alloc_padata alloc_workqueue(WQ_UNBOUND) // later in the series alloc_and_link_pwqs apply_wqattrs_lock get_online_cpus // recursive rwsem acquisition Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13padata: make padata_do_parallel find alternate callback CPUDaniel Jordan
padata_do_parallel currently returns -EINVAL if the callback CPU isn't in the callback cpumask. pcrypt tries to prevent this situation by keeping its own callback cpumask in sync with padata's and checks that the callback CPU it passes to padata is valid. Make padata handle this instead. padata_do_parallel now takes a pointer to the callback CPU and updates it for the caller if an alternate CPU is used. Overall behavior in terms of which callback CPUs are chosen stays the same. Prepares for removal of the padata cpumask notifier in pcrypt, which will fix a lockdep complaint about nested acquisition of the CPU hotplug lock later in the series. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13workqueue: require CPU hotplug read exclusion for apply_workqueue_attrsDaniel Jordan
Change the calling convention for apply_workqueue_attrs to require CPU hotplug read exclusion. Avoids lockdep complaints about nested calls to get_online_cpus in a future patch where padata calls apply_workqueue_attrs when changing other CPU-hotplug-sensitive data structures with the CPU read lock already held. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13workqueue: unconfine alloc/apply/free_workqueue_attrs()Daniel Jordan
padata will use these these interfaces in a later patch, so unconfine them. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13padata: allocate workqueue internallyDaniel Jordan
Move workqueue allocation inside of padata to prepare for further changes to how padata uses workqueues. Guarantees the workqueue is created with max_active=1, which padata relies on to work correctly. No functional change. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-13sched/psi: Correct overly pessimistic size calculationMiles Chen
When passing a equal or more then 32 bytes long string to psi_write(), psi_write() copies 31 bytes to its buf and overwrites buf[30] with '\0'. Which makes the input string 1 byte shorter than it should be. Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf). This does not cause problems in normal use case like: "some 500000 10000000" or "full 500000 10000000" because they are less than 32 bytes in length. /* assuming nbytes == 35 */ char buf[32]; buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */ if (copy_from_user(buf, user_buf, buf_size)) return -EFAULT; buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */ Before: %cd /proc/pressure/ %echo "123456789|123456789|123456789|1234" > memory [ 22.473497] nbytes=35,buf_size=31 [ 22.473775] 123456789|123456789|123456789| (print 30 chars) %sh: write error: Invalid argument %echo "123456789|123456789|123456789|1" > memory [ 64.916162] nbytes=32,buf_size=31 [ 64.916331] 123456789|123456789|123456789| (print 30 chars) %sh: write error: Invalid argument After: %cd /proc/pressure/ %echo "123456789|123456789|123456789|1234" > memory [ 254.837863] nbytes=35,buf_size=32 [ 254.838541] 123456789|123456789|123456789|1 (print 31 chars) %sh: write error: Invalid argument %echo "123456789|123456789|123456789|1" > memory [ 9965.714935] nbytes=32,buf_size=32 [ 9965.715096] 123456789|123456789|123456789|1 (print 31 chars) %sh: write error: Invalid argument Also remove the superfluous parentheses. Signed-off-by: Miles Chen <miles.chen@mediatek.com> Cc: <linux-mediatek@lists.infradead.org> Cc: <wsd_upstream@mediatek.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-13sched/fair: Speed-up energy-aware wake-upsQuentin Perret
EAS computes the energy impact of migrating a waking task when deciding on which CPU it should run. However, the current approach is known to have a high algorithmic complexity, which can result in prohibitively high wake-up latencies on systems with complex energy models, such as systems with per-CPU DVFS. On such systems, the algorithm complexity is in O(n^2) (ignoring the cost of searching for performance states in the EM) with 'n' the number of CPUs. To address this, re-factor the EAS wake-up path to compute the energy 'delta' (with and without the task) on a per-performance domain basis, rather than system-wide, which brings the complexity down to O(n). No functional changes intended. Test results ~~~~~~~~~~~~ * Setup: Tested on a Google Pixel 3, with a Snapdragon 845 (4+4 CPUs, A55/A75). Base kernel is 5.3-rc5 + Pixel3 specific patches. Android userspace, no graphics. * Test case: Run a periodic rt-app task, with 16ms period, ramping down from 70% to 10%, in 5% steps of 500 ms each (json avail. at [1]). Frequencies of all CPUs are pinned to max (using scaling_min_freq CPUFreq sysfs entries) to reduce variability. The time to run select_task_rq_fair() is measured using the function profiler (/sys/kernel/debug/tracing/trace_stat/function*). See the test script for more details [2]. Test 1: I hacked the DT to 'fake' per-CPU DVFS. That is, we end up with one CPUFreq policy per CPU (8 policies in total). Since all frequencies are pinned to max for the test, this should have no impact on the actual frequency selection, but it does in the EAS calculation. +---------------------------+----------------------------------+ | Without patch | With patch | +-----+-----+----------+----------+-----+-----------------+----------+ | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) | |-----+-----+----------+----------+-----+-----------------+----------+ | 0 | 274 | 38.303 | 1750.239 | 401 | 14.126 (-63.1%) | 146.625 | | 1 | 197 | 49.529 | 1695.852 | 314 | 16.135 (-67.4%) | 167.525 | | 2 | 142 | 34.296 | 1758.665 | 302 | 14.133 (-58.8%) | 130.071 | | 3 | 172 | 31.734 | 1490.975 | 641 | 14.637 (-53.9%) | 139.189 | | 4 | 316 | 7.834 | 178.217 | 425 | 5.413 (-30.9%) | 20.803 | | 5 | 447 | 8.424 | 144.638 | 556 | 5.929 (-29.6%) | 27.301 | | 6 | 581 | 14.886 | 346.793 | 456 | 5.711 (-61.6%) | 23.124 | | 7 | 456 | 10.005 | 211.187 | 997 | 4.708 (-52.9%) | 21.144 | +-----+-----+----------+----------+-----+-----------------+----------+ * Hit, Avg and s^2 are as reported by the function profiler Test 2: I also ran the same test with a normal DT, with 2 CPUFreq policies, to see if this causes regressions in the most common case. +---------------------------+----------------------------------+ | Without patch | With patch | +-----+-----+----------+----------+-----+-----------------+----------+ | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) | |-----+-----+----------+----------+-----+-----------------+----------+ | 0 | 345 | 22.184 | 215.321 | 580 | 18.635 (-16.0%) | 146.892 | | 1 | 358 | 18.597 | 200.596 | 438 | 12.934 (-30.5%) | 104.604 | | 2 | 359 | 25.566 | 200.217 | 397 | 10.826 (-57.7%) | 74.021 | | 3 | 362 | 16.881 | 200.291 | 718 | 11.455 (-32.1%) | 102.280 | | 4 | 457 | 3.822 | 9.895 | 757 | 4.616 (+20.8%) | 13.369 | | 5 | 344 | 4.301 | 7.121 | 594 | 5.320 (+23.7%) | 18.798 | | 6 | 472 | 4.326 | 7.849 | 464 | 5.648 (+30.6%) | 22.022 | | 7 | 331 | 4.630 | 13.937 | 408 | 5.299 (+14.4%) | 18.273 | +-----+-----+----------+----------+-----+-----------------+----------+ * Hit, Avg and s^2 are as reported by the function profiler In addition to these two tests, I also ran 50 iterations of the Lisa EAS functional test suite [3] with this patch applied on Arm Juno r0, Arm Juno r2, Arm TC2 and Hikey960, and could not see any regressions (all EAS functional tests are passing). [1] https://paste.debian.net/1100055/ [2] https://paste.debian.net/1100057/ [3] https://github.com/ARM-software/lisa/blob/master/lisa/tests/scheduler/eas_behaviour.py Signed-off-by: Quentin Perret <quentin.perret@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: qais.yousef@arm.com Cc: qperret@qperret.net Cc: rjw@rjwysocki.net Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20190912094404.13802-1-qperret@qperret.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-12cgroup: freezer: fix frozen state inheritanceRoman Gushchin
If a new child cgroup is created in the frozen cgroup hierarchy (one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup flag should be set. Otherwise if a process will be attached to the child cgroup, it won't become frozen. The problem can be reproduced with the test_cgfreezer_mkdir test. This is the output before this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen not ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork And with this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork Reported-by: Mark Crossen <mcrossen@fb.com> Signed-off-by: Roman Gushchin <guro@fb.com> Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer") Cc: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Tejun Heo <tj@kernel.org>
2019-09-12Merge tag 'for-linus-20190912' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull clone3 fix from Christian Brauner: "This is a last-minute bugfix for clone3() that should go in before we release 5.3 with clone3(). clone3() did not verify that the exit_signal argument was set to a valid signal. This can be used to cause a crash by specifying a signal greater than NSIG. e.g. -1. The commit from Eugene adds a check to copy_clone_args_from_user() to verify that the exit signal is limited by CSIGNAL as with legacy clone() and that the signal is valid. With this we don't get the legacy clone behavior were an invalid signal could be handed down and would only be detected and then ignored in do_notify_parent(). Users of clone3() will now get a proper error right when they pass an invalid exit signal. Note, that this is not a change in user-visible behavior since no kernel with clone3() has been released yet" * tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: fork: block invalid exit signals with clone3()
2019-09-12fork: block invalid exit signals with clone3()Eugene Syromiatnikov
Previously, higher 32 bits of exit_signal fields were lost when copied to the kernel args structure (that uses int as a type for the respective field). Moreover, as Oleg has noted, exit_signal is used unchecked, so it has to be checked for sanity before use; for the legacy syscalls, applying CSIGNAL mask guarantees that it is at least non-negative; however, there's no such thing is done in clone3() code path, and that can break at least thread_group_leader. This commit adds a check to copy_clone_args_from_user() to verify that the exit signal is limited by CSIGNAL as with legacy clone() and that the signal is valid. With this we don't get the legacy clone behavior were an invalid signal could be handed down and would only be detected and ignored in do_notify_parent(). Users of clone3() will now get a proper error when they pass an invalid exit signal. Note, that this is not user-visible behavior since no kernel with clone3() has been released yet. The following program will cause a splat on a non-fixed clone3() version and will fail correctly on a fixed version: #define _GNU_SOURCE #include <linux/sched.h> #include <linux/types.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> int main(int argc, char *argv[]) { pid_t pid = -1; struct clone_args args = {0}; args.exit_signal = -1; pid = syscall(__NR_clone3, &args, sizeof(struct clone_args)); if (pid < 0) exit(EXIT_FAILURE); if (pid == 0) exit(EXIT_SUCCESS); wait(NULL); exit(EXIT_SUCCESS); } Fixes: 7f192e3cd316 ("fork: add clone3") Reported-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com [christian.brauner@ubuntu.com: simplify check and rework commit message] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-09-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "Fix an initialization bug in the hw-breakpoints, which triggered on the ARM platform" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
2019-09-12Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix a race in the IRQ resend mechanism, which can result in a NULL dereference crash" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent NULL pointer dereference in resend_irqs()
2019-09-11module: remove unneeded casts in cmp_name()Masahiro Yamada
You can pass opaque pointers directly. I also renamed 'va' and 'vb' into more meaningful arguments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-11module: Fix link failure due to invalid relocation on namespace offsetWill Deacon
Commit 8651ec01daed ("module: add support for symbol namespaces.") broke linking for arm64 defconfig: | lib/crypto/arc4.o: In function `__ksymtab_arc4_setkey': | arc4.c:(___ksymtab+arc4_setkey+0x8): undefined reference to `no symbol' | lib/crypto/arc4.o: In function `__ksymtab_arc4_crypt': | arc4.c:(___ksymtab+arc4_crypt+0x8): undefined reference to `no symbol' This is because the dummy initialisation of the 'namespace_offset' field in 'struct kernel_symbol' when using EXPORT_SYMBOL on architectures with support for PREL32 locations uses an offset from an absolute address (0) in an effort to trick 'offset_to_pointer' into behaving as a NOP, allowing non-namespaced symbols to be treated in the same way as those belonging to a namespace. Unfortunately, place-relative relocations require a symbol reference rather than an absolute value and, although x86 appears to get away with this due to placing the kernel text at the top of the address space, it almost certainly results in a runtime failure if the kernel is relocated dynamically as a result of KASLR. Rework 'namespace_offset' so that a value of 0, which cannot occur for a valid namespaced symbol, indicates that the corresponding symbol does not belong to a namespace. Cc: Matthias Maennich <maennich@google.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 8651ec01daed ("module: add support for symbol namespaces.") Reported-by: kbuild test robot <lkp@intel.com> Tested-by: Matthias Maennich <maennich@google.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matthias Maennich <maennich@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-11Merge branches 'arm/omap', 'arm/exynos', 'arm/smmu', 'arm/mediatek', ↵Joerg Roedel
'arm/qcom', 'arm/renesas', 'x86/amd', 'x86/vt-d' and 'core' into next
2019-09-11swiotlb: Split size parameter to map/unmap APIsLu Baolu
This splits the size parameter to swiotlb_tbl_map_single() and swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size parameter, where the latter one is rounded up to the iommu page size. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-10waitid: Add support for waiting for the current process groupEric W. Biederman
It was recently discovered that the linux version of waitid is not a superset of the other wait functions because it does not include support for waiting for the current process group. This has two downsides: 1. An extra system call is needed to get the current process group. 2. After the current process group is received and before it is passed to waitid a signal could arrive causing the current process group to change. Inherent race-conditions as these make it impossible for userspace to emulate this functionaly and thus violate async-signal safety requirements for waitpid. Arguments can be made for using a different choice of idtype and id for this case but the BSDs already use this P_PGID and 0 to indicate waiting for the current process's process group. So be nice to user space programmers and don't introduce an unnecessary incompatibility. Some people have noted that the posix description is that waitpid will wait for the current process group, and that in the presence of pthreads that process group can change. To get clarity on this issue I looked at XNU, FreeBSD, and Luminos. All of those flavors of unix waited for the current process group at the time of call and as written could not adapt to the process group changing after the call. At one point Linux did adapt to the current process group changing but that stopped in 161550d74c07 ("pid: sys_wait... fixes"). It has been over 11 years since Linux has that behavior, no programs that fail with the change in behavior have been reported, and I could not find any other unix that does this. So I think it is safe to clarify the definition of current process group, to current process group at the time of the wait function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Alistair Francis <alistair23@gmail.com> Cc: Zong Li <zongbox@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: GNU C Library <libc-alpha@sourceware.org> Link: https://lore.kernel.org/r/20190814154400.6371-2-christian.brauner@ubuntu.com
2019-09-10posix-cpu-timers: Fix permission check regressionThomas Gleixner
The recent consolidation of the three permission checks introduced a subtle regression. For timer_create() with a process wide timer it returns the current task if the lookup through the PID which is encoded into the clockid results in returning current. That's broken because it does not validate whether the current task is the group leader. That was caused by the two different variants of permission checks: - posix_cpu_timer_get() allowed access to the process wide clock when the looked up task is current. That's not an issue because the process wide clock is in the shared sighand. - posix_cpu_timer_create() made sure that the looked up task is the group leader. Restore the previous state. Note, that these permission checks are more than questionable, but that's subject to follow up changes. Fixes: 6ae40e3fdcd3 ("posix-cpu-timers: Provide task validation functions") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1909052314110.1902@nanos.tec.linutronix.de
2019-09-10module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTSMatthias Maennich
If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the requirement for modules to import all namespaces that are used by the module is relaxed. Enabling this option effectively allows (invalid) modules to be loaded while only a warning is emitted. Disabling this option keeps the enforcement at module loading time and loading is denied if the module's imports are not satisfactory. Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-10module: add support for symbol namespaces.Matthias Maennich
The EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL() macros can be used to export a symbol to a specific namespace. There are no _GPL_FUTURE and _UNUSED variants because these are currently unused, and I'm not sure they are necessary. I didn't add EXPORT_SYMBOL_NS() for ASM exports; this patch sets the namespace of ASM exports to NULL by default. In case of relative references, it will be relocatable to NULL. If there's a need, this should be pretty easy to add. A module that wants to use a symbol exported to a namespace must add a MODULE_IMPORT_NS() statement to their module code; otherwise, modpost will complain when building the module, and the kernel module loader will emit an error and fail when loading the module. MODULE_IMPORT_NS() adds a modinfo tag 'import_ns' to the module. That tag can be observed by the modinfo command, modpost and kernel/module.c at the time of loading the module. The ELF symbols are renamed to include the namespace with an asm label; for example, symbol 'usb_stor_suspend' in namespace USB_STORAGE becomes 'usb_stor_suspend.USB_STORAGE'. This allows modpost to do namespace checking, without having to go through all the effort of parsing ELF and relocation records just to get to the struct kernel_symbols. On x86_64 I saw no difference in binary size (compression), but at runtime this will require a word of memory per export to hold the namespace. An alternative could be to store namespaced symbols in their own section and use a separate 'struct namespaced_kernel_symbol' for that section, at the cost of making the module loader more complex. Co-developed-by: Martijn Coenen <maco@android.com> Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-10module: support reading multiple values per modinfo tagMatthias Maennich
Similar to modpost's get_next_modinfo(), introduce get_next_modinfo() in kernel/module.c to acquire any further values associated with the same modinfo tag name. That is useful for any tags that have multiple occurrences (such as 'alias'), but is in particular introduced here as part of the symbol namespaces patch series to read the (potentially) multiple namespaces a module is importing. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-07kernel.h: Add non_block_start/end()Daniel Vetter
In some special cases we must not block, but there's not a spinlock, preempt-off, irqs-off or similar critical section already that arms the might_sleep() debug checks. Add a non_block_start/end() pair to annotate these. This will be used in the oom paths of mmu-notifiers, where blocking is not allowed to make sure there's forward progress. Quoting Michal: "The notifier is called from quite a restricted context - oom_reaper - which shouldn't depend on any locks or sleepable conditionals. The code should be swift as well but we mostly do care about it to make a forward progress. Checking for sleepable context is the best thing we could come up with that would describe these demands at least partially." Peter also asked whether we want to catch spinlocks on top, but Michal said those are less of a problem because spinlocks can't have an indirect dependency upon the page allocator and hence close the loop with the oom reaper. Suggested by Michal Hocko. Link: https://lore.kernel.org/r/20190826201425.17547-4-daniel.vetter@ffwll.ch Acked-by: Christian König <christian.koenig@amd.com> (v1) Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-06kexec_elf: support 32 bit ELF filesSven Schnelle
The powerpc version only supported 64 bit. Add some code to switch decoding of fields during runtime so we can kexec a 32 bit kernel from a 64 bit kernel and vice versa. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec_elf: remove unused variable in kexec_elf_load()Sven Schnelle
base was never assigned, so we can remove it. Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec_elf: remove Elf_Rel macroSven Schnelle
It wasn't used anywhere, so lets drop it. Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec_elf: remove PURGATORY_STACK_SIZESven Schnelle
It's not used anywhere so just drop it. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec_elf: remove parsing of section headersSven Schnelle
We're not using them, so we can drop the parsing. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec_elf: change order of elf_*_to_cpu() functionsSven Schnelle
Change the order to have a 64/32/16 order, no functional change. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06kexec: add KEXEC_ELFSven Schnelle
Right now powerpc provides an implementation to read elf files with the kexec_file_load() syscall. Make that available as a public kexec interface so it can be re-used on other architectures. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Merge tag 'irqchip-5.4' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for Linux 5.4 from Marc Zyngier: - Large GICv3 updates to support new PPI and SPI ranges - Conver all alloc_fwnode() users to use PAs instead of VAs - Add support for Marvell's MMP3 irqchip - Add support for Amlogic Meson SM1 - Various cleanups and fixes
2019-09-06perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initializationMark-PK Tsai
If we disable the compiler's auto-initialization feature, if -fplugin-arg-structleak_plugin-byref or -ftrivial-auto-var-init=pattern are disabled, arch_hw_breakpoint may be used before initialization after: 9a4903dde2c86 ("perf/hw_breakpoint: Split attribute parse and commit") On our ARM platform, the struct step_ctrl in arch_hw_breakpoint, which used to be zero-initialized by kzalloc(), may be used in arch_install_hw_breakpoint() without initialization. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alix Wu <alix.wu@mediatek.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: YJ Chiang <yj.chiang@mediatek.com> Link: https://lkml.kernel.org/r/20190906060115.9460-1-mark-pk.tsai@mediatek.com [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>