summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-03-05Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The interrupt departement delivers this time: - New infrastructure to manage NMIs on platforms which have a sane NMI delivery, i.e. identifiable NMI vectors instead of a single lump. - Simplification of the interrupt affinity management so drivers don't have to implement ugly loops around the PCI/MSI enablement. - Speedup for interrupt statistics in /proc/stat - Provide a function to retrieve the default irq domain - A new interrupt controller for the Loongson LS1X platform - Affinity support for the SiFive PLIC - Better support for the iMX irqsteer driver - NUMA aware memory allocations for GICv3 - The usual small fixes, improvements and cleanups all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) irqchip/imx-irqsteer: Add multi output interrupts support irqchip/imx-irqsteer: Change to use reg_num instead of irq_group dt-bindings: irq: imx-irqsteer: Add multi output interrupts support dt-binding: irq: imx-irqsteer: Use irq number instead of group number irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables irqdomain: Allow the default irq domain to be retrieved irqchip/sifive-plic: Implement irq_set_affinity() for SMP host irqchip/sifive-plic: Differentiate between PLIC handler and context irqchip/sifive-plic: Add warning in plic_init() if handler already present irqchip/sifive-plic: Pre-compute context hart base and enable base PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets genirq/affinity: Remove the leftovers of the original set support nvme-pci: Simplify interrupt allocation genirq/affinity: Add new callback for (re)calculating interrupt sets genirq/affinity: Store interrupt sets size in struct irq_affinity genirq/affinity: Code consolidation irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid. irqchip/i8259: Fix shutdown order by moving syscore_ops registration dt-bindings: interrupt-controller: loongson ls1x intc ...
2019-03-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and clockevent updates from Thomas Gleixner: "The time(r) core and clockevent updates are mostly boring this time: - A new driver for the Tegra210 timer - Small fixes and improvements alll over the place - Documentation updates and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) soc/tegra: default select TEGRA_TIMER for Tegra210 clocksource/drivers/tegra: Add Tegra210 timer support dt-bindings: timer: add Tegra210 timer clocksource/drivers/timer-cs5535: Rename the file for consistency clocksource/drivers/timer-pxa: Rename the file for consistency clocksource/drivers/tango-xtal: Rename the file for consistency dt-bindings: timer: gpt: update binding doc clocksource/drivers/exynos_mct: Remove unused header includes dt-bindings: timer: mediatek: update bindings for MT7629 SoC clocksource/drivers/exynos_mct: Fix error path in timer resources initialization clocksource/drivers/exynos_mct: Remove dead code clocksource/drivers/riscv: Add required checks during clock source init dt-bindings: timer: renesas: tmu: Document r8a774c0 bindings dt-bindings: timer: renesas, cmt: Document r8a774c0 CMT support clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable timers: Mark expected switch fall-throughs timekeeping/debug: No need to check return value of debugfs_create functions ...
2019-03-05dm: add support to directly boot to a mapped deviceHelen Koike
Add a "create" module parameter, which allows device-mapper targets to be configured at boot time. This enables early use of DM targets in the boot process (as the root device or otherwise) without the need of an initramfs. The syntax used in the boot param is based on the concise format from the dmsetup tool to follow the rule of least surprise: dmsetup table --concise /dev/mapper/lroot Which is: dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+] Where, <name> ::= The device name. <uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "" <minor> ::= The device minor number | "" <flags> ::= "ro" | "rw" <table> ::= <start_sector> <num_sectors> <target_type> <target_args> <target_type> ::= "verity" | "linear" | ... For example, the following could be added in the boot parameters: dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0 Only the targets that were tested are allowed and the ones that don't change any block device when the device is create as read-only. For example, mirror and cache targets are not allowed. The rationale behind this is that if the user makes a mistake, choosing the wrong device to be the mirror or the cache can corrupt data. The only targets initially allowed are: * crypt * delay * linear * snapshot-origin * striped * verity Co-developed-by: Will Drewry <wad@chromium.org> Co-developed-by: Kees Cook <keescook@chromium.org> Co-developed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Helen Koike <helen.koike@collabora.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-03-05dm: fix to_sector() for 32bitNeilBrown
A dm-raid array with devices larger than 4GB won't assemble on a 32 bit host since _check_data_dev_sectors() was added in 4.16. This is because to_sector() treats its argument as an "unsigned long" which is 32bits (4GB) on a 32bit host. Using "unsigned long long" is more correct. Kernels as early as 4.2 can have other problems due to to_sector() being used on the size of a device. Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality") cc: stable@vger.kernel.org (v4.2+) Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-03-05Merge tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds
Pull MIPS updates from Paul Burton: - Support for the MIPSr6 MemoryMapID register & Global INValidate TLB (GINVT) instructions, allowing for more efficient TLB maintenance when running on a CPU such as the I6500 that supports these. - Enable huge page support for MIPS64r6. - Optimize post-DMA cache sync by removing that code entirely for kernel configurations in which we know it won't be needed. - The number of pages allocated for interrupt stacks is now calculated correctly, where before we would wastefully allocate too much memory in some configurations. - The ath79 platform migrates to devicetree. - The bcm47xx platform sees fixes for the Buffalo WHR-G54S board. - The ingenic/jz4740 platform gains support for appended devicetrees. - The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see cleanups as do various pieces of core architecture code. * tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (66 commits) MIPS: lantiq: Remove separate GPHY Firmware loader MIPS: ingenic: Add support for appended devicetree MIPS: SGI-IP27: rework HUB interrupts MIPS: SGI-IP27: do boot CPU init later MIPS: SGI-IP27: do xtalk scanning later MIPS: SGI-IP27: use pr_info/pr_emerg and pr_cont to fix output MIPS: SGI-IP27: clean up bridge access and header files MIPS: SGI-IP27: get rid of volatile and hubreg_t MIPS: irq: Allocate accurate order pages for irq stack MIPS: dma-noncoherent: Remove bogus condition in dma_sync_phys() MIPS: eBPF: Remove REG_32BIT_ZERO_EX MIPS: eBPF: Always return sign extended 32b values MIPS: CM: Fix indentation MIPS: BCM47XX: Fix/improve Buffalo WHR-G54S support MIPS: OCTEON: program rx/tx-delay always from DT MIPS: OCTEON: delete board-specific link status MIPS: OCTEON: don't lie about interface type of CN3005 board MIPS: OCTEON: warn if deprecated link status is being used MIPS: OCTEON: add fixed-link nodes to in-kernel device tree MIPS: Delete unused flush_cache_sigtramp() ...
2019-03-05Merge tag 's390-5.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - A copy of Arnds compat wrapper generation series - Pass information about the KVM guest to the host in form the control program code and the control program version code - Map IOV resources to support PCI physical functions on s390 - Add vector load and store alignment hints to improve performance - Use the "jdd" constraint with gcc 9 to make jump labels working again - Remove amode workaround for old z/VM releases from the DCSS code - Add support for in-kernel performance measurements using the CPU measurement counter facility - Introduce a new PMU device cpum_cf_diag to capture counters and store thenn as event raw data. - Bug fixes and cleanups * tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits) Revert "s390/cpum_cf: Add kernel message exaplanations" s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y s390/suspend: fix prefix register reset in swsusp_arch_resume s390: warn about clearing als implied facilities s390: allow overriding facilities via command line s390: clean up redundant facilities list setup s390/als: remove duplicated in-place implementation of stfle s390/cio: Use cpa range elsewhere within vfio-ccw s390/cio: Fix vfio-ccw handling of recursive TICs s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation s390/cpum_cf: Add kernel message exaplanations s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace s390/cpum_cf: add ctr_stcctm() function s390/cpum_cf: move common functions into a separate file s390/cpum_cf: introduce kernel_cpumcf_avail() function s390/cpu_mf: replace stcctm5() with the stcctm() function s390/cpu_mf: add store cpu counter multiple instruction support s390/cpum_cf: Add minimal in-kernel interface for counter measurements s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts ...
2019-03-05ceph: add mount option to limit caps countYan, Zheng
If number of caps exceed the limit, ceph_trim_dentires() also trim dentries with valid leases. Trimming dentry releases references to associated inode, which may evict inode and release caps. By default, there is no limit for caps count. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add helper for simple skcipher modes. - Add helper to register multiple templates. - Set CRYPTO_TFM_NEED_KEY when setkey fails. - Require neither or both of export/import in shash. - AEAD decryption test vectors are now generated from encryption ones. - New option CONFIG_CRYPTO_MANAGER_EXTRA_TESTS that includes random fuzzing. Algorithms: - Conversions to skcipher and helper for many templates. - Add more test vectors for nhpoly1305 and adiantum. Drivers: - Add crypto4xx prng support. - Add xcbc/cmac/ecb support in caam. - Add AES support for Exynos5433 in s5p. - Remove sha384/sha512 from artpec7 as hardware cannot do partial hash" [ There is a merge of the Freescale SoC tree in order to pull in changes required by patches to the caam/qi2 driver. ] * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (174 commits) crypto: s5p - add AES support for Exynos5433 dt-bindings: crypto: document Exynos5433 SlimSSS crypto: crypto4xx - add missing of_node_put after of_device_is_available crypto: cavium/zip - fix collision with generic cra_driver_name crypto: af_alg - use struct_size() in sock_kfree_s() crypto: caam - remove redundant likely/unlikely annotation crypto: s5p - update iv after AES-CBC op end crypto: x86/poly1305 - Clear key material from stack in SSE2 variant crypto: caam - generate hash keys in-place crypto: caam - fix DMA mapping xcbc key twice crypto: caam - fix hash context DMA unmap size hwrng: bcm2835 - fix probe as platform device crypto: s5p-sss - Use AES_BLOCK_SIZE define instead of number crypto: stm32 - drop pointless static qualifier in stm32_hash_remove() crypto: chelsio - Fixed Traffic Stall crypto: marvell - Remove set but not used variable 'ivsize' crypto: ccp - Update driver messages to remove some confusion crypto: adiantum - add 1536 and 4096-byte test vectors crypto: nhpoly1305 - add a test vector with len % 16 != 0 crypto: arm/aes-ce - update IV after partial final CTR block ...
2019-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Here we go, another merge window full of networking and #ebpf changes: 1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP range without dealing with floods of ARP traffic, from Linus Lüssing. 2) Throttle buffered multicast packet transmission in mt76, from Felix Fietkau. 3) Support adaptive interrupt moderation in ice, from Brett Creeley. 4) A lot of struct_size conversions, from Gustavo A. R. Silva. 5) Add peek/push/pop commands to bpftool, as well as bash completion, from Stanislav Fomichev. 6) Optimize sk_msg_clone(), from Vakul Garg. 7) Add SO_BINDTOIFINDEX, from David Herrmann. 8) Be more conservative with local resends due to local congestion, from Yuchung Cheng. 9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata. 10) Add health buffer support to devlink, from Eran Ben Elisha. 11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen. 12) Add statistics to basic packet scheduler filter, from Cong Wang. 13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan. 14) Lots of new IP tunneling forwarding tests, also from Nir Dotan. 15) Add 3ad stats to bonding, from Nikolay Aleksandrov. 16) Lots of probing improvements for bpftool, from Quentin Monnet. 17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski. 18) Allow #ebpf programs to access gso_segs from skb shared info, from Eric Dumazet. 19) Add sock_diag support for AF_XDP sockets, from Björn Töpel. 20) Support 22260 iwlwifi devices, from Luca Coelho. 21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov. 22) Add JMP32 instruction class support to #ebpf, from Jiong Wang. 23) Add spinlock support to #ebpf, from Alexei Starovoitov. 24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson. 25) Add device infomation API to devlink, from Jakub Kicinski. 26) Add new timestamping socket options which are y2038 safe, from Deepa Dinamani. 27) Add RX checksum offloading for various sh_eth chips, from Sergei Shtylyov. 28) Flow offload infrastructure, from Pablo Neira Ayuso. 29) Numerous cleanups, improvements, and bug fixes to the PHY layer and many drivers from Heiner Kallweit. 30) Lots of changes to try and make packet scheduler classifiers run lockless as much as possible, from Vlad Buslov. 31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows. 32) Add concurrency tests to tc-tests infrastructure, from Vlad Buslov. 33) Add hwmon support to aquantia, from Heiner Kallweit. 34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet. And I would be remiss if I didn't thank the various major networking subsystem maintainers for integrating much of this work before I even saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso, Johannes Berg, Kalle Valo, and many others. Thank you!" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits) net/sched: avoid unused-label warning net: ignore sysctl_devconf_inherit_init_net without SYSCTL phy: mdio-mux: fix Kconfig dependencies net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework selftest/net: Remove duplicate header sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79 net/mlx5e: Update tx reporter status in case channels were successfully opened devlink: Add support for direct reporter health state update devlink: Update reporter state to error even if recover aborted sctp: call iov_iter_revert() after sending ABORT team: Free BPF filter when unregistering netdev ip6mr: Do not call __IP6_INC_STATS() from preemptible context isdn: mISDN: Fix potential NULL pointer dereference of kzalloc net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs cxgb4/chtls: Prefix adapter flags with CXGB4 net-sysfs: Switch to bitmap_zalloc() mellanox: Switch to bitmap_zalloc() bpf: add test cases for non-pointer sanitiation logic mlxsw: i2c: Extend initialization by querying resources data ...
2019-03-05signal: add pidfd_send_signal() syscallChristian Brauner
The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/<pid> as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); /* syscall number 424 */ The syscall number was chosen to be 424 to align with Arnd's rework in his y2038 to minimize merge conflicts (cf. [25]). In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). Clarification for this has been requested by Eric (cf. [19]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13]) and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat syscalls */ It became clear that we would like to avoid adding compat syscalls (cf. [7]). The compat syscall handling is now done in kernel/signal.c itself by adding __copy_siginfo_from_user_generic() which lets us avoid compat syscalls (cf. [8]). It should be noted that the addition of __copy_siginfo_from_user_any() is caused by a bug in the original implementation of rt_sigqueueinfo(2) (cf. 12). With upcoming rework for syscall handling things might improve significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain any additional callers. /* testing */ This patch was tested on x64 and x86. /* userspace usage */ An asciinema recording for the basic functionality can be found under [9]. With this patch a process can be killed via: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags) { #ifdef __NR_pidfd_send_signal return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int fd, ret, saved_errno, sig; if (argc < 3) exit(EXIT_FAILURE); fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]); exit(EXIT_FAILURE); } sig = atoi(argv[2]); printf("Sending signal %d to process %s\n", sig, argv[1]); ret = do_pidfd_send_signal(fd, sig, NULL, 0); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to send signal %d to process %s\n", strerror(errno), sig, argv[1]); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); } /* Q&A * Given that it seems the same questions get asked again by people who are * late to the party it makes sense to add a Q&A section to the commit * message so it's hopefully easier to avoid duplicate threads. * * For the sake of progress please consider these arguments settled unless * there is a new point that desperately needs to be addressed. Please make * sure to check the links to the threads in this commit message whether * this has not already been covered. */ Q-01: (Florian Weimer [20], Andrew Morton [21]) What happens when the target process has exited? A-01: Sending the signal will fail with ESRCH (cf. [22]). Q-02: (Andrew Morton [21]) Is the task_struct pinned by the fd? A-02: No. A reference to struct pid is kept. struct pid - as far as I understand - was created exactly for the reason to not require to pin struct task_struct (cf. [22]). Q-03: (Andrew Morton [21]) Does the entire procfs directory remain visible? Just one entry within it? A-03: The same thing that happens right now when you hold a file descriptor to /proc/<pid> open (cf. [22]). Q-04: (Andrew Morton [21]) Does the pid remain reserved? A-04: No. This patchset guarantees a stable handle not that pids are not recycled (cf. [22]). Q-05: (Andrew Morton [21]) Do attempts to signal that fd return errors? A-05: See {Q,A}-01. Q-06: (Andrew Morton [22]) Is there a cleaner way of obtaining the fd? Another syscall perhaps. A-06: Userspace can already trivially retrieve file descriptors from procfs so this is something that we will need to support anyway. Hence, there's no immediate need to add another syscalls just to make pidfd_send_signal() not dependent on the presence of procfs. However, adding a syscalls to get such file descriptors is planned for a future patchset (cf. [22]). Q-07: (Andrew Morton [21] and others) This fd-for-a-process sounds like a handy thing and people may well think up other uses for it in the future, probably unrelated to signals. Are the code and the interface designed to permit such future applications? A-07: Yes (cf. [22]). Q-08: (Andrew Morton [21] and others) Now I think about it, why a new syscall? This thing is looking rather like an ioctl? A-08: This has been extensively discussed. It was agreed that a syscall is preferred for a variety or reasons. Here are just a few taken from prior threads. Syscalls are safer than ioctl()s especially when signaling to fds. Processes are a core kernel concept so a syscall seems more appropriate. The layout of the syscall with its four arguments would require the addition of a custom struct for the ioctl() thereby causing at least the same amount or even more complexity for userspace than a simple syscall. The new syscall will replace multiple other pid-based syscalls (see description above). The file-descriptors-for-processes concept introduced with this syscall will be extended with other syscalls in the future. See also [22], [23] and various other threads already linked in here. Q-09: (Florian Weimer [24]) What happens if you use the new interface with an O_PATH descriptor? A-09: pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. See also [4]. /* References */ [1]: https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/ [2]: https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/ [3]: https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/ [4]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/ [5]: https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/ [6]: https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/ [7]: https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/ [8]: https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/ [9]: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy [11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/ [12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/ [13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/ [14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/ [15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/ [16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/ [17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/ [18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/ [19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/ [20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/ [22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/ [23]: https://lwn.net/Articles/773459/ [24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/ Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Aleksa Sarai <cyphar@cyphar.com>
2019-03-04Merge tag 'leds-for-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: - finalize previously announced support for initialization of pattern triggers from Device Tree - fix for null deref on firmware load failure in leds-lp55xx-common.c * tag 'leds-for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: lp55xx: fix null deref on firmware load failure leds: trigger: timer: Add initialization from Device Tree leds: trigger: oneshot: Add initialization from Device Tree leds: trigger: pattern: Add pattern initialization from Device Tree leds: Add helper for getting default pattern from Device Tree dt-bindings: leds: Add pattern initialization from Device Tree
2019-03-04Merge tag 'spi-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "A fairly quiet release for SPI, the biggest thing is the conversion to use GPIO descriptors which is now 90% done but still needs some stragglers converting. Summary: - Support for inter-word delays - Conversion of the core and most drivers to use GPIO descriptors for GPIO controlled chip selects - New drivers for NXP FlexSPI and QuadSPI, SiFive and Spreadtrum" * tag 'spi-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (104 commits) spi: sh-msiof: Restrict bits per word to 8/16/24/32 on R-Car Gen2/3 spi: sifive: Remove redundant dev_err call in sifive_spi_probe() spi: sifive: Remove spi_master_put in sifive_spi_remove() spi: spi-gpio: fix SPI_CS_HIGH capability spi: pxa2xx: Setup maximum supported DMA transfer length spi: sifive: Add driver for the SiFive SPI controller spi: sifive: Add DT documentation for SiFive SPI controller spi: sprd: Add a prefix for SPI DMA channel macros spi: sprd: spi: sprd: Add DMA mode support dt-bindings: spi: Add the DMA properties for the SPI dma mode spi: sprd: Add the SPI irq function for the SPI DMA mode dt-bindings: spi: imx: Add an entry for the i.MX8QM compatible spi: use gpio[d]_set_value_cansleep for setting chipselect GPIO spi: gpio: Advertise support for SPI_CS_HIGH spi: sh-msiof: Replace spi_master by spi_controller spi: sh-hspi: Replace spi_master by spi_controller spi: rspi: Replace spi_master by spi_controller spi: atmel-quadspi: add support for sam9x60 qspi controller dt-bindings: spi: atmel-quadspi: QuadSPI driver for Microchip SAM9X60 spi: atmel-quadspi: add support for named peripheral clock ...
2019-03-04Merge tag 'regulator-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The bulk of the standout changes in this release are cleanups, with the core work being a combination of factoring out common code into helpers and the completion of the conversion of the core to use GPIO descriptors. Summary: - Addition of helper functions for current limits and conversion of drivers to use them by Axel Lin. - Lots and lots of cleanups from Axel Lin. - Conversion of the core to use GPIO descriptors rather than numbers by Linus Walleij. - New drivers for Maxim MAX77650 and ROHM BD70528" * tag 'regulator-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (131 commits) regulator: mc13xxx: Constify regulator_ops variables regulator: palmas: Constify palmas_smps_ramp_delay array regulator: wm831x-dcdc: Convert to use regulator_set/get_current_limit_regmap regulator: pv88090: Convert to use regulator_set/get_current_limit_regmap regulator: pv88080: Convert to use regulator_set/get_current_limit_regmap regulator: pv88060: Convert to use regulator_set/get_current_limit_regmap regulator: max77650: Convert to use regulator_set/get_current_limit_regmap regulator: lp873x: Convert to use regulator_set/get_current_limit_regmap regulator: lp872x: Convert to use regulator_set/get_current_limit_regmap regulator: da9210: Convert to use regulator_set/get_current_limit_regmap regulator: da9055: Convert to use regulator_set/get_current_limit_regmap regulator: core: Add set/get_current_limit helpers for regmap users regulator: Fix comment for csel_reg and csel_mask regulator: stm32-vrefbuf: add power management support regulator: 88pm8607: Remove unused fields from struct pm8607_regulator_info regulator: 88pm8607: Simplify pm8607_list_voltage implementation regulator: cpcap: Constify omap4_regulators and xoom_regulators regulator: cpcap: Remove unused vsel_shift from struct cpcap_regulator dt-bindings: regulator: tps65218: rectify units of LS3 dt-bindings: regulator: add LS2 load switch documentation ...
2019-03-04Merge tag 'regmap-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "There are only two changes here: - fix for conflicting attributes on the rbtree node structure - implementation of main status register support in the interrupt code which supports chips that have a register to cut down on the number of per-interrupt status registers that need to be checked when handling interrupts" * tag 'regmap-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: Remove attribute packed from struct 'regcache_rbtree_node' regmap: regmap-irq: Add main status register support
2019-03-04Merge tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Fixup max_discard/trim calculations - Announce SD specs greater than 4.0 - Add discard support for SD cards - Don't do retries for CMD6 (SWITCH command) - Various cleanups and re-structuring MMC host: - cqhci: * Add maintainers for eMMC CQHCI driver - sdhci: * Consolidate WP GPIO code * Add ADMA3 DMA support for V4 enabled host * Fixup card detect support in pci-o2micro driver * Add support for CMDQ and SDMMC pads auto-calibration in tegra driver * Add DCMD support and CMDQ support, support for i.MX6ULL variant, fixup HS400 timing issue and add HS400_ES support for i.MX8QXP to esdhc-imx driver * Avoid CRC errors by adjusting settings to speed mode and fixup card initialization for high speed mode in renesas_sdhi * Fixup timeout settings for omap * Enable 8 bits bus-width support in atmel-mci * Convert some legacy code in jz4740 driver to use modern APIs * Send a CMD12 to clear DPSM at errors for STM32 sdmmc mmci driver" * tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (69 commits) mmc:fix a bug when max_discard is 0 mmc: core: Add a debug print when the card may have been replaced mmc: core: Add sd discard timeout mmc: core: Add discard support to sd mmc: sdhci-esdhc-imx: clear the HALT bit when enable CQE mmc: core: do not retry CMD6 in __mmc_switch() mmc: core: Convert mmc_align_data_size() into an SDIO specific function mmc: core: Move mmc_of_parse_voltage() to host.c mmc: core: Convert mmc_regulator_get_ocrmask() to static mmc: core: Move regulator helpers to separate file mmc: of_mmc_spi: Convert to mmc_of_parse_voltage() mmc: core: Drop retries as in-parameter to mmc_wait_for_app_cmd() mmc: core: Convert mmc_wait_for_app_cmd() to static mmc: renesas_sdhi: Change HW adjustment register according to speed mode mmc: mmci: Send a CMD12 to clear the DPSM at errors mmc: sdhci-xenon: Fixup already marked switch fall-through mmc: sdhci-tegra: drop ->get_ro() implementation mmc: sdhci-omap: drop ->get_ro() implementation mmc: sdhci: use WP GPIO in sdhci_check_ro() mmc: wmt-sdmmc: Drop unused include ...
2019-03-04Merge tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Boris Brezillon: "Core MTD changes: - Use struct_size() where appropriate - mtd_{read,write}() as wrappers around mtd_{read,write}_oob() - Fix misuse of PTR_ERR() in docg3 - Coding style improvements in mtdcore.c SPI NOR changes: Core changes: - Add support of octal mode I/O transfer - Add a bunch of SPI NOR entries to the flash_info table SPI NOR controller driver changes: - cadence-quadspi: * Add support for Octal SPI controller * write upto 8-bytes data in STIG mode - mtk-quadspi: * rename config to a common one * add SNOR_HWCAPS_READ to spi_nor_hwcaps mask - Add Tudor as SPI-NOR co-maintainer NAND changes: NAND core changes: - Fourth batch of fixes/cleanup to the raw NAND core impacting various controller drivers (Sunxi, Marvell, MTK, TMIO, OMAP2). - Check the return code of nand_reset() and nand_readid_op(). - Remove ->legacy.erase and single_erase(). - Simplify the locking. - Several implicit fall through annotations. Raw NAND controllers drivers changes: - Fix various possible object reference leaks (MTK, JZ4780, Atmel) - ST: * Add support for STM32 FMC2 NAND flash controller - Meson: * Add support for Amlogic NAND flash controller - Denali: * Several cleanup patches - Sunxi: * Several cleanup patches - FSMC: * Disable NAND on remove() * Reset NAND timings on resume() SPI-NAND drivers changes: - Toshiba: * Add support for all Toshiba products. - Macronix: * Fix ECC status read. - Gigadevice: * Add support for GD5F1GQ4UExxG" * tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtd: (64 commits) mtd: spi-nor: Fix wrong abbreviation HWCPAS mtd: spi-nor: cadence-quadspi: fix spelling mistake: "Couldnt't" -> "Couldn't" mtd: spi-nor: Add support for en25qh64 mtd: spi-nor: Add support for MX25V8035F mtd: spi-nor: Add support for EN25Q80A mtd: spi-nor: cadence-quadspi: Add support for Octal SPI controller dt-bindings: cadence-quadspi: Add new compatible for AM654 SoC mtd: spi-nor: split s25fl128s into s25fl128s0 and s25fl128s1 mtd: spi-nor: cadence-quadspi: write upto 8-bytes data in STIG mode mtd: spi-nor: Add support for mx25u3235f mtd: rawnand: denali_dt: remove single anonymous clock support mtd: rawnand: mtk: fix possible object reference leak mtd: rawnand: jz4780: fix possible object reference leak mtd: rawnand: atmel: fix possible object reference leak mtd: rawnand: fsmc: Disable NAND on remove() mtd: rawnand: fsmc: Reset NAND timings on resume() mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG mtd: rawnand: denali: remove unused dma_addr field from denali_nand_info mtd: rawnand: denali: remove unused function argument 'raw' mtd: rawnand: denali: remove unneeded denali_reset_irq() call ...
2019-03-04Merge tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Switch mdev to generic UUID API (Andy Shevchenko) - Fixup platform reset include paths (Masahiro Yamada) - Fix usage of MINORMASK (Chengguang Xu) - Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy) - Restore device state after PM reset (Alex Williamson) - Ensure memory translation enabled for PCI ROM access (Eric Auger) * tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio: vfio_pci: Enable memory accesses before calling pci_map_rom vfio/pci: Restore device state on PM transition vfio/spapr_tce: Skip unsetting already unset table samples/vfio-mdev/mtty: expand minor range when registering chrdev region samples/vfio-mdev/mdpy: expand minor range when registering chrdev region samples/vfio-mdev/mbochs: expand minor range when registering chrdev region vfio: expand minor range when registering chrdev region vfio: platform: reset: fix up include directives to remove ccflags-y vfio-mdev: Switch to use new generic UUID API
2019-03-05Merge tag 'drm-misc-fixes-2019-02-22' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-fixes for v5.0: - Block fb changes for async atomic updates to prevent a use after free. - Fix ID mismatch error on load in bochs. - Fix memory leak when drm_setup fails. - Fixes around handling of DRM_AUTH. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/42113611-e2cd-6bdd-7de5-4f8ab5a0cbe6@linux.intel.com
2019-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-03-04devlink: Add support for direct reporter health state updateEran Ben Elisha
It is possible that a reporter state will be updated due to a recover flow which is not triggered by a devlink health related operation, but as a side effect of some other operation in the system. Expose devlink health API for a direct update of a reporter status. Move devlink_health_reporter_state enum definition to devlink.h so it could be used from drivers as a parameter of devlink_health_reporter_state_update. In addition, add trace_devlink_health_reporter_state_update to provide user notification for reporter state change. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04get rid of legacy 'get_ds()' functionLinus Torvalds
Every in-kernel use of this function defined it to KERNEL_DS (either as an actual define, or as an inline function). It's an entirely historical artifact, and long long long ago used to actually read the segment selector valueof '%ds' on x86. Which in the kernel is always KERNEL_DS. Inspired by a patch from Jann Horn that just did this for a very small subset of users (the ones in fs/), along with Al who suggested a script. I then just took it to the logical extreme and removed all the remaining gunk. Roughly scripted with git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/' git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d' plus manual fixups to remove a few unusual usage patterns, the couple of inline function cases and to fix up a comment that had become stale. The 'get_ds()' function remains in an x86 kvm selftest, since in user space it actually does something relevant. Inspired-by: Jann Horn <jannh@google.com> Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04aio: simplify - and fix - fget/fput for io_submit()Linus Torvalds
Al Viro root-caused a race where the IOCB_CMD_POLL handling of fget/fput() could cause us to access the file pointer after it had already been freed: "In more details - normally IOCB_CMD_POLL handling looks so: 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() 2) aio_poll() resolves the descriptor to struct file by req->file = fget(iocb->aio_fildes) 3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that aio_kiocb to 2 (bumps by 1, that is). 4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait() had been called and only once") it locks the queue. That's what the extra reference to iocb had been for - we know we can safely access it. 5) With queue locked, we check if ->woken has already been set to true (by aio_poll_wake()) and, if it had been, we unlock the queue, drop a reference to aio_kiocb and bugger off - at that point it's a responsibility to aio_poll_wake() and the stuff called/scheduled by it. That code will drop the reference to file in req->file, along with the other reference to our aio_kiocb. 6) otherwise, we see whether we need to wait. If we do, we unlock the queue, drop one reference to aio_kiocb and go away - eventual wakeup (or cancel) will deal with the reference to file and with the other reference to aio_kiocb 7) otherwise we remove ourselves from waitqueue (still under the queue lock), so that wakeup won't get us. No async activity will be happening, so we can safely drop req->file and iocb ourselves. If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb won't get freed under us, so we can do all the checks and locking safely. And we don't touch ->file if we detect that case. However, vfs_poll() most certainly *does* touch the file it had been given. So wakeup coming while we are still in ->poll() might end up doing fput() on that file. That case is not too rare, and usually we are saved by the still present reference from descriptor table - that fput() is not the final one. But if another thread closes that descriptor right after our fget() and wakeup does happen before ->poll() returns, we are in trouble - final fput() done while we are in the middle of a method: Al also wrote a patch to take an extra reference to the file descriptor to fix this, but I instead suggested we just streamline the whole file pointer handling by submit_io() so that the generic aio submission code simply keeps the file pointer around until the aio has completed. Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-03-04 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add AF_XDP support to libbpf. Rationale is to facilitate writing AF_XDP applications by offering higher-level APIs that hide many of the details of the AF_XDP uapi. Sample programs are converted over to this new interface as well, from Magnus. 2) Introduce a new cant_sleep() macro for annotation of functions that cannot sleep and use it in BPF_PROG_RUN() to assert that BPF programs run under preemption disabled context, from Peter. 3) Introduce per BPF prog stats in order to monitor the usage of BPF; this is controlled by kernel.bpf_stats_enabled sysctl knob where monitoring tools can make use of this to efficiently determine the average cost of programs, from Alexei. 4) Split up BPF selftest's test_progs similarly as we already did with test_verifier. This allows to further reduce merge conflicts in future and to get more structure into our quickly growing BPF selftest suite, from Stanislav. 5) Fix a bug in BTF's dedup algorithm which can cause an infinite loop in some circumstances; also various BPF doc fixes and improvements, from Andrii. 6) Various BPF sample cleanups and migration to libbpf in order to further isolate the old sample loader code (so we can get rid of it at some point), from Jakub. 7) Add a new BPF helper for BPF cgroup skb progs that allows to set ECN CE code point and a Host Bandwidth Manager (HBM) sample program for limiting the bandwidth used by v2 cgroups, from Lawrence. 8) Enable write access to skb->queue_mapping from tc BPF egress programs in order to let BPF pick TX queue, from Jesper. 9) Fix a bug in BPF spinlock handling for map-in-map which did not propagate spin_lock_off to the meta map, from Yonghong. 10) Fix a bug in the new per-CPU BPF prog counters to properly initialize stats for each CPU, from Eric. 11) Add various BPF helper prototypes to selftest's bpf_helpers.h, from Willem. 12) Fix various BPF samples bugs in XDP and tracing progs, from Toke, Daniel and Yonghong. 13) Silence preemption splat in test_bpf after BPF_PROG_RUN() enforces it now everywhere, from Anders. 14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to get error handling working, from Dan. 15) Fix bpftool documentation and auto-completion with regards to stream_{verdict,parser} attach types, from Alban. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04Merge branch 'spi-5.1' into spi-nextMark Brown
2019-03-04Merge branch 'regulator-5.1' into regulator-nextMark Brown
2019-03-04printk: Remove no longer used LOG_PREFIX.Tetsuo Handa
When commit 5becfb1df5ac8e49 ("kmsg: merge continuation records while printing") introduced LOG_PREFIX, we used KERN_DEFAULT etc. as a flag for setting LOG_PREFIX in order to tell whether to call cont_add() (i.e. whether to append the message to "struct cont"). But since commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for printing continuation lines") inverted the behavior (i.e. don't append the message to "struct cont" unless KERN_CONT is specified) and commit 5aa068ea4082b39e ("printk: remove games with previous record flags") removed the last LOG_PREFIX check, setting LOG_PREFIX via KERN_DEFAULT etc. is no longer meaningful. Therefore, we can remove LOG_PREFIX and make KERN_DEFAULT empty string. Link: http://lkml.kernel.org/r/1550829580-9189-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp To: Steven Rostedt <rostedt@goodmis.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-03-04Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (48 commits) cpufreq: kryo: Release OPP tables on module removal cpufreq: ap806: add missing of_node_put after of_device_is_available cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies cpufreq: Pass updated policy to driver ->setpolicy() callback cpufreq: Fix two debug messages in cpufreq_set_policy() cpufreq: Reorder and simplify cpufreq_update_policy() cpufreq: Add kerneldoc comments for two core functions cpufreq: intel_pstate: Rework iowait boosting to be less aggressive cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate() cpufreq: intel_pstate: Avoid redundant initialization of local vars cpufreq / cppc: Work around for Hisilicon CPPC cpufreq ACPI / CPPC: Add a helper to get desired performance cpufreq: davinci: move configuration to include/linux/platform_data cpufreq: speedstep: convert BUG() to BUG_ON() cpufreq: powernv: fix missing check of return value in init_powernv_pstates() cpufreq: longhaul: remove unneeded semicolon cpufreq: pcc-cpufreq: remove unneeded semicolon cpufreq: Replace double NOT (!!) with single NOT (!) cpufreq: intel_pstate: Add reasons for failure and debug messages cpufreq: dt: Implement online/offline() callbacks ...
2019-03-04Merge branches 'pm-cpuidle' and 'powercap'Rafael J. Wysocki
* pm-cpuidle: ACPI / processor: Set P_LVL{2,3} idle state descriptions intel_idle: add support for Jacobsville cpuidle: dt: bail out if the idle-state DT node is not compatible cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK Documentation: driver-api: PM: Add cpuidle document cpuidle: New timer events oriented governor for tickless systems * powercap: powercap/intel_rapl: add Ice Lake mobile powercap: intel_rapl: add support for Jacobsville
2019-03-04Merge branches 'pm-core', 'pm-sleep', 'pm-qos', 'pm-domains' and 'pm-em'Rafael J. Wysocki
* pm-core: PM / core: Add support to skip power management in device/driver model PM / suspend: Print debug messages for device using direct-complete PM-runtime: update time accounting only when enabled PM-runtime: Switch accounting over to ktime_get_mono_fast_ns() PM-runtime: Optimize pm_runtime_autosuspend_expiration() PM-runtime: Replace jiffies-based accounting with ktime-based accounting PM-runtime: update accounting_timestamp on enable PM: clock_ops: fix missing clk_prepare() return value check drm/i915: Move on the new pm runtime interface PM-runtime: Add new interface to get accounted time * pm-sleep: PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event() * pm-qos: PM: QoS: no need to check return value of debugfs_create functions * pm-domains: PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name() PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name() PM: domains: no need to check return value of debugfs_create functions * pm-em: PM / EM: Expose the Energy Model in debugfs
2019-03-04Merge branch 'acpi-apei'Rafael J. Wysocki
* acpi-apei: (29 commits) efi: cper: Fix possible out-of-bounds access ACPI: APEI: Fix possible out-of-bounds access to BERT region MAINTAINERS: Add James Morse to the list of APEI reviewers ACPI / APEI: Add support for the SDEI GHES Notification type firmware: arm_sdei: Add ACPI GHES registration helper ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry() ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length ACPI / APEI: Make GHES estatus header validation more user friendly ACPI / APEI: Pass ghes and estatus separately to avoid a later copy ACPI / APEI: Let the notification helper specify the fixmap slot ACPI / APEI: Move locking to the notification helper arm64: KVM/mm: Move SEA handling behind a single 'claim' interface KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors ACPI / APEI: Generalise the estatus queue's notify code ACPI / APEI: Don't update struct ghes' flags in read/clear estatus ACPI / APEI: Remove spurious GHES_TO_CLEAR check ...
2019-03-04Merge branches 'acpi-tables', 'acpi-debug', 'acpi-ec' and 'acpi-dptf'Rafael J. Wysocki
* acpi-tables: ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs ACPI / tables: table override from built-in initrd * acpi-debug: ACPI: debug: Clean up acpi_aml_init() ACPI: no need to check return value of debugfs_create functions * acpi-ec: Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" ACPI: EC: Simplify boot EC checks in acpi_ec_add() ACPI: EC: Eliminate acpi_config_boot_ec() ACPI: EC: Make acpi_ec_dsdt_probe() more straightforward ACPI: EC: Make acpi_ec_ecdt_probe() more straightforward ACPI: EC: Declare boot_ec as static ACPI: EC: Clean up probing for early EC * acpi-dptf: ACPI / DPTF: remove header search path to the parent directory
2019-03-04Merge branch 'acpica'Rafael J. Wysocki
* acpica: ACPICA: Update version to 20190215 ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting ACPICA: ACPI 6.3: add GTDT Revision 3 support ACPICA: ACPI 6.3: HMAT updates ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG ACPICA: Update/clarify messages for control method failures ACPICA: Debugger: Fix possible fault with the "test objects" command ACPICA: Interpreter: Emit warning for creation of a zero-length op region ACPICA: Remove legacy module-level code support ACPICA: Get rid of acpi_sleep_dispatch() ACPICA: Update version to 20190108 ACPICA: All acpica: Update copyrights to 2019 ACPICA: acpiexec: Add option to dump extra info for memory leaks ACPICA: Convert more ACPI errors to firmware errors
2019-03-03Merge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 5.1 merge window.
2019-03-03tls: Fix write space handlingBoris Pismenny
TLS device cannot use the sw context. This patch returns the original tls device write space handler and moves the sw/device specific portions to the relevant files. Also, we remove the write_space call for the tls_sw flow, because it handles partial records in its delayed tx work handler. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix tls_device handling of partial recordsBoris Pismenny
Cleanup the handling of partial records while fixing a bug where the tls_push_pending_closed_record function is using the software tls context instead of the hardware context. The bug resulted in the following crash: [ 88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 88.793271] #PF error: [normal kernel read fault] [ 88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0 [ 88.795958] Oops: 0000 [#1] SMP PTI [ 88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3 [ 88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls] [ 88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd 4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213 [ 88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000 [ 88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980 [ 88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700 [ 88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000 [ 88.814506] FS: 00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000 [ 88.816337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0 [ 88.819328] Call Trace: [ 88.820123] tls_push_data+0x628/0x6a0 [tls] [ 88.821283] ? remove_wait_queue+0x20/0x60 [ 88.822383] ? n_tty_read+0x683/0x910 [ 88.823363] tls_device_sendmsg+0x53/0xa0 [tls] [ 88.824505] sock_sendmsg+0x36/0x50 [ 88.825492] sock_write_iter+0x87/0x100 [ 88.826521] __vfs_write+0x127/0x1b0 [ 88.827499] vfs_write+0xad/0x1b0 [ 88.828454] ksys_write+0x52/0xc0 [ 88.829378] do_syscall_64+0x5b/0x180 [ 88.830369] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 88.831603] RIP: 0033:0x7fcad1451680 [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 1248.472564] #PF error: [normal kernel read fault] [ 1248.473790] PGD 0 P4D 0 [ 1248.474642] Oops: 0000 [#1] SMP PTI [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G OE 5.0.0-rc4+ #3 [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls] [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c 1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213 [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006 [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286 [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1 [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000 [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000 [ 1248.494923] FS: 0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000 [ 1248.496847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0 [ 1248.500136] Call Trace: [ 1248.500998] ? tcp_check_oom+0xd0/0xd0 [ 1248.502106] tls_sk_proto_close+0x127/0x1e0 [tls] [ 1248.503411] inet_release+0x3c/0x60 [ 1248.504530] __sock_release+0x3d/0xb0 [ 1248.505611] sock_close+0x11/0x20 [ 1248.506612] __fput+0xb4/0x220 [ 1248.507559] task_work_run+0x88/0xa0 [ 1248.508617] do_exit+0x2cb/0xbc0 [ 1248.509597] ? core_sys_select+0x17a/0x280 [ 1248.510740] do_group_exit+0x39/0xb0 [ 1248.511789] get_signal+0x1d0/0x630 [ 1248.512823] do_signal+0x36/0x620 [ 1248.513822] exit_to_usermode_loop+0x5c/0xc6 [ 1248.515003] do_syscall_64+0x157/0x180 [ 1248.516094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1248.517456] RIP: 0033:0x7fb398bd3f53 [ 1248.518537] Code: Bad RIP value. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: remove gen10g_no_soft_resetHeiner Kallweit
genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops, one is enough. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: don't export gen10g_read_statusHeiner Kallweit
gen10g_read_status is deprecated, therefore stop exporting it. We don't want to encourage anybody to use it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: remove gen10g_config_initHeiner Kallweit
ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported and advertising bitmap because it's part of PHY_10GBIT_FEATURES. And all users of gen10g_config_init use PHY_10GBIT_FEATURES. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: remove gen10g_suspend and gen10g_resumeHeiner Kallweit
phy_suspend() and phy_resume() are no-ops anyway if no callback is defined. Therefore we don't need these stubs. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATEFrancesco Ruggeri
By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will receive all IPv6 RA packets from all namespaces. IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by the socket to be only from the socket's namespace. Signed-off-by: Maxim Martynov <maxim@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Permit use of connmarks as tin classifiersKevin Darbyshire-Bryant
Add flag 'FWMARK' to enable use of firewall connmarks as tin selector. The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie. for diffserv3 the mark needs to be 1->3. Background Typically CAKE uses DSCP as the basis for tin selection. DSCP values are relatively easily changed as part of the egress path, usually with iptables & the mangle table, ingress is more challenging. CAKE is often used on the WAN interface of a residential gateway where passthrough of DSCP from the ISP is either missing or set to unhelpful values thus use of ingress DSCP values for tin selection isn't helpful in that environment. An approach to solving the ingress tin selection problem is to use CAKE's understanding of tc filters. Naive tc filters could match on source/destination port numbers and force tin selection that way, but multiple filters don't scale particularly well as each filter must be traversed whether it matches or not. e.g. a simple example to map 3 firewall marks to tins: MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1 tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2 tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3 Another option is to use eBPF cls_act with tc filters e.g. MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o This has the disadvantages of a) needing someone to write & maintain the bpf program, b) a bpf toolchain to compile it and c) needing to hardcode the major number in the bpf program so it matches the cake instance (or forcing the cake instance to a particular major number) since the major number cannot be passed to the bpf program via tc command line. As already hinted at by the previous examples, it would be helpful to associate tins with something that survives the Internet path and ideally allows tin selection on both egress and ingress. Netfilter's conntrack permits setting an identifying mark on a connection which can also be restored to an ingress packet with tc action connmark e.g. tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1 Since tc's connmark action has restored any connmark into skb->mark, any of the previous solutions are based upon it and in one form or another copy that mark to the skb->priority field where again CAKE picks this up. This change cuts out at least one of the (less intuitive & non-scalable) middlemen and permit direct access to skb->mark. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04Merge v5.0 into drm-nextDave Airlie
There is a really hairy resolution involving amdgpu fixes, that I'd rather confirm here. Also some misc fixes are landed by me, but the pr has them as well. Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-03-03regulator: core: Add set/get_current_limit helpers for regmap usersAxel Lin
By setting curr_table, n_current_limits, csel_reg and csel_mask, the regmap users can use regulator_set_current_limit_regmap and regulator_get_current_limit_regmap for set/get_current_limit callbacks. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-03-03regulator: Fix comment for csel_reg and csel_maskAxel Lin
The csel_reg and csel_mask fields in struct regulator_desc needs to be generic for drivers. Not just for TPS65218. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-03-03net: dsa: add KSZ9893 switch tagging supportTristram Ha
KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag has 1 byte instead of 2 bytes. The size of the portmap is smaller and so the override and lookup bits are also moved. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03appletalk: Fix use-after-free in atalk_proc_exitYueHaibing
KASAN report this: BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806 CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667 atalk_proc_exit+0x18/0x820 [appletalk] atalk_exit+0xf/0x5a [appletalk] __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff Allocated by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2739 [inline] slab_alloc mm/slub.c:2747 [inline] kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752 kmem_cache_zalloc include/linux/slab.h:730 [inline] __proc_create+0x30f/0xa20 fs/proc/generic.c:408 proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469 0xffffffffc10c01bb 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002 pde_put+0x6e/0x80 fs/proc/generic.c:647 remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684 0xffffffffc10c031c 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881f41fe500 which belongs to the cache proc_dir_entry of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff8881f41fe500, ffff8881f41fe600) The buggy address belongs to the page: page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00 raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb It should check the return value of atalk_proc_init fails, otherwise atalk_exit will trgger use-after-free in pde_subdir_find while unload the module.This patch fix error cleanup path of atalk_init Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: sched: put back q.qlen into a single locationEric Dumazet
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'") John made the assumption that the data path had no need to read the qdisc qlen (number of packets in the qdisc). It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO children. But pfifo_fast can be used as leaf in class full qdiscs, and existing logic needs to access the child qlen in an efficient way. HTB breaks badly, since it uses cl->leaf.q->q.qlen in : htb_activate() -> WARN_ON() htb_dequeue_tree() to decide if a class can be htb_deactivated when it has no more packets. HFSC, DRR, CBQ, QFQ have similar issues, and some calls to qdisc_tree_reduce_backlog() also read q.qlen directly. Using qdisc_qlen_sum() (which iterates over all possible cpus) in the data path is a non starter. It seems we have to put back qlen in a central location, at least for stable kernels. For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock, so the existing q.qlen{++|--} are correct. For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}() because the spinlock might be not held (for example from pfifo_fast_enqueue() and pfifo_fast_dequeue()) This patch adds atomic_qlen (in the same location than qlen) and renames the following helpers, since we want to express they can be used without qdisc lock, and that qlen is no longer percpu. - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec() - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc() Later (net-next) we might revert this patch by tracking all these qlen uses and replace them by a more efficient method (not having to access a precise qlen, but an empty/non_empty status that might be less expensive to maintain/track). Another possibility is to have a legacy pfifo_fast version that would be used when used a a child qdisc, since the parent qdisc needs a spinlock anyway. But then, future lockless qdiscs would also have the same problem. Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge tag 'mlx5-updates-2019-03-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-03-01 This series adds multipath offload support and contains some small updates to mlx5 driver. Multipath offload support from Roi Dayan: We are going to track SW multipath route and related nexthops and reflect that as port affinity to the HW. 1) Some patches are preparation. 2) add the multipath mode and fib events handling. 3) add support to handle offload failure for net error, i.e. port down. 4) Small updates to match the behavior of multipath Two small updates from Eran Ben Elisha, 5) Make a function static 6) Update PCIe supported devices list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add .release_ops to properly unroll .select_ops, use it from nft_compat. After this change, we can remove list of extensions too to simplify this codebase. 2) Update amanda conntrack helper to support v3.4, from Florian Tham. 3) Get rid of the obsolete BUGPRINT macro in ebtables, from Florian Westphal. 4) Merge IPv4 and IPv6 masquerading infrastructure into one single module. From Florian Westphal. 5) Patchset to remove nf_nat_l3proto structure to get rid of indirections, from Florian Westphal. 6) Skip unnecessary conntrack timeout updates in case the value is still the same, also from Florian Westphal. 7) Remove unnecessary 'fall through' comments in empty switch cases, from Li RongQing. 8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys. 9) Incorrect logic to deactivate path of fixed size hashtable sets, element was being tested to self. 10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit keys. 11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi. 12) Enter close state in conntrack if RST matches exact sequence number, from Florian Westphal. 13) Initialize dst_cache in tunnel extension, from wenxu. 14) Pass protocol as u16 to xt_check_match and xt_check_target, from Li RongQing. 15) SCTP header is granted to be in a linear area from IPVS NAT handler, from Xin Long. 16) Don't steal packets coming from slave VRF device from the ip_sabotage_in() path, from David Ahern. 17) Fix unsafe update of basechain stats, from Li RongQing. 18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize modulo operation as bitwise AND, from Li RongQing. 19) Use device_attribute instead of internal definition in the IDLETIMER target, from Sami Tolvanen. 20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-03-02 Here's one more bluetooth-next pull request for the 5.1 kernel: - Added support for MediaTek MT7663U and MT7668U UART devices - Cleanups & fixes to the hci_qca driver - Fixed wakeup pin behavior for QCA6174A controller Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>