summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-02Coccinelle: coccicheck: fix typoJulia Lawall
Correct spelling of "coccinelle". Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-01rtnetlink: remove check for IFLA_IF_NETNSIDChristian Brauner
RTM_NEWLINK supports the IFLA_IF_NETNSID property since 5bb8ed075428b71492734af66230aa0c07fcc515 so we should not error out when it is passed. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01rocker: fix possible null pointer dereference in rocker_router_fib_event_workJiri Pirko
Currently, rocker user may experience following null pointer derefence bug: [ 3.062141] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0 [ 3.065163] IP: rocker_router_fib_event_work+0x36/0x110 [rocker] The problem is uninitialized rocker->wops pointer that is initialized only with the first initialized port. So move the port initialization before registering the fib events. Fixes: 936bd486564a ("rocker: use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01inet: Avoid unitialized variable warning in inet_unhash()Geert Uytterhoeven
With gcc-4.1.2: net/ipv4/inet_hashtables.c: In function ‘inet_unhash’: net/ipv4/inet_hashtables.c:628: warning: ‘ilb’ may be used uninitialized in this function While this is a false positive, it can easily be avoided by using the pointer itself as the canary variable. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01net: bridge: Fix uninitialized error in br_fdb_sync_static()Geert Uytterhoeven
With gcc-4.1.2.: net/bridge/br_fdb.c: In function ‘br_fdb_sync_static’: net/bridge/br_fdb.c:996: warning: ‘err’ may be used uninitialized in this function Indeed, if the list is empty, err will be uninitialized, and will be propagated up as the function return value. Fix this by preinitializing err to zero. Fixes: eb7935830d00b9e0 ("net: bridge: use rhashtable for fdbs") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01openvswitch: Remove padding from packet before L3+ conntrack processingEd Swierk
IPv4 and IPv6 packets may arrive with lower-layer padding that is not included in the L3 length. For example, a short IPv4 packet may have up to 6 bytes of padding following the IP payload when received on an Ethernet device with a minimum packet length of 64 bytes. Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(), and help() in nf_conntrack_ftp) assume skb->len reflects the length of the L3 header and payload, rather than referring back to ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by lower-layer padding. In the normal IPv4 receive path, ip_rcv() trims the packet to ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly in the br_netfilter receive path, br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3 length before invoking netfilter hooks. Currently in the OVS conntrack receive path, ovs_ct_execute() pulls the skb to the L3 header but does not trim it to the L3 length before calling nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp encounters a packet with lower-layer padding, nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log message. While extra zero bytes don't affect the checksum, the length in the IP pseudoheader does. That length is based on skb->len, and without trimming, it doesn't match the length the sender used when computing the checksum. In ovs_ct_execute(), trim the skb to the L3 length before higher-layer processing. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01tcp_bbr: fix pacing_gain to always be unity when using lt_bwNeal Cardwell
This commit fixes the pacing_gain to remain at BBR_UNIT (1.0) when using lt_bw and returning from the PROBE_RTT state to PROBE_BW. Previously, when using lt_bw, upon exiting PROBE_RTT and entering PROBE_BW the bbr_reset_probe_bw_mode() code could sometimes randomly end up with a cycle_idx of 0 and hence have bbr_advance_cycle_phase() set a pacing gain above 1.0. In such cases this would result in a pacing rate that is 1.25x higher than intended, potentially resulting in a high loss rate for a little while until we stop using the lt_bw a bit later. This commit is a stable candidate for kernels back as far as 4.9. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reported-by: Beyers Cronje <bcronje@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01be2net: remove redundant initialization of 'head' and pointer txqColin Ian King
Variable head is initialized to a value that is never read and is being updated to a new value a few lines later, hence this initialization is redundant and can be safely removed as well as the now unused pointer txq. Cleans up clang warning: drivers/net/ethernet/emulex/benet/be_main.c:996:6: warning: Value stored to 'head' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01Merge branch 'bnx2x-disable-GSO-on-too-large-packets'David S. Miller
Daniel Axtens says: ==================== bnx2x: disable GSO on too-large packets We observed a case where a packet received on an ibmveth device had a GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x device, where it caused a firmware assert. This is described in detail at [0]. Ultimately we want a fix in the core, but that is very tricky to backport. So for now, just stop the bnx2x driver from crashing. When net-next re-opens I will send the fix to the core and a revert for this. v4 changes: - fix compilation error with EXPORTs (patch 1) - only do slow test if gso_size is greater than 9000 bytes (patch 2) Thanks, Daniel [0]: https://patchwork.ozlabs.org/patch/859410/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01bnx2x: disable GSO where gso_size is too big for hardwareDaniel Axtens
If a bnx2x card is passed a GSO packet with a gso_size larger than ~9700 bytes, it will cause a firmware error that will bring the card down: bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert! bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2 bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052 bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1 ... (dump of values continues) ... Detect when the mac length of a GSO packet is greater than the maximum packet size (9700 bytes) and disable GSO. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01net: create skb_gso_validate_mac_len()Daniel Axtens
If you take a GSO skb, and split it into packets, will the MAC length (L2 + L3 + L4 headers + payload) of those packets be small enough to fit within a given length? Move skb_gso_mac_seglen() to skbuff.h with other related functions like skb_gso_network_seglen() so we can use it, and then create skb_gso_validate_mac_len to do the full calculation. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_LOG_FDEric Biggers
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_dev->log_file. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_VRING_ERREric Biggers
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->error. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_VRING_CALLEric Biggers
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->call. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-02-01ringtest: ring.c malloc & memset to callocPeter Malone
Code cleanup change - moving from malloc & memset to calloc. Signed-off-by: Peter Malone <peter.malone@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01virtio_vop: don't kfree device on register failureweiping zhang
As mentioned at drivers/base/core.c: /* * NOTE: _Never_ directly free @dev after calling this function, even * if it returned an error! Always use put_device() to give up the * reference initialized in this function instead. */ so we don't free vdev until vdev->vdev.dev.release be called. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01virtio_pci: don't kfree device on register failureweiping zhang
As mentioned at drivers/base/core.c: /* * NOTE: _Never_ directly free @dev after calling this function, even * if it returned an error! Always use put_device() to give up the * reference initialized in this function instead. */ so we don't free vp_dev until vp_dev->vdev.dev.release be called. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01virtio: split device_register into device_initialize and device_addweiping zhang
In order to make caller do a simple cleanup, we split device_register into device_initialize and device_add. device_initialize always succeeds, so the caller can always use put_device when register_virtio_device faild. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2018-02-01vhost: remove unused lock check flag in vhost_dev_cleanup()夷则(Caspar)
In commit ea5d404655ba ("vhost: fix release path lockdep checks"), Michael added a flag to check whether we should hold a lock in vhost_dev_cleanup(), however, in commit 47283bef7ed3 ("vhost: move memory pointer to VQs"), RCU operations have been replaced by mutex, we can remove the no-longer-used `locked' parameter now. Signed-off-by: Caspar Zhang <jinli.zjl@alibaba-inc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01vhost: Remove the unused variable.Tonghao Zhang
The patch (7235acdb1) changed the way of the work flushing in which the queued seq, done seq, and the flushing are not used anymore. Then remove them now. Fixes: 7235acdb1 ("vhost: simplify work flushing") Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01virtio_blk: print capacity at probe timeStefan Hajnoczi
Print the capacity of the block device when the driver is probed. Many users expect this since SCSI disks (sd) do it. Moreover, kernel dmesg output is the primary source of troubleshooting information so it's helpful to include the disk size there. The capacity is already printed by virtio_blk when a resize event occurs. Extract the code and reuse it from virtblk_probe(). This patch also adds the block device name to the message so it can be correlated with a specific device: virtio_blk virtio0: [vda] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB) Cc: Rodrigo A B Freire <rfreire@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01virtio: make VIRTIO a menuconfig to ease disabling it allVincent Legoll
No need to get into the submenu to disable all VIRTIO-related config entries. This makes it easier to disable all VIRTIO config options without entering the submenu. It will also enable one to see that en/dis-abled state from the outside menu. This is only intended to change menuconfig UI, not change the config dependencies. Signed-off-by: Vincent Legoll <vincent.legoll@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01platform/x86: intel-vbtn: Replace License by SDPX identifierAndy Shevchenko
Replace License short header by SPDX identifier. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-01platform/x86: intel-vbtn: Remove redundant inclusionsAndy Shevchenko
Some headers are not needed since the driver can be built as module. Remove them. While here, sort headers alphabetically. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-01platform/x86: intel-vbtn: Support tablet mode switchMarco Martin
On some laptop like the Dell Inspiron 7000 series tablet mode switch implemented in Intel ACPI, the events to enter and exit the tablet mode are 0xCC and 0xCD This initializes the tablet/laptop mode at the correct value if the system booted in tablet mode (or the intel-vbtn module loaded with the device in tablet mode) Cc: platform-driver-x86@vger.kernel.org Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: "Pali Rohár" <pali.rohar@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Stefan Brüns<stefan.bruens@rwth-aachen.de> Signed-off-by: Marco Martin <notmart@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@dell.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> [andy: fixed style of comments, indentation, and massaged commit message] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-01platform/x86: dell-laptop: Allocate buffer on heap rather than globallyMario Limonciello
There is no longer a need for the buffer to be defined in first 4GB physical address space. Furthermore there may be race conditions with multiple different functions working on a module wide buffer causing incorrect results. Fixes: 549b4930f057658dc50d8010e66219233119a4d8 Suggested-by: Pali Rohar <pali.rohar@gmail.com> Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-01platform/x86: intel_pmc_core: Remove unused header fileRajneesh Bhardwaj
Recently sent patch 'platform/x86: intel_pmc_core: Remove unused EXPORTED API' missed to remove the header file 'arch/x86/include/asm/pmc_core.h' which was solely used to declare the EXPORTED API 'intel_pmc_slp_s0_counter_read'. This patch provides the errata fix for the same. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-01iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}Goffredo Baroncelli
The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-02-01tools/bpf: permit selftests/bpf to be built in a different directoryYonghong Song
Fix a couple of issues at tools/testing/selftests/bpf/Makefile so the following command make -C tools/testing/selftests/bpf OUTPUT=/home/yhs/tmp can put the built results into a different directory. Also add the built binary test_tcpbpf_user in the .gitignore file. Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range checks") Fixes: 9d1f15941967 ("bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-01netdevsim: fix overflow on the error pathJakub Kicinski
Undo loop condition on the error path would cause the i counter to go below zero, if allocation failure happened with the first (i.e. 0th) element of the array. Fixes: 395cacb5f1a0 ("netdevsim: bpf: support fake map offload") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-01genirq: Make legacy autoprobing work againThomas Gleixner
Meelis reported the following warning on a quad P3 HP NetServer museum piece: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100 EIP: __irq_startup+0x80/0x100 irq_startup+0x7e/0x170 probe_irq_on+0x128/0x2b0 parport_irq_probe.constprop.18+0x8d/0x1af [parport_pc] parport_pc_probe_port+0xf11/0x1260 [parport_pc] parport_pc_init+0x78a/0xf10 [parport_pc] parport_parse_param.constprop.16+0xf0/0xf0 [parport_pc] do_one_initcall+0x45/0x1e0 This is caused by the rewrite of the irq activation/startup sequence which missed to convert a callsite in the irq legacy auto probing code. To fix this irq_activate_and_startup() needs to gain a return value so the pending logic can work proper. Fixes: c942cee46bba ("genirq: Separate activation and startup") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Meelis Roos <mroos@linux.ee> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801301935410.1797@nanos
2018-02-01x86/kvm: Update spectre-v1 mitigationDan Williams
Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup" added a raw 'asm("lfence");' to prevent a bounds check bypass of 'vmcs_field_to_offset_table'. The lfence can be avoided in this path by using the array_index_nospec() helper designed for these types of fixes. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andrew Honig <ahonig@google.com> Cc: kvm@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Link: https://lkml.kernel.org/r/151744959670.6342.3001723920950249067.stgit@dwillia2-desk3.amr.corp.intel.com
2018-02-01Merge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 4.16 merge window.
2018-01-31xfs: fix u32 type usage in sb validation functionDarrick J. Wong
Don't use u32, use uint32_t, because this won't work in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-01-31Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-01-31Merge branch 'work.vmci' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vmci iov_iter updates from Al Viro: "Get rid of "is it an iovec or an entire array?" flags in vmxi - just use iov_iter. Simplifies the living hell out of that code..." * 'work.vmci' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vmci: the same on the send side... vmci: simplify qp_dequeue_locked() vmci: get rid of qp_memcpy_from_queue() vmci: fix buf_size in case of iovec-based accesses
2018-01-31Merge branch 'work.whack-a-mole' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull asm/uaccess.h whack-a-mole from Al Viro: "It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop cropping up..." * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h riscv: use linux/uaccess.h, not asm/uaccess.h... ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
2018-01-31Merge branch 'work.dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Neil Brown's d_move()/d_path() race fix" * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: close race between getcwd() and d_move()
2018-01-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - misc fixes - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) mm: remove PG_highmem description tools, vm: new option to specify kpageflags file mm/swap.c: make functions and their kernel-doc agree mm, memory_hotplug: fix memmap initialization mm: correct comments regarding do_fault_around() mm: numa: do not trap faults on shared data section pages. hugetlb, mbind: fall back to default policy if vma is NULL hugetlb, mempolicy: fix the mbind hugetlb migration mm, hugetlb: further simplify hugetlb allocation API mm, hugetlb: get rid of surplus page accounting tricks mm, hugetlb: do not rely on overcommit limit during migration mm, hugetlb: integrate giga hugetlb more naturally to the allocation path mm, hugetlb: unify core page allocation accounting and initialization mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes mm/memcontrol.c: make local symbol static mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd() include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer mm/compaction.c: fix comment for try_to_compact_pages() mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it zsmalloc: use U suffix for negative literals being shifted ...
2018-02-01drm/ast: Load lut in crtc_commitDaniel Vetter
In the past the ast driver relied upon the fbdev emulation helpers to call ->load_lut at boot-up. But since commit b8e2b0199cc377617dc238f5106352c06dcd3fa2 Author: Peter Rosin <peda@axentia.se> Date: Tue Jul 4 12:36:57 2017 +0200 drm/fb-helper: factor out pseudo-palette that's cleaned up and drivers are expected to boot into a consistent lut state. This patch fixes that. Fixes: b8e2b0199cc3 ("drm/fb-helper: factor out pseudo-palette") Cc: Peter Rosin <peda@axenita.se> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: <stable@vger.kernel.org> # v4.14+ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=198123 Cc: Bill Fraser <bill.fraser@gmail.com> Reported-and-Tested-by: Bill Fraser <bill.fraser@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-01Merge tag 'drm-misc-next-fixes-2018-01-31' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next This contains a fix to restrict what lessee can do with masters and another one when waiting for timeouts on reservation objects. * tag 'drm-misc-next-fixes-2018-01-31' of git://anongit.freedesktop.org/drm/drm-misc: drm: Check for lessee in DROP_MASTER ioctl dma-buf: fix reservation_object_wait_timeout_rcu once more v2
2018-01-31mm: remove PG_highmem descriptionMiles Chen
Commit cbe37d093707 ("[PATCH] mm: remove PG_highmem") removed PG_highmem to save a page flag. So the description of PG_highmem is no longer needed. Link: http://lkml.kernel.org/r/1517391212-2950-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31tools, vm: new option to specify kpageflags fileDavid Rientjes
page-types currently hardcodes /proc/kpageflags as the file to parse. This works when using the tool to examine the state of pageflags on the same system, but does not allow storing a snapshot of pageflags at a given time to debug issues nor on a different system. This allows the user to specify a saved version of kpageflags with a new page-types -F option. [akpm@linux-foundation.org: add "filename" to fix usage() string] [rientjes@google.com: fix layout] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301840050.140969@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301458180.153857@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm/swap.c: make functions and their kernel-doc agreeRandy Dunlap
Fix some basic kernel-doc notation in mm/swap.c: - for function lru_cache_add_anon(), make its kernel-doc function name match its function name and change colon to hyphen following the function name - for function pagevec_lookup_entries(), change the function parameter name from nr_pages to nr_entries since that is more descriptive of what the parameter actually is and then it matches the kernel-doc comments also Fix function kernel-doc to match the change in commit 67fd707f4681: - drop the kernel-doc notation for @nr_pages from pagevec_lookup_range() and correct the function description for that change Link: http://lkml.kernel.org/r/3b42ee3e-04a9-a6ca-6be4-f00752a114fe@infradead.org Fixes: 67fd707f4681 ("mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm, memory_hotplug: fix memmap initializationMichal Hocko
Bharata has noticed that onlining a newly added memory doesn't increase the total memory, pointing to commit f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") as a culprit. This commit has changed the way how the memory for memmaps is initialized and moves it from the allocation time to the initialization time. This works properly for the early memmap init path. It doesn't work for the memory hotplug though because we need to mark page as reserved when the sparsemem section is created and later initialize it completely during onlining. memmap_init_zone is called in the early stage of onlining. With the current code it calls __init_single_page and as such it clears up the whole stage and therefore online_pages_range skips those pages. Fix this by skipping mm_zero_struct_page in __init_single_page for memory hotplug path. This is quite uggly but unifying both early init and memory hotplug init paths is a large project. Make sure we plug the regression at least. Link: http://lkml.kernel.org/r/20180130101141.GW21609@dhcp22.suse.cz Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: correct comments regarding do_fault_around()William Kucharski
There are multiple comments surrounding do_fault_around that memtion fault_around_pages() and fault_around_mask(), two routines that do not exist. These comments should be reworded to reference fault_around_bytes, the value which is used to determine how much do_fault_around() will attempt to read when processing a fault. These comments should have been updated when fault_around_pages() and fault_around_mask() were removed in commit aecd6f44266c ("mm: close race between do_fault_around() and fault_around_bytes_set()"). Fixes: aecd6f44266c1 ("mm: close race between do_fault_around() and fault_around_bytes_set()") Link: http://lkml.kernel.org/r/302D0B14-C7E9-44C6-8BED-033F9ACBD030@oracle.com Signed-off-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Larry Bassel <larry.bassel@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: numa: do not trap faults on shared data section pages.Henry Willard
Workloads consisting of a large number of processes running the same program with a very large shared data segment may experience performance problems when numa balancing attempts to migrate the shared cow pages. This manifests itself with many processes or tasks in TASK_UNINTERRUPTIBLE state waiting for the shared pages to be migrated. The program listed below simulates the conditions with these results when run with 288 processes on a 144 core/8 socket machine. Average throughput Average throughput Average throughput with numa_balancing=0 with numa_balancing=1 with numa_balancing=1 without the patch with the patch --------------------- --------------------- --------------------- 2118782 2021534 2107979 Complex production environments show less variability and fewer poorly performing outliers accompanied with a smaller number of processes waiting on NUMA page migration with this patch applied. In some cases, %iowait drops from 16%-26% to 0. // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved. */ #include <sys/time.h> #include <stdio.h> #include <wait.h> #include <sys/mman.h> int a[1000000] = {13}; int main(int argc, const char **argv) { int n = 0; int i; pid_t pid; int stat; int *count_array; int cpu_count = 288; long total = 0; struct timeval t1, t2 = {(argc > 1 ? atoi(argv[1]) : 10), 0}; if (argc > 2) cpu_count = atoi(argv[2]); count_array = mmap(NULL, cpu_count * sizeof(int), (PROT_READ|PROT_WRITE), (MAP_SHARED|MAP_ANONYMOUS), 0, 0); if (count_array == MAP_FAILED) { perror("mmap:"); return 0; } for (i = 0; i < cpu_count; ++i) { pid = fork(); if (pid <= 0) break; if ((i & 0xf) == 0) usleep(2); } if (pid != 0) { if (i == 0) { perror("fork:"); return 0; } for (;;) { pid = wait(&stat); if (pid < 0) break; } for (i = 0; i < cpu_count; ++i) total += count_array[i]; printf("Total %ld\n", total); munmap(count_array, cpu_count * sizeof(int)); return 0; } gettimeofday(&t1, 0); timeradd(&t1, &t2, &t1); while (timercmp(&t2, &t1, <)) { int b = 0; int j; for (j = 0; j < 1000000; j++) b += a[j]; gettimeofday(&t2, 0); n++; } count_array[i] = n; return 0; } This patch changes change_pte_range() to skip shared copy-on-write pages when called from change_prot_numa(). NOTE: change_prot_numa() is nominally called from task_numa_work() and queue_pages_test_walk(). task_numa_work() is the auto NUMA balancing path, and queue_pages_test_walk() is part of explicit NUMA policy management. However, queue_pages_test_walk() only calls change_prot_numa() when MPOL_MF_LAZY is specified and currently that is not allowed, so change_prot_numa() is only called from auto NUMA balancing. In the case of explicit NUMA policy management, shared pages are not migrated unless MPOL_MF_MOVE_ALL is specified, and MPOL_MF_MOVE_ALL depends on CAP_SYS_NICE. Currently, there is no way to pass information about MPOL_MF_MOVE_ALL to change_pte_range. This will have to be fixed if MPOL_MF_LAZY is enabled and MPOL_MF_MOVE_ALL is to be honored in lazy migration mode. task_numa_work() skips the read-only VMAs of programs and shared libraries. Link: http://lkml.kernel.org/r/1516751617-7369-1-git-send-email-henry.willard@oracle.com Signed-off-by: Henry Willard <henry.willard@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb, mbind: fall back to default policy if vma is NULLMichal Hocko
Dan Carpenter has noticed that mbind migration callback (new_page) can get a NULL vma pointer and choke on it inside alloc_huge_page_vma which relies on the VMA to get the hstate. We used to BUG_ON this case but the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the mbind hugetlb migration". The proper way to handle this is to get the hstate from the migrated page and rely on huge_node (resp. get_vma_policy) do the right thing with null VMA. We are currently falling back to the default mempolicy in that case which is in line what THP path is doing here. Link: http://lkml.kernel.org/r/20180110104712.GR1732@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb, mempolicy: fix the mbind hugetlb migrationMichal Hocko
do_mbind migration code relies on alloc_huge_page_noerr for hugetlb pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel allocation function which has to take care of reserves, overcommit or hugetlb cgroup accounting. None of that is really required for the page migration because the new page is only temporal and either will replace the original page or it will be dropped. This is essentially as for other migration call paths and there shouldn't be any reason to handle mbind in a special way. The current implementation is even suboptimal because the migration might fail just because the hugetlb cgroup limit is reached, or the overcommit is saturated. Fix this by making mbind like other hugetlb migration paths. Add a new migration helper alloc_huge_page_vma as a wrapper around alloc_huge_page_nodemask with additional mempolicy handling. alloc_huge_page_noerr has no more users and it can go. Link: http://lkml.kernel.org/r/20180103093213.26329-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm, hugetlb: further simplify hugetlb allocation APIMichal Hocko
Hugetlb allocator has several layer of allocation functions depending and the purpose of the allocation. There are two allocators depending on whether the page can be allocated from the page allocator or we need a contiguous allocator. This is currently opencoded in alloc_fresh_huge_page which is the only path that might allocate giga pages which require the later allocator. Create alloc_fresh_huge_page which hides this implementation detail and use it in all callers which hardcoded the buddy allocator path (__hugetlb_alloc_buddy_huge_page). This shouldn't introduce any funtional change because both migration and surplus allocators exlude giga pages explicitly. While we are at it let's do some renaming. The current scheme is not consistent and overly painfull to read and understand. Get rid of prefix underscores from most functions. There is no real reason to make names longer. * alloc_fresh_huge_page is the new layer to abstract underlying allocator * __hugetlb_alloc_buddy_huge_page becomes shorter and neater alloc_buddy_huge_page. * Former alloc_fresh_huge_page becomes alloc_pool_huge_page because we put the new page directly to the pool * alloc_surplus_huge_page can drop the opencoded prep_new_huge_page code as it uses alloc_fresh_huge_page now * others lose their excessive prefix underscores to make names shorter [dan.carpenter@oracle.com: fix double unlock bug in alloc_surplus_huge_page()] Link: http://lkml.kernel.org/r/20180109200559.g3iz5kvbdrz7yydp@mwanda Link: http://lkml.kernel.org/r/20180103093213.26329-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>