summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-07-07crush: implement weight and id overrides for straw2Ilya Dryomov
bucket_straw2_choose needs to use weights that may be different from weight_items. For instance to compensate for an uneven distribution caused by a low number of values. Or to fix the probability biais introduced by conditional probabilities (see http://tracker.ceph.com/issues/15653 for more information). We introduce a weight_set for each straw2 bucket to set the desired weight for a given item at a given position. The weight of a given item when picking the first replica (first position) may be different from the weight the second replica (second position). For instance the weight matrix for a given bucket containing items 3, 7 and 13 could be as follows: position 0 position 1 item 3 0x10000 0x100000 item 7 0x40000 0x10000 item 13 0x40000 0x10000 When crush_do_rule picks the first of two replicas (position 0), item 7, 3 are four times more likely to be choosen by bucket_straw2_choose than item 13. When choosing the second replica (position 1), item 3 is ten times more likely to be choosen than item 7, 13. By default the weight_set of each bucket exactly matches the content of item_weights for each position to ensure backward compatibility. bucket_straw2_choose compares items by using their id. The same ids are also used to index buckets and they must be unique. For each item in a bucket an array of ids can be provided for placement purposes and they are used instead of the ids. If no replacement ids are provided, the legacy behavior is preserved. Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: pg_upmap[_items] infrastructureIlya Dryomov
pg_temp and pg_upmap encodings are the same (PG -> array of osds), except for the incremental remove: it's an empty mapping in new_pg_temp for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't looked at as an example for pg_upmap encoding.) Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap. __decode_pg_temp() stores into pg_temp union member, but since pg_upmap union member is identical, reading through pg_upmap later is OK. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: ceph_decode_skip_* helpersIlya Dryomov
Some of these won't be as efficient as they could be (e.g. ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32) once instead of advancing by sizeof(u32) len times), but that's fine and not worth a bunch of extra macro code. Replace skip_name_map() with ceph_decode_skip_map as an example. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: respect RADOS_BACKOFF backoffsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: make DEFINE_RB_* helpers more generalIlya Dryomov
Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id, compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare() respectively. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: avoid unnecessary pi lookups in calc_target()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: make sure need_resend targets reflect latest mapIlya Dryomov
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: resend on PG splits if OSD has RESEND_ON_SPLITIlya Dryomov
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: MOSDOp v8 encoding (actual spgid + full hash)Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: ceph_connection_operations::reencode_message() methodIlya Dryomov
Give upper layers a chance to reencode the message after the connection is negotiated and ->peer_features is set. OSD client will use this to support both luminous and pre-luminous OSDs (in a single cluster): the former need MOSDOp v8; the latter will continue to be sent MOSDOp v4. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: introduce ceph_spg, ceph_pg_to_primary_shard()Ilya Dryomov
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: fold [l]req->last_force_resend into ceph_osd_request_targetIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: support SERVER_JEWEL feature bitsIlya Dryomov
Only MON_STATEFUL_SUB, really. MON_ROUTE_OSDMAP and OSDSUBOP_NO_SNAPCONTEXT are irrelevant. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: advertise support for OSD_POOLRESENDIlya Dryomov
The code has been in place since commit 63244fa123a7 ("libceph: introduce ceph_osd_request_target, calc_target()"), and, with the ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now in working order. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: new features macrosIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: remove ceph_sanitize_features() workaroundIlya Dryomov
Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07KVM: mark memory slots as rcuChristian Borntraeger
we access the memslots array via srcu. Mark it as such and use the right access functions also for the freeing of memory slots. Found by sparse: ./include/linux/kvm_host.h:565:16: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07KVM: mark kvm->busses as rcu protectedChristian Borntraeger
mark kvm->busses as rcu protected and use the correct access function everywhere. found by sparse virt/kvm/kvm_main.c:3490:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3509:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3561:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3644:15: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-07-07rtc: add generic nvmem supportAlexandre Belloni
Many RTCs have an on board non volatile storage. It can be battery backed RAM or an EEPROM. Use the nvmem subsystem to export it to both userspace and in-kernel consumers. This stays compatible with the previous (non documented) ABI that was using /sys/class/rtc/rtcx/device/nvram to export that memory. But will warn about the deprecation. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-07-07rtc: introduce new registration methodAlexandre Belloni
Introduce rtc_register_device() to register an already allocated and initialized struct rtc_device. It automatically sets up the owner and the two steps allocation/registration will allow to remove race conditions in the IRQ handling of some driver. It also allows to properly extend the core without adding more arguments to rtc_device_register(). Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-07-07KVM: mark vcpu->pid pointer as rcu protectedChristian Borntraeger
We do use rcu to protect the pid pointer. Mark it as such and adopt all code to use the proper access methods. This was detected by sparse. "virt/kvm/kvm_main.c:2248:15: error: incompatible types in comparison expression (different address spaces)" Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few hotfixes - various misc updates - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits) mm, memory_hotplug: move movable_node to the hotplug proper mm, memory_hotplug: drop CONFIG_MOVABLE_NODE mm, memory_hotplug: drop artificial restriction on online/offline mm: memcontrol: account slab stats per lruvec mm: memcontrol: per-lruvec stats infrastructure mm: memcontrol: use generic mod_memcg_page_state for kmem pages mm: memcontrol: use the node-native slab memory counters mm: vmstat: move slab statistics from zone to node counters mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() mm/zswap.c: improve a size determination in zswap_frontswap_init() mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create() mm/swapfile.c: sort swap entries before free mm/oom_kill: count global and memory cgroup oom kills mm: per-cgroup memory reclaim stats mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects mm: kmemleak: factor object reference updating out of scan_block() mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures mm, mempolicy: don't check cpuset seqlock where it doesn't matter mm, cpuset: always use seqlock when changing task's nodemask mm, mempolicy: simplify rebinding mempolicies when updating cpusets ...
2017-07-06Merge branch 'misc.compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc compat stuff updates from Al Viro: "This part is basically untangling various compat stuff. Compat syscalls moved to their native counterparts, getting rid of quite a bit of double-copying and/or set_fs() uses. A lot of field-by-field copyin/copyout killed off. - kernel/compat.c is much closer to containing just the copyin/copyout of compat structs. Not all compat syscalls are gone from it yet, but it's getting there. - ipc/compat_mq.c killed off completely. - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to drivers/block/floppy.c where they belong. Yes, there are several drivers that implement some of the same ioctls. Some are m68k and one is 32bit-only pmac. drivers/block/floppy.c is the only one in that bunch that can be built on biarch" * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mqueue: move compat syscalls to native ones usbdevfs: get rid of field-by-field copyin compat_hdio_ioctl: get rid of set_fs() take floppy compat ioctls to sodding floppy.c ipmi: get rid of field-by-field __get_user() ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one rt_sigtimedwait(): move compat to native select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap() put_compat_rusage(): switch to copy_to_user() sigpending(): move compat to native getrlimit()/setrlimit(): move compat to native times(2): move compat to native compat_{get,put}_bitmap(): use unsafe_{get,put}_user() fb_get_fscreeninfo(): don't bother with do_fb_ioctl() do_sigaltstack(): lift copying to/from userland into callers take compat_sys_old_getrlimit() to native syscall trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06Merge tag 'trace-v4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The new features of this release: - Added TRACE_DEFINE_SIZEOF() which allows trace events that use sizeof() it the TP_printk() to be converted to the actual size such that trace-cmd and perf can parse them correctly. - Some rework of the TRACE_DEFINE_ENUM() such that the above TRACE_DEFINE_SIZEOF() could reuse the same code. - Recording of tgid (Thread Group ID). This is similar to how task COMMs are recorded (cached at sched_switch), where it is in a table and used on output of the trace and trace_pipe files. - Have ":mod:<module>" be cached when written into set_ftrace_filter. Then the functions of the module will be traced at module load. - Some random clean ups and small fixes" * tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits) ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes ftrace: Decrement count for dyn_ftrace_total_info for init functions ftrace: Unlock hash mutex on failed allocation in process_mod_list() tracing: Add support for display of tgid in trace output tracing: Add support for recording tgid of tasks ftrace: Decrement count for dyn_ftrace_total_info file ftrace: Remove unused function ftrace_arch_read_dyn_info() sh/ftrace: Remove only user of ftrace_arch_read_dyn_info() ftrace: Have cached module filters be an active filter ftrace: Implement cached modules tracing on module load ftrace: Have the cached module list show in set_ftrace_filter ftrace: Add :mod: caching infrastructure to trace_array tracing: Show address when function names are not found ftrace: Add missing comment for FTRACE_OPS_FL_RCU tracing: Rename update the enum_map file tracing: Add TRACE_DEFINE_SIZEOF() macros tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values tracing: Rename enum_replace to eval_replace trace: rename enum_map functions trace: rename trace.c enum functions ...
2017-07-06Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping infrastructure from Christoph Hellwig: "This is the first pull request for the new dma-mapping subsystem In this new subsystem we'll try to properly maintain all the generic code related to dma-mapping, and will further consolidate arch code into common helpers. This pull request contains: - removal of the DMA_ERROR_CODE macro, replacing it with calls to ->mapping_error so that the dma_map_ops instances are more self contained and can be shared across architectures (me) - removal of the ->set_dma_mask method, which duplicates the ->dma_capable one in terms of functionality, but requires more duplicate code. - various updates for the coherent dma pool and related arm code (Vladimir) - various smaller cleanups (me)" * tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits) ARM: dma-mapping: Remove traces of NOMMU code ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus ARM: NOMMU: Introduce dma operations for noMMU drivers: dma-mapping: allow dma_common_mmap() for NOMMU drivers: dma-coherent: Introduce default DMA pool drivers: dma-coherent: Account dma_pfn_offset when used with device tree dma: Take into account dma_pfn_offset dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs dma-mapping: remove dmam_free_noncoherent crypto: qat - avoid an uninitialized variable warning au1100fb: remove a bogus dma_free_nonconsistent call MAINTAINERS: add entry for dma mapping helpers powerpc: merge __dma_set_mask into dma_set_mask dma-mapping: remove the set_dma_mask method powerpc/cell: use the dma_supported method for ops switching powerpc/cell: clean up fixed mapping dma_ops initialization tile: remove dma_supported and mapping_error methods xen-swiotlb: remove xen_swiotlb_set_dma_mask arm: implement ->dma_supported instead of ->set_dma_mask mips/loongson64: implement ->dma_supported instead of ->set_dma_mask ...
2017-07-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Martin Schwidefsky: - The fixup for the blk-mq clash with the scm driver - An improvement for the dasd driver in regard to raw I/O - Bug fixes and cleanup * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: Update my email address s390/syscalls: Fix out of bounds arguments access s390/vfio_ccw: remove unused variable s390/dasd: remove unneeded code s390/crash: Remove unused KEXEC_NOTE_BYTES s390/zcrypt: Fix missing newlines at some debug feature messages. s390/dasd: Make raw I/O usable without prefix support s390/dasd: Rename dasd_raw_build_cp() s390/dasd: Refactor prefix_LRE() and related functions s390: fix up for "blk-mq: switch ->queue_rq return value to blk_status_t"
2017-07-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ...
2017-07-07Add "shutdown" to "struct class".Josh Zimmerman
The TPM class has some common shutdown code that must be executed for all drivers. This adds some needed functionality for that. Signed-off-by: Josh Zimmerman <joshz@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Fixes: 74d6b3ceaa17 ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-06mm, memory_hotplug: move movable_node to the hotplug properMichal Hocko
movable_node_is_enabled is defined in memblock proper while it is initialized from the memory hotplug proper. This is quite messy and it makes a dependency between the two so move movable_node along with the helper functions to memory_hotplug. To make it more entertaining the kernel parameter is ignored unless CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node information for each memblock otherwise. So let's warn when the option is disabled. Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: drop CONFIG_MOVABLE_NODEMichal Hocko
Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a good explanation on why it is actually useful. It makes a lot of sense to make movable node semantic opt in but we already have that because the feature has to be explicitly enabled on the kernel command line. A config option on top only makes the configuration space larger without a good reason. It also adds an additional ifdefery that pollutes the code. Just drop the config option and make it de-facto always enabled. This shouldn't introduce any change to the semantic. Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: memcontrol: per-lruvec stats infrastructureJohannes Weiner
lruvecs are at the intersection of the NUMA node and memcg, which is the scope for most paging activity. Introduce a convenient accounting infrastructure that maintains statistics per node, per memcg, and the lruvec itself. Then convert over accounting sites for statistics that are already tracked in both nodes and memcgs and can be easily switched. [hannes@cmpxchg.org: fix crash in the new cgroup stat keeping code] Link: http://lkml.kernel.org/r/20170531171450.GA10481@cmpxchg.org [hannes@cmpxchg.org: don't track uncharged pages at all Link: http://lkml.kernel.org/r/20170605175254.GA8547@cmpxchg.org [hannes@cmpxchg.org: add missing free_percpu()] Link: http://lkml.kernel.org/r/20170605175354.GB8547@cmpxchg.org [linux@roeck-us.net: hexagon: fix build error caused by include file order] Link: http://lkml.kernel.org/r/20170617153721.GA4382@roeck-us.net Link: http://lkml.kernel.org/r/20170530181724.27197-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: memcontrol: use generic mod_memcg_page_state for kmem pagesJohannes Weiner
The kmem-specific functions do the same thing. Switch and drop. Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: memcontrol: use the node-native slab memory countersJohannes Weiner
Now that the slab counters are moved from the zone to the node level we can drop the private memcg node stats and use the official ones. Link: http://lkml.kernel.org/r/20170530181724.27197-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: vmstat: move slab statistics from zone to node countersJohannes Weiner
Patch series "mm: per-lruvec slab stats" Josef is working on a new approach to balancing slab caches and the page cache. For this to work, he needs slab cache statistics on the lruvec level. These patches implement that by adding infrastructure that allows updating and reading generic VM stat items per lruvec, then switches some existing VM accounting sites, including the slab accounting ones, to this new cgroup-aware API. I'll follow up with more patches on this, because there is actually substantial simplification that can be done to the memory controller when we replace private memcg accounting with making the existing VM accounting sites cgroup-aware. But this is enough for Josef to base his slab reclaim work on, so here goes. This patch (of 5): To re-implement slab cache vs. page cache balancing, we'll need the slab counters at the lruvec level, which, ever since lru reclaim was moved from the zone to the node, is the intersection of the node, not the zone, and the memcg. We could retain the per-zone counters for when the page allocator dumps its memory information on failures, and have counters on both levels - which on all but NUMA node 0 is usually redundant. But let's keep it simple for now and just move them. If anybody complains we can restore the per-zone counters. [hannes@cmpxchg.org: fix oops] Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/oom_kill: count global and memory cgroup oom killsKonstantin Khlebnikov
Show count of oom killer invocations in /proc/vmstat and count of processes killed in memory cgroup in knob "memory.events" (in memory.oom_control for v1 cgroup). Also describe difference between "oom" and "oom_kill" in memory cgroup documentation. Currently oom in memory cgroup kills tasks iff shortage has happened inside page fault. These counters helps in monitoring oom kills - for now the only way is grepping for magic words in kernel log. [akpm@linux-foundation.org: fix for mem_cgroup_count_vm_event() rename] [akpm@linux-foundation.org: fix comment, per Konstantin] Link: http://lkml.kernel.org/r/149570810989.203600.9492483715840752937.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Roman Guschin <guroan@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: per-cgroup memory reclaim statsRoman Gushchin
Track the following reclaim counters for every memory cgroup: PGREFILL, PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and PGLAZYFREED. These values are exposed using the memory.stats interface of cgroup v2. The meaning of each value is the same as for global counters, available using /proc/vmstat. Also, for consistency, rename mem_cgroup_count_vm_event() to count_memcg_event_mm(). Link: http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objectsCatalin Marinas
Kmemleak requires that vmalloc'ed objects have a minimum reference count of 2: one in the corresponding vm_struct object and the other owned by the vmalloc() caller. There are cases, however, where the original vmalloc() returned pointer is lost and, instead, a pointer to vm_struct is stored (see free_thread_stack()). Kmemleak currently reports such objects as leaks. This patch adds support for treating any surplus references to an object as additional references to a specified object. It introduces the kmemleak_vmalloc() API function which takes a vm_struct pointer and sets its surplus reference passing to the actual vmalloc() returned pointer. The __vmalloc_node_range() calling site has been modified accordingly. Link: http://lkml.kernel.org/r/1495726937-23557-4-git-send-email-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, mempolicy: simplify rebinding mempolicies when updating cpusetsVlastimil Babka
Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing cpuset's mems") has introduced a two-step protocol when rebinding task's mempolicy due to cpuset update, in order to avoid a parallel allocation seeing an empty effective nodemask and failing. Later, commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier related damage v3") introduced a seqlock protection and removed the synchronization point between the two update steps. At that point (or perhaps later), the two-step rebinding became unnecessary. Currently it only makes sure that the update first adds new nodes in step 1 and then removes nodes in step 2. Without memory barriers the effects are questionable, and even then this cannot prevent a parallel zonelist iteration checking the nodemask at each step to observe all nodes as unusable for allocation. We now fully rely on the seqlock to prevent premature OOMs and allocation failures. We can thus remove the two-step update parts and simplify the code. Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hugh Dickins <hughd@google.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, page_alloc: pass preferred nid instead of zonelist to allocatorVlastimil Babka
The main allocator function __alloc_pages_nodemask() takes a zonelist pointer as one of its parameters. All of its callers directly or indirectly obtain the zonelist via node_zonelist() using a preferred node id and gfp_mask. We can make the code a bit simpler by doing the zonelist lookup in __alloc_pages_nodemask(), passing it a preferred node id instead (gfp_mask is already another parameter). There are some code size benefits thanks to removal of inlined node_zonelist(): bloat-o-meter add/remove: 2/2 grow/shrink: 4/36 up/down: 399/-1351 (-952) This will also make things simpler if we proceed with converting cpusets to zonelists. Link: http://lkml.kernel.org/r/20170517081140.30654-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()Vlastimil Babka
The task->il_next variable stores the next allocation node id for task's MPOL_INTERLEAVE policy. mpol_rebind_nodemask() updates interleave and bind mempolicies due to changing cpuset mems. Currently it also tries to make sure that current->il_next is valid within the updated nodemask. This is bogus, because 1) we are updating potentially any task's mempolicy, not just current, and 2) we might be updating a per-vma mempolicy, not task one. The interleave_nodes() function that uses il_next can cope fine with the value not being within the currently allowed nodes, so this hasn't manifested as an actual issue. We can remove the need for updating il_next completely by changing it to il_prev and store the node id of the previous interleave allocation instead of the next id. Then interleave_nodes() can calculate the next id using the current nodemask and also store it as il_prev, except when querying the next node via do_get_mempolicy(). Link: http://lkml.kernel.org/r/20170517081140.30654-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hugh Dickins <hughd@google.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/hugetlb: introduce set_huge_swap_pte_at() helperPunit Agrawal
set_huge_pte_at(), an architecture callback to populate hugepage ptes, does not provide the range of virtual memory that is targeted. This leads to ambiguity when dealing with swap entries on architectures that support hugepages consisting of contiguous ptes. Fix the problem by introducing an overridable helper that is called when populating the page tables with swap entries. The size of the targeted region is provided to the helper to help determine the number of entries to be updated. Provide a default implementation that maintains the current behaviour. [punit.agrawal@arm.com: v4] Link: http://lkml.kernel.org/r/20170524115409.31309-8-punit.agrawal@arm.com [punit.agrawal@arm.com: add an empty definition for set_huge_swap_pte_at()] Link: http://lkml.kernel.org/r/20170525171331.31469-1-punit.agrawal@arm.com Link: http://lkml.kernel.org/r/20170522133604.11392-6-punit.agrawal@arm.com Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/hugetlb: add size parameter to huge_pte_offset()Punit Agrawal
A poisoned or migrated hugepage is stored as a swap entry in the page tables. On architectures that support hugepages consisting of contiguous page table entries (such as on arm64) this leads to ambiguity in determining the page table entry to return in huge_pte_offset() when a poisoned entry is encountered. Let's remove the ambiguity by adding a size parameter to convey additional information about the requested address. Also fixup the definition/usage of huge_pte_offset() throughout the tree. Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE) Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: zero hash tables in allocatorPavel Tatashin
Add a new flag HASH_ZERO which when provided grantees that the hash table that is returned by alloc_large_system_hash() is zeroed. In most cases that is what is needed by the caller. Use page level allocator's __GFP_ZERO flags to zero the memory. It is using memset() which is efficient method to zero memory and is optimized for most platforms. Link: http://lkml.kernel.org/r/1488432825-92126-3-git-send-email-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/follow_page_mask: add support for hugepage directory entryAneesh Kumar K.V
Architectures like ppc64 supports hugepage size that is not mapped to any of of the page table levels. Instead they add an alternate page table entry format called hugepage directory (hugepd). hugepd indicates that the page table entry maps to a set of hugetlb pages. Add support for this in generic follow_page_mask code. We already support this format in the generic gup code. The default implementation prints warning and returns NULL. We will add ppc64 support in later patches Link: http://lkml.kernel.org/r/1494926612-23928-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/hugetlb: move default definition of hugepd_t earlier in the headerAneesh Kumar K.V
This enable to use the hugepd_t type early. No functional change in this patch. Link: http://lkml.kernel.org/r/1494926612-23928-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/follow_page_mask: add support for hugetlb pgd entriesAnshuman Khandual
ppc64 supports pgd hugetlb entries. Add code to handle hugetlb pgd entries to follow_page_mask so that ppc64 can switch to it to handle hugetlbe entries. Link: http://lkml.kernel.org/r/1494926612-23928-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/hugetlb: export hugetlb_entry_migration helperAneesh Kumar K.V
We will be using this later from the ppc64 code. Change the return type to bool. Link: http://lkml.kernel.org/r/1494926612-23928-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD levelAnshuman Khandual
Though migrating gigantic HugeTLB pages does not sound much like real world use case, they can be affected by memory errors. Hence migration at the PGD level HugeTLB pages should be supported just to enable soft and hard offline use cases. While allocating the new gigantic HugeTLB page, it should not matter whether new page comes from the same node or not. There would be very few gigantic pages on the system afterall, we should not be bothered about node locality when trying to save a big page from crashing. This change renames dequeu_huge_page_node() function as dequeue_huge _page_node_exact() preserving it's original functionality. Now the new dequeue_huge_page_node() function scans through all available online nodes to allocate a huge page for the NUMA_NO_NODE case and just falls back calling dequeu_huge_page_node_exact() for all other cases. [arnd@arndb.de: make hstate_is_gigantic() inline] Link: http://lkml.kernel.org/r/20170522124748.3911296-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170516100509.20122-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: remove unused cruft after memory hotplug reworkMichal Hocko
zone_for_memory doesn't have any user anymore as well as the whole zone shifting infrastructure so drop them all. This shouldn't introduce any functional changes. Link: http://lkml.kernel.org/r/20170515085827.16474-15-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: replace for_device by want_memblock in arch_add_memoryMichal Hocko
arch_add_memory gets for_device argument which then controls whether we want to create memblocks for created memory sections. Simplify the logic by telling whether we want memblocks directly rather than going through pointless negation. This also makes the api easier to understand because it is clear what we want rather than nothing telling for_device which can mean anything. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>