summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2017-07-06target/tcm_loop: Make TMF processing slightly fasterBart Van Assche
Target drivers must guarantee that struct se_cmd and struct se_tmr_req exist as long as target_tmr_work() is in progress. This is why the tcm_loop driver today passes 1 as second argument to transport_generic_free_cmd() from inside the TMF code. Instead of making the TMF code wait, make the TMF code obtain two references (SCF_ACK_KREF) and drop one reference from inside the .check_stop_free() callback. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target/tcm_loop: Use target_submit_tmr() instead of open-coding this functionBart Van Assche
Use target_submit_tmr() instead of open-coding this function. The only functional change is that TMFs are now added to sess_cmd_list, something the current code does not do. This behavior change is a bug fix because it makes LUN RESETs wait for other TMFs that are in progress for the same LUN. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target/tcm_loop: Replace a waitqueue and a counter by a completionBart Van Assche
This patch simplifies the implementation of the tcm_loop driver but does not change its behavior. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target/tcm_loop: Merge struct tcm_loop_cmd and struct tcm_loop_tmrBart Van Assche
This patch simplifies the tcm_loop implementation but does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Introduce a function that shows the command stateBart Van Assche
Introduce target_show_cmd() and use it where appropriate. If transport_wait_for_tasks() takes too long, make it show the state of the command it is waiting for. (Add missing brackets around multi-line conditions - nab) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06iscsi-target: Kill left-over iscsi_target_do_cleanupNicholas Bellinger
With commit 25cdda95fda7 in place to address the initial login PDU asynchronous socket close OOPs, go ahead and kill off the left-over iscsi_target_do_cleanup() and ->login_cleanup_work. Reported-by: Mike Christie <mchristi@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06xen/scsiback: Make TMF processing slightly fasterBart Van Assche
Target drivers must guarantee that struct se_cmd and struct se_tmr_req exist as long as target_tmr_work() is in progress. Since the last access by the LIO core is a call to .check_stop_free() and since the Xen scsiback .check_stop_free() drops a reference to the TMF, it is already guaranteed that the struct se_cmd that corresponds to the TMF exists as long as target_tmr_work() is in progress. Hence change the second argument of transport_generic_free_cmd() from 1 into 0. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Cc: xen-devel@lists.xenproject.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06xen/scsiback: Replace a waitqueue and a counter by a completionBart Van Assche
This patch simplifies the implementation of the scsiback driver but does not change its behavior. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Cc: xen-devel@lists.xenproject.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06xen/scsiback: Fix a TMR related use-after-freeBart Van Assche
scsiback_release_cmd() must not dereference se_cmd->se_tmr_req because that memory is freed by target_free_cmd_mem() before scsiback_release_cmd() is called. Fix this use-after-free by inlining struct scsiback_tmr into struct vscsibk_pend. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Cc: xen-devel@lists.xenproject.org Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06IB/srpt: Make a debug statement in srpt_abort_cmd() more informativeBart Van Assche
Do not only report the state of the I/O context before srpt_abort_cmd() was called but also the new state assigned by srpt_abort_cmd() Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Fix a deadlock between the XCOPY code and iSCSI session shutdownBart Van Assche
Move the code for parsing an XCOPY command from the context of the iSCSI receiver thread to the context of the XCOPY workqueue. Keep the simple XCOPY checks in the context of the iSCSI receiver thread. Move the code for allocating and freeing struct xcopy_op from the code that parses an XCOPY command to its caller. This patch fixes the following deadlock: ====================================================== [ INFO: possible circular locking dependency detected ] 4.10.0-rc7-dbg+ #1 Not tainted ------------------------------------------------------- rmdir/13321 is trying to acquire lock: (&sess->cmdsn_mutex){+.+.+.}, at: [<ffffffffa02cb47d>] iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod] but task is already holding lock: (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c6e20>] vfs_rmdir+0x50/0x140 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#14){++++++}: lock_acquire+0x71/0x90 down_write+0x3f/0x70 configfs_depend_item+0x3a/0xb0 [configfs] target_depend_item+0x13/0x20 [target_core_mod] target_xcopy_locate_se_dev_e4+0xdd/0x1a0 [target_core_mod] target_do_xcopy+0x34b/0x970 [target_core_mod] __target_execute_cmd+0x22/0xa0 [target_core_mod] target_execute_cmd+0x233/0x2c0 [target_core_mod] iscsit_execute_cmd+0x208/0x270 [iscsi_target_mod] iscsit_sequence_cmd+0x10b/0x190 [iscsi_target_mod] iscsit_get_rx_pdu+0x37d/0xcd0 [iscsi_target_mod] iscsi_target_rx_thread+0x6e/0xa0 [iscsi_target_mod] kthread+0x102/0x140 ret_from_fork+0x31/0x40 -> #0 (&sess->cmdsn_mutex){+.+.+.}: __lock_acquire+0x10e6/0x1260 lock_acquire+0x71/0x90 mutex_lock_nested+0x5f/0x670 iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod] iscsit_close_session+0xac/0x200 [iscsi_target_mod] lio_tpg_close_session+0x9f/0xb0 [iscsi_target_mod] target_shutdown_sessions+0xc3/0xd0 [target_core_mod] core_tpg_del_initiator_node_acl+0x91/0x140 [target_core_mod] target_fabric_nacl_base_release+0x20/0x30 [target_core_mod] config_item_release+0x5a/0xc0 [configfs] config_item_put+0x1d/0x1f [configfs] configfs_rmdir+0x1a6/0x300 [configfs] vfs_rmdir+0xb7/0x140 do_rmdir+0x1f4/0x200 SyS_rmdir+0x11/0x20 entry_SYSCALL_64_fastpath+0x23/0xc6 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock(&sess->cmdsn_mutex); lock(&sb->s_type->i_mutex_key#14); lock(&sess->cmdsn_mutex); *** DEADLOCK *** 3 locks held by rmdir/13321: #0: (sb_writers#10){.+.+.+}, at: [<ffffffff811e1aff>] mnt_want_write+0x1f/0x50 #1: (&default_group_class[depth - 1]#2/1){+.+.+.}, at: [<ffffffff811cc8ce>] do_rmdir+0x15e/0x200 #2: (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c6e20>] vfs_rmdir+0x50/0x140 stack backtrace: CPU: 2 PID: 13321 Comm: rmdir Not tainted 4.10.0-rc7-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x86/0xc3 print_circular_bug+0x1c7/0x220 __lock_acquire+0x10e6/0x1260 lock_acquire+0x71/0x90 mutex_lock_nested+0x5f/0x670 iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod] iscsit_close_session+0xac/0x200 [iscsi_target_mod] lio_tpg_close_session+0x9f/0xb0 [iscsi_target_mod] target_shutdown_sessions+0xc3/0xd0 [target_core_mod] core_tpg_del_initiator_node_acl+0x91/0x140 [target_core_mod] target_fabric_nacl_base_release+0x20/0x30 [target_core_mod] config_item_release+0x5a/0xc0 [configfs] config_item_put+0x1d/0x1f [configfs] configfs_rmdir+0x1a6/0x300 [configfs] vfs_rmdir+0xb7/0x140 do_rmdir+0x1f4/0x200 SyS_rmdir+0x11/0x20 entry_SYSCALL_64_fastpath+0x23/0xc6 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Use {get,put}_unaligned_be*() instead of open coding these functionsBart Van Assche
Introduce the function get_unaligned_be24(). Use {get,put}_unaligned_be*() where appropriate. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Fix transport_init_se_cmd()Bart Van Assche
Avoid that aborting a command before it has been submitted onto a workqueue triggers the following warning: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 3 PID: 46 Comm: kworker/u8:1 Not tainted 4.12.0-rc2-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: tmr-iblock target_tmr_work [target_core_mod] Call Trace: dump_stack+0x86/0xcf register_lock_class+0xe8/0x570 __lock_acquire+0xa1/0x11d0 lock_acquire+0x59/0x80 flush_work+0x42/0x2b0 __cancel_work_timer+0x10c/0x180 cancel_work_sync+0xb/0x10 core_tmr_lun_reset+0x352/0x740 [target_core_mod] target_tmr_work+0xd6/0x130 [target_core_mod] process_one_work+0x1ca/0x3f0 worker_thread+0x49/0x3b0 kthread+0x109/0x140 ret_from_fork+0x31/0x40 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Remove se_device.dev_listBart Van Assche
The last user of se_device.dev_list was removed through commit 0fd97ccf45be ("target: kill struct se_subsystem_dev"). Hence also remove se_device.dev_list. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Use symbolic value for WRITE_VERIFY_16Bart Van Assche
Now that a symbolic value has been introduced for WRITE_VERIFY_16, use it. This patch does not change any functionality. References: commit c2d26f18dcbc ("target: Add WRITE_VERIFY_16") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAGNicholas Bellinger
Following Himanshu's earlier patch to drop the redundant tag lookup within __qlt_24xx_handle_abts(), go ahead and drop this now QLA_TGT_ABTS can use TARGET_SCF_LOOKUP_LUN_FROM_TAG and have target_submit_tmr() do this from common code. Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Quinn Tran <quinn.tran@cavium.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASKNicholas Bellinger
This patch introduces support in target_submit_tmr() for locating a unpacked_lun from an existing se_cmd->tag during ABORT_TASK. When TARGET_SCF_LOOKUP_LUN_FROM_TAG is set, target_submit_tmr() will do the extra lookup via target_lookup_lun_from_tag() and subsequently invoke transport_lookup_tmr_lun() so a proper percpu se_lun->lun_ref is taken before workqueue dispatch into se_device->tmr_wq happens. Aside from the extra target_lookup_lun_from_tag(), the existing code-path remains unchanged. Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Quinn Tran <quinn.tran@cavium.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: Add support for TMR percpu reference countingNicholas Bellinger
This patch introduces TMR percpu reference counting using se_lun->lun_ref in transport_lookup_tmr_lun(), following how existing non TMR per se_lun reference counting works within transport_lookup_cmd_lun(). It also adds explicit transport_lun_remove_cmd() calls to drop the reference in the three tmr related locations that invoke transport_cmd_check_stop_to_fabric(); - target_tmr_work() during normal ->queue_tm_rsp() - target_complete_tmr_failure() during error ->queue_tm_rsp() - transport_generic_handle_tmr() during early failure Also, note the exception paths in transport_generic_free_cmd() and transport_cmd_finish_abort() already check SCF_SE_LUN_CMD, and will invoke transport_lun_remove_cmd() when necessary. Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Quinn Tran <quinn.tran@cavium.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06target: reject COMPARE_AND_WRITE if emulate_caw is not setJiang Yi
In struct se_dev_attrib, there is a field emulate_caw exposed as a /sys/kernel/config/target/core/$HBA/$DEV/attrib/. If this field is set zero, it means the corresponding struct se_device does not support the scsi cmd COMPARE_AND_WRITE In function sbc_parse_cdb(), go ahead and reject scsi COMPARE_AND_WRITE if emulate_caw is not set, because it has been explicitly disabled from user-space. (Make pr_err ratelimited - nab) Signed-off-by: Jiang Yi <jiangyilism@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few hotfixes - various misc updates - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits) mm, memory_hotplug: move movable_node to the hotplug proper mm, memory_hotplug: drop CONFIG_MOVABLE_NODE mm, memory_hotplug: drop artificial restriction on online/offline mm: memcontrol: account slab stats per lruvec mm: memcontrol: per-lruvec stats infrastructure mm: memcontrol: use generic mod_memcg_page_state for kmem pages mm: memcontrol: use the node-native slab memory counters mm: vmstat: move slab statistics from zone to node counters mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() mm/zswap.c: improve a size determination in zswap_frontswap_init() mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create() mm/swapfile.c: sort swap entries before free mm/oom_kill: count global and memory cgroup oom kills mm: per-cgroup memory reclaim stats mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects mm: kmemleak: factor object reference updating out of scan_block() mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures mm, mempolicy: don't check cpuset seqlock where it doesn't matter mm, cpuset: always use seqlock when changing task's nodemask mm, mempolicy: simplify rebinding mempolicies when updating cpusets ...
2017-07-06Merge branch 'uaccess.strlen' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull user access str* updates from Al Viro: "uaccess str...() dead code removal" * 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: s390 keyboard.c: don't open-code strndup_user() mips: get rid of unused __strnlen_user() get rid of unused __strncpy_from_user() instances kill strlen_user()
2017-07-06Merge branch 'work.probe_kernel_read' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull probe_kernel_read() uses from Al Viro: "Several open-coded probe_kernel_read()..." * 'work.probe_kernel_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dio: use probe_kernel_read() hp_sdc: use probe_kernel_read() hpfb: use probe_kernel_read()
2017-07-06Merge branch 'misc.compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc compat stuff updates from Al Viro: "This part is basically untangling various compat stuff. Compat syscalls moved to their native counterparts, getting rid of quite a bit of double-copying and/or set_fs() uses. A lot of field-by-field copyin/copyout killed off. - kernel/compat.c is much closer to containing just the copyin/copyout of compat structs. Not all compat syscalls are gone from it yet, but it's getting there. - ipc/compat_mq.c killed off completely. - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to drivers/block/floppy.c where they belong. Yes, there are several drivers that implement some of the same ioctls. Some are m68k and one is 32bit-only pmac. drivers/block/floppy.c is the only one in that bunch that can be built on biarch" * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mqueue: move compat syscalls to native ones usbdevfs: get rid of field-by-field copyin compat_hdio_ioctl: get rid of set_fs() take floppy compat ioctls to sodding floppy.c ipmi: get rid of field-by-field __get_user() ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one rt_sigtimedwait(): move compat to native select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap() put_compat_rusage(): switch to copy_to_user() sigpending(): move compat to native getrlimit()/setrlimit(): move compat to native times(2): move compat to native compat_{get,put}_bitmap(): use unsafe_{get,put}_user() fb_get_fscreeninfo(): don't bother with do_fb_ioctl() do_sigaltstack(): lift copying to/from userland into callers take compat_sys_old_getrlimit() to native syscall trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06Merge branch 'work.drm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull DRM compat ioctl handling updates from Al Viro: "This kills the double-copies in there and tons of field-by-field copyin/copyout. Several dead ioctls put to rest, while we are at it - the native counterparts had been gone for a decade, so we can bloody well fail early on the compat side. No point rearranging the 32bit structure into 64bit one (and back) only to be told "piss off, I don't know that ioctl" by the native code..." * 'work.drm' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits) Fix trivial misannotations mga: switch compat ioctls to drm_ioctl_kernel() radeon: take out dead compat ioctls drm compat: ia64 is not biarch drm_compat_ioctl(): tidy up a bit switch compat_drm_mapbufs() to drm_ioctl_kernel() switch compat_drm_rmmap() to drm_ioctl_kernel() switch compat_drm_mode_addfb2() to drm_ioctl_kernel() switch compat_drm_wait_vblank() to drm_ioctl_kernel() switch compat_drm_update_draw() compat_drm: switch sg ioctls compat_drm: switch AGP compat ioctls to drm_ioctl_kernel() switch compat_drm_dma() to drm_ioctl_kernel() switch compat_drm_resctx() to drm_ioctl_kernel() switch compat_drm_getsareactx() to drm_ioctl_kernel() switch compat_drm_setsareactx() to drm_ioctl_kernel() switch compat_drm_freebufs() to drm_ioctl_kernel() switch compat_drm_markbufs() to drm_ioctl_kernel() switch compat_drm_addmap() to drm_ioctl_kernel() switch compat_drm_getstats() to drm_ioctl_kernel() ...
2017-07-06Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping infrastructure from Christoph Hellwig: "This is the first pull request for the new dma-mapping subsystem In this new subsystem we'll try to properly maintain all the generic code related to dma-mapping, and will further consolidate arch code into common helpers. This pull request contains: - removal of the DMA_ERROR_CODE macro, replacing it with calls to ->mapping_error so that the dma_map_ops instances are more self contained and can be shared across architectures (me) - removal of the ->set_dma_mask method, which duplicates the ->dma_capable one in terms of functionality, but requires more duplicate code. - various updates for the coherent dma pool and related arm code (Vladimir) - various smaller cleanups (me)" * tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits) ARM: dma-mapping: Remove traces of NOMMU code ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus ARM: NOMMU: Introduce dma operations for noMMU drivers: dma-mapping: allow dma_common_mmap() for NOMMU drivers: dma-coherent: Introduce default DMA pool drivers: dma-coherent: Account dma_pfn_offset when used with device tree dma: Take into account dma_pfn_offset dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs dma-mapping: remove dmam_free_noncoherent crypto: qat - avoid an uninitialized variable warning au1100fb: remove a bogus dma_free_nonconsistent call MAINTAINERS: add entry for dma mapping helpers powerpc: merge __dma_set_mask into dma_set_mask dma-mapping: remove the set_dma_mask method powerpc/cell: use the dma_supported method for ops switching powerpc/cell: clean up fixed mapping dma_ops initialization tile: remove dma_supported and mapping_error methods xen-swiotlb: remove xen_swiotlb_set_dma_mask arm: implement ->dma_supported instead of ->set_dma_mask mips/loongson64: implement ->dma_supported instead of ->set_dma_mask ...
2017-07-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Martin Schwidefsky: - The fixup for the blk-mq clash with the scm driver - An improvement for the dasd driver in regard to raw I/O - Bug fixes and cleanup * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: Update my email address s390/syscalls: Fix out of bounds arguments access s390/vfio_ccw: remove unused variable s390/dasd: remove unneeded code s390/crash: Remove unused KEXEC_NOTE_BYTES s390/zcrypt: Fix missing newlines at some debug feature messages. s390/dasd: Make raw I/O usable without prefix support s390/dasd: Rename dasd_raw_build_cp() s390/dasd: Refactor prefix_LRE() and related functions s390: fix up for "blk-mq: switch ->queue_rq return value to blk_status_t"
2017-07-06Merge tag 'for-linus-4.13-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Other than fixes and cleanups it contains: - support > 32 VCPUs at domain restore - support for new sysfs nodes related to Xen - some performance tuning for Linux running as Xen guest" * tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: allow userspace access during hypercalls x86: xen: remove unnecessary variable in xen_foreach_remap_area() xen: allocate page for shared info page from low memory xen: avoid deadlock in xenbus driver xen: add sysfs node for hypervisor build id xen: sync include/xen/interface/version.h xen: add sysfs node for guest type doc,xen: document hypervisor sysfs nodes for xen xen/vcpu: Handle xen_vcpu_setup() failure at boot xen/vcpu: Handle xen_vcpu_setup() failure in hotplug xen/pv: Fix OOPS on restore for a PV, !SMP domain xen/pvh*: Support > 32 VCPUs at domain restore xen/vcpu: Simplify xen_vcpu related code xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU xen: avoid type warning in xchg_xen_ulong xen: fix HYPERVISOR_dm_op() prototype xen: don't print error message in case of missing Xenstore entry arm/xen: Adjust one function call together with a variable assignment arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi() arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-07IB/core: Fix static analysis warning in ib_policy_change_taskDaniel Jurgens
ib_get_cached_subnet_prefix can technically fail, but the only way it could is not possible based on the loop conditions. Check the return value before using the variable sp to resolve a static analysis warning. -v1: - Fix check to !ret. Paul Moore Fixes: 8f408ab64be6 ("selinux lsm IB/core: Implement LSM notification system") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07IB/core: Fix uninitialized variable use in check_qp_port_pkey_settingsDaniel Jurgens
Check the return value from get_pkey_and_subnet_prefix to prevent using uninitialized variables. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: do not suspend/resume if power stays onEnric Balletbo i Serra
The suspend/resume behavior of the TPM can be controlled by setting "powered-while-suspended" in the DTS. This is useful for the cases when hardware does not power-off the TPM. Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: use tpm2_pcr_read() in tpm2_do_selftest()Roberto Sassu
tpm2_do_selftest() performs a PCR read during the TPM initialization phase. This patch replaces the PCR read code with a call to tpm2_pcr_read(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: use tpm_buf functions in tpm2_pcr_read()Roberto Sassu
tpm2_pcr_read() now builds the PCR read command buffer with tpm_buf functions. This solution is preferred to using a tpm2_cmd structure, as tpm_buf functions provide protection against buffer overflow. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm_tis: make ilb_base_addr staticColin Ian King
The pointer ilb_base_addr does not need to be in global scope, so make it static. Cleans up sparse warning: "symbol 'ilb_base_addr' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: consolidate the TPM startup codeJarkko Sakkinen
Consolidated all the "manual" TPM startup code to a single function in order to make code flows a bit cleaner and migrate to tpm_buf. Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: Enable CLKRUN protocol for Braswell systemsAzhar Shaikh
To overcome a hardware limitation on Intel Braswell systems, disable CLKRUN protocol during TPM transactions and re-enable once the transaction is completed. Signed-off-by: Azhar Shaikh <azhar.shaikh@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm/tpm_crb: fix priv->cmd_size initialisationManuel Lauss
priv->cmd_size is never initialised if the cmd and rsp buffers reside at different addresses. Initialise it in the exit path of the function when rsp buffer has also been successfully allocated. Fixes: aa77ea0e43dc ("tpm/tpm_crb: cache cmd_size register value."). Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: fix a kernel memory leak in tpm-sysfs.cJarkko Sakkinen
While cleaning up sysfs callback that prints EK we discovered a kernel memory leak. This commit fixes the issue by zeroing the buffer used for TPM command/response. The leak happen when we use either tpm_vtpm_proxy, tpm_ibmvtpm or xen-tpmfront. Cc: stable@vger.kernel.org Fixes: 0883743825e3 ("TPM: sysfs functions consolidation") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07tpm: Issue a TPM2_Shutdown for TPM2 devices.Josh Zimmerman
If a TPM2 loses power without a TPM2_Shutdown command being issued (a "disorderly reboot"), it may lose some state that has yet to be persisted to NVRam, and will increment the DA counter. After the DA counter gets sufficiently large, the TPM will lock the user out. NOTE: This only changes behavior on TPM2 devices. Since TPM1 uses sysfs, and sysfs relies on implicit locking on chip->ops, it is not safe to allow this code to run in TPM1, or to add sysfs support to TPM2, until that locking is made explicit. Signed-off-by: Josh Zimmerman <joshz@google.com> Cc: stable@vger.kernel.org Fixes: 74d6b3ceaa17 ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-07Add "shutdown" to "struct class".Josh Zimmerman
The TPM class has some common shutdown code that must be executed for all drivers. This adds some needed functionality for that. Signed-off-by: Josh Zimmerman <joshz@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Fixes: 74d6b3ceaa17 ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-07-06mm, memory_hotplug: drop CONFIG_MOVABLE_NODEMichal Hocko
Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a good explanation on why it is actually useful. It makes a lot of sense to make movable node semantic opt in but we already have that because the feature has to be explicitly enabled on the kernel command line. A config option on top only makes the configuration space larger without a good reason. It also adds an additional ifdefery that pollutes the code. Just drop the config option and make it de-facto always enabled. This shouldn't introduce any change to the semantic. Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: vmstat: move slab statistics from zone to node countersJohannes Weiner
Patch series "mm: per-lruvec slab stats" Josef is working on a new approach to balancing slab caches and the page cache. For this to work, he needs slab cache statistics on the lruvec level. These patches implement that by adding infrastructure that allows updating and reading generic VM stat items per lruvec, then switches some existing VM accounting sites, including the slab accounting ones, to this new cgroup-aware API. I'll follow up with more patches on this, because there is actually substantial simplification that can be done to the memory controller when we replace private memcg accounting with making the existing VM accounting sites cgroup-aware. But this is enough for Josef to base his slab reclaim work on, so here goes. This patch (of 5): To re-implement slab cache vs. page cache balancing, we'll need the slab counters at the lruvec level, which, ever since lru reclaim was moved from the zone to the node, is the intersection of the node, not the zone, and the memcg. We could retain the per-zone counters for when the page allocator dumps its memory information on failures, and have counters on both levels - which on all but NUMA node 0 is usually redundant. But let's keep it simple for now and just move them. If anybody complains we can restore the per-zone counters. [hannes@cmpxchg.org: fix oops] Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zoneMichal Hocko
Heiko Carstens has noticed that he can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ cat /sys/devices/system/memory/memory5/valid_zones DMA $ echo 0 > /sys/devices/system/memory/memory5/online $ cat /sys/devices/system/memory/memory5/valid_zones Normal $ echo 1 > /sys/devices/system/memory/memory5/online Normal $ cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- The reason is that we assume that the default zone for kernel onlining is ZONE_NORMAL. This was a simplification introduced by the memory hotplug rework and it is easily fixable by checking the range overlap in the zone order and considering the first matching zone as the default one. If there is no such zone then assume ZONE_NORMAL as we have been doing so far. Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: do not associate hotadded memory to zones until onlineMichal Hocko
The current memory hotplug implementation relies on having all the struct pages associate with a zone/node during the physical hotplug phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast majority of cases this means that they are added to ZONE_NORMAL. This has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd without sparsemem") and it wasn't a big deal back then because movable onlining didn't exist yet. Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable memory and portion memory") and then things got more complicated. Rather than reconsidering the zone association which was no longer needed (because the memory hotplug already depended on SPARSEMEM) a convoluted semantic of zone shifting has been developed. Only the currently last memblock or the one adjacent to the zone_movable can be onlined movable. This essentially means that the online type changes as the new memblocks are added. Let's simulate memory hot online manually $ echo 0x100000000 > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory32/valid_zones Normal Movable $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal /sys/devices/system/memory/memory34/valid_zones:Normal Movable $ echo online_movable > /sys/devices/system/memory/memory34/state $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Normal This is an awkward semantic because an udev event is sent as soon as the block is onlined and an udev handler might want to online it based on some policy (e.g. association with a node) but it will inherently race with new blocks showing up. This patch changes the physical online phase to not associate pages with any zone at all. All the pages are just marked reserved and wait for the onlining phase to be associated with the zone as per the online request. There are only two requirements - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses the latter one is not an inherent requirement and can be changed in the future. It preserves the current behavior and made the code slightly simpler. This is subject to change in future. This means that the same physical online steps as above will lead to the following state: Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Implementation: The current move_pfn_range is reimplemented to check the above requirements (allow_online_pfn_range) and then updates the respective zone (move_pfn_range_to_zone), the pgdat and links all the pages in the pfn range with the zone/node. __add_pages is updated to not require the zone and only initializes sections in the range. This allowed to simplify the arch_add_memory code (s390 could get rid of quite some of code). devm_memremap_pages is the only user of arch_add_memory which relies on the zone association because it only hooks into the memory hotplug only half way. It uses it to associate the new memory with ZONE_DEVICE but doesn't allow it to be {on,off}lined via sysfs. This means that this particular code path has to call move_pfn_range_to_zone explicitly. The original zone shifting code is kept in place and will be removed in the follow up patch for an easier review. Please note that this patch also changes the original behavior when offlining a memory block adjacent to another zone (Normal vs. Movable) used to allow to change its movable type. This will be handled later. [richard.weiyang@gmail.com: simplify zone_intersects()] Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com [richard.weiyang@gmail.com: remove duplicate call for set_page_links] Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com [akpm@linux-foundation.org: remove unused local `i'] Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: consider offline memblocks removableMichal Hocko
is_pageblock_removable_nolock() relies on having zone association to examine all the page blocks to check whether they are movable or free. This is just wasting of cycles when the memblock is offline. Later patch in the series will also change the time when the page is associated with a zone so we let's bail out early if the memblock is offline. Link: http://lkml.kernel.org/r/20170515085827.16474-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: split up register_one_node()Michal Hocko
Memory hotplug (add_memory_resource) has to reinitialize node infrastructure if the node is offline (one which went through the complete add_memory(); remove_memory() cycle). That involves node registration to the kobj infrastructure (register_node), the proper association with cpus (register_cpu_under_node) and finally creation of node<->memblock symlinks (link_mem_sections). The last part requires to know node_start_pfn and node_spanned_pages which we currently have but a leter patch will postpone this initialization to the onlining phase which happens later. In fact we do not need to rely on the early pgdat initialization even now because the currently hot added pfn range is currently known. Split register_one_node into core which does all the common work for the boot time NUMA initialization and the hotplug (__register_one_node). register_one_node keeps the full initialization while hotplug calls __register_one_node and manually calls link_mem_sections for the proper range. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, memory_hotplug: get rid of is_zone_device_sectionMichal Hocko
Device memory hotplug hooks into regular memory hotplug only half way. It needs memory sections to track struct pages but there is no need/desire to associate those sections with memory blocks and export them to the userspace via sysfs because they cannot be onlined anyway. This is currently expressed by for_device argument to arch_add_memory which then makes sure to associate the given memory range with ZONE_DEVICE. register_new_memory then relies on is_zone_device_section to distinguish special memory hotplug from the regular one. While this works now, later patches in this series want to move __add_zone outside of arch_add_memory path so we have to come up with something else. Add want_memblock down the __add_pages path and use it to control whether the section->memblock association should be done. arch_add_memory then just trivially want memblock for everything but for_device hotplug. remove_memory_section doesn't need is_zone_device_section either. We can simply skip all the memblock specific cleanup if there is no memblock for the given section. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm: drop page_initialized check from get_nid_for_pfnMichal Hocko
Commit c04fc586c1a4 ("mm: show node to memory section relationship with symlinks in sysfs") has added means to export memblock<->node association into the sysfs. It has also introduced get_nid_for_pfn which is a rather confusing counterpart of pfn_to_nid which checks also whether the pfn page is already initialized (page_initialized). This is done by checking page::lru != NULL which doesn't make any sense at all. Nothing in this path really relies on the lru list being used or initialized. Just remove it because this will become a problem with later patches. Thanks to Reza Arbab for testing which revealed this to be a problem (http://lkml.kernel.org/r/20170403202337.GA12482@dhcp22.suse.cz) Link: http://lkml.kernel.org/r/20170515085827.16474-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06zram: count same page write as page_storedMinchan Kim
Regardless of whether it is same page or not, it's surely write and stored to zram so we should increase pages_stored stat. Otherwise, user can see zero value via mm_stats although he writes a lot of pages to zram. Link: http://lkml.kernel.org/r/1494834068-27004-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06mm, sparsemem: break out of loops earlyDave Hansen
There are a number of times that we loop over NR_MEM_SECTIONS, looking for section_present() on each section. But, when we have very large physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS becomes very large, making the loops quite long. With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops are 512k iterations, which we barely notice on modern hardware. But, raising MAX_PHYSMEM_BITS higher (like we will see on systems that support 5-level paging) makes this 64x longer and we start to notice, especially on slower systems like simulators. A 10-second delay for 512k iterations is annoying. But, a 640- second delay is crippling. This does not help if we have extremely sparse physical address spaces, but those are quite rare. We expect that most of the "slow" systems where this matters will also be quite small and non-sparse. To fix this, we track the highest section we've ever encountered. This lets us know when we will *never* see another section_present(), and lets us break out of the loops earlier. Doing the whole for_each_present_section_nr() macro is probably overkill, but it will ensure that any future loop iterations that we grow are more likely to be correct. Kirrill said "It shaved almost 40 seconds from boot time in qemu with 5-level paging enabled for me". Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06drivers/sh/intc/virq.c: delete an error message for a failed memory ↵SF Markus Elfring
allocation in add_virq_to_pirq() This issue was detected by using the Coccinelle software. Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf Link: http://lkml.kernel.org/r/54e30d61-5183-9911-cf35-1410fb78da5a@users.sourceforge.net Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>