summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-07scsi: ufs: Adjust ufshcd_hold() during sending attribute requestsjintae jang
Invalidation check of arguments should have been checked before ufshcd_hold(). This can help to prevent ufshcd_hold()/ ufshcd_release() from being invoked unnecessarily. [mkp: removed unused out: labels] Link: https://lore.kernel.org/r/1606973132-5937-1-git-send-email-user@jang-Samsung-DeskTop-System Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: jintae jang <jt77.jang@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07bridge: Fix a deadlock when enabling multicast snoopingJoseph Huang
When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Steps to reproduce: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && \ ip link set dev bridge type bridge mcast_snooping 1 < deadlock > A typical call trace looks like the following: [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07ibmvnic: add some debugsSukadev Bhattiprolu
We sometimes run into situations where a soft/hard reset of the adapter takes a long time or fails to complete. Having additional messages that include important adapter state info will hopefully help understand what is happening, reduce the guess work and minimize requests to reproduce problems with debug patches. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Link: https://lore.kernel.org/r/20201205022235.2414110-1-sukadev@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07enetc: Fix reporting of h/w packet countersClaudiu Manoil
Noticed some inconsistencies in packet statistics reporting. This patch adds the missing Tx packet counter registers to ethtool reporting and fixes the information strings for a few of them. Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201204171505.21389-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07null_blk: Move driver into its own directoryDamien Le Moal
Move null_blk driver code into the new sub-directory drivers/block/null_blk. Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: Allow controlling max_hw_sectors limitDamien Le Moal
Add the module option and configfs attribute max_sectors to allow configuring the maximum size of a command issued to a null_blk device. This allows exercising the block layer BIO splitting with different limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for testing the zone append write path of file systems as the max_hw_sectors limit value is also used for the max_zone_append_sectors limit. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: discard zones on resetDamien Le Moal
When memory backing is enabled, use null_handle_discard() to free the backing memory used by a zone when the zone is being reset. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: cleanup discard handlingDamien Le Moal
null_handle_discard() is called from both null_handle_rq() and null_handle_bio(). As these functions are only passed a nullb_cmd structure, this forces pointer dereferences to identiify the discard operation code and to access the sector range to be discarded. Simplify all this by changing the interface of the functions null_handle_discard() and null_handle_memory_backed() to pass along the operation code, operation start sector and number of sectors. With this change null_handle_discard() can be called directly from null_handle_memory_backed(). Also add a message warning that the discard configuration attribute has no effect when memory backing is disabled. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: Improve implicit zone closeDamien Le Moal
When open zone resource management is enabled, that is, when a null_blk zoned device is created with zone_max_open different than 0, implicitly or explicitly opening a zone may require implicitly closing a zone that is already implicitly open. This operation is done using the function null_close_first_imp_zone(), which search for an implicitly open zone to close starting from the first sequential zone. This implementation is simple but may result in the same being constantly implicitly closed and then implicitly reopened on write, namely, the lowest numbered zone that is being written. Avoid this by starting the search for an implicitly open zone starting from the zone following the last zone that was implicitly closed. The function null_close_first_imp_zone() is renamed null_close_imp_open_zone(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: improve zone lockingDamien Le Moal
With memory backing disabled, using a single spinlock for protecting zone information and zone resource management prevents the parallel execution on multiple queue of IO requests to different zones. Furthermore, regardless of the use of memory backing, if a null_blk device is created without limits on the number of open and active zones, accounting for zone resource management is not necessary. >From these observations, zone locking is changed as follows to improve performance: 1) the zone_lock spinlock is renamed zone_res_lock and used only if zone resource management is necessary, that is, if either zone_max_open or zone_max_active are not 0. This is indicated using the new boolean need_zone_res_mgmt in the nullb_device structure. null_zone_write() is modified to reduce the amount of code executed with the zone_res_lock spinlock held. 2) With memory backing disabled, per zone locking is changed to a spinlock per zone. 3) Introduce the structure nullb_zone to replace the use of struct blk_zone for zone information. This new structure includes a union of a spinlock and a mutex for zone locking. The spinlock is used when memory backing is disabled and the mutex is used with memory backing. With these changes, fio performance with zonemode=zbd for 4K random read and random write on a dual socket (24 cores per socket) machine using the none schedulder is as follows: before patch: write (psync x 96 jobs) = 465 KIOPS read (libaio@qd=8 x 96 jobs) = 1361 KIOPS after patch: write (psync x 96 jobs) = 456 KIOPS read (libaio@qd=8 x 96 jobs) = 4096 KIOPS Write performance remains mostly unchanged but read performance is three times higher. Performance when using the mq-deadline scheduler is not changed by this patch as mq-deadline becomes the bottleneck for a multi-queue device. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07block: Align max_hw_sectors to logical blocksizeDamien Le Moal
Block device drivers do not have to call blk_queue_max_hw_sectors() to set a limit on request size if the default limit BLK_SAFE_MAX_SECTORS is acceptable. However, this limit (255 sectors) may not be aligned to the device logical block size which cannot be used as is for a request maximum size. This is the case for the null_blk device driver. Modify blk_queue_max_hw_sectors() to make sure that the request size limits specified by the max_hw_sectors and max_sectors queue limits are always aligned to the device logical block size. Additionally, to avoid introducing a dependence on the execution order of this function with blk_queue_logical_block_size(), also modify blk_queue_logical_block_size() to perform the same alignment when the logical block size is set after max_hw_sectors. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: Fail zone append to conventional zonesDamien Le Moal
Conventional zones do not have a write pointer and so cannot accept zone append writes. Make sure to fail any zone append write command issued to a conventional zone. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07null_blk: Fix zone size initializationDamien Le Moal
For a null_blk device with zoned mode enabled is currently initialized with a number of zones equal to the device capacity divided by the zone size, without considering if the device capacity is a multiple of the zone size. If the zone size is not a divisor of the capacity, the zones end up not covering the entire capacity, potentially resulting is out of bounds accesses to the zone array. Fix this by adding one last smaller zone with a size equal to the remainder of the disk capacity divided by the zone size if the capacity is not a multiple of the zone size. For such smaller last zone, the zone capacity is also checked so that it does not exceed the smaller zone size. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Fixes: ca4b2a011948 ("null_blk: add zone support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07block: Improve blk_revalidate_disk_zones() checksDamien Le Moal
Improves the checks on the zones of a zoned block device done in blk_revalidate_disk_zones() by making sure that the device report_zones method did report at least one zone and that the zones reported exactly cover the entire disk capacity, that is, that there are no missing zones at the end of the disk sector range. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: simplify wrap checkPavel Begunkov
__sbitmap_get_word() doesn't warp if it's starting from the beginning (i.e. initial hint is 0). Instead of stashing the original hint just set @wrap accordingly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: replace CAS with atomic andPavel Begunkov
sbitmap_deferred_clear() does CAS loop to propagate cleared bits, replace it with equivalent atomic bitwise and. That's slightly faster and makes wait-free instead of lock-free as before. The atomic can be relaxed (i.e. barrier-less) because following sbitmap_get*() deal with synchronisation, see comments in sbitmap_queue_clear(). It's ok to cast to atomic_long_t, that's what bitops/lock.h does. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: remove swap_lockPavel Begunkov
map->swap_lock protects map->cleared from concurrent modification, however sbitmap_deferred_clear() is already atomically drains it, so it's guaranteed to not loose bits on concurrent sbitmap_deferred_clear(). A one threaded tag heavy test on top of nullbk showed ~1.5% t-put increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: optimise sbitmap_deferred_clear()Pavel Begunkov
Because of spinlocks and atomics sbitmap_deferred_clear() have to reload &sb->map[index] on each access even though the map address won't change. Pass in sbitmap_word instead of {sb, index}, so it's cached in a variable. It also improves code generation of sbitmap_find_bit_in_index(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-mq: skip hybrid polling if iopoll doesn't spinPavel Begunkov
If blk_poll() is not going to spin (i.e. @spin=false), it also must not sleep in hybrid polling, otherwise it might be pretty suprising for users trying to do a quick check and expecting no-wait behaviour. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07scsi: ufs: Print host regs in IRQ handler when AH8 error happensCan Guo
Dump registers and states prior to leaving IRQ handler when an AH8 error occurs. Link: https://lore.kernel.org/r/1606910644-21185-4-git-send-email-cang@codeaurora.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bao D. Nguyen <nguyenb@codeaurora.org> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Fix a race condition between ufshcd_abort() and eh_work()Can Guo
In current task abort routine, if task abort happens to the device W-LUN, the code directly jumps to ufshcd_eh_host_reset_handler() to perform a full reset and restore then returns FAIL or SUCCESS. Commands sent to the device W-LUN are most likely the SSU cmds sent during UFS PM operations. If such SSU cmd enters task abort routine when ufshcd_eh_host_reset_handler() flushes eh_work, it will get stuck there since err_handler is serialized with PM operations. In order to unblock above call path, we merely clean up the lrb taken by this cmd, queue the eh_work and return SUCCESS. Once the cmd is aborted, the PM operation which sends out the cmd just errors out, then err_handler shall be able to proceed with the full reset and restore. In this scenario, the cmd is aborted even before it is actually cleared by HW, set the lrb->in_use flag to prevent subsequent cmds, including SCSI cmds and dev cmds, from taking the lrb released from abort. The flag shall evetually be cleared in __ufshcd_transfer_req_compl() invoked by the full reset and restore from err_handler. [mkp: conflict with event logging series] Link: https://lore.kernel.org/r/1606910644-21185-3-git-send-email-cang@codeaurora.org Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Serialize eh_work with system PM events and async scanCan Guo
Serialize eh_work with system PM events and async scan to make sure eh_work does not run in parallel with them. Link: https://lore.kernel.org/r/1606910644-21185-2-git-send-email-cang@codeaurora.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-08powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()Christophe Leroy
Since commit c33165253492 ("powerpc: use non-set_fs based maccess routines"), userspace access is not granted anymore when using copy_from_kernel_nofault() However, kthread_probe_data() uses copy_from_kernel_nofault() to check validity of pointers. When the pointer is NULL, it points to userspace, leading to a KUAP fault and triggering the following big hammer warning many times when you request a sysrq "show task": [ 1117.202054] ------------[ cut here ]------------ [ 1117.202102] Bug: fault blocked by AP register ! [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec [ 1117.202310] Modules linked in: [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 [ 1117.202899] [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204509] Call Trace: [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 [ 1117.205416] DAR: 0000005c DSISR: c0000000 [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 [ 1117.206258] --- interrupt: 301 [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 [ 1117.207938] --- interrupt: c01 at 0xfd4e784 [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 [ 1117.208470] [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 [ 1117.209104] NIP [0fd4e784] 0xfd4e784 [ 1117.209180] LR [0fe0f244] 0xfe0f244 [ 1117.209236] --- interrupt: c01 [ 1117.209274] Instruction dump: [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- To avoid that, copy_from_kernel_nofault_allowed() is used to check whether the address is a valid kernel address. But the default version of it returns true for any address. Provide a powerpc version of copy_from_kernel_nofault_allowed() that returns false when the address is below TASK_USER_MAX, so that copy_from_kernel_nofault() will return -ERANGE. Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") Reported-by: Qian Cai <qcai@redhat.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18bcb456d32a3e74f5ae241fd6f1580c092d07f5.1607360230.git.christophe.leroy@csgroup.eu
2020-12-07scsi: ufs: Remove pre-defined initial voltage values of device powerStanley Chu
UFS specficication allows different VCC configurations for UFS devices, for example: (1). 2.70V - 3.60V (Activated by default in UFS core driver) (2). 1.70V - 1.95V (Activated if "vcc-supply-1p8" is declared in device tree) (3). 2.40V - 2.70V (Supported since UFS 3.x) With the introduction of UFS 3.x products, an issue is happening that UFS driver will use wrong "min_uV-max_uV" values to configure the voltage of VCC regulator on UFU 3.x products with the configuration (3) used. To solve this issue, we simply remove pre-defined initial VCC voltage values in UFS core driver with below reasons, 1. UFS specifications do not define how to detect the VCC configuration supported by attached device. 2. Device tree already supports standard regulator properties. Therefore VCC voltage shall be defined correctly in device tree, and shall not changed by UFS driver. What UFS driver needs to do is simply enable or disable the VCC regulator only. Similar change is applied to VCCQ and VCCQ2 as well. Note that we keep struct ufs_vreg unchanged. This allows vendors to configure proper min_uV and max_uV of any regulators to make regulator_set_voltage() works during regulator toggling flow in the future. Without specific vendor configurations, min_uV and max_uV will be NULL by default and UFS core driver will enable or disable the regulator only without adjusting its voltage. Link: https://lore.kernel.org/r/20201202091819.22363-1-stanley.chu@mediatek.com Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Can Guo <cang@codeaurora.org> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs-dwc: Use phy_initialization helperStanley Chu
Use phy_initialization helper instead of direct invocation. Link: https://lore.kernel.org/r/20201205120041.26869-5-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs-cdns: Use phy_initialization helperStanley Chu
Use phy_initialization helper instead of direct function invocation. Link: https://lore.kernel.org/r/20201205120041.26869-4-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Introduce phy_initialization helperStanley Chu
Introduce phy_initialization helper since this is the only one variant function without helper. Link: https://lore.kernel.org/r/20201205120041.26869-3-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Remove unused setup_regulators variant functionStanley Chu
Since setup_regulators variant function is not used by any vendors, simply remove it. Link: https://lore.kernel.org/r/20201205120041.26869-2-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs-mediatek: Introduce event_notify implementationStanley Chu
Introduce event_notify implementation on MediaTek UFS platform. A vendor-specific tracepoint is added that can be used for debugging purposes. Link: https://lore.kernel.org/r/20201205115901.26815-5-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Introduce event_notify variant functionStanley Chu
Introduce event_notify variant function to allow vendor to get notification of important events and connect to any proprietary debugging facilities. Link: https://lore.kernel.org/r/20201205115901.26815-4-stanley.chu@mediatek.com Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Refine error history functionsStanley Chu
The UFS error history does not only have "history of errors" but also a log of some other events which are not defined as errors. This patch fixes the confused naming of related functions and changes the approach for updating and printing history in preparation of next patch. This patch does not change any functionality. Link: https://lore.kernel.org/r/20201205115901.26815-3-stanley.chu@mediatek.com Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: ufs: Add error history for abort event in UFS Device W-LUNStanley Chu
Add error history for abort event in UFS Device W-LUN. Use specified value as parameter of ufshcd_update_reg_hist() to identify the aborted tag or LUNs. Link: https://lore.kernel.org/r/20201205115901.26815-2-stanley.chu@mediatek.com Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: iscsi: Fix inappropriate use of put_device()Qinglang Miao
kfree(conn) is called inside put_device(&conn->dev) which could lead to use-after-free. In addition, device_unregister() should be used here rather than put_deviceO(). Link: https://lore.kernel.org/r/20201120074852.31658-1-miaoqinglang@huawei.com Fixes: f3c893e3dbb5 ("scsi: iscsi: Fail session and connection on transport registration failure") Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: pm80xx: Fix error return in pm8001_pci_probe()Zhang Qilong
The driver did not return an error in the case where pm8001_configure_phy_settings() failed. Use rc to store the return value of pm8001_configure_phy_settings(). Link: https://lore.kernel.org/r/20201205115551.2079471-1-zhangqilong3@huawei.com Fixes: 279094079a44 ("[SCSI] pm80xx: Phy settings support for motherboard controller.") Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07scsi: qedi: Fix missing destroy_workqueue() on error in __qedi_probeQinglang Miao
Add the missing destroy_workqueue() before return from __qedi_probe in the error handling case when fails to create workqueue qedi->offload_thread. Link: https://lore.kernel.org/r/20201109091518.55941-1-miaoqinglang@huawei.com Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07clk: renesas: r9a06g032: Drop __packed for portabilityGeert Uytterhoeven
The R9A06G032 clock driver uses an array of packed structures to reduce kernel size. However, this array contains pointers, which are no longer aligned naturally, and cannot be relocated on PPC64. Hence when compile-testing this driver on PPC64 with CONFIG_RELOCATABLE=y (e.g. PowerPC allyesconfig), the following warnings are produced: WARNING: 136 bad relocations c000000000616be3 R_PPC64_UADDR64 .rodata+0x00000000000cf338 c000000000616bfe R_PPC64_UADDR64 .rodata+0x00000000000cf370 ... Fix this by dropping the __packed attribute from the r9a06g032_clkdesc definition, trading a small size increase for portability. This increases the 156-entry clock table by 1 byte per entry, but due to the compiler generating more efficient code for unpacked accesses, the net size increase is only 76 bytes (gcc 9.3.0 on arm32). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 4c3d88526eba2143 ("clk: renesas: Renesas R9A06G032 clock driver") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20201130085743.1656317-1-geert+renesas@glider.be Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # PowerPC allyesconfig build Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-12-07clk: imx: scu: fix MXC_CLK_SCU module build breakDong Aisheng
This issue can be reproduced by having a kernel config with CONFIG_IMX_MBOX=m and CONFIG_MXC_CLK_SCU=m. It's caused by the Makefile wanting to build clk-scu.o and clk-imx8qxp.o as different targets but that doesn't work (e.g. MXC_CLK_SCU = y while CLK_IMX8QXP = n) "obj-$(CONFIG_MXC_CLK_SCU) += clk-imx-scu.o clk-imx-lpcg-scu.o clk-imx-scu-$(CONFIG_CLK_IMX8QXP) += clk-scu.o clk-imx8qxp.o" Having MXC_CLK_SCU=y/m while CLK_IMX8QXP=n will cause a linker problem like below: LD [M] drivers/clk/imx/clk-imx-scu.o arm-poky-linux-gnueabi-ld: no input files Make MXC_CLK_SCU be un-selectable by users so it can only be selected by the CLK_IMX8QXP option, ensuring the two symbols are built together. Drop COMPILE_TEST too because this option isn't selectable anymore. We can remove it from MXC_CLK_SCU because CLK_IMX8QXP selects MXC_CLK_SCU which already has COMPILE_TEST. Fixes: e0d0d4d86c766 ("clk: imx8qxp: Support building i.MX8QXP clock driver as module") Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Link: https://lore.kernel.org/r/20201130084624.21113-1-aisheng.dong@nxp.com [sboyd@kernel.org: Rework commit text] Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-12-07selftests/clone3: Fix build errorXingxing Su
When compiling the selftests with the -std=gnu99 option the build can fail with. Following build error: test_core.c: In function ‘test_cgcore_destroy’: test_core.c:87:2: error: ‘for’ loop initial declarations are only allowed in C99 mode for (int i = 0; i < 10; i++) { ^ test_core.c:87:2: note: use option -std=c99 or -std=gnu99 to compile Add -std=gnu99 to the clone3 selftest Makefile to fix this. Signed-off-by: Xingxing Su <suxingxing@loongson.cn> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-12-07rseq/selftests: Fix MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ build error under ↵Xingxing Su
other arch. Except arch x86, the function rseq_offset_deref_addv is not defined. The function test_membarrier_manager_thread call rseq_offset_deref_addv produces a build error. The RSEQ_ARCH_HAS_OFFSET_DEREF_ADD should contain all the code for the MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ. If the other Arch implements this feature, defined RSEQ_ARCH_HAS_OFFSET_DEREF_ADD in the header file to ensure that this feature is available. Following build errors: param_test.c: In function ‘test_membarrier_worker_thread’: param_test.c:1164:10: warning: implicit declaration of function ‘rseq_offset_deref_addv’ ret = rseq_offset_deref_addv(&args->percpu_list_ptr, ^~~~~~~~~~~~~~~~~~~~~~ /tmp/ccMj9yHJ.o: In function `test_membarrier_worker_thread': param_test.c:1164: undefined reference to `rseq_offset_deref_addv' param_test.c:1164: undefined reference to `rseq_offset_deref_addv' collect2: error: ld returned 1 exit status make: *** [/selftests/rseq/param_test_benchmark] Error 1 Signed-off-by: Xingxing Su <suxingxing@loongson.cn> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-12-07bpf: Avoid overflows involving hash elem_sizeEric Dumazet
Use of bpf_map_charge_init() was making sure hash tables would not use more than 4GB of memory. Since the implicit check disappeared, we have to be more careful about overflows, to support big hash tables. syzbot triggers a panic using : bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_LRU_HASH, key_size=16384, value_size=8, max_entries=262200, map_flags=0, inner_map_fd=-1, map_name="", map_ifindex=0, btf_fd=-1, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 64) = ... BUG: KASAN: vmalloc-out-of-bounds in bpf_percpu_lru_populate kernel/bpf/bpf_lru_list.c:594 [inline] BUG: KASAN: vmalloc-out-of-bounds in bpf_lru_populate+0x4ef/0x5e0 kernel/bpf/bpf_lru_list.c:611 Write of size 2 at addr ffffc90017e4a020 by task syz-executor.5/19786 CPU: 0 PID: 19786 Comm: syz-executor.5 Not tainted 5.10.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x4c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562 bpf_percpu_lru_populate kernel/bpf/bpf_lru_list.c:594 [inline] bpf_lru_populate+0x4ef/0x5e0 kernel/bpf/bpf_lru_list.c:611 prealloc_init kernel/bpf/hashtab.c:319 [inline] htab_map_alloc+0xf6e/0x1230 kernel/bpf/hashtab.c:507 find_and_alloc_map kernel/bpf/syscall.c:123 [inline] map_create kernel/bpf/syscall.c:829 [inline] __do_sys_bpf+0xa81/0x5170 kernel/bpf/syscall.c:4336 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45deb9 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fd93fbc0c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000001a40 RCX: 000000000045deb9 RDX: 0000000000000040 RSI: 0000000020000280 RDI: 0000000000000000 RBP: 000000000119bf60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000119bf2c R13: 00007ffc08a7be8f R14: 00007fd93fbc19c0 R15: 000000000119bf2c Fixes: 755e5d55367a ("bpf: Eliminate rlimit-based memory accounting for hashtab maps") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20201207182821.3940306-1-eric.dumazet@gmail.com
2020-12-07RDMA/core: Fix empty gid table for non IB/RoCE devicesGal Pressman
The query_gid_table ioctl skips non IB/RoCE ports, which as a result returns an empty gid table for devices such as EFA which have a GID table, but are not IB/RoCE. Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API") Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07bcache: fix race between setting bdev state to none and new write request ↵Dongsheng Yang
direct to backing There is a race condition in detaching as below: A. detaching B. Write request (1) writing back (2) write back done, set bdev state to clean. (3) cached_dev_put() and schedule_work(&dc->detach); (4) write data [0 - 4K] directly into backing and ack to user. (5) power-failure... When we restart this bcache device, this bdev is clean but not detached, and read [0 - 4K], we will get unexpected old data from cache device. To fix this problem, set the bdev state to none when we writeback done in detaching, and then if power-failure happened as above, the data in cache will not be used in next bcache device starting, it's detached, we will read the correct data from backing derectly. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Factor out the base vrate change into a separate functionBaolin Wang
Factor out the base vrate change code into a separate function to fimplify the ioc_timer_fn(). No functional change. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Factor out the active iocgs' state check into a separate functionBaolin Wang
Factor out the iocgs' state check into a separate function to simplify the ioc_timer_fn(). No functional change. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Move the usage ratio calculation to the correct placeBaolin Wang
We only use the hweight based usage ratio to calculate the new hweight_inuse of the iocg to decide if this iocg can donate some surplus vtime. Thus move the usage ratio calculation to the correct place to avoid unnecessary calculation for some vtime shortage iocgs. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Remove unnecessary advance declarationBaolin Wang
Remove unnecessary advance declaration of struct ioc_gq. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Fix some typos in commentsBaolin Wang
Fix some typos in comments. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blktrace: fix up a kerneldoc commentChristoph Hellwig
Fixes: a54895fa057c ("block: remove the request_queue to argument request based tracepoints") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07x86/platform/uv: Update sysfs documentationMike Travis
Update sysfs documentation file to include moved /proc leaves. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lkml.kernel.org/r/20201128034227.120869-6-mike.travis@hpe.com
2020-12-07RDMA/iser: Remove in_interrupt() usageSebastian Andrzej Siewior
iser_initialize_task_headers() uses in_interrupt() to find out if it is safe to acquire a mutex. in_interrupt() is deprecated as it is ill defined and does not provide what it suggests. Aside of that it covers only parts of the contexts in which a mutex may not be acquired. The following callchains exist: iscsi_queuecommand() *locks* iscsi_session::frwd_lock -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_iser_task_xmit() (iscsi_transport::xmit_task) -> iscsi_iser_task_xmit_unsol_data() -> iser_send_data_out() -> iser_initialize_task_headers() iscsi_data_xmit() *locks* iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() __iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> session->tt->xmit_task() ( The only callchain that is close to be invoked in preemptible context: iscsi_xmitworker() worker -> iscsi_data_xmit() -> iscsi_xmit_task() -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit() In iscsi_iser_task_xmit() there is this check: if (!task->sc) return iscsi_iser_mtask_xmit(conn, task); so it does end up in iser_initialize_task_headers() and iser_initialize_task_headers() relies on iscsi_task::sc == NULL. Remove conditional locking of iser_conn::state_mutex because there is no call chain to do so. Remove the goto label and return early now that there is no clean up needed. Link: https://lore.kernel.org/r/20201204174256.62xfcvudndt7oufl@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <maxg@nvidia.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>