summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-19btrfs: exclude mmap from happening during all fallocate operationsJosef Bacik
There's a small window where a deadlock can happen between fallocate and mmap. This is described in detail by Filipe: """ When doing a fallocate operation we lock the inode, flush delalloc within the target range, wait for any ordered extents to complete and then lock the file range. Before we lock the range and after we flush delalloc, there is a time window where another task can come in and do a memory mapped write for a page within the fallocate range. This means that after fallocate locks the range, there can be a dirty page in the range. More often than not, this does not cause any problem. The exception is when we are low on available metadata space, because an fallocate operation needs to start a transaction while holding the file range locked, either through btrfs_prealloc_file_range() or through the call to btrfs_fallocate_update_isize(). If that's the case, we can end up in a deadlock. The following list of steps explains how that happens: 1) A fallocate operation starts, locks the inode, flushes delalloc in the range and waits for ordered extents in the range to complete; 2) Before the fallocate task locks the file range, another task does a memory mapped write for a page in the fallocate target range. This is possible since memory mapped writes do not (and can not) lock the inode; 3) The fallocate task locks the file range. At this point there is one dirty page in the range (due to the memory mapped write); 4) When the fallocate task attempts to start a transaction, it blocks when attempting to reserve metadata space, since we are low on available metadata space. Before blocking (wait on its reservation ticket), it starts the async reclaim task (if not running already); 5) The async reclaim task is not able to release space through any other means, so it decides to flush delalloc for inodes with dirty pages. It finds that the inode used in the fallocate operation has a dirty page and therefore queues a job (fs_info->flush_workers workqueue) to flush delalloc for that inode and waits on that job to complete; 6) The flush job blocks when attempting to lock the file range because it is currently locked by the fallocate task; 7) The fallocate task keeps waiting for its metadata reservation, waiting for a wakeup on its reservation ticket. The async reclaim task is waiting on the flush job, which in turn is waiting for locking the file range that is currently locked by the fallocate task. So unless some other task is able to release enough metadata space, for example an ordered extent for some other inode completes, we end up in a deadlock between all these tasks. When this happens stack traces like the following show up in dmesg/syslog: INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds. Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:11 state:D stack: 0 pid:1810830 ppid: 2 flags:0x00004000 Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs] Call Trace: __schedule+0x5d1/0xcf0 schedule+0x45/0xe0 lock_extent_bits+0x1e6/0x2d0 [btrfs] ? finish_wait+0x90/0x90 btrfs_invalidatepage+0x32c/0x390 [btrfs] ? __mod_memcg_state+0x8e/0x160 __extent_writepage+0x2d4/0x400 [btrfs] extent_write_cache_pages+0x2b2/0x500 [btrfs] ? lock_release+0x20e/0x4c0 ? trace_hardirqs_on+0x1b/0xf0 extent_writepages+0x43/0x90 [btrfs] ? lock_acquire+0x1a3/0x490 do_writepages+0x43/0xe0 ? __filemap_fdatawrite_range+0xa4/0x100 __filemap_fdatawrite_range+0xc5/0x100 btrfs_run_delalloc_work+0x17/0x40 [btrfs] btrfs_work_helper+0xf1/0x600 [btrfs] process_one_work+0x24e/0x5e0 worker_thread+0x50/0x3b0 ? process_one_work+0x5e0/0x5e0 kthread+0x153/0x170 ? kthread_mod_delayed_work+0xc0/0xc0 ret_from_fork+0x22/0x30 INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds. Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:1 state:D stack: 0 pid:2426217 ppid: 2 flags:0x00004000 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] Call Trace: __schedule+0x5d1/0xcf0 ? kvm_clock_read+0x14/0x30 ? wait_for_completion+0x81/0x110 schedule+0x45/0xe0 schedule_timeout+0x30c/0x580 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? lock_acquire+0x1a3/0x490 ? try_to_wake_up+0x7a/0xa20 ? lock_release+0x20e/0x4c0 ? lock_acquired+0x199/0x490 ? wait_for_completion+0x81/0x110 wait_for_completion+0xab/0x110 start_delalloc_inodes+0x2af/0x390 [btrfs] btrfs_start_delalloc_roots+0x12d/0x250 [btrfs] flush_space+0x24f/0x660 [btrfs] btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs] process_one_work+0x24e/0x5e0 worker_thread+0x20f/0x3b0 ? process_one_work+0x5e0/0x5e0 kthread+0x153/0x170 ? kthread_mod_delayed_work+0xc0/0xc0 ret_from_fork+0x22/0x30 (...) several tasks waiting for the inode lock held by the fallocate task below (...) RIP: 0033:0x7f61efe73fff Code: Unable to access opcode bytes at RIP 0x7f61efe73fd5. RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000202 ORIG_RAX: 000000000000013c RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73fff RDX: 00000000ffffff9c RSI: 0000560fbd5d90a0 RDI: 00000000ffffff9c RBP: 00007ffc3371beb0 R08: 0000000000000001 R09: 0000000000000003 R10: 0000560fbd5d7ad0 R11: 0000000000000202 R12: 0000000000000001 R13: 000000000000005e R14: 00007ffc3371bea0 R15: 00007ffc3371beb0 task:fdm-stress state:D stack: 0 pid:2508243 ppid:2508153 flags:0x00000000 Call Trace: __schedule+0x5d1/0xcf0 ? _raw_spin_unlock_irqrestore+0x3c/0x60 schedule+0x45/0xe0 __reserve_bytes+0x4a4/0xb10 [btrfs] ? finish_wait+0x90/0x90 btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs] btrfs_block_rsv_add+0x1f/0x50 [btrfs] start_transaction+0x2d1/0x760 [btrfs] btrfs_replace_file_extents+0x120/0x930 [btrfs] ? btrfs_fallocate+0xdcf/0x1260 [btrfs] btrfs_fallocate+0xdfb/0x1260 [btrfs] ? filename_lookup+0xf1/0x180 vfs_fallocate+0x14f/0x440 ioctl_preallocate+0x92/0xc0 do_vfs_ioctl+0x66b/0x750 ? __do_sys_newfstat+0x53/0x60 __x64_sys_ioctl+0x62/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 """ Fix this by disallowing mmaps from happening while we're doing any of the fallocate operations on this inode. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: exclude mmaps while doing remapJosef Bacik
Darrick reported a potential issue to me where we could allow mmap writes after validating a page range matched in the case of dedupe. Generally we rely on lock page -> lock extent with the ordered flush to protect us, but this is done after we check the pages because we use the generic helpers, so we could modify the page in between doing the check and locking the range. There also exists a deadlock, as described by Filipe """ When cloning a file range, we lock the inodes, flush any delalloc within the respective file ranges, wait for any ordered extents and then lock the file ranges in both inodes. This means that right after we flush delalloc and before we lock the file ranges, memory mapped writes can come in and dirty pages in the file ranges of the clone operation. Most of the time this is harmless and causes no problems. However, if we are low on available metadata space, we can later end up in a deadlock when starting a transaction to replace file extent items. This happens if when allocating metadata space for the transaction, we need to wait for the async reclaim thread to release space and the reclaim thread needs to flush delalloc for the inode that got the memory mapped write and has its range locked by the clone task. Basically what happens is the following: 1) A clone operation locks inodes A and B, flushes delalloc for both inodes in the respective file ranges and waits for any ordered extents in those ranges to complete; 2) Before the clone task locks the file ranges, another task does a memory mapped write (which does not lock the inode) for one of the inodes of the clone operation. So now we have a dirty page in one of the ranges used by the clone operation; 3) The clone operation locks the file ranges for inodes A and B; 4) Later, when iterating over the file extents of inode A, the clone task attempts to start a transaction. There's not enough available free metadata space, so the async reclaim task is started (if not running already) and we wait for someone to wake us up on our reservation ticket; 5) The async reclaim task is not able to release space by any other means and decides to flush delalloc for the inode of the clone operation; 6) The workqueue job used to flush the inode blocks when starting delalloc for the inode, since the file range is currently locked by the clone task; 7) But the clone task is waiting on its reservation ticket and the async reclaim task is waiting on the flush job to complete, which can't progress since the clone task has the file range locked. So unless some other task is able to release space, for example an ordered extent for some other inode completes, we have a deadlock between all these tasks; When this happens stack traces like the following show up in dmesg/syslog: INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds. Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:11 state:D stack: 0 pid:1810830 ppid: 2 flags:0x00004000 Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs] Call Trace: __schedule+0x5d1/0xcf0 schedule+0x45/0xe0 lock_extent_bits+0x1e6/0x2d0 [btrfs] ? finish_wait+0x90/0x90 btrfs_invalidatepage+0x32c/0x390 [btrfs] ? __mod_memcg_state+0x8e/0x160 __extent_writepage+0x2d4/0x400 [btrfs] extent_write_cache_pages+0x2b2/0x500 [btrfs] ? lock_release+0x20e/0x4c0 ? trace_hardirqs_on+0x1b/0xf0 extent_writepages+0x43/0x90 [btrfs] ? lock_acquire+0x1a3/0x490 do_writepages+0x43/0xe0 ? __filemap_fdatawrite_range+0xa4/0x100 __filemap_fdatawrite_range+0xc5/0x100 btrfs_run_delalloc_work+0x17/0x40 [btrfs] btrfs_work_helper+0xf1/0x600 [btrfs] process_one_work+0x24e/0x5e0 worker_thread+0x50/0x3b0 ? process_one_work+0x5e0/0x5e0 kthread+0x153/0x170 ? kthread_mod_delayed_work+0xc0/0xc0 ret_from_fork+0x22/0x30 INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds. Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:1 state:D stack: 0 pid:2426217 ppid: 2 flags:0x00004000 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] Call Trace: __schedule+0x5d1/0xcf0 ? kvm_clock_read+0x14/0x30 ? wait_for_completion+0x81/0x110 schedule+0x45/0xe0 schedule_timeout+0x30c/0x580 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? lock_acquire+0x1a3/0x490 ? try_to_wake_up+0x7a/0xa20 ? lock_release+0x20e/0x4c0 ? lock_acquired+0x199/0x490 ? wait_for_completion+0x81/0x110 wait_for_completion+0xab/0x110 start_delalloc_inodes+0x2af/0x390 [btrfs] btrfs_start_delalloc_roots+0x12d/0x250 [btrfs] flush_space+0x24f/0x660 [btrfs] btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs] process_one_work+0x24e/0x5e0 worker_thread+0x20f/0x3b0 ? process_one_work+0x5e0/0x5e0 kthread+0x153/0x170 ? kthread_mod_delayed_work+0xc0/0xc0 ret_from_fork+0x22/0x30 (...) several other tasks blocked on inode locks held by the clone task below (...) RIP: 0033:0x7f61efe73fff Code: Unable to access opcode bytes at RIP 0x7f61efe73fd5. RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000202 ORIG_RAX: 000000000000013c RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73fff RDX: 00000000ffffff9c RSI: 0000560fbd604690 RDI: 00000000ffffff9c RBP: 00007ffc3371beb0 R08: 0000000000000002 R09: 0000560fbd5d75f0 R10: 0000560fbd5d81f0 R11: 0000000000000202 R12: 0000000000000002 R13: 000000000000000b R14: 00007ffc3371bea0 R15: 00007ffc3371beb0 task: fdm-stress state:D stack: 0 pid:2508234 ppid:2508153 flags:0x00004000 Call Trace: __schedule+0x5d1/0xcf0 ? _raw_spin_unlock_irqrestore+0x3c/0x60 schedule+0x45/0xe0 __reserve_bytes+0x4a4/0xb10 [btrfs] ? finish_wait+0x90/0x90 btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs] btrfs_block_rsv_add+0x1f/0x50 [btrfs] start_transaction+0x2d1/0x760 [btrfs] btrfs_replace_file_extents+0x120/0x930 [btrfs] ? lock_release+0x20e/0x4c0 btrfs_clone+0x3e4/0x7e0 [btrfs] ? btrfs_lookup_first_ordered_extent+0x8e/0x100 [btrfs] btrfs_clone_files+0xf6/0x150 [btrfs] btrfs_remap_file_range+0x324/0x3d0 [btrfs] do_clone_file_range+0xd4/0x1f0 vfs_clone_file_range+0x4d/0x230 ? lock_release+0x20e/0x4c0 ioctl_file_clone+0x8f/0xc0 do_vfs_ioctl+0x342/0x750 __x64_sys_ioctl+0x62/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 """ Fix both of these issues by excluding mmaps from happening we are doing any sort of remap, which prevents this race completely. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: use btrfs_inode_lock/btrfs_inode_unlock inode lock helpersJosef Bacik
A few places we intermix btrfs_inode_lock with a inode_unlock, and some places we just use inode_lock/inode_unlock instead of btrfs_inode_lock. None of these places are using this incorrectly, but as we adjust some of these callers it would be nice to keep everything consistent, so convert everybody to use btrfs_inode_lock/btrfs_inode_unlock. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: add a i_mmap_lock to our inodeJosef Bacik
We need to be able to exclude page_mkwrite from happening concurrently with certain operations. To facilitate this, add a i_mmap_lock to our inode, down_read() it in our mkwrite, and add a new ILOCK flag to indicate that we want to take the i_mmap_lock as well. I used pahole to check the size of the btrfs_inode, the sizes are as follows no lockdep: before: 1120 (3 per 4k page) after: 1160 (3 per 4k page) lockdep: before: 2072 (1 per 4k page) after: 2224 (1 per 4k page) We're slightly larger but it doesn't change how many objects we can fit per page. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: remove mirror argument from btrfs_csum_verify_data()Goldwyn Rodrigues
The parameter mirror is not used and does not make sense for checksum verification of the given bio. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: remove force argument from run_delalloc_nocow()Goldwyn Rodrigues
force_cow can be calculated from inode and does not need to be passed as an argument. This simplifies run_delalloc_nocow() call from btrfs_run_delalloc_range() A new function, should_nocow() checks if the range should be NOCOWed or not. The function returns true iff either BTRFS_INODE_NODATA or BTRFS_INODE_PREALLOC, but is not a defrag extent. Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: don't opencode extent_changeset_freeNikolay Borisov
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: assign proper values to a bool variable in dev_extent_hole_check_zonedJiapeng Chong
Fix the following coccicheck warnings: ./fs/btrfs/volumes.c:1462:10-11: WARNING: return of 0/1 in function 'dev_extent_hole_check_zoned' with return type bool. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: add btree read ahead for incremental send operationsFilipe Manana
Currently we do not do btree read ahead when doing an incremental send, however we know that we will read and process any node or leaf in the send root that has a generation greater than the generation of the parent root. So triggering read ahead for such nodes and leafs is beneficial for an incremental send. This change does that, triggers read ahead of any node or leaf in the send root that has a generation greater then the generation of the parent root. As for the parent root, no readahead is triggered because knowing in advance which nodes/leaves are going to be read is not so linear and there's often a large time window between visiting nodes or leaves of the parent root. So I opted to leave out the parent root, and triggering read ahead for its nodes/leaves seemed to have not made significant difference. The following test script was used to measure the improvement on a box using an average, consumer grade, spinning disk and with 16GiB of ram: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj MKFS_OPTIONS="--nodesize 16384" # default, just to be explicit MOUNT_OPTIONS="-o max_inline=2048" # default, just to be explicit mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null mount $MOUNT_OPTIONS $DEV $MNT # Create files with inline data to make it easier and faster to create # large btrees. add_files() { local total=$1 local start_offset=$2 local number_jobs=$3 local total_per_job=$(($total / $number_jobs)) echo "Creating $total new files using $number_jobs jobs" for ((n = 0; n < $number_jobs; n++)); do ( local start_num=$(($start_offset + $n * $total_per_job)) for ((i = 1; i <= $total_per_job; i++)); do local file_num=$((start_num + $i)) local file_path="$MNT/file_${file_num}" xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null if [ $? -ne 0 ]; then echo "Failed creating file $file_path" break fi done ) & worker_pids[$n]=$! done wait ${worker_pids[@]} sync echo echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)" } initial_file_count=500000 add_files $initial_file_count 0 4 echo echo "Creating first snapshot..." btrfs subvolume snapshot -r $MNT $MNT/snap1 echo echo "Adding more files..." add_files $((initial_file_count / 4)) $initial_file_count 4 echo echo "Updating 1/50th of the initial files..." for ((i = 1; i < $initial_file_count; i += 50)); do xfs_io -c "pwrite -S 0xcd 0 20" $MNT/file_$i > /dev/null done echo echo "Creating second snapshot..." btrfs subvolume snapshot -r $MNT $MNT/snap2 umount $MNT echo 3 > /proc/sys/vm/drop_caches blockdev --flushbufs $DEV &> /dev/null hdparm -F $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT echo echo "Testing full send..." start=$(date +%s) btrfs send $MNT/snap1 > /dev/null end=$(date +%s) echo echo "Full send took $((end - start)) seconds" umount $MNT echo 3 > /proc/sys/vm/drop_caches blockdev --flushbufs $DEV &> /dev/null hdparm -F $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT echo echo "Testing incremental send..." start=$(date +%s) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s) echo echo "Incremental send took $((end - start)) seconds" umount $MNT Before this change, incremental send duration: with $initial_file_count == 200000: 51 seconds with $initial_file_count == 500000: 168 seconds After this change, incremental send duration: with $initial_file_count == 200000: 39 seconds (-26.7%) with $initial_file_count == 500000: 125 seconds (-29.4%) For $initial_file_count == 200000 there are 62600 nodes and leaves in the btree of the first snapshot, and 77759 nodes and leaves in the btree of the second snapshot. The root nodes were at level 2. While for $initial_file_count == 500000 there are 152476 nodes and leaves in the btree of the first snapshot, and 190511 nodes and leaves in the btree of the second snapshot. The root nodes were at level 2 as well. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: add btree read ahead for full send operationsFilipe Manana
When doing a full send we know that we are going to be reading every node and leaf of the send root, so we benefit from enabling read ahead for the btree. This change enables read ahead for full send operations only, incremental sends will have read ahead enabled in a different way by a separate patch. The following test script was used to measure the improvement on a box using an average, consumer grade, spinning disk and with 16GiB of RAM: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj MKFS_OPTIONS="--nodesize 16384" # default, just to be explicit MOUNT_OPTIONS="-o max_inline=2048" # default, just to be explicit mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null mount $MOUNT_OPTIONS $DEV $MNT # Create files with inline data to make it easier and faster to create # large btrees. add_files() { local total=$1 local start_offset=$2 local number_jobs=$3 local total_per_job=$(($total / $number_jobs)) echo "Creating $total new files using $number_jobs jobs" for ((n = 0; n < $number_jobs; n++)); do ( local start_num=$(($start_offset + $n * $total_per_job)) for ((i = 1; i <= $total_per_job; i++)); do local file_num=$((start_num + $i)) local file_path="$MNT/file_${file_num}" xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null if [ $? -ne 0 ]; then echo "Failed creating file $file_path" break fi done ) & worker_pids[$n]=$! done wait ${worker_pids[@]} sync echo echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)" } initial_file_count=500000 add_files $initial_file_count 0 4 echo echo "Creating first snapshot..." btrfs subvolume snapshot -r $MNT $MNT/snap1 echo echo "Adding more files..." add_files $((initial_file_count / 4)) $initial_file_count 4 echo echo "Updating 1/50th of the initial files..." for ((i = 1; i < $initial_file_count; i += 50)); do xfs_io -c "pwrite -S 0xcd 0 20" $MNT/file_$i > /dev/null done echo echo "Creating second snapshot..." btrfs subvolume snapshot -r $MNT $MNT/snap2 umount $MNT echo 3 > /proc/sys/vm/drop_caches blockdev --flushbufs $DEV &> /dev/null hdparm -F $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT echo echo "Testing full send..." start=$(date +%s) btrfs send $MNT/snap1 > /dev/null end=$(date +%s) echo echo "Full send took $((end - start)) seconds" umount $MNT echo 3 > /proc/sys/vm/drop_caches blockdev --flushbufs $DEV &> /dev/null hdparm -F $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT echo echo "Testing incremental send..." start=$(date +%s) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s) echo echo "Incremental send took $((end - start)) seconds" umount $MNT Before this change, full send duration: with $initial_file_count == 200000: 165 seconds with $initial_file_count == 500000: 407 seconds After this change, full send duration: with $initial_file_count == 200000: 149 seconds (-10.2%) with $initial_file_count == 500000: 353 seconds (-14.2%) For $initial_file_count == 200000 there are 62600 nodes and leaves in the btree of the first snapshot, while for $initial_file_count == 500000 there are 152476 nodes and leaves. The roots were at level 2. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: simplify code flow in btrfs_delayed_inode_reserve_metadataNikolay Borisov
btrfs_block_rsv_add can return only ENOSPC since it's called with NO_FLUSH modifier. This so simplify the logic in btrfs_delayed_inode_reserve_metadata to exploit this invariant. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add assert and comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: remove btrfs_inode parameter from btrfs_delayed_inode_reserve_metadataNikolay Borisov
It's only used for tracepoint to obtain the inode number, but we already have the ino from btrfs_delayed_node::inode_id. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: simplify commit logic in try_flush_qgroupNikolay Borisov
It's no longer expected to call this function with an open transaction so all the workarounds concerning this can be removed. In fact it'll constitute a bug to call this function with a transaction already held so WARN in this case. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: scrub: drop a few function declarationsAnand Jain
Drop function declarations at the beginning of the file scrub.c. These functions are defined before they are used in the same file and don't need forward declaration. No functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: change return type to bool in btrfs_extent_readonlyAnand Jain
btrfs_extent_readonly() checks if the block group is readonly, the bool return type should be used. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: unexport btrfs_extent_readonly() and make it staticAnand Jain
btrfs_extent_readonly() is used by can_nocow_extent() in inode.c. So move it from extent-tree.c to inode.c and declare it as static. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: replace open coded while loop with proper constructNikolay Borisov
btrfs_inc_block_group_ro wants to ensure that the current transaction is not running dirty block groups, if it is it waits and loops again. That logic is currently implemented using a goto label. Actually using a proper do {} while() construct doesn't hurt readability nor does it introduce excessive nesting and makes the relevant code stand out by being encompassed in the loop construct. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: replace offset_in_entry with in_rangeNikolay Borisov
No point in duplicating the functionality just use the generic helper that has the same semantics. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: make find_desired_extent take btrfs_inodeNikolay Borisov
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: make btrfs_replace_file_extents take btrfs_inodeNikolay Borisov
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: fix comment for btrfs ordered extent flag bitsQu Wenruo
There is small error in comment about BTRFS_ORDERED_* flags, added in commit 3c198fe06449 ("btrfs: rework the order of btrfs_ordered_extent::flags") but the fixup did not get merged in time. The 4 types are for ordered extent itself, not for direct io. Only 3 types support direct io, REGULAR/NOCOW/PREALLOC. Fix the comment to reflect that. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19Merge series "spi: stm32-qspi: Fix and update" from ↵Mark Brown
<patrice.chotard@foss.st.com> Patrice Chotard <patrice.chotard@foss.st.com>: From: Patrice Chotard <patrice.chotard@foss.st.com> Christophe Kerello (1): spi: stm32-qspi: fix pm_runtime usage_count counter Patrice Chotard (2): spi: stm32-qspi: Trigger DMA only if more than 4 bytes to transfer spi: stm32-qspi: Add dirmap support drivers/spi/spi-stm32-qspi.c | 106 +++++++++++++++++++++++++++-------- 1 file changed, 84 insertions(+), 22 deletions(-) -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
2021-04-19Merge tag 'qcom-dts-for-5.13-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt More Qualcomm DTS updates for 5.13 This adds CPUfreq, interconnect providers, IPC, remoteproc and IPA to the SDX55 platform and then adds board files for the Telit FN980 TLB and Thundercomm TurboX T55. * tag 'qcom-dts-for-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: ARM: dts: qcom: sdx55: add IPA information ARM: dts: qcom: sdx55: Add basic devicetree support for Thundercomm T55 dt-bindings: arm: qcom: Add binding for Thundercomm T55 kit ARM: dts: qcom: sdx55: Add basic devicetree support for Telit FN980 TLB dt-bindings: arm: qcom: Add binding for Telit FN980 TLB board ARM: dts: qcom: sdx55: Add Modem remoteproc node ARM: dts: qcom: Fix node name for NAND controller node ARM: dts: qcom: sdx55: Add interconnect nodes ARM: dts: qcom: sdx55: Add SCM node dt-bindings: firmware: scm: Add compatible for SDX55 ARM: dts: qcom: sdx55: Add IMEM and PIL info region ARM: dts: qcom: sdx55: Add modem SMP2P node ARM: dts: qcom: sdx55: Add CPUFreq support ARM: dts: qcom: sdx55: Add support for APCS block ARM: dts: qcom: sdx55: Add support for A7 PLL clock Link: https://lore.kernel.org/r/20210419150956.860423-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19arm64: dts: qcom: sc7180: Update iommu property for simultaneous playbackV Sujith Kumar Reddy
Update iommu property in lpass cpu node for supporting simultaneous playback on headset and speaker. Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: V Sujith Kumar Reddy <vsujithk@codeaurora.org> Signed-off-by: Srinivasa Rao Mandadapu <srivasam@codeaurora.org> Link: https://lore.kernel.org/r/20210406163330.11996-1-srivasam@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2021-04-19arm64: dts: qcom: sc7180: pompom: Add "dmic_clk_en" + sound modelDouglas Anderson
Match what's downstream for this board. Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Srinivasa Rao Mandadapu <srivasam@codeaurora.org> Cc: Ajit Pandey <ajitp@codeaurora.org> Cc: Judy Hsiao <judyhsiao@chromium.org> Cc: Cheng-Yi Chiang <cychiang@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210315133924.v2.2.If218189eff613a6c48ba12d75fad992377d8f181@changeid Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2021-04-19arm64: dts: qcom: sc7180: coachz: Add "dmic_clk_en"Douglas Anderson
This was present downstream. Add upstream too. NOTE: upstream I managed to get some sort of halfway state and got one pinctrl entry in the coachz-r1 device tree. Remove that as part of this since it's now in the dtsi. Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Srinivasa Rao Mandadapu <srivasam@codeaurora.org> Cc: Ajit Pandey <ajitp@codeaurora.org> Cc: Judy Hsiao <judyhsiao@chromium.org> Cc: Cheng-Yi Chiang <cychiang@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210315133924.v2.1.I601a051cad7cfd0923e55b69ef7e5748910a6096@changeid Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2021-04-19ARM: dts: mstar: Add a dts for M5Stack UnitV2Daniel Palmer
M5Stack are releasing a new widget based on the SigmaStar SSD202D. We have some support for the SSD202D so lets add a dts for it. Signed-off-by: Daniel Palmer <daniel@0x0f.com> Link: https://m5stack-store.myshopify.com/products/unitv2-ai-camera-gc2145 Link: https://lore.kernel.org/r/20210417011015.2105280-4-daniel@0x0f.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19dt-bindings: arm: mstar: Add compatible for M5Stack UnitV2Daniel Palmer
Add a compatible for the M5Stack UnitV2 that is based on the SigmaStar SSD202D (inifinity2m). Signed-off-by: Daniel Palmer <daniel@0x0f.com> Link: https://lore.kernel.org/r/20210417011015.2105280-3-daniel@0x0f.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19dt-bindings: vendor-prefixes: Add vendor prefix for M5StackDaniel Palmer
M5Stack make various modules for STEM, Makers, IoT. Their UnitV2 is based on a SigmaStar SSD202D SoC which we already have some minimal support for so add a prefix in preparation for UnitV2 board support. Signed-off-by: Daniel Palmer <daniel@0x0f.com> Link: https://m5stack.com/ Link: https://lore.kernel.org/r/20210417011015.2105280-2-daniel@0x0f.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19arm64: dts: mt8183: fix dtbs_check warningMatthias Brugger
Fix unit names to make dtbs_check happy. Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Link: https://lore.kernel.org/r/20210414144643.17435-2-matthias.bgg@kernel.org Link: https://lore.kernel.org/r/20210416143923.23406-3-matthias.bgg@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19arm64: dts: mt8183-pumpkin: fix dtbs_check warningMatthias Brugger
Fix unit names to make dtbs_check happy. Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Link: https://lore.kernel.org/r/20210414144643.17435-1-matthias.bgg@kernel.org Link: https://lore.kernel.org/r/20210416143923.23406-2-matthias.bgg@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19Merge tag 'memory-controller-drv-5.13-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into arm/drivers Memory controller drivers for v5.13, part two 1. Renesas RPC: fix possible NULL pointer. 2. Exynos5422 DMC: add proper error checking for clk_prepare. 3. Mediatek SMI: use device-links instead of explicit PM runtime calls. * tag 'memory-controller-drv-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: memory: mtk-smi: Add device-link between smi-larb and smi-common memory: samsung: exynos5422-dmc: handle clk_set_parent() failure memory: renesas-rpc-if: fix possible NULL pointer dereference of resource Link: https://lore.kernel.org/r/20210415065514.7385-1-krzysztof.kozlowski@canonical.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19spi: Handle SPI device setup callback failure.Joe Burmeister
If the setup callback failed, but the controller has auto_runtime_pm and set_cs, the setup failure could be missed. Signed-off-by: Joe Burmeister <joe.burmeister@devtank.co.uk> Link: https://lore.kernel.org/r/20210419130631.4586-1-joe.burmeister@devtank.co.uk Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-19spi: sync up initial chipselect stateDavid Bauer
When initially probing the SPI slave device, the call for disabling an SPI device without the SPI_CS_HIGH flag is not applied, as the condition for checking whether or not the state to be applied equals the one currently set evaluates to true. This however might not necessarily be the case, as the chipselect might be active. Add a force flag to spi_set_cs which allows to override this early exit condition. Set it to false everywhere except when called from spi_setup to sync up the initial CS state. Fixes commit d40f0b6f2e21 ("spi: Avoid setting the chip select if we don't need to") Signed-off-by: David Bauer <mail@david-bauer.net> Link: https://lore.kernel.org/r/20210416195956.121811-1-mail@david-bauer.net Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-19spi: stm32-qspi: Add dirmap supportPatrice Chotard
Add stm32_qspi_dirmap_read() and stm32_qspi_dirmap_create() to get dirmap support. Update the exec_op callback which doens't allow anymore memory map access. Memory map access are only available through the dirmap_read callback. Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://lore.kernel.org/r/20210419121541.11617-4-patrice.chotard@foss.st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-19spi: stm32-qspi: Trigger DMA only if more than 4 bytes to transferPatrice Chotard
In order to optimize accesses to spi flashes, trigger a DMA only if more than 4 bytes has to be transferred. DMA transfer preparation's cost becomes negligible above 4 bytes to transfer. Below this threshold, indirect transfer give more throughput. mtd_speedtest shows that page write throughtput increases : - from 779 to 853 KiB/s (~9.5%) with s25fl512s SPI-NOR. - from 5283 to 5666 KiB/s (~7.25%) with Micron SPI-NAND. Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com> Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://lore.kernel.org/r/20210419121541.11617-3-patrice.chotard@foss.st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-19spi: stm32-qspi: fix pm_runtime usage_count counterChristophe Kerello
pm_runtime usage_count counter is not well managed. pm_runtime_put_autosuspend callback drops the usage_counter but this one has never been increased. Add pm_runtime_get_sync callback to bump up the usage counter. It is also needed to use pm_runtime_force_suspend and pm_runtime_force_resume APIs to handle properly the clock. Fixes: 9d282c17b023 ("spi: stm32-qspi: Add pm_runtime support") Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210419121541.11617-2-patrice.chotard@foss.st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-19x86/build: Disable HIGHMEM64G selection for M486SXMaciej W. Rozycki
Fix a regression caused by making the 486SX separately selectable in Kconfig, for which the HIGHMEM64G setting has not been updated and therefore has become exposed as a user-selectable option for the M486SX configuration setting unlike with original M486 and all the other settings that choose non-PAE-enabled processors: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) 3. 64GB (HIGHMEM64G) choice[1-3?]: With the fix in place the setting is now correctly removed: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) choice[1-2?]: [ bp: Massage commit message. ] Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.5+ Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.me.uk
2021-04-19m68k: sun3x: Remove unneeded semicolonWan Jiabing
Fix the following coccicheck warning: ./arch/m68k/include/asm/sun3xflop.h:109:2-3: Unneeded semicolon Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20210415031450.23379-1-wanjiabing@vivo.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2021-04-19platform/x86: touchscreen_dmi: Add info for the Teclast Tbook 11 tabletHans de Goede
Add touchscreen info for the Teclast Tbook 11 tablet. This includes info for getting the firmware directly from the UEFI, so that the user does not need to manually install the firmware in /lib/firmware/silead. This change will make the touchscreen on these devices work OOTB, without requiring any manual setup. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210417173105.4134-1-hdegoede@redhat.com
2021-04-19platform/x86: intel_pmc_core: Add support for Alder Lake PCH-PDavid E. Box
Alder PCH-P is based on Tiger Lake PCH. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-10-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Add LTR registers for Tiger LakeGayatri Kammela
Just like Ice Lake, Tiger Lake uses Cannon Lake's LTR information and supports a few additional registers. Hence add the LTR registers specific to Tiger Lake to the cnp_ltr_show_map[]. Also adjust the number of LTR IPs for Tiger Lake to the correct amount. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-9-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Add option to set/clear LPM modeDavid E. Box
By default the Low Power Mode (LPM or sub-state) status registers will latch condition status on every entry into Package C10. This is configurable in the PMC to allow latching on any achievable sub-state. Add a debugfs file to support this. Also add the option to clear the status registers to 0. Clearing the status registers before testing removes ambiguity around when the current values were set. The new file, latch_lpm_mode, looks like this: [c10] S0i2.0 S0i3.0 S0i2.1 S0i3.1 S0i3.2 clear Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210417031252.3020837-8-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Add requirements file to debugfsGayatri Kammela
Add the debugfs file, substate_requirements, to view the low power mode (LPM) requirements for each enabled mode alongside the last latched status of the condition. After this patch, the new file will look like this: Element | S0i2.0 | S0i3.0 | S0i2.1 | S0i3.1 | S0i3.2 | Status | USB2PLL_OFF_STS | Required | Required | Required | Required | Required | | PCIe/USB3.1_Gen2PLL_OFF_STS | Required | Required | Required | Required | Required | | PCIe_Gen3PLL_OFF_STS | Required | Required | Required | Required | Required | Yes | OPIOPLL_OFF_STS | Required | Required | Required | Required | Required | Yes | OCPLL_OFF_STS | Required | Required | Required | Required | Required | Yes | MainPLL_OFF_STS | | Required | | Required | Required | | Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Co-developed-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210417031252.3020837-7-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Get LPM requirements for Tiger LakeGayatri Kammela
Platforms that support low power modes (LPM) such as Tiger Lake maintain requirements for each sub-state that a readable in the PMC. However, unlike LPM status registers, requirement registers are not memory mapped but are available from an ACPI _DSM. Collect the requirements for Tiger Lake using the _DSM method and store in a buffer. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Co-developed-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210417031252.3020837-6-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Show LPM residency in microsecondsGayatri Kammela
Modify the low power mode (LPM or sub-state) residency counters to display in microseconds just like the slp_s0_residency counter. The granularity of the counter is approximately 30.5us per tick. Double this value then divide by two to maintain accuracy. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-5-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Handle sub-states genericallyGayatri Kammela
The current implementation of pmc_core_substate_res_show() is written specifically for Tiger Lake. However, new platform will also have sub-states and may support different modes. Therefore rewrite the code to handle sub-states generically. Obtain the number and type of enabled states form the PMC. Use the Low Power Mode (LPM) priority register to store the states in order from shallowest to deepest for displays. Add a for_each macro to simplify this. While changing the sub-state display it makes sense to show only the "enabled" sub-states instead of showing all possible ones. After this patch, the debugfs file looks like this: Substate Residency S0i2.0 0 S0i3.0 0 S0i2.1 9329279 S0i3.1 0 S0i3.2 0 Suggested-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-4-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Remove global struct pmc_devDavid E. Box
The intel_pmc_core driver did not always bind to a device which meant it lacked a struct device that could be used to maintain driver data. So a global instance of struct pmc_dev was used for this purpose and functions accessed this directly. Since the driver now binds to an ACPI device, remove the global pmc_dev in favor of one that is allocated during probe. Modify users of the global to obtain the object by argument instead. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-3-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19platform/x86: intel_pmc_core: Don't use global pmcdev in quirksDavid E. Box
The DMI callbacks, used for quirks, currently access the PMC by getting the address a global pmc_dev struct. Instead, have the callbacks set a global quirk specific variable. In probe, after calling dmi_check_system(), pass pmc_dev to a function that will handle each quirk if its variable condition is met. This allows removing the global pmc_dev later. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-2-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-19mmc: meson-gx: replace WARN_ONCE with dev_warn_once about scatterlist size ↵Neil Armstrong
alignment in block mode Since commit e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode"), support for SDIO SD_IO_RW_EXTENDED transferts are properly filtered but some driver like brcmfmac still gives a block sg buffer size not aligned with SDIO block, triggerring a WARN_ONCE() with scary stacktrace even if the transfer works fine but with possible degraded performances. Simply replace with dev_warn_once() to inform user this should be fixed to avoid degraded performance. This should be ultimately fixed in brcmfmac, but since it's only a performance issue the warning should be removed. Fixes: e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20210416094347.2015896-1-narmstrong@baylibre.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>