summaryrefslogtreecommitdiff
path: root/kernel/power/swap.c
AgeCommit message (Collapse)Author
2023-12-20PM: hibernate: Repair excess function parameter description warningRandy Dunlap
Function swsusp_close() does not have any parameters, so remove the description of parameter @exclusive to prevent this warning. swap.c:1573: warning: Excess function parameter 'exclusive' description in 'swsusp_close' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-15PM: hibernate: Enforce ordering during image compression/decompressionHongchen Zhang
An S4 (suspend to disk) test on the LoongArch 3A6000 platform sometimes fails with the following error messaged in the dmesg log: Invalid LZO compressed length That happens because when compressing/decompressing the image, the synchronization between the control thread and the compress/decompress/crc thread is based on a relaxed ordering interface, which is unreliable, and the following situation may occur: CPU 0 CPU 1 save_image_lzo lzo_compress_threadfn atomic_set(&d->stop, 1); atomic_read(&data[thr].stop) data[thr].cmp = data[thr].cmp_len; WRITE data[thr].cmp_len Then CPU0 gets a stale cmp_len and writes it to disk. During resume from S4, wrong cmp_len is loaded. To maintain data consistency between the two threads, use the acquire/release variants of atomic set and read operations. Fixes: 081a9d043c98 ("PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Co-developed-by: Weihao Li <liweihao@loongson.cn> Signed-off-by: Weihao Li <liweihao@loongson.cn> [ rjw: Subject rewrite and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-11PM: hibernate: Do not initialize error in swap_write_page()Li zeming
'error' first receives the function result before it is used, and it does not need to be assigned a value during definition. Signed-off-by: Li zeming <zeming@nfschina.com> [ rjw: Subject rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-31Merge tag 'pm-6.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add new hardware support (new Qualcomm SoC versions in cpufreq, RK3568/RK3588 in devfreq), extend the OPP (operating performance points) framework, improve cpufreq governors, fix issues and clean up code (most of the changes are in cpufreq and devfreq). Specifics: - Add support for several Qualcomm SoC versions and other similar changes (Christian Marangi, Dmitry Baryshkov, Luca Weiss, Neil Armstrong, Richard Acayan, Robert Marko, Rohit Agarwal, Stephan Gerhold and Varadarajan Narayanan) - Clean up the tegra cpufreq driver (Sumit Gupta) - Use of_property_read_reg() to parse "reg" in pmac32 driver (Rob Herring) - Add support for TI's am62p5 Soc (Bryan Brattlof) - Make ARM_BRCMSTB_AVS_CPUFREQ depends on !ARM_SCMI_CPUFREQ (Florian Fainelli) - Update Kconfig to mention i.MX7 as well (Alexander Stein) - Revise global turbo disable check in intel_pstate (Srinivas Pandruvada) - Carry out initialization of sg_cpu in the schedutil cpufreq governor in one loop (Liao Chang) - Simplify the condition for storing 'down_threshold' in the conservative cpufreq governor (Liao Chang) - Use fine-grained mutex in the userspace cpufreq governor (Liao Chang) - Move is_managed indicator in the userspace cpufreq governor into a per-policy structure (Liao Chang) - Rebuild sched-domains when removing cpufreq driver (Pierre Gondois) - Fix buffer overflow detection in trans_stats() (Christian Marangi) - Switch to dev_pm_opp_find_freq_(ceil/floor)_indexed() APIs to support specific devices like UFS which handle multiple clocks through OPP (Operating Performance Point) framework (Manivannan Sadhasivam) - Add perf support to the Rockchip DFI (DDR Monitor Module) devfreq- event driver: * Generalize rockchip-dfi.c to support new RK3568/RK3588 using different DDR type (Sascha Hauer). * Convert DT binding document format to yaml (Sascha Hauer). * Add perf support for DFI (a unit suitable for measuring DDR utilization) to rockchip-dfi.c to extend DFI usage (Sascha Hauer) - Add locking to the OPP handling code in the Mediatek CCI devfreq driver, because the voltage of shared OPP might be changed by multiple drivers (Mark Tseng, Dan Carpenter) - Use device_get_match_data() in the Samsung Exynos PPMU devfreq-event driver (Rob Herring) - Extend support for the opp-level beyond required-opps (Ulf Hansson) - Add dev_pm_opp_find_level_floor() (Krishna chaitanya chundru) - dt-bindings: Allow opp-peak-kBpsfor kryo CPUs, support Qualcomm Krait SoCs and document named opp-microvolt property (Bjorn Andersson, Dmitry Baryshkov and Christian Marangi) - Fix -Wunsequenced warning _of_add_opp_table_v1() (Nathan Chancellor) - General cleanup of OPP code (Viresh Kumar) - Use __get_safe_page() rather than touching the list in hibernation snapshot code (Brian Geffon) - Fix symbol export for _SIMPLE_ variants of _PM_OPS() (Raag Jadav) - Clean up sync_read handling in snapshot_write_next() (Brian Geffon) - Fix kerneldoc comments for swsusp_check() and swsusp_close() to better match code (Christoph Hellwig) - Downgrade BIOS locked limits pr_warn() in the Intel RAPL power capping driver to pr_debug() (Ville Syrjälä) - Change the minimum python version for the intel_pstate_tracer utility from 2.7 to 3.6 (Doug Smythies)" * tag 'pm-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (82 commits) dt-bindings: cpufreq: qcom-hw: document SM8650 CPUFREQ Hardware cpufreq: arm: Kconfig: Add i.MX7 to supported SoC for ARM_IMX_CPUFREQ_DT cpufreq: qcom-nvmem: add support for IPQ8064 cpufreq: qcom-nvmem: also accept operating-points-v2-krait-cpu cpufreq: qcom-nvmem: drop pvs_ver for format a fuses dt-bindings: cpufreq: qcom-cpufreq-nvmem: Document krait-cpu cpufreq: qcom-nvmem: add support for IPQ6018 dt-bindings: cpufreq: qcom-cpufreq-nvmem: document IPQ6018 cpufreq: qcom-nvmem: Add MSM8909 cpufreq: qcom-nvmem: Simplify driver data allocation powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug() cpufreq: stats: Fix buffer overflow detection in trans_stats() dt-bindings: devfreq: event: rockchip,dfi: Add rk3588 support dt-bindings: devfreq: event: rockchip,dfi: Add rk3568 support dt-bindings: devfreq: event: convert Rockchip DFI binding to yaml PM / devfreq: rockchip-dfi: add support for RK3588 PM / devfreq: rockchip-dfi: account for multiple DDRMON_CTRL registers PM / devfreq: rockchip-dfi: make register stride SoC specific PM / devfreq: rockchip-dfi: Add perf support PM / devfreq: rockchip-dfi: give variable a better name ...
2023-10-28PM: hibernate: Drop unused snapshot_test argumentJan Kara
snapshot_test argument is now unused in swsusp_close() and load_image_and_restore(). Drop it CC: linux-pm@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-17-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28PM: hibernate: Convert to bdev_open_by_dev()Jan Kara
Convert hibernation code to use bdev_open_by_dev(). CC: linux-pm@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-16-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-26PM: hibernate: fix the kerneldoc comment for swsusp_check() and swsusp_close()Christoph Hellwig
The comments for both swsusp_check() and swsusp_close() don't actually describe what they are doing. Just removing the comments would probably better, but as the file is full of useless kerneldoc comments for non-exported symbols this fits in better with the style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-12PM: hibernate: Rename function parameter from snapshot_test to exclusiveChen Yu
Several functions reply on snapshot_test to decide whether to open the resume device exclusively. However there is no strict connection between the snapshot_test and the open mode. Rename the 'snapshot_test' input parameter to 'exclusive' to better reflect the use case. No functional change is expected. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12swsusp: don't pass a stack address to blkdev_get_by_pathChristoph Hellwig
holder is just an on-stack pointer that can easily be reused by other calls, replace it with a static variable that doesn't change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05PM: hibernate: remove the global snapshot_test variableChristoph Hellwig
Passing call dependent variable in global variables is a huge antipattern. Fix it up. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230531125535.676098-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-27PM: hibernate: Do not get block device exclusively in test_resume modeChen Yu
The system refused to do a test_resume because it found that the swap device has already been taken by someone else. Specifically, the swsusp_check()->blkdev_get_by_dev(FMODE_EXCL) is supposed to do this check. Steps to reproduce: dd if=/dev/zero of=/swapfile bs=$(cat /proc/meminfo | awk '/MemTotal/ {print $2}') count=1024 conv=notrunc mkswap /swapfile swapon /swapfile swap-offset /swapfile echo 34816 > /sys/power/resume_offset echo test_resume > /sys/power/disk echo disk > /sys/power/state PM: Using 3 thread(s) for compression PM: Compressing and saving image data (293150 pages)... PM: Image saving progress: 0% PM: Image saving progress: 10% ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata2: SATA link down (SStatus 0 SControl 300) ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) PM: Image saving progress: 20% PM: Image saving progress: 30% PM: Image saving progress: 40% PM: Image saving progress: 50% pcieport 0000:00:02.5: pciehp: Slot(0-5): No device found PM: Image saving progress: 60% PM: Image saving progress: 70% PM: Image saving progress: 80% PM: Image saving progress: 90% PM: Image saving done PM: hibernation: Wrote 1172600 kbytes in 2.70 seconds (434.29 MB/s) PM: S| PM: hibernation: Basic memory bitmaps freed PM: Image not found (code -16) This is because when using the swapfile as the hibernation storage, the block device where the swapfile is located has already been mounted by the OS distribution(usually mounted as the rootfs). This is not an issue for normal hibernation, because software_resume()->swsusp_check() happens before the block device(rootfs) mount. But it is a problem for the test_resume mode. Because when test_resume happens, the block device has been mounted already. Thus remove the FMODE_EXCL for test_resume mode. This would not be a problem because in test_resume stage, the processes have already been frozen, and the race condition described in Commit 39fbef4b0f77 ("PM: hibernate: Get block device exclusively in swsusp_check()") is unlikely to happen. Fixes: 39fbef4b0f77 ("PM: hibernate: Get block device exclusively in swsusp_check()") Reported-by: Yifan Li <yifan2.li@intel.com> Suggested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Tested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Tested-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20PM: hibernate: swap: don't use /** for non-kernel-doc commentsRandy Dunlap
kernel-doc complains about multiple occurrences of "/**" being used for something that is not a kernel-doc comment, so change all of these to just use "/*" comment style. The warning message for all of these is: FILE:LINE: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst kernel/power/swap.c:585: warning: ... Structure used for CRC32. kernel/power/swap.c:600: warning: ... * CRC32 update function that runs in its own thread. kernel/power/swap.c:627: warning: ... * Structure used for LZO data compression. kernel/power/swap.c:644: warning: ... * Compression function that runs in its own thread. kernel/power/swap.c:952: warning: ... * The following functions allow us to read data using a swap map kernel/power/swap.c:1111: warning: ... * Structure used for LZO data decompression. kernel/power/swap.c:1127: warning: ... * Decompression function that runs in its own thread. Also correct one spello/typo. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-14PM: Use the enum req_op and blk_opf_t typesBart Van Assche
Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Combine the first two hib_submit_io() arguments into a single argument. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-62-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-21Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
2022-03-01PM: hibernate: Clean up non-kernel-doc commentsJiapeng Chong
Address the following W=1 kernel build warning: kernel/power/swap.c:120: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-02block: pass a block_device and opf to bio_allocChristoph Hellwig
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: remove genhd.hChristoph Hellwig
There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-08PM: hibernate: Allow ACPI hardware signature to be honouredDavid Woodhouse
Theoretically, when the hardware signature in FACS changes, the OS is supposed to gracefully decline to attempt to resume from S4: "If the signature has changed, OSPM will not restore the system context and can boot from scratch" In practice, Windows doesn't do this and many laptop vendors do allow the signature to change especially when docking/undocking, so it would be a bad idea to simply comply with the specification by default in the general case. However, there are use cases where we do want the compliant behaviour and we know it's safe. Specifically, when resuming virtual machines where we know the hypervisor has changed sufficiently that resume will fail. We really want to be able to *tell* the guest kernel not to try, so it boots cleanly and doesn't just crash. This patch provides a way to opt in to the spec-compliant behaviour on the command line. A follow-up patch may do this automatically for certain "known good" machines based on a DMI match, or perhaps just for all hypervisor guests since there's no good reason a hypervisor would change the hardware_signature that it exposes to guests *unless* it wants them to obey the ACPI specification. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21PM: hibernate: Get block device exclusively in swsusp_check()Ye Bin
The following kernel crash can be triggered: [ 89.266592] ------------[ cut here ]------------ [ 89.267427] kernel BUG at fs/buffer.c:3020! [ 89.268264] invalid opcode: 0000 [#1] SMP KASAN PTI [ 89.269116] CPU: 7 PID: 1750 Comm: kmmpd-loop0 Not tainted 5.10.0-862.14.0.6.x86_64-08610-gc932cda3cef4-dirty #20 [ 89.273169] RIP: 0010:submit_bh_wbc.isra.0+0x538/0x6d0 [ 89.277157] RSP: 0018:ffff888105ddfd08 EFLAGS: 00010246 [ 89.278093] RAX: 0000000000000005 RBX: ffff888124231498 RCX: ffffffffb2772612 [ 89.279332] RDX: 1ffff11024846293 RSI: 0000000000000008 RDI: ffff888124231498 [ 89.280591] RBP: ffff8881248cc000 R08: 0000000000000001 R09: ffffed1024846294 [ 89.281851] R10: ffff88812423149f R11: ffffed1024846293 R12: 0000000000003800 [ 89.283095] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8881161f7000 [ 89.284342] FS: 0000000000000000(0000) GS:ffff88839b5c0000(0000) knlGS:0000000000000000 [ 89.285711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 89.286701] CR2: 00007f166ebc01a0 CR3: 0000000435c0e000 CR4: 00000000000006e0 [ 89.287919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 89.289138] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 89.290368] Call Trace: [ 89.290842] write_mmp_block+0x2ca/0x510 [ 89.292218] kmmpd+0x433/0x9a0 [ 89.294902] kthread+0x2dd/0x3e0 [ 89.296268] ret_from_fork+0x22/0x30 [ 89.296906] Modules linked in: by running the following commands: 1. mkfs.ext4 -O mmp /dev/sda -b 1024 2. mount /dev/sda /home/test 3. echo "/dev/sda" > /sys/power/resume That happens because swsusp_check() calls set_blocksize() on the target partition which confuses the file system: Thread1 Thread2 mount /dev/sda /home/test get s_mmp_bh --> has mapped flag start kmmpd thread echo "/dev/sda" > /sys/power/resume resume_store software_resume swsusp_check set_blocksize truncate_inode_pages_range truncate_cleanup_page block_invalidatepage discard_buffer --> clean mapped flag write_mmp_block submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) To address this issue, modify swsusp_check() to open the target block device with exclusive access. Signed-off-by: Ye Bin <yebin10@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21PM: hibernate: swap: Use vzalloc() and kzalloc()Cai Huoqing
Replace vmalloc()/memset() with vzalloc() and kmalloc()/memset() with kzalloc() to simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21PM: hibernate: fix sparse warningsAnders Roxell
When building the kernel with sparse enabled 'C=1' the following warnings shows up: kernel/power/swap.c:390:29: warning: incorrect type in assignment (different base types) kernel/power/swap.c:390:29: expected int ret kernel/power/swap.c:390:29: got restricted blk_status_t This is due to function hib_wait_io() returns a 'blk_status_t' which is a bitwise u8. Commit 5416da01ff6e ("PM: hibernate: Remove blk_status_to_errno in hib_wait_io") seemed to have mixed up the return type. However, the 4e4cbee93d56 ("block: switch bios to blk_status_t") actually broke the behaviour by returning the wrong type. Rework so function hib_wait_io() returns a 'int' instead of 'blk_status_t' and make sure to call function blk_status_to_errno(hb->error)' when returning from function hib_wait_io() a int gets returned. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Fixes: 5416da01ff6e ("PM: hibernate: Remove blk_status_to_errno in hib_wait_io") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-15PM: hibernate: Remove blk_status_to_errno in hib_wait_ioFalla Coulibaly
blk_status_to_errno doesn't appear to perform extra work besides converting blk_status_t to integer. This patch removes that unnecessary conversion as the return type of the function is blk_status_t. Signed-off-by: Falla Coulibaly <fallacoulibalyz@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-24PM: hibernate: fix spelling mistakesZhen Lei
Fix some spelling mistakes in comments: corresonds ==> corresponds alocated ==> allocated unitialized ==> uninitialized Deompression ==> Decompression Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-08PM: sleep: fix typos in commentsLu Jialin
Change "occured" to "occurred" in kernel/power/autosleep.c. Change "consiting" to "consisting" in kernel/power/snapshot.c. Change "avaiable" to "available" in kernel/power/swap.c. No functionality changed. Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-25PM: hibernate: flush swap writer after markingLaurent Badel
Flush the swap writer after, not before, marking the files, to ensure the signature is properly written. Fixes: 6f612af57821 ("PM / Hibernate: Group swap ops") Signed-off-by: Laurent Badel <laurentbadel@eaton.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-14Merge tag 'pm-5.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework the collection of cpufreq statistics to allow it to take place if fast frequency switching is enabled in the governor, rework the frequency invariance handling in the cpufreq core and drivers, add new hardware support to a couple of cpufreq drivers, fix a number of assorted issues and clean up the code all over. Specifics: - Rework cpufreq statistics collection to allow it to take place when fast frequency switching is enabled in the governor (Viresh Kumar). - Make the cpufreq core set the frequency scale on behalf of the driver and update several cpufreq drivers accordingly (Ionela Voinescu, Valentin Schneider). - Add new hardware support to the STI and qcom cpufreq drivers and improve them (Alain Volmat, Manivannan Sadhasivam). - Fix multiple assorted issues in cpufreq drivers (Jon Hunter, Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan Gerhold, Viresh Kumar). - Fix several assorted issues in the operating performance points (OPP) framework (Stephan Gerhold, Viresh Kumar). - Allow devfreq drivers to fetch devfreq instances by DT enumeration instead of using explicit phandles and modify the devfreq core code to support driver-specific devfreq DT bindings (Leonard Crestez, Chanwoo Choi). - Improve initial hardware resetting in the tegra30 devfreq driver and clean up the tegra cpuidle driver (Dmitry Osipenko). - Update the cpuidle core to collect state entry rejection statistics and expose them via sysfs (Lina Iyer). - Improve the ACPI _CST code handling diagnostics (Chen Yu). - Update the PSCI cpuidle driver to allow the PM domain initialization to occur in the OSI mode as well as in the PC mode (Ulf Hansson). - Rework the generic power domains (genpd) core code to allow domain power off transition to be aborted in the absence of the "power off" domain callback (Ulf Hansson). - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael Wysocki). - Fix the handling of timer_expires in the PM-runtime framework on 32-bit systems and the handling of device links in it (Grygorii Strashko, Xiang Chen). - Add IO requests batching support to the hibernate image saving and reading code and drop a bogus get_gendisk() from there (Xiaoyi Chen, Christoph Hellwig). - Allow PCIe ports to be put into the D3cold power state if they are power-manageable via ACPI (Lukas Wunner). - Add missing header file include to a power capping driver (Pujin Shi). - Clean up the qcom-cpr AVS driver a bit (Liu Shixin). - Kevin Hilman steps down as designated reviwer of adaptive voltage scaling (AVS) drivers (Kevin Hilman)" * tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) cpufreq: stats: Fix string format specifier mismatch arm: disable frequency invariance for CONFIG_BL_SWITCHER cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale() cpufreq: stats: Add memory barrier to store_reset() cpufreq: schedutil: Simplify sugov_fast_switch() ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe() ACPI: EC: PM: Flush EC work unconditionally after wakeup PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI PM: hibernate: remove the bogus call to get_gendisk() in software_resume() cpufreq: Move traces and update to policy->cur to cpufreq core cpufreq: stats: Enable stats for fast-switch as well cpufreq: stats: Mark few conditionals with unlikely() cpufreq: stats: Remove locking cpufreq: stats: Defer stats update to cpufreq_stats_record_transition() PM: domains: Allow to abort power off when no ->power_off() callback PM: domains: Rename power state enums for genpd PM / devfreq: tegra30: Improve initial hardware resetting PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function PM / devfreq: Add devfreq_get_devfreq_by_node function ...
2020-09-28PM: hibernate: Batch hibernate and resume IO requestsXiaoyi Chen
Hibernate and resume process submits individual IO requests for each page of the data, so use blk_plug to improve the batching of these requests. Testing this change with hibernate and resumes consistently shows merging of the IO requests and more than an order of magnitude improvement in hibernate and resume speed is observed. One hibernate and resume cycle for 16GB RAM out of 32GB in use takes around 21 minutes before the change, and 1 minutes after the change on a system with limited storage IOPS. Signed-off-by: Xiaoyi Chen <cxiaoyi@amazon.com> Co-Developed-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Anchal Agarwal <anchalag@amazon.com> [ rjw: Subject and changelog edits, white space damage fixes ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-23PM: mm: cleanup swsusp_swap_checkChristoph Hellwig
Use blkdev_get_by_dev instead of bdget + blkdev_get. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23mm: split swap_type_ofChristoph Hellwig
swap_type_of is used for two entirely different purposes: (1) check what swap type a given device/offset corresponds to (2) find the first available swap device that can be written to Mixing both in a single function creates an unreadable mess. Create two separate functions instead, and switch both to pass a dev_t instead of a struct block_device to further simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-05PM: hibernate: Add __init annotation to swsusp_header_init()Christophe JAILLET
'swsusp_header_init()' is only called via 'core_initcall'. It can be marked as __init to save a few bytes of memory. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-28kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()Fuqian Huang
Use zeroing allocator instead of using allocator followed with memset with 0 Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner
Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-02PM / hibernate: cast PAGE_SIZE to int when comparing with error codeChengguang Xu
If PAGE_SIZE is unsigned type then negative error code will be larger than PAGE_SIZE. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-12treewide: Use array_size() in vmalloc()Kees Cook
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-14block: consistently use GFP_NOIO instead of __GFP_NORECLAIMChristoph Hellwig
Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-29Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
2018-01-17PM / hibernate: Drop unused parameter of enough_swapKyungsik Lee
Parameter flags is no longer used, remove it. Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-01-06block: convert to bio_first_bvec_all & bio_first_page_allMing Lei
This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-03PM: Use a more common logging styleJoe Perches
Convert printks to pr_<level>. Miscellanea: o Use pr_fmt with "PM:" and remove "PM: " from format strings o Coalesce format strings and realign format arguments o Convert an embedded incorrect function name to "%s: ", __func__ o Convert a couple multi-line formats to multiple pr_<level> calls Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27PM / Hibernate: Use rb_entry() instead of container_of()Geliang Tang
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-26Merge tag 'pm-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Again, the majority of changes go into the cpufreq subsystem, but there are no big features this time. The cpufreq changes that stand out somewhat are the governor interface rework and improvements related to the handling of frequency tables. Apart from those, there are fixes and new device/CPU IDs in drivers, cleanups and an improvement of the new schedutil governor. Next, there are some changes in the hibernation core, including a fix for a nasty problem related to the MONITOR/MWAIT usage by CPU offline during resume from hibernation, a few core improvements related to memory management during resume, a couple of additional debug features and cleanups. Finally, we have some fixes and cleanups in the devfreq subsystem, generic power domains framework improvements related to system suspend/resume, support for some new chips in intel_idle and in the power capping RAPL driver, a new version of the AnalyzeSuspend utility and some assorted fixes and cleanups. Specifics: - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko)" * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" PM / hibernate: Introduce test_resume mode for hibernation cpufreq: export cpufreq_driver_resolve_freq() cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index() PCI / PM: check all fields in pci_set_platform_pm() cpufreq: acpi-cpufreq: use cached frequency mapping when possible cpufreq: schedutil: map raw required frequency to driver frequency cpufreq: add cpufreq_driver_resolve_freq() cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT intel_pstate: Update cpu_frequency tracepoint every time cpufreq: intel_pstate: clean remnant struct element PM / tools: scripts: AnalyzeSuspend v4.2 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation cpufreq: powernv: Replacing pstate_id with frequency table index intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() ...
2016-07-22PM / hibernate: Introduce test_resume mode for hibernationChen Yu
test_resume mode is to verify if the snapshot data written to swap device can be successfully restored to memory. It is useful to ease the debugging process on hibernation, since this mode can not only bypass the BIOSes/bootloader, but also the system re-initialization. To avoid the risk to break the filesystm on persistent storage, this patch resumes the image with tasks frozen. For example: echo test_resume > /sys/power/disk echo disk > /sys/power/state [ 187.306470] PM: Image saving progress: 70% [ 187.395298] PM: Image saving progress: 80% [ 187.476697] PM: Image saving progress: 90% [ 187.554641] PM: Image saving done. [ 187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s) [ 187.566000] PM: S| [ 187.589742] PM: Basic memory bitmaps freed [ 187.594694] PM: Checking hibernation image [ 187.599865] PM: Image signature found, resuming [ 187.605209] PM: Loading hibernation image. [ 187.665753] PM: Basic memory bitmaps created [ 187.691397] PM: Using 3 thread(s) for decompression. [ 187.691397] PM: Loading and decompressing image data (148650 pages)... [ 187.889719] PM: Image loading progress: 0% [ 188.100452] PM: Image loading progress: 10% [ 188.244781] PM: Image loading progress: 20% [ 189.057305] PM: Image loading done. [ 189.068793] PM: Image successfully loaded Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-07pm: use bio op accessorsMike Christie
Separate the op from the rq_flag_bits and have the pm code set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block/fs/drivers: remove rw argument from submit_bioMike Christie
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>