summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-17blk-mq: make sure hctx->next_cpu is set correctlyMing Lei
When hctx->next_cpu is set from possible online CPUs, there is one race in which hctx->next_cpu may be set as >= nr_cpu_ids, and finally break workqueue. The race can be triggered in the following two sitations: 1) when one CPU is becoming DEAD, blk_mq_hctx_notify_dead() is called to dispatch requests from the DEAD cpu context, but at that time, this DEAD CPU has been cleared from 'cpu_online_mask', so all CPUs in hctx->cpumask may become offline, and cause hctx->next_cpu set a bad value. 2) blk_mq_delay_run_hw_queue() is called from CPU B, and found the queue should be run on the other CPU A, then CPU A may become offline at the same time and all CPUs in hctx->cpumask become offline. This patch deals with this issue by re-selecting next CPU, and making sure it is set correctly. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: "jianchao.wang" <jianchao.w.wang@oracle.com> Tested-by: "jianchao.wang" <jianchao.w.wang@oracle.com> Fixes: 20e4d81393 ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17Merge tag 'perf-core-for-mingo-4.16-20180117' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Fix various per event 'max-stack' and 'call-graph=dwarf' issues, mostly in 'perf trace', allowing to use 'perf trace --call-graph' with 'dwarf' and 'fp' to setup the callgraph details for the syscall events and make that apply to other events, whilhe allowing to override that on a per-event basis, using '-e sched:*switch/call-graph=dwarf/' for instance (Arnaldo Carvalho de Melo) - Improve the --time percent support in record/report/script (Jin Yao) - Fix copyfile_offset update of output offset (Jiri Olsa) - Add python script to profile and resolve physical mem type (Kan Liang) - Add ARM Statistical Profiling Extensions (SPE) support (Kim Phillips) - Remove trailing semicolon in the evlist code (Luis de Bethencourt) - Fix incorrect handling of type _TERM_DRV_CFG (Mathieu Poirier) - Use asprintf when possible in libtraceevent (Federico Vaga) - Fix bad force_token escape sequence in libtraceevent (Michael Sartain) - Add UL suffix to MISSING_EVENTS in libtraceevent (Michael Sartain) - value of unknown symbolic fields in libtraceevent (Jan Kiszka) - libtraceevent updates: (Steven Rostedt) o Show value of flags that have not been parsed o Simplify pointer print logic and fix %pF o Handle new pointer processing of bprint strings o Show contents (in hex) of data of unrecognized type records o Fix get_field_str() for dynamic strings - Add missing break in FALSE case of pevent_filter_clear_trivial() (Taeung Song) - Fix failed memory allocation for get_cpuid_str (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-17Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-17ata: ahci_brcm: Recover from failures to identify devicesFlorian Fainelli
When powering up, the SATA controller may fail to mount the HDD. The SATA controller will lock up, preventing it from negotiating to a lower speed or transmitting data. Root cause is power supply noise creating resonance at 6 Ghz and 3 GHz frequencies, which causes instability in the Clock-Data Recovery (CDR) frontend module, resulting in false acquisition of the clock at SATA 6G/3G speeds. The SATA controller may fail to mount the HDD and lock up, requiring a power cycle. Broadcom chips suspected of being susceptible to this issue include BCM7445, BCM7439, and BCM7366. The Kernel implements an error recovery mechanism that resets the SATA PHY and digital controller when the controller locks up. During this error recovery process, typically there is less activity on the board and Broadcom STB chip, so that the power supply is less noisy, thus allowing the SATA controller to lock correctly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-17phy: brcm-sata: Implement calibrate callbackFlorian Fainelli
Implement the calibration callback to allow turning on the Clock-Data Recovery module clamping when necessary, e.g: during failure to identify a SATA hard drive from ahci_brcm.c::brcm_ahci_read_id. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-17aoe: use ktime_t instead of timevalTina Ruchandani
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' is deprecated because it uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is the recommended type for timestamping - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time. [updates by Arnd below] The original patch from Tina never went anywhere as we discussed how to keep the impact on performance minimal. I've started over now but arrived at basically the same patch that she had originally, except for an slightly improved tsince_hr() function. I'm making it more robust against overflows, and also optimize explicitly for the common case in which a frame is less than 4.2 seconds old, using only a 32-bit division in that case. This should make the new version more efficient than the old code, since we replace the existing two 32-bit division in do_gettimeofday() plus one multiplication with a single single 32-bit division in tsince_hr() and drop the double bookkeeping. It's also more efficient than the ktime_get_us() API we discussed before, since that would also rely on multiple divisions. Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Cc: Ed Cashin <ed.cashin@acm.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17drm/vmwgfx: fix memory corruption with legacy/sou connectorsRob Clark
It looks like in all cases 'struct vmw_connector_state' is used. But only in stdu connectors, was atomic_{duplicate,destroy}_state() properly subclassed. Leading to writes beyond the end of the allocated connector state block and all sorts of fun memory corruption related crashes. Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state" Cc: <stable@vger.kernel.org> Signed-off-by: Rob Clark <rclark@redhat.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-01-17RDMA/bnxt_re: Add support for query firmware versionSelvin Xavier
The device now reports firmware version thus, removing the hard coded values of the FW version string and redundant fw_rev hook from sysfs. Adding code to query firmware version from underlying device and report it through the kernel verb to get firmware version string. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Enable RoCE on virtual functionsSelvin Xavier
RoCE can be used by virtual functions (VFs) as well. Adding code changes to allow resource reservation, initialization and avail the resources to the RDMA applications running on those VFs. Currently, fifty percent of the total available resources are reserved for PF and remaining are equally divided among active VFs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATAJeremy Compostella
On a I2C_SMBUS_I2C_BLOCK_DATA read request, if data->block[0] is greater than I2C_SMBUS_BLOCK_MAX + 1, the underlying I2C driver writes data out of the msgbuf1 array boundary. It is possible from a user application to run into that issue by calling the I2C_SMBUS ioctl with data.block[0] greater than I2C_SMBUS_BLOCK_MAX + 1. This patch makes the code compliant with Documentation/i2c/dev-interface by raising an error when the requested size is larger than 32 bytes. Call Trace: [<ffffffff8139f695>] dump_stack+0x67/0x92 [<ffffffff811802a4>] panic+0xc5/0x1eb [<ffffffff810ecb5f>] ? vprintk_default+0x1f/0x30 [<ffffffff817456d3>] ? i2cdev_ioctl_smbus+0x303/0x320 [<ffffffff8109a68b>] __stack_chk_fail+0x1b/0x20 [<ffffffff817456d3>] i2cdev_ioctl_smbus+0x303/0x320 [<ffffffff81745aed>] i2cdev_ioctl+0x4d/0x1e0 [<ffffffff811f761a>] do_vfs_ioctl+0x2ba/0x490 [<ffffffff81336e43>] ? security_file_ioctl+0x43/0x60 [<ffffffff811f7869>] SyS_ioctl+0x79/0x90 [<ffffffff81a22e97>] entry_SYSCALL_64_fastpath+0x12/0x6a Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-01-17i2c: core: decrease reference count of device node in i2c_unregister_deviceLixin Wang
Reference count of device node was increased in of_i2c_register_device, but without decreasing it in i2c_unregister_device. Then the added device node will never be released. Fix this by adding the of_node_put. Signed-off-by: Lixin Wang <alan.1.wang@nokia-sbell.com> Tested-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-01-17dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIREDMing Lei
Avoid using DM_MAPIO_REQUEUE unless absolutely necessary because it results in dm-rq.c:dm_mq_queue_rq() returning BLK_STS_RESOURCE to blk-mq -- doing so should only ever be done if the underlying queue is out of resources. So switch to returning DM_MAPIO_DELAY_REQUEUE from multipath_clone_and_map() if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED are set. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failureMing Lei
blk-mq will rerun queue via RESTART or dispatch wake after one request is completed, so not necessary to wait random time for requeuing, we should trust blk-mq to do it. More importantly, we need to return BLK_STS_RESOURCE to blk-mq so that dequeuing from the I/O scheduler can be stopped, this results in improved I/O merging. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm log writes: fix max length used for kstrndupMa Shimiao
If source string is longer than max, kstrndup will allocate max+1 space. So make sure the result will not exceed max. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm: backfill missing calls to mutex_destroy()Mike Snitzer
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm snapshot: use mutex instead of rw_semaphoreMikulas Patocka
The rw_semaphore is acquired for read only in two places, neither is performance-critical. So replace it with a mutex -- which is more efficient. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm flakey: check for null arg_name in parse_features()Goldwyn Rodrigues
One can crash dm-flakey by specifying more feature arguments than the number of features supplied. Checking for null in arg_name avoids this. dmsetup create flakey-test --table "0 66076080 flakey /dev/sdb9 0 0 180 2 drop_writes" Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin: extend thinpool status format string with omitted fieldsmulhern
Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin: fixes in thin-provisioning.txtmulhern
Make the format string for thinpool status more correct. Swap the order of two items to correspond with reality. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin: document representation of <highest mapped sector> when there is nonemulhern
Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin: fix documentation relative to low water mark thresholdmulhern
Fixes: 1. The use of "exceeds" when the opposite of exceeds, falls below, was meant. 2. Properly speaking, a table can not exceed a threshold. It emphasizes the important point, which is that it is the userspace daemon's responsibility to check for low free space when a device is resumed, since it won't get a special event indicating low free space in that situation. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm cache: be consistent in specifying sectors and SI units in cache.txtmulhern
Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm cache: delete obsoleted paragraph in cache.txtmulhern
The 'mq' policy is no longer the default policy, and the default policy, 'smq', does not store hit counts. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm cache: fix grammar in cache-policies.txtmulhern
Use possessive pronoun where appropriate, instead of contraction. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm snapshot: improve documentation relative to origin suspend requirementsMikulas Patocka
Add a note to snapshot.txt that the origin target must be suspended when loading or unloading the snapshot target. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm: move dm_table_destroy() to same header as dm_table_create()Brian Norris
If anyone is going to use dm_table_create(), they probably should be able to use dm_table_destroy() too. Move the dm_table_destroy() definition outside the private header, near dm_table_create() Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm raid: make raid_sets symbol staticWei Yongjun
Fixes the following sparse warning: drivers/md/dm-raid.c:33:1: warning: symbol 'raid_sets' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm bufio: eliminate unnecessary labels in dm_bufio_client_create()Mike Snitzer
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm bufio: check result of register_shrinker()Aliaksei Karaliou
dm_bufio_client_create() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm bufio: add missed destroys of client mutexAliaksei Karaliou
The client's mutex needs to be destroyed in dm_bufio_client_destroy() as well as the dm_bufio_client_create() error path. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm bufio: use REQ_OP_READ and REQ_OP_WRITEMikulas Patocka
Use REQ_OP_READ and REQ_OP_WRITE macros instead of READ and WRITE. They have the same value, but the block layer uses REQ_OP so bufio should too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm: add unstriped targetScott Bauer
This device mapper "unstriped" target remaps and unstripes I/O so it is issued solely on a single drive in a HW RAID0 or dm-striped target. In a 4 drive HW RAID0 the striped target exposes 1/4th of the LBA range as a virtual drive. Each I/O to that virtual drive will only be issued to the 1 drive that was selected of the 4 drives in the HW RAID0. This unstriped target is most useful for Intel NVMe drives that have multiple cores but that do not have firmware control to pin separate LBA ranges to each discrete cpu core. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm crypt: fix error return code in crypt_ctr()Wei Yongjun
Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool() error handling case instead of 0, as done elsewhere in this function. Fixes: ef43aa38063a6 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm crypt: wipe kernel key copy after IV initializationOndrej Kozina
Loading key via kernel keyring service erases the internal key copy immediately after we pass it in crypto layer. This is wrong because IV is initialized later and we use wrong key for the initialization (instead of real key there's just zeroed block). The bug may cause data corruption if key is loaded via kernel keyring service first and later same crypt device is reactivated using exactly same key in hexbyte representation, or vice versa. The bug (and fix) affects only ciphers using following IVs: essiv, lmk and tcw. Fixes: c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service") Cc: stable@vger.kernel.org # 4.10+ Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm integrity: don't store cipher request on the stackMikulas Patocka
Some asynchronous cipher implementations may use DMA. The stack may be mapped in the vmalloc area that doesn't support DMA. Therefore, the cipher request and initialization vector shouldn't be on the stack. Fix this by allocating the request and iv with kmalloc. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm crypt: fix crash by adding missing check for auth key sizeMilan Broz
If dm-crypt uses authenticated mode with separate MAC, there are two concatenated part of the key structure - key(s) for encryption and authentication key. Add a missing check for authenticated key length. If this key length is smaller than actually provided key, dm-crypt now properly fails instead of crashing. Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # 4.12+ Reported-by: Salah Coronya <salahx@yahoo.com> Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm btree: fix serious bug in btree_split_beneath()Joe Thornber
When inserting a new key/value pair into a btree we walk down the spine of btree nodes performing the following 2 operations: i) space for a new entry ii) adjusting the first key entry if the new key is lower than any in the node. If the _root_ node is full, the function btree_split_beneath() allocates 2 new nodes, and redistibutes the root nodes entries between them. The root node is left with 2 entries corresponding to the 2 new nodes. btree_split_beneath() then adjusts the spine to point to one of the two new children. This means the first key is never adjusted if the new key was lower, ie. operation (ii) gets missed out. This can result in the new key being 'lost' for a period; until another low valued key is inserted that will uncover it. This is a serious bug, and quite hard to make trigger in normal use. A reproducing test case ("thin create devices-in-reverse-order") is available as part of the thin-provision-tools project: https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593 Fix the issue by changing btree_split_beneath() so it no longer adjusts the spine. Instead it unlocks both the new nodes, and lets the main loop in btree_insert_raw() relock the appropriate one and make any neccessary adjustments. Cc: stable@vger.kernel.org Reported-by: Monty Pavel <monty_pavel@sina.com> Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6Dennis Yang
For btree removal, there is a corner case that a single thread could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5) and leads to deadlock. A btree removal might eventually call rebalance_children()->rebalance3() to rebalance entries of three neighbor child nodes when shadow_spine has already acquired two write locks. In rebalance3(), it tries to shadow and acquire the write locks of all three child nodes. However, shadowing a child node requires acquiring a read lock of the original child node and a write lock of the new block. Although the read lock will be released after block shadowing, shadowing the third child node in rebalance3() could still take the sixth lock. (2 write locks for shadow_spine + 2 write locks for the first two child nodes's shadow + 1 write lock for the last child node's shadow + 1 read lock for the last child node) Cc: stable@vger.kernel.org Signed-off-by: Dennis Yang <dennisyang@qnap.com> Acked-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in ↵Tianyu Lan
kvm_valid_sregs() kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is to fix it. Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set) Reported-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-17Merge tag 'kvm-arm-fixes-for-v4.15-3-v2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Fixes for v4.15, Round 3 (v2) Three more fixes for v4.15 fixing incorrect huge page mappings on systems using the contigious hint for hugetlbfs; supporting an alternative GICv4 init sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
2018-01-17Turn gfs2_block_truncate_page into gfs2_block_zero_rangeAndreas Gruenbacher
Turn gfs2_block_truncate_page into a function that zeroes a range within a block rather than only the end of a block. This will be used for cleaning the end of the first partial block and the start of the last partial block when punching a hole in a file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Improve non-recursive delete algorithmAndreas Gruenbacher
In rare cases, the current non-recursive delete algorithm doesn't deallocate empty intermediary indirect blocks. This should have very little practical effect, but deallocating all blocks correctly should still be preferable as it is cleaner and easier to validate. The fix consists of using the first block to deallocate to compute the start marker of the truncate point instead of the last block that needs to be kept. With that change, computing which indirect blocks are still needed becomes relatively easy. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Fix metadata read-ahead during truncateAndreas Gruenbacher
The metadata read-ahead algorithm broke when switching from recursive to non-recursive delete: the current algorithm reads ahead blocks at height N - 1 while deallocating the blocks at hight N. However, deallocating the blocks at height N requires a complete walk of the metadata tree, not only down to height N - 1. Consequently, all blocks below height N - 1 will be accessed without read-ahead. Fix this by issuing read-aheads as early as possible, after each metapath lookup. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Clean up {lookup,fillup}_metapathAndreas Gruenbacher
Split out the entire lookup loop from lookup_metapath and fillup_metapath. Make both functions return the actual height in mp->mp_aheight, and return 0 on success. Handle lookup errors properly in trunc_dealloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Remove minor gfs2_journaled_truncate inefficienciesAndreas Gruenbacher
First, this function truncates the file in chunks. When the original file size isn't block aligned, each chunk that is truncated will remain be misaligned. This is inefficient. Second, this function doesn't recognize where holes are, so it loops through them. For each chunk of a hole, it creates a new transaction. At least avoid creating another transactions whe the current one is still empty. (An better fix would be to skip large holes, of course.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: truncate: Remove unnecessary oldsize parametersAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Clean up trunc_start error pathAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Remove pointless BUG_ONAndreas Gruenbacher
The current transaction is being dereferenced before asserting that is not NULL; that isn't going to help. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Add gfs2_blk2rgrpd comment and fix incorrect useSteven Whitehouse
Document when to use gfs2_blk2rgrpd for "inexact" resource group matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in sweep_bh_for_rgrps. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17perf record: Fix failed memory allocation for get_cpuid_strThomas Richter
In x86 architecture dependend part function get_cpuid_str() mallocs a 128 byte buffer, but does not check if the memory allocation succeeded or not. When the memory allocation fails, function __get_cpuid() is called with first parameter being a NULL pointer. However this function references its first parameter and operates on a NULL pointer which might cause core dumps. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180117131611.34319-1-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>