linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-07-25	bcache: fix bio_{start,end}_io_acct with proper device	Coly Li
	Commit 85750aeb748f ("bcache: use bio_{start,end}_io_acct") moves the io account code to the location after bio_set_dev(bio, dc->bdev) in cached_dev_make_request(). Then the account is performed incorrectly on backing device, indeed the I/O should be counted to bcache device like /dev/bcache0. With the mistaken I/O account, iostat does not display I/O counts for bcache device and all the numbers go to backing device. In writeback mode, the hard drive may have 340K+ IOPS which is impossible and wrong for spinning disk. This patch introduces bch_bio_start_io_acct() and bch_bio_end_io_acct(), which switches bio->bi_disk to bcache device before calling bio_start_io_acct() or bio_end_io_acct(). Now the I/Os are counted to bcache device, and bcache device, cache device and backing device have their correct I/O count information back. Fixes: 85750aeb748f ("bcache: use bio_{start,end}_io_acct") Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: avoid extra memory consumption in struct bbio for large bucket size	Coly Li
	Bcache uses struct bbio to do I/Os for meta data pages like uuids, disk_buckets, prio_buckets, and btree nodes. Example writing a btree node onto cache device, the process is, - Allocate a struct bbio from mempool c->bio_meta. - Inside struct bbio embedded a struct bio, initialize bi_inline_vecs for this embedded bio. - Call bch_bio_map() to map each meta data page to each bv from the inlined bi_io_vec table. - Call bch_submit_bbio() to submit the bio into underlying block layer. - When the I/O completed, only release the struct bbio, don't touch the reference counter of the meta data pages. The struct bbio is defined as, 738 struct bbio { 739 unsigned int submit_time_us; [snipped] 748 struct bio bio; 749 }; Because struct bio is embedded at the end of struct bbio, therefore the actual size of struct bbio is sizeof(struct bio) + size of the embedded bio->bi_inline_vecs. Now all the meta data bucket size are limited to meta_bucket_pages(), if the bucket size is large than meta_bucket_pages()PAGE_SECTORS, rested space in the bucket is unused. Therefore the most used space in meta bucket is (1<<MAX_ORDER) pages, or (1<<CONFIG_FORCE_MAX_ZONEORDER) if it is configured. Therefore for large bucket size, it is unnecessary to calculate the allocation size of mempool c->bio_meta as, mempool_init_kmalloc_pool(&c->bio_meta, 2, sizeof(struct bbio) + sizeof(struct bio_vec) bucket_pages(c)) It is too large, neither the Linux buddy allocator cannot allocate so much continuous pages, nor the extra allocated pages are wasted. This patch replace bucket_pages() to meta_bucket_pages() in two places, - In bch_cache_set_alloc(), when initialize mempool c->bio_meta, uses sizeof(struct bbio) + sizeof(struct bio_vec) * bucket_pages(c) to set the allocating object size. - In bch_bbio_alloc(), when calling bio_init() to set inline bvec talbe bi_inline_bvecs, uses meta_bucket_pages() to indicate number of the inline bio vencs number. Now the maximum size of embedded bio inside struct bbio exactly matches the limit of meta_bucket_pages(), no extra page wasted. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: avoid extra memory allocation from mempool c->fill_iter	Coly Li
	Mempool c->fill_iter is used to allocate memory for struct btree_iter in bch_btree_node_read_done() to iterate all keys of a read-in btree node. The allocation size is defined in bch_cache_set_alloc() by, mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size)) where iter_size is defined by a calculation, (sb->bucket_size / sb->block_size + 1) * sizeof(struct btree_iter_set) For 16bit width bucket_size the calculation is OK, but now the bucket size is extended to 32bit, the bucket size can be 2GB. By the above calculation, iter_size can be 2048 pages (order 11 is still accepted by buddy allocator). But the actual size holds the bkeys in meta data bucket is limited to meta_bucket_pages() already, which is 16MB. By the above calculation, if replace sb->bucket_size by meta_bucket_pages() * PAGE_SECTORS, the result is 16 pages. This is the size large enough for the mempool allocation to struct btree_iter. Therefore in worst case every time mempool c->fill_iter allocates, at most 4080 pages are wasted and won't be used. Therefore this patch uses meta_bucket_pages() * PAGE_SECTORS to calculate the iter size in bch_cache_set_alloc(), to avoid extra memory allocation from mempool c->fill_iter. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: add sysfs file to display feature sets information of cache set	Coly Li
	The following three sysfs files are created to display according feature set information of bcache: /sys/fs/bcache/<cache set UUID>/internal/feature_compat /sys/fs/bcache/<cache set UUID>/internal/feature_ro_compat /sys/fs/bcache/<cache set UUID>/internal/feature_incompat is added by this patch, to display feature sets information of the cache set. Now only an incompat feature 'large_bucket' added in bcache, the sysfs file content is: [large_bucket] string large_bucket means the running bcache drive supports incompat feature 'large_bucket', the wrapping [] means the 'large_bucket' feature is currently enabled on this cache set. This patch is ready to display compat and ro_compat features, in future once bcache code implements such feature sets, the according feature strings will be displayed in their sysfs files too. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: add bucket_size_hi into struct cache_sb_disk for large bucket	Coly Li
	The large bucket feature is to extend bucket_size from 16bit to 32bit. When create cache device on zoned device (e.g. zoned NVMe SSD), making a single bucket cover one or more zones of the zoned device is the simplest way to support zoned device as cache by bcache. But current maximum bucket size is 16MB and a typical zone size of zoned device is 256MB, this is the major motiviation to extend bucket size to a larger bit width. This patch is the basic and first change to support large bucket size, the major changes it makes are, - Add BCH_FEATURE_INCOMPAT_LARGE_BUCKET for the large bucket feature, INCOMPAT means it introduces incompatible on-disk format change. - Add BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET) routines. - Adds __le16 bucket_size_hi into struct cache_sb_disk at offset 0x8d0 for the on-disk super block format. - For the in-memory super block struct cache_sb, member bucket_size is extended from __u16 to __32. - Add get_bucket_size() to combine the bucket_size and bucket_size_hi from struct cache_sb_disk into an unsigned int value. Since we already have large bucket size helpers meta_bucket_pages(), meta_bucket_bytes() and alloc_meta_bucket_pages(), they make sure when bucket size > 8MB, the memory allocation for bcache meta data bucket won't fail no matter how large the bucket size extended. So these meta data buckets are handled properly when the bucket size width increase from 16bit to 32bit, we don't need to worry about them. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: handle btree node memory allocation properly for bucket size > 8MB	Coly Li
	Currently the bcache internal btree node occupies a whole bucket. When loading the btree node from cache device into memory, mca_data_alloc() will call bch_btree_keys_alloc() to allocate memory for the whole bucket size, ilog2(b->c->btree_pages) is send to bch_btree_keys_alloc() as the parameter 'page_order'. c->btree_pages is set as bucket_pages() in bch_cache_set_alloc(), for bucket size > 8MB, ilog2(b->c->btree_pages) is 12 for 4KB page size. By default the maximum page order __get_free_pages() accepts is MAX_ORDER (11), in this condition bch_btree_keys_alloc() will always fail. Because of other over-page-order allocation failure fails the cache device registration, such btree node allocation failure wasn't observed during runtime. After other blocking page allocation failures for bucket size > 8MB, this btree node allocation issue may trigger potentical risk e.g. infinite dead-loop to retry btree node allocation after failure. This patch fixes the potential problem by setting c->btree_pages to meta_bucket_pages() in bch_cache_set_alloc(). In the condition that bucket size > 8MB, meta_bucket_pages() will always return a number which won't exceed the maximum page order of the buddy allocator. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: handle cache set verify_ondisk properly for bucket size > 8MB	Coly Li
	In bch_btree_cache_alloc() when CONFIG_BCACHE_DEBUG is configured, allocate memory for c->verify_ondisk may fail if the bucket size > 8MB, which will require __get_free_pages() to allocate continuous pages with order > 11 (the default MAX_ORDER of Linux buddy allocator). Such over size allocation will fail, and cause 2 problems, - When CONFIG_BCACHE_DEBUG is configured, bch_btree_verify() does not work, because c->verify_ondisk is NULL and bch_btree_verify() returns immediately. - bch_btree_cache_alloc() will fail due to c->verify_ondisk allocation failed, then the whole cache device registration fails. And because of this failure, the first problem of bch_btree_verify() has no chance to be triggered. This patch fixes the above problem by two means, 1) If pages allocation of c->verify_ondisk fails, set it to NULL and returns bch_btree_cache_alloc() with -ENOMEM. 2) When calling __get_free_pages() to allocate c->verify_ondisk pages, use ilog2(meta_bucket_pages(&c->sb)) to make sure ilog2() will always generate a pages order <= MAX_ORDER (or CONFIG_FORCE_MAX_ZONEORDER). Then the buddy system won't directly reject the allocation request. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: handle cache prio_buckets and disk_buckets properly for bucket size ↵	Coly Li
	> 8MB Similar to c->uuids, struct cache's prio_buckets and disk_buckets also have the potential memory allocation failure during cache registration if the bucket size > 8MB. ca->prio_buckets can be stored on cache device in multiple buckets, its in-memory space is allocated by kzalloc() interface but normally allocated by alloc_pages() because the size > KMALLOC_MAX_CACHE_SIZE. So allocation of ca->prio_buckets has the MAX_ORDER restriction too. If the bucket size > 8MB, by default the page allocator will fail because the page order > 11 (default MAX_ORDER value). ca->prio_buckets should also use meta_bucket_bytes(), meta_bucket_pages() to decide its memory size and use alloc_meta_bucket_pages() to allocate pages, to avoid the allocation failure during cache set registration when bucket size > 8MB. ca->disk_buckets is a single bucket size memory buffer, it is used to iterate each bucket of ca->prio_buckets, and compose the bio based on memory of ca->disk_buckets, then write ca->disk_buckets memory to cache disk one-by-one for each bucket of ca->prio_buckets. ca->disk_buckets should have in-memory size exact to the meta_bucket_pages(), this is the size that ca->prio_buckets will be stored into each on-disk bucket. This patch fixes the above issues and handle cache's prio_buckets and disk_buckets properly for bucket size larger than 8MB. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: handle c->uuids properly for bucket size > 8MB	Coly Li
	Bcache allocates a whole bucket to store c->uuids on cache device, and allocates continuous pages to store it in-memory. When the bucket size exceeds maximum allocable continuous pages, bch_cache_set_alloc() will fail and cache device registration will fail. This patch allocates c->uuids by alloc_meta_bucket_pages(), and uses ilog2(meta_bucket_pages(c)) to indicate order of c->uuids pages when free it. When writing c->uuids to cache device, its size is decided by meta_bucket_pages(c) * PAGE_SECTORS. Now c->uuids is properly handled for bucket size > 8MB. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: introduce meta_bucket_pages() related helper routines	Coly Li
	Currently the in-memory meta data like c->uuids or c->disk_buckets are allocated by alloc_bucket_pages(). The macro alloc_bucket_pages() calls __get_free_pages() to allocated continuous pages with order indicated by ilog2(bucket_pages(c)), #define alloc_bucket_pages(gfp, c) \ ((void *) __get_free_pages(__GFP_ZERO\|gfp, ilog2(bucket_pages(c)))) The maximum order is defined as MAX_ORDER, the default value is 11 (and can be overwritten by CONFIG_FORCE_MAX_ZONEORDER). In bcache code the maximum bucket size width is 16bits, this is restricted both by KEY_SIZE size and bucket_size size from struct cache_sb_disk. The maximum 16bits width and power-of-2 value is (1<<15) in unit of sector (512byte). It means the maximum value of bucket size in bytes is (1<<24) bytes a.k.a 4096 pages. When the bucket size is set to maximum permitted value, ilog2(4096) is 12, which exceeds the default maximum order __get_free_pages() can accepted, the failed pages allocation will fail cache set registration procedure and print a kernel oops message for the exceeded pages order. This patch introduces meta_bucket_pages(), meta_bucket_bytes(), and alloc_bucket_pages() helper routines. meta_bucket_pages() indicates the maximum pages can be allocated to meta data bucket, meta_bucket_bytes() indicates the according maximum bytes, and alloc_bucket_pages() does the pages allocation for meta bucket. Because meta_bucket_pages() chooses the smaller value among the bucket size and MAX_ORDER_NR_PAGES, it still works when MAX_ORDER overwritten by CONFIG_FORCE_MAX_ZONEORDER. Following patches will use these helper routines to decide maximum pages can be allocated for different meta data buckets. If the bucket size is larger than meta_bucket_bytes(), the bcache registration can continue to success, just the space more than meta_bucket_bytes() inside the bucket is wasted. Comparing bcache failed for large bucket size, wasting some space for meta data buckets is acceptable at this moment. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: move bucket related code into read_super_common()	Coly Li
	Setting sb->first_bucket and checking sb->keys indeed are only for cache device, it does not make sense to do them in read_super() for backing device too. This patch moves the related code piece into read_super_common() explicitly for cache device and avoid the confusion. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: increase super block version for cache device and backing device	Coly Li
	The new added super block version BCACHE_SB_VERSION_BDEV_WITH_FEATURES (5) BCACHE_SB_VERSION_CDEV_WITH_FEATURES value (6), is for the feature set bits. Devices have super block version equal to the new version will have three new members for feature set bits in the on-disk super block, __le64 feature_compat; __le64 feature_incompat; __le64 feature_ro_compat; They are used for further new features which may introduce on-disk format change, and avoid unncessary super block version increase. The very basic features handling code skeleton is also initialized in this patch. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: fix super block seq numbers comparision in register_cache_set()	Coly Li
	In register_cache_set(), c is pointer to struct cache_set, and ca is pointer to struct cache, if ca->sb.seq > c->sb.seq, it means this registering cache has up to date version and other members, the in- memory version and other members should be updated to the newer value. But current implementation makes a cache set only has a single cache device, so the above assumption works well except for a special case. The execption is when a cache device new created and both ca->sb.seq and c->sb.seq are 0, because the super block is never flushed out yet. In the location for the following if() check, 2156 if (ca->sb.seq > c->sb.seq) { 2157 c->sb.version = ca->sb.version; 2158 memcpy(c->sb.set_uuid, ca->sb.set_uuid, 16); 2159 c->sb.flags = ca->sb.flags; 2160 c->sb.seq = ca->sb.seq; 2161 pr_debug("set version = %llu\n", c->sb.version); 2162 } c->sb.version is not initialized yet and valued 0. When ca->sb.seq is 0, the if() check will fail (because both values are 0), and the cache set version, set_uuid, flags and seq won't be updated. The above problem is hiden for current code, because the bucket size is compatible among different super block version. And the next time when running cache set again, ca->sb.seq will be larger than 0 and cache set super block version will be updated properly. But if the large bucket feature is enabled, sb->bucket_size is the low 16bits of the bucket size. For a power of 2 value, when the actual bucket size exceeds 16bit width, sb->bucket_size will always be 0. Then read_super_common() will fail because the if() check to is_power_of_2(sb->bucket_size) is false. This is how the long time hidden bug is triggered. This patch modifies the if() check to the following way, 2156 if (ca->sb.seq > c->sb.seq \|\| c->sb.seq == 0) { Then cache set's version, set_uuid, flags and seq will always be updated corectly including for a new created cache device. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: disassemble the big if() checks in bch_cache_set_alloc()	Coly Li
	In bch_cache_set_alloc() there is a big if() checks combined by 11 items together. When this big if() statement fails, it is difficult to tell exactly which item fails indeed. This patch disassembles this big if() checks into 11 single if() checks, which makes code debug more easier. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: add more accurate error information in read_super_common()	Coly Li
	The improperly set bucket or block size will trigger error in read_super_common(). For large bucket size, a more accurate error message for invalid bucket or block size is necessary. This patch disassembles the combined if() checks into multiple single if() check, and provide more accurate error message for each check failure condition. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: add read_super_common() to read major part of super block	Coly Li
	Later patches will introduce feature set bits to on-disk super block and increase super block version. Current code in read_super() which reads common part of super block for version BCACHE_SB_VERSION_CDEV and version BCACHE_SB_VERSION_CDEV_WITH_UUID will be shared with the new version. Therefore this patch moves the reusable part into read_super_common(), this preparation patch will make later patches more simplier and only focus on new feature set bits. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: fix overflow in offset_to_stripe()	Coly Li
	offset_to_stripe() returns the stripe number (in type unsigned int) from an offset (in type uint64_t) by the following calculation, do_div(offset, d->stripe_size); For large capacity backing device (e.g. 18TB) with small stripe size (e.g. 4KB), the result is 4831838208 and exceeds UINT_MAX. The actual returned value which caller receives is 536870912, due to the overflow. Indeed in bcache_device_init(), bcache_device->nr_stripes is limited in range [1, INT_MAX]. Therefore all valid stripe numbers in bcache are in range [0, bcache_dev->nr_stripes - 1]. This patch adds a upper limition check in offset_to_stripe(): the max valid stripe number should be less than bcache_device->nr_stripes. If the calculated stripe number from do_div() is equal to or larger than bcache_device->nr_stripe, -EINVAL will be returned. (Normally nr_stripes is less than INT_MAX, exceeding upper limitation doesn't mean overflow, therefore -EOVERFLOW is not used as error code.) This patch also changes nr_stripes' type of struct bcache_device from 'unsigned int' to 'int', and return value type of offset_to_stripe() from 'unsigned int' to 'int', to match their exact data ranges. All locations where bcache_device->nr_stripes and offset_to_stripe() are referenced also get updated for the above type change. Reported-and-tested-by: Ken Raeburn <raeburn@redhat.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: avoid nr_stripes overflow in bcache_device_init()	Coly Li
	For some block devices which large capacity (e.g. 8TB) but small io_opt size (e.g. 8 sectors), in bcache_device_init() the stripes number calcu- lated by, DIV_ROUND_UP_ULL(sectors, d->stripe_size); might be overflow to the unsigned int bcache_device->nr_stripes. This patch uses the uint64_t variable to store DIV_ROUND_UP_ULL() and after the value is checked to be available in unsigned int range, sets it to bache_device->nr_stripes. Then the overflow is avoided. Reported-and-tested-by: Ken Raeburn <raeburn@redhat.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: Use struct_size() in kzalloc()	Gustavo A. R. Silva
	Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: movinggc: Use struct_size() helper in kzalloc()	Gustavo A. R. Silva
	Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: writeback: Remove unneeded variable i	Xu Wang
	Remove unneeded variable i in bch_dirty_init_thread(). Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: journel: use for_each_clear_bit() to simplify the code	Xu Wang
	Using for_each_clear_bit() to simplify the code. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: allocate meta data pages as compound pages	Coly Li
	There are some meta data of bcache are allocated by multiple pages, and they are used as bio bv_page for I/Os to the cache device. for example cache_set->uuids, cache->disk_buckets, journal_write->data, bset_tree->data. For such meta data memory, all the allocated pages should be treated as a single memory block. Then the memory management and underlying I/O code can treat them more clearly. This patch adds __GFP_COMP flag to all the location allocating >0 order pages for the above mentioned meta data. Then their pages are treated as compound pages now. Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	bcache: Fix typo in Kconfig name	Jean Delvare
	registraion -> registration Fixes: 0c8d3fceade2 ("bcache: configure the asynchronous registertion to be experimental") Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Coly Li <colyli@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-25	firmware_loader: EFI firmware loader must handle pre-allocated buffer	Kees Cook
	The EFI platform firmware fallback would clobber any pre-allocated buffers. Instead, correctly refuse to reallocate when too small (as already done in the sysfs fallback), or perform allocation normally when needed. Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firmware_request_platform()") Cc: stable@vger.kernel.org Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200724213640.389191-4-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-25	test_firmware: Test platform fw loading on non-EFI systems	Kees Cook
	On non-EFI systems, it wasn't possible to test the platform firmware loader because it will have never set "checked_fw" during __init. Instead, allow the test code to override this check. Additionally split the declarations into a private header file so it there is greater enforcement of the symbol visibility. Fixes: 548193cba2a7 ("test_firmware: add support for firmware_request_platform") Cc: stable@vger.kernel.org Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200724213640.389191-2-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-24	drivers/net/wan: lapb: Corrected the usage of skb_cow	Xie He
	This patch fixed 2 issues with the usage of skb_cow in LAPB drivers "lapbether" and "hdlc_x25": 1) After skb_cow fails, kfree_skb should be called to drop a reference to the skb. But in both drivers, kfree_skb is not called. 2) skb_cow should be called before skb_push so that is can ensure the safety of skb_push. But in "lapbether", it is incorrectly called after skb_push. More details about these 2 issues: 1) The behavior of calling kfree_skb on failure is also the behavior of netif_rx, which is called by this function with "return netif_rx(skb);". So this function should follow this behavior, too. 2) In "lapbether", skb_cow is called after skb_push. This results in 2 logical issues: a) skb_push is not protected by skb_cow; b) An extra headroom of 1 byte is ensured after skb_push. This extra headroom has no use in this function. It also has no use in the upper-layer function that this function passes the skb to (x25_lapb_receive_frame in net/x25/x25_dev.c). So logically skb_cow should instead be called before skb_push. Cc: Eric Dumazet <edumazet@google.com> Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24	Merge tag 'pci-v5.8-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into master Pull PCI fixes from Bjorn Helgaas: - Reject invalid IRQ 0 command line argument for virtio_mmio because IRQ 0 now generates warnings (Bjorn Helgaas) - Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms", which broke nouveau (Bjorn Helgaas) * tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms" virtio-mmio: Reject invalid IRQ 0 command line argument
2020-07-24	Merge tag 'wireless-drivers-2020-07-24' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.8 Second set of fixes for v5.8, and hopefully also the last. Three important regressions fixed. ath9k * fix a regression which broke support for all ath9k usb devices ath10k * fix a regression which broke support for all QCA4019 AHB devices iwlwifi * fix a regression which broke support for some Killer Wireless-AC 1550 cards ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24	xen-netfront: fix potential deadlock in xennet_remove()	Andrea Righi
	There's a potential race in xennet_remove(); this is what the driver is doing upon unregistering a network device: 1. state = read bus state 2. if state is not "Closed": 3. request to set state to "Closing" 4. wait for state to be set to "Closing" 5. request to set state to "Closed" 6. wait for state to be set to "Closed" If the state changes to "Closed" immediately after step 1 we are stuck forever in step 4, because the state will never go back from "Closed" to "Closing". Make sure to check also for state == "Closed" in step 4 to prevent the deadlock. Also add a 5 sec timeout any time we wait for the bus state to change, to avoid getting stuck forever in wait_event(). Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24	soc: TI knav_qmss: make symbol 'knav_acc_range_ops' static	Wei Yongjun
	The sparse tool complains as follows: drivers/soc/ti/knav_qmss_acc.c:453:23: warning: symbol 'knav_acc_range_ops' was not declared. Should it be static? 'knav_acc_range_ops' is not used outside of knav_qmss_acc.c, so marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	firmware: ti_sci: Replace HTTP links with HTTPS ones	Alexander A. Klimov
	Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3: fix semicolon.cocci warnings	kernel test robot
	drivers/soc/ti/k3-ringacc.c:616:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: 3277e8aa2504 ("soc: ti: k3: add navss ringacc driver") CC: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3-ringacc: fix: warn: variable dereferenced before check 'ring'	Grygorii Strashko
	Fix build warning in k3_ringacc_ring_cfg(): smatch warnings: drivers/soc/ti/k3-ringacc.c:562 k3_ringacc_ring_cfg() warn: variable dereferenced before check 'ring' (see line 559) 557 int k3_ringacc_ring_cfg(struct k3_ring ring, struct k3_ring_cfg cfg) 558 { @559 struct k3_ringacc *ringacc = ring->parent; ^^^^^^^^^^^^ Dereference. 560 int ret = 0; 561 @562 if (!ring \|\| !cfg) ^^^^ Check too late. Delete it? Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	dmaengine: ti: k3-udma: Switch to k3_ringacc_request_rings_pair	Peter Ujfalusi
	We only request ring pairs via K3 DMA driver, switch to use the new k3_ringacc_request_rings_pair() to simplify the code. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3-ringacc: separate soc specific initialization	Grygorii Strashko
	Separate SoC specific initialization and and OF mach data in preparation of adding support for more K3 SoCs Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3-ringacc: add request pair of rings api.	Grygorii Strashko
	Add new API k3_ringacc_request_rings_pair() to request pair of rings at once, as in the most cases Rings are used with DMA channels, which need to request pair of rings - one to feed DMA with descriptors (TX/RX FDQ) and one to receive completions (RX/TX CQ). This will allow to simplify Ringacc API users. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3-ringacc: add ring's flags to dump	Grygorii Strashko
	Add struct k3_ring *ring->flags to the ring dump. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	soc: ti: k3-ringacc: Move state tracking variables under a struct	Peter Ujfalusi
	Move the free, occ, windex and rindex under a struct. We can use memset to zero them and it will allow a cleaner way to extend driver functionality in the future, Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2020-07-24	mtd: properly check all write ioctls for permissions	Greg Kroah-Hartman
	When doing a "write" ioctl call, properly check that we have permissions to do so before copying anything from userspace or anything else so we can "fail fast". This includes also covering the MEMWRITE ioctl which previously missed checking for this. Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [rw: Fixed locking issue] Signed-off-by: Richard Weinberger <richard@nod.at>
2020-07-24	Merge tag 'iommu-fix-v5.8-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into master Pull iommu fix from Joerg Roedel: "Fix a NULL-ptr dereference in the QCOM IOMMU driver" * tag 'iommu-fix-v5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/qcom: Use domain rather than dev as tlb cookie
2020-07-24	Merge tag 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into master Pull rdma fixes from Jason Gunthorpe: "One merge window regression, some corruption bugs in HNS and a few more syzkaller fixes: - Two long standing syzkaller races - Fix incorrect HW configuration in HNS - Restore accidentally dropped locking in IB CM - Fix ODP prefetch bug added in the big rework several versions ago" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Prevent prefetch from racing with implicit destruction RDMA/cm: Protect access to remote_sidr_table RDMA/core: Fix race in rdma_alloc_commit_uobject() RDMA/hns: Fix wrong PBL offset when VA is not aligned to PAGE_SIZE RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC RDMA/mlx5: Use xa_lock_irq when access to SRQ table
2020-07-24	Merge tag 'for-5.8/dm-fixes-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm into master Pull device mapper fix from Mike Snitzer: "A stable fix for DM integrity target's integrity recalculation that gets skipped when resuming a device. This is a fix for a previous stable@ fix" * tag 'for-5.8/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix integrity recalculation that is improperly skipped
2020-07-24	Merge branch 'i2c/for-current' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into master Pull i2c fixes from Wolfram Sang: "Again some driver bugfixes and some documentation fixes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: i2c-qcom-geni: Fix DMA transfer race i2c: rcar: always clear ICSAR to avoid side effects MAINTAINERS: i2c: at91: handover maintenance to Codrin Ciubotariu i2c: drop duplicated word in the header file i2c: cadence: Clear HOLD bit at correct time in Rx path Revert "i2c: cadence: Fix the hold bit setting"
2020-07-24	Merge tag 'mmc-v5.8-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc into master Pull MMC fix from Ulf Hansson: "Fix clock divider calculation in the ASPEED SDHCI controller" * tag 'mmc-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-of-aspeed: Fix clock divider calculation
2020-07-24	Merge tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm ↵	Linus Torvalds
	into master Pull drm fixes from Dave Airlie: "Quiet fixes, I may have a single regression fix follow up to this for nouveau, but it might be next week, Ben was testing it a bit more . Otherwise two amdgpu fixes, one lima and one sun4i: amdgpu: - Fix crash when overclocking VegaM - Fix possible crash when editing dpm levels sun4i: - Fix inverted HPD result; fixes an earlier fix lima: - fix timeout during reset" * tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: Fix NULL dereference in dpm sysfs handlers drm/amd/powerplay: fix a crash when overclocking Vega M drm/lima: fix wait pp reset timeout drm: sun4i: hdmi: Fix inverted HPD result
2020-07-24	Merge tag 'amlogic-dt64-2' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/dt arm64: dts: amlogic: updates for v5.9 (round 2) - new board: WeTek Core2 - audio playback support on more boards - add GPU DVFS * tag 'amlogic-dt64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: arm64: dts: amlogic: meson-g12: add the Mali OPP table and use DVFS arm64: dts: amlogic: meson-gxm: add the Mali OPP table and use DVFS arm64: dts: amlogic: meson-gx: add the Mali-450 OPP table and use DVFS arm64: dts: meson: add support for the WeTek Core 2 dt-bindings: arm: amlogic: add support for the WeTek Core 2 arm64: dts: meson: add audio playback to khadas-vim3l arm64: dts: meson: add audio playback to odroid-c4 arm64: dts: meson: update spifc node name on Khadas VIM3/VIM3L ARM: dts: meson: Align L2 cache-controller nodename with dtschema arm64: dts: meson-gxl-s805x: reduce initial Mali450 core frequency arm64: dts: meson: add missing gxl rng clock soc: amlogic: meson-gx-socinfo: Fix S905X3 and S905D3 ID's Link: https://lore.kernel.org/r/7h8sf8671u.fsf@baylibre.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-24	driver core: Change delimiter in devlink device's name to "--"	Saravana Kannan
	The devlink device name is of the form "supplier:consumer". But ":" is fairly common in device names and makes it visually hard to distinguish supplier and consumer. So, replace it with "--" to make it easier. Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200724180523.1393383-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-24	Merge tag 'memory-controller-drv-5.9' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/drivers Memory controller drivers for v5.9 The drivers/memory directory with memory controller drivers, over the last days grew in numbers but lacked any coordinated care. The generic part (device tree helpers) were pulled in through various trees, depending on driver needs. The patchset is a first try to improve code quality of memory controller drivers. Mostly these are non-intrusive fixes for GCC, checkpatch or sparse warnings. This also fixes missing SPDX tags or improves generic code quality (whitespace, const correctness). Last commit appoints also Krzysztof Kozlowski as a maintainer. * tag 'memory-controller-drv-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: (22 commits) MAINTAINERS: Add Krzysztof Kozlowski as maintainer of memory controllers memory: samsung: exynos-srom: Describe the Kconfig entry memory: Describe the MEMORY Kconfig entry memory: da8xx-ddrctl: Remove unused 'node' variable memory: fsl_ifc: Fix whitespace issues memory: pl172: Add GPLv2 SPDX license header memory: omap-gpmc: Fix whitespace issue memory: omap-gpmc: Include <linux/sizes.h> for SZ_16M memory: mtk-smi: Add argument to function pointer definition memory: brcmstb_dpfe: Remove unneeded braces memory: brcmstb_dpfe: Constify the contents of string memory: ti-emif-pm: Fix cast to iomem pointer memory: ti-aemif: Rename SS to SSTROBE to avoid name conflicts memory: emif: Silence platform_get_irq() error in driver memory: emif: Fix whitespace coding style violations memory: emif: Put constant in comparison on the right side memory: emif-asm-offsets: Add GPLv2 SPDX license header memory: of: Remove unneeded extern from function declarations memory: of: Correct indentation memory: of: Remove __func__ in device related messages ... Link: https://lore.kernel.org/r/20200724160314.8543-1-krzk@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-24	Merge tag 'misc-habanalabs-next-2020-07-24' of ↵	Greg Kroah-Hartman
	git://people.freedesktop.org/~gabbayo/linux into char-misc-next Oded writes: This tag contains the following changes for kernel 5.9-rc1: - Remove rate limiters from GAUDI configuration (no longer needed). - Set maximum amount of in-flight CS per ASIC type and increase the maximum amount for GAUDI. - Refactor signal/wait command submissions code - Calculate trace frequency from PLLs to show accurate profiling data - Rephrase error messages to make them more clear to the common user - Add statistics of dropped CS (counter per possible reason for drop) - Get ECC information from firmware - Remove support for partial SoC reset in Gaudi - Halt device CPU only when reset is certain to happen. Sometimes we abort the reset procedure and in that case we can't leave device CPU in halt mode. - set each CQ to its own work queue to prevent a race between completions on different CQs. - Use queue pi/ci in order to determine queue occupancy. This is done to make the code reusable between current and future ASICs. - Add more validations for user inputs. - Refactor PCIe controller configuration to make the code reusable between current and future ASICs. - Update firmware interface headers to latest version - Move all common code to a dedicated common sub-folder * tag 'misc-habanalabs-next-2020-07-24' of git://people.freedesktop.org/~gabbayo/linux: (28 commits) habanalabs: Fix memory leak in error flow of context initialization habanalabs: use no flags on MMU cache invalidation habanalabs: enable device before hw_init() habanalabs: create internal CB pool habanalabs: update hl_boot_if.h from firmware habanalabs: create common folder habanalabs: check for DMA errors when clearing memory habanalabs: verify queue can contain all cs jobs habanalabs: Assign each CQ with its own work queue habanalabs: halt device CPU only upon certain reset habanalabs: remove unused hash habanalabs: use queue pi/ci in order to determine queue occupancy habanalabs: configure maximum queues per asic habanalabs: remove soft-reset support from GAUDI habanalabs: PCIe iATU refactoring habanalabs: Extract ECC information from FW habanalabs: Add dropped cs statistics info struct habanalabs: extract cpu boot status lookup habanalabs: rephrase error messages habanalabs: Increase queues depth ...