linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-08-09	fscache: don't leak cookie access refs if invalidation is in progress or failed	Jeff Layton
	It's possible for a request to invalidate a fscache_cookie will come in while we're already processing an invalidation. If that happens we currently take an extra access reference that will leak. Only call __fscache_begin_cookie_access if the FSCACHE_COOKIE_DO_INVALIDATE bit was previously clear. Also, ensure that we attempt to clear the bit when the cookie is "FAILED" and put the reference to avoid an access leak. Fixes: 85e4ea1049c7 ("fscache: Fix invalidation/lookup race") Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
2022-08-09	can: ems_usb: fix clang's -Wunaligned-access warning	Marc Kleine-Budde
	clang emits a -Wunaligned-access warning on struct __packed ems_cpc_msg. The reason is that the anonymous union msg (not declared as packed) is being packed right after some non naturally aligned variables (38 bits + 232) inside a packed struct: \| struct __packed ems_cpc_msg { \| u8 type; /* type of message / \| u8 length; / length of data within union 'msg' / \| u8 msgid; / confirmation handle / \| __le32 ts_sec; / timestamp in seconds / \| __le32 ts_nsec; / timestamp in nano seconds / \| / ^ not naturally aligned / \| \| union { \| / ^ not declared as packed */ \| u8 generic[64]; \| struct cpc_can_msg can_msg; \| struct cpc_can_params can_params; \| struct cpc_confirm confirmation; \| struct cpc_overrun overrun; \| struct cpc_can_error error; \| struct cpc_can_err_counter err_counter; \| u8 can_state; \| } msg; \| }; Starting from LLVM 14, having an unpacked struct nested in a packed struct triggers a warning. c.f. [1]. Fix the warning by marking the anonymous union as packed. [1] https://github.com/llvm/llvm-project/issues/55520 Fixes: 702171adeed3 ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Link: https://lore.kernel.org/all/20220802094021.959858-1-mkl@pengutronix.de Cc: Gerhard Uttenthaler <uttenthaler@ems-wuensche.com> Cc: Sebastian Haas <haas@ems-wuensche.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-08-09	can: j1939: j1939_session_destroy(): fix memory leak of skbs	Fedor Pchelkin
	We need to drop skb references taken in j1939_session_skb_queue() when destroying a session in j1939_session_destroy(). Otherwise those skbs would be lost. Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Suggested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-08-09	can: j1939: j1939_sk_queue_activate_next_locked(): replace WARN_ON_ONCE with ↵	Fedor Pchelkin
	netdev_warn_once() We should warn user-space that it is doing something wrong when trying to activate sessions with identical parameters but WARN_ON_ONCE macro can not be used here as it serves a different purpose. So it would be good to replace it with netdev_warn_once() message. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20220729143655.1108297-1-pchelkin@ispras.ru [mkl: fix indention] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-08-08	Merge tag 'for-net-2022-08-08' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fixes various issues related to ISO channel/socket support - Fixes issues when building with C=1 - Fix cancel uninitilized work which blocks syzbot to run * tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: ISO: Fix not using the correct QoS Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed() Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP Bluetooth: MGMT: Fixes build warnings with C=1 Bluetooth: hci_event: Fix build warning with C=1 Bluetooth: ISO: Fix memory corruption Bluetooth: Fix null pointer deref on unexpected status event Bluetooth: ISO: Fix info leak in iso_sock_getsockopt() Bluetooth: hci_conn: Fix updating ISO QoS PHY Bluetooth: ISO: unlock on error path in iso_sock_setsockopt() Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression ==================== Link: https://lore.kernel.org/r/20220809001224.412807-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	s390/qeth: cache link_info for ethtool	Alexandra Winter
	Since commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline") there was a timing window during recovery, that qeth_query_card_info could be sent to the card, even before it was ready for it, leading to a failing card recovery. There is evidence that this window was hit, as not all callers of get_link_ksettings() check for netif_device_present. Use cached values in qeth_get_link_ksettings(), instead of calling qeth_query_card_info() and falling back to default values in case it fails. Link info is already updated when the card goes online, e.g. after STARTLAN (physical link up). Set the link info to default values, when the card goes offline or at STOPLAN (physical link down). A follow-on patch will improve values reported for link down. Fixes: e6e771b3d897 ("s390/qeth: detach netdevice while card is offline") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com> Link: https://lore.kernel.org/r/20220805155714.59609-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	net: phy: dp83867: fix get nvmem cell fail	Nikita Shubin
	If CONFIG_NVMEM is not set of_nvmem_cell_get, of_nvmem_device_get functions will return ERR_PTR(-EOPNOTSUPP) and "failed to get nvmem cell io_impedance_ctrl" error would be reported despite "io_impedance_ctrl" is completely missing in Device Tree and we should use default values. Check -EOPNOTSUPP togather with -ENOENT to avoid this situation. Fixes: 5c2d0a6a0701 ("net: phy: dp83867: implement support for io_impedance_ctrl nvmem cell") Signed-off-by: Nikita Shubin <n.shubin@yadro.com> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220805084843.24542-1-nikita.shubin@maquefel.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	net: phy: c45 baset1: do not skip aneg configuration if clock role is not ↵	Oleksij Rempel
	specified In case master/slave clock role is not specified (which is default), the aneg registers will not be written. The visible impact of this is missing pause advertisement. So, rework genphy_c45_baset1_an_config_aneg() to be able to write advertisement registers even if clock role is unknown. Fixes: 3da8ffd8545f ("net: phy: Add 10BASE-T1L support in phy-c45") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220805073159.908643-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	atm: idt77252: fix use-after-free bugs caused by tst_timer	Duoming Zhou
	There are use-after-free bugs caused by tst_timer. The root cause is that there are no functions to stop tst_timer in idt77252_exit(). One of the possible race conditions is shown below: (thread 1) \| (thread 2) \| idt77252_init_one \| init_card \| fill_tst \| mod_timer(&card->tst_timer, ...) idt77252_exit \| (wait a time) \| tst_timer \| \| ... kfree(card) // FREE \| \| card->soft_tst[e] // USE The idt77252_dev is deallocated in idt77252_exit() and used in timer handler. This patch adds del_timer_sync() in idt77252_exit() in order that the timer handler could be stopped before the idt77252_dev is deallocated. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220805070008.18007-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	net: dsa: felix: fix min gate len calculation for tc when its first gate is ↵	Vladimir Oltean
	closed min_gate_len[tc] is supposed to track the shortest interval of continuously open gates for a traffic class. For example, in the following case: TC 76543210 t0 00000001b 200000 ns t1 00000010b 200000 ns min_gate_len[0] and min_gate_len[1] should be 200000, while min_gate_len[2-7] should be 0. However what happens is that min_gate_len[0] is 200000, but min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the point where the logic detects the gate close event for TC 1). The problem is that the code considers a "gate close" event whenever it sees that there is a 0 for that TC (essentially it's level rather than edge triggered). By doing that, any time a gate is seen as closed without having been open prior, gate_len, which is 0, will be written into min_gate_len. Once min_gate_len becomes 0, it's impossible for it to track anything higher than that (the length of actually open intervals). To fix this, we make the writing to min_gate_len[tc] be edge-triggered, which avoids writes for gates that are closed in consecutive intervals. However what this does is it makes us need to special-case the permanently closed gates at the end. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	net/x25: fix call timeouts in blocking connects	Martin Schiller
	When a userspace application starts a blocking connect(), a CALL REQUEST is sent, the t21 timer is started and the connect is waiting in x25_wait_for_connection_establishment(). If then for some reason the t21 timer expires before any reaction on the assigned logical channel (e.g. CALL ACCEPT, CLEAR REQUEST), there is sent a CLEAR REQUEST and timer t23 is started waiting for a CLEAR confirmation. If we now receive a CLEAR CONFIRMATION from the peer, x25_disconnect() is called in x25_state2_machine() with reason "0", which means "normal" call clearing. This is ok, but the parameter "reason" is used as sk->sk_err in x25_disconnect() and sock_error(sk) is evaluated in x25_wait_for_connection_establishment() to check if the call is still pending. As "0" is not rated as an error, the connect will stuck here forever. To fix this situation, also check if the sk->sk_state changed form TCP_SYN_SENT to TCP_CLOSE in the meantime, which is also done by x25_disconnect(). Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20220805061810.10824-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	Merge branch 'tsnep-two-fixes-for-the-driver'	Jakub Kicinski
	Gerhard Engleder says: ==================== tsnep: Two fixes for the driver Two simple bugfixes for tsnep driver. ==================== Link: https://lore.kernel.org/r/20220804183935.73763-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	tsnep: Fix tsnep_tx_unmap() error path usage	Gerhard Engleder
	If tsnep_tx_map() fails, then tsnep_tx_unmap() shall start at the write index like tsnep_tx_map(). This is different to the normal operation. Thus, add an additional parameter to tsnep_tx_unmap() to enable start at different positions for successful TX and failed TX. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	tsnep: Fix unused warning for 'tsnep_of_match'	Gerhard Engleder
	Kernel test robot found the following warning: drivers/net/ethernet/engleder/tsnep_main.c:1254:34: warning: 'tsnep_of_match' defined but not used [-Wunused-const-variable=] of_match_ptr() compiles into NULL if CONFIG_OF is disabled. tsnep_of_match exists always so use of of_match_ptr() is useless. Fix warning by dropping of_match_ptr(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08	Merge tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull ksmbd updates from Steve French: - fixes for memory access bugs (out of bounds access, oops, leak) - multichannel fixes - session disconnect performance improvement, and session register improvement - cleanup * tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix heap-based overflow in set_ntacl_dacl() ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT ksmbd: prevent out of bound read for SMB2_WRITE ksmbd: fix use-after-free bug in smb2_tree_disconect ksmbd: fix memory leak in smb2_handle_negotiate ksmbd: fix racy issue while destroying session on multichannel ksmbd: use wait_event instead of schedule_timeout() ksmbd: fix kernel oops from idr_remove() ksmbd: add channel rwlock ksmbd: replace sessions list in connection with xarray MAINTAINERS: ksmbd: add entry for documentation ksmbd: remove unused ksmbd_share_configs_cleanup function
2022-08-08	Merge tag 'pull-work.iov_iter-rebased' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more iov_iter updates from Al Viro: - more new_sync_{read,write}() speedups - ITER_UBUF introduction - ITER_PIPE cleanups - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics - making ITER_PIPE take high-order pages without splitting them - handling copy_page_from_iter() for high-order pages properly * tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) fix copy_page_from_iter() for compound destinations hugetlbfs: copy_page_to_iter() can deal with compound pages copy_page_to_iter(): don't split high-order page in case of ITER_PIPE expand those iov_iter_advance()... pipe_get_pages(): switch to append_pipe() get rid of non-advancing variants ceph: switch the last caller of iov_iter_get_pages_alloc() 9p: convert to advancing variant of iov_iter_get_pages_alloc() af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages() iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() block: convert to advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: saner helper for page array allocation fold __pipe_get_pages() into pipe_get_pages() ITER_XARRAY: don't open-code DIV_ROUND_UP() unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts unify xarray_get_pages() and xarray_get_pages_alloc() unify pipe_get_pages() and pipe_get_pages_alloc() iov_iter_get_pages(): sanity-check arguments iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper ...
2022-08-08	fix copy_page_from_iter() for compound destinations	Al Viro
	had been broken for ITER_BVEC et.al. since ever (OK, v3.17 when ITER_BVEC had first appeared)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	hugetlbfs: copy_page_to_iter() can deal with compound pages	Al Viro
	... since April 2021 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	copy_page_to_iter(): don't split high-order page in case of ITER_PIPE	Al Viro
	... just shove it into one pipe_buffer. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	expand those iov_iter_advance()...	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	pipe_get_pages(): switch to append_pipe()	Al Viro
	now that we are advancing the iterator, there's no need to treat the first page separately - just call append_pipe() in a loop. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	get rid of non-advancing variants	Al Viro
	mechanical change; will be further massaged in subsequent commits Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ceph: switch the last caller of iov_iter_get_pages_alloc()	Al Viro
	here nothing even looks at the iov_iter after the call, so we couldn't care less whether it advances or not. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	9p: convert to advancing variant of iov_iter_get_pages_alloc()	Al Viro
	that one is somewhat clumsier than usual and needs serious testing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()	Al Viro
	... and adjust the callers Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()	Al Viro
	... and untangle the cleanup on failure to add into pipe. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	block: convert to advancing variants of iov_iter_get_pages{,_alloc}()	Al Viro
	... doing revert if we end up not using some pages Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()	Al Viro
	Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	iov_iter: saner helper for page array allocation	Al Viro
	All call sites of get_pages_array() are essenitally identical now. Replace with common helper... Returns number of slots available in resulting array or 0 on OOM; it's up to the caller to make sure it doesn't ask to zero-entry array (i.e. neither maxpages nor size are allowed to be zero). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	fold __pipe_get_pages() into pipe_get_pages()	Al Viro
	... and don't mangle maxsize there - turn the loop into counting one instead. Easier to see that we won't run out of array that way. Note that special treatment of the partial buffer in that thing is an artifact of the non-advancing semantics of iov_iter_get_pages() - if not for that, it would be append_pipe(), same as the body of the loop that follows it. IOW, once we make iov_iter_get_pages() advancing, the whole thing will turn into calculate how many pages do we want allocate an array (if needed) call append_pipe() that many times. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_XARRAY: don't open-code DIV_ROUND_UP()	Al Viro
	Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts	Al Viro
	same as for pipes and xarrays; after that iov_iter_get_pages() becomes a wrapper for __iov_iter_get_pages_alloc(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	unify xarray_get_pages() and xarray_get_pages_alloc()	Al Viro
	same as for pipes Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	unify pipe_get_pages() and pipe_get_pages_alloc()	Al Viro
	The differences between those two are * pipe_get_pages() gets a non-NULL struct page ** value pointing to preallocated array + array size. * pipe_get_pages_alloc() gets an address of struct page variable that contains NULL, allocates the array and (on success) stores its address in that variable. Not hard to combine - always pass struct page *, have the previous pipe_get_pages_alloc() caller pass ~0U as cap for array size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	iov_iter_get_pages(): sanity-check arguments	Al Viro
	zero maxpages is bogus, but best treated as "just return 0"; NULL pages, OTOH, should be treated as a hard bug. get rid of now completely useless checks in xarray_get_pages{,_alloc}(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into ↵	Al Viro
	wrapper Incidentally, ITER_XARRAY did not free the sucker in case when iter_xarray_populate_pages() returned 0... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: fold data_start() and pipe_space_for_user() together	Al Viro
	All their callers are next to each other; all of them want the total amount of pages and, possibly, the offset in the partial final buffer. Combine into a new helper (pipe_npages()), fix the bogosity in pipe_space_for_user(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: cache the type of last buffer	Al Viro
	We often need to find whether the last buffer is anon or not, and currently it's rather clumsy: check if ->iov_offset is non-zero (i.e. that pipe is not empty) if so, get the corresponding pipe_buffer and check its ->ops if it's &default_pipe_buf_ops, we have an anon buffer. Let's replace the use of ->iov_offset (which is nowhere near similar to its role for other flavours) with signed field (->last_offset), with the following rules: empty, no buffers occupied: 0 anon, with bytes up to N-1 filled: N zero-copy, with bytes up to N-1 filled: -N That way abs(i->last_offset) is equal to what used to be in i->iov_offset and empty vs. anon vs. zero-copy can be distinguished by the sign of i->last_offset. Checks for "should we extend the last buffer or should we start a new one?" become easier to follow that way. Note that most of the operations can only be done in a sane state - i.e. when the pipe has nothing past the current position of iterator. About the only thing that could be done outside of that state is iov_iter_advance(), which transitions to the sane state by truncating the pipe. There are only two cases where we leave the sane state: 1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be dealt with later, when we make get_pages advancing - the callers are actually happier that way. 2) iov_iter copied, then something is put into the copy. Since they share the underlying pipe, the original gets behind. When we decide that we are done with the copy (original is not usable until then) we advance the original. direct_io used to be done that way; nowadays it operates on the original and we do iov_iter_revert() to discard the excessive data. At the moment there's nothing in the kernel that could do that to ITER_PIPE iterators, so this reason for insane state is theoretical right now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: clean iov_iter_revert()	Al Viro
	Fold pipe_truncate() into it, clean up. We can release buffers in the same loop where we walk backwards to the iterator beginning looking for the place where the new position will be. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: clean pipe_advance() up	Al Viro
	instead of setting ->iov_offset for new position and calling pipe_truncate() to adjust ->len of the last buffer and discard everything after it, adjust ->len at the same time we set ->iov_offset and use pipe_discard_from() to deal with buffers past that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: lose iter_head argument of __pipe_get_pages()	Al Viro
	it's only used to get to the partial buffer we can add to, and that's always the last one, i.e. pipe->head - 1. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: fold push_pipe() into __pipe_get_pages()	Al Viro
	Expand the only remaining call of push_pipe() (in __pipe_get_pages()), combine it with the page-collecting loop there. Note that the only reason it's not a loop doing append_pipe() is that append_pipe() is advancing, while iov_iter_get_pages() is not. As soon as it switches to saner semantics, this thing will switch to using append_pipe(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives	Al Viro
	New helper: append_pipe(). Extends the last buffer if possible, allocates a new one otherwise. Returns page and offset in it on success, NULL on failure. iov_iter is advanced past the data we've got. Use that instead of push_pipe() in copy-to-pipe primitives; they get simpler that way. Handling of short copy (in "mc" one) is done simply by iov_iter_revert() - iov_iter is in consistent state after that one, so we can use that. [Fix for braino caught by Liu Xinpeng <liuxp11@chinatelecom.cn> folded in] [another braino fix, this time in copy_pipe_to_iter() and pipe_zero(); caught by testcase from Hugh Dickins] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: helpers for adding pipe buffers	Al Viro
	There are only two kinds of pipe_buffer in the area used by ITER_PIPE. 1) anonymous - copy_to_iter() et.al. end up creating those and copying data there. They have zero ->offset, and their ->ops points to default_pipe_page_ops. 2) zero-copy ones - those come from copy_page_to_iter(), and page comes from caller. ->offset is also caller-supplied - it might be non-zero. ->ops points to page_cache_pipe_buf_ops. Move creation and insertion of those into helpers - push_anon(pipe, size) and push_page(pipe, page, offset, size) resp., separating them from the "could we avoid creating a new buffer by merging with the current head?" logics. Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	ITER_PIPE: helper for getting pipe buffer by index	Al Viro
	pipe_buffer instances of a pipe are organized as a ring buffer, with power-of-2 size. Indices are kept not reduced modulo ring size, so the buffer refered to by index N is pipe->bufs[N & (pipe->ring_size - 1)]. Ring size can change over the lifetime of a pipe, but not while the pipe is locked. So for any iov_iter primitives it's a constant. Original conversion of pipes to this layout went overboard trying to microoptimize that - calculating pipe->ring_size - 1, storing it in a local variable and using through the function. In some cases it might be warranted, but most of the times it only obfuscates what's going on in there. Introduce a helper (pipe_buf(pipe, N)) that would encapsulate that and use it in the obvious cases. More will follow... Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	splice: stop abusing iov_iter_advance() to flush a pipe	Al Viro
	Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother with rather non-obvious use of iov_iter_advance() in there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	switch new_sync_{read,write}() to ITER_UBUF	Al Viro
	Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	new iov_iter flavour - ITER_UBUF	Al Viro
	Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08	Documentation/mm: add details about kmap_local_page() and preemption	Fabio M. De Francesco
	What happens if a thread is preempted after mapping pages with kmap_local_page() was questioned recently.[1] Commit f3ba3c710ac5 ("mm/highmem: Provide kmap_local*") from Thomas Gleixner explains clearly that on context switch, the maps of an outgoing task are removed and the map of the incoming task are restored and that kmap_local_page() can be invoked from both preemptible and atomic contexts.[2] Therefore, for the purpose to make it clearer that users can call kmap_local_page() from contexts that allow preemption, rework a couple of sentences and add further information in highmem.rst. [1] https://lore.kernel.org/lkml/5303077.Sb9uPGUboI@opensuse/ [2] https://lore.kernel.org/all/20201118204007.468533059@linutronix.de/ Link: https://lkml.kernel.org/r/20220728154844.10874-8-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08	highmem: delete a sentence from kmap_local_page() kdocs	Fabio M. De Francesco
	kmap_local_page() should always be preferred in place of kmap() and kmap_atomic(). "Only use when really necessary." is not consistent with the Documentation/mm/highmem.rst and these kdocs it embeds. Therefore, delete the above-mentioned sentence from kdocs. Link: https://lkml.kernel.org/r/20220728154844.10874-7-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>