summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-06wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()Gustavo A. R. Silva
kmemdup() at line 2735 is not duplicating enough memory for notif->tid_tear_down and notif->station_id. As it only duplicates 612 bytes: up to offsetofend(struct iwl_wowlan_info_notif, received_beacons), this is the range of [0, 612) bytes. 2735 notif = kmemdup(notif_v1, 2736 offsetofend(struct iwl_wowlan_info_notif, 2737 received_beacons), 2738 GFP_ATOMIC); which evidently does not cover bytes 612 and 613 for members tid_tear_down and station_id in struct iwl_wowlan_info_notif. See below: $ pahole -C iwl_wowlan_info_notif drivers/net/wireless/intel/iwlwifi/mvm/d3.o struct iwl_wowlan_info_notif { struct iwl_wowlan_gtk_status_v3 gtk[2]; /* 0 488 */ /* --- cacheline 7 boundary (448 bytes) was 40 bytes ago --- */ struct iwl_wowlan_igtk_status igtk[2]; /* 488 80 */ /* --- cacheline 8 boundary (512 bytes) was 56 bytes ago --- */ __le64 replay_ctr; /* 568 8 */ /* --- cacheline 9 boundary (576 bytes) --- */ __le16 pattern_number; /* 576 2 */ __le16 reserved1; /* 578 2 */ __le16 qos_seq_ctr[8]; /* 580 16 */ __le32 wakeup_reasons; /* 596 4 */ __le32 num_of_gtk_rekeys; /* 600 4 */ __le32 transmitted_ndps; /* 604 4 */ __le32 received_beacons; /* 608 4 */ u8 tid_tear_down; /* 612 1 */ u8 station_id; /* 613 1 */ u8 reserved2[2]; /* 614 2 */ /* size: 616, cachelines: 10, members: 13 */ /* last cacheline: 40 bytes */ }; Therefore, when the following assignments take place, actually no memory has been allocated for those objects: 2743 notif->tid_tear_down = notif_v1->tid_tear_down; 2744 notif->station_id = notif_v1->station_id; Fix this by allocating space for the whole notif object and zero out the remaining space in memory after member station_id. This also fixes the following -Warray-bounds issues: CC drivers/net/wireless/intel/iwlwifi/mvm/d3.o drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function ‘iwl_mvm_wait_d3_notif’: drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2743:30: warning: array subscript ‘struct iwl_wowlan_info_notif[0]’ is partly outside array bounds of ‘unsigned char[612]’ [-Warray-bounds=] 2743 | notif->tid_tear_down = notif_v1->tid_tear_down; | from drivers/net/wireless/intel/iwlwifi/mvm/d3.c:7: In function ‘kmemdup’, inlined from ‘iwl_mvm_wait_d3_notif’ at drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2735:12: include/linux/fortify-string.h:765:16: note: object of size 612 allocated by ‘__real_kmemdup’ 765 | return __real_kmemdup(p, size, gfp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function ‘iwl_mvm_wait_d3_notif’: drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2744:30: warning: array subscript ‘struct iwl_wowlan_info_notif[0]’ is partly outside array bounds of ‘unsigned char[612]’ [-Warray-bounds=] 2744 | notif->station_id = notif_v1->station_id; | ^~ In function ‘kmemdup’, inlined from ‘iwl_mvm_wait_d3_notif’ at drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2735:12: include/linux/fortify-string.h:765:16: note: object of size 612 allocated by ‘__real_kmemdup’ 765 | return __real_kmemdup(p, size, gfp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://github.com/KSPP/linux/issues/306 Fixes: 905d50ddbc83 ("wifi: iwlwifi: mvm: support wowlan info notification version 2") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/ZHpGN555FwAKGduH@work Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: fix switch count in EMA beaconsAditya Kumar Singh
Currently, whenever an EMA beacon is formed, due to is_template argument being false from the caller, the switch count is always decremented once which is wrong. Also if switch count is equal to profile periodicity, this makes the switch count to reach till zero which triggers a WARN_ON_ONCE. [ 261.593915] CPU: 1 PID: 800 Comm: kworker/u8:3 Not tainted 5.4.213 #0 [ 261.616143] Hardware name: Qualcomm Technologies, Inc. IPQ9574 [ 261.622666] Workqueue: phy0 ath12k_get_link_bss_conf [ath12k] [ 261.629771] pstate: 60400005 (nZCv daif +PAN -UAO) [ 261.635595] pc : ieee80211_next_txq+0x1ac/0x1b8 [mac80211] [ 261.640282] lr : ieee80211_beacon_update_cntdwn+0x64/0xb4 [mac80211] [...] [ 261.729683] Call trace: [ 261.734986] ieee80211_next_txq+0x1ac/0x1b8 [mac80211] [ 261.737156] ieee80211_beacon_cntdwn_is_complete+0xa28/0x1194 [mac80211] [ 261.742365] ieee80211_beacon_cntdwn_is_complete+0xef4/0x1194 [mac80211] [ 261.749224] ieee80211_beacon_get_template_ema_list+0x38/0x5c [mac80211] [ 261.755908] ath12k_get_link_bss_conf+0xf8/0x33b4 [ath12k] [ 261.762590] ath12k_get_link_bss_conf+0x390/0x33b4 [ath12k] [ 261.767881] process_one_work+0x194/0x270 [ 261.773346] worker_thread+0x200/0x314 [ 261.777514] kthread+0x140/0x150 [ 261.781158] ret_from_fork+0x10/0x18 Fix this issue by making the is_template argument as true when fetching the EMA beacons. Fixes: bd54f3c29077 ("wifi: mac80211: generate EMA beacons in AP mode") Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://lore.kernel.org/r/20230531062012.4537-1-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: don't translate beacon/presp addrsJohannes Berg
Don't do link address translation for beacons and probe responses, this leads to reporting multiple scan list entries for the same AP (one with the MLD address) which just breaks things. We might need to extend this in the future for some other (action) frames that aren't MLD addressed. Fixes: 42fb9148c078 ("wifi: mac80211: do link->MLD address translation on RX") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.62adead1b43a.Ifc25eed26ebf3b269f60b1ec10060156d0e7ec0d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: mlme: fix non-inheritence elementJohannes Berg
There were two bugs when creating the non-inheritence element: 1) 'at_extension' needs to be declared outside the loop, otherwise the value resets every iteration and we can never really switch properly 2) 'added' never got set to true, so we always cut off the extension element again at the end of the function This shows another issue that we might add a list but no extension list, but we need to make the extension list a zero-length one in that case. Fix all these issues. While at it, add a comment explaining the trim. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.3addaa5c4782.If3a78f9305997ad7ef4ba7ffc17a8234c956f613@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: cfg80211: reject bad AP MLD addressJohannes Berg
When trying to authenticate, if the AP MLD address isn't a valid address, mac80211 can throw a warning. Avoid that by rejecting such addresses. Fixes: d648c23024bd ("wifi: nl80211: support MLO in auth/assoc") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.89188912bd1d.I8dbc6c8ee0cb766138803eec59508ef4ce477709@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: use correct iftype HE capJohannes Berg
We already check that the right iftype capa exists, but then don't use it. Assign it to a variable so we can actually use it, and then do that. Fixes: bac2fd3d7534 ("mac80211: remove use of ieee80211_get_he_sta_cap()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.0e908e5c5fdd.Iac142549a6144ac949ebd116b921a59ae5282735@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-05Bluetooth: L2CAP: Add missing checks for invalid DCIDSungwoo Kim
When receiving a connect response we should make sure that the DCID is within the valid range and that we don't already have another channel allocated for the same DCID. Missing checks may violate the specification (BLUETOOTH CORE SPECIFICATION Version 5.4 | Vol 3, Part A, Page 1046). Fixes: 40624183c202 ("Bluetooth: L2CAP: Add missing checks for invalid LE DCID") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: ISO: use correct CIS order in Set CIG Parameters eventPauli Virtanen
The order of CIS handle array in Set CIG Parameters response shall match the order of the CIS_ID array in the command (Core v5.3 Vol 4 Part E Sec 7.8.97). We send CIS_IDs mainly in the order of increasing CIS_ID (but with "last" CIS first if it has fixed CIG_ID). In handling of the reply, we currently assume this is also the same as the order of hci_conn in hdev->conn_hash, but that is not true. Match the correct hci_conn to the correct handle by matching them based on the CIG+CIS combination. The CIG+CIS combination shall be unique for ISO_LINK hci_conn at state >= BT_BOUND, which we maintain in hci_le_set_cig_params. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: ISO: don't try to remove CIG if there are bound CIS leftPauli Virtanen
Consider existing BOUND & CONNECT state CIS to block CIG removal. Otherwise, under suitable timing conditions we may attempt to remove CIG while Create CIS is pending, which fails. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: Fix l2cap_disconnect_req deadlockYing Hsu
L2CAP assumes that the locks conn->chan_lock and chan->lock are acquired in the order conn->chan_lock, chan->lock to avoid potential deadlock. For example, l2sock_shutdown acquires these locks in the order: mutex_lock(&conn->chan_lock) l2cap_chan_lock(chan) However, l2cap_disconnect_req acquires chan->lock in l2cap_get_chan_by_scid first and then acquires conn->chan_lock before calling l2cap_chan_del. This means that these locks are acquired in unexpected order, which leads to potential deadlock: l2cap_chan_lock(c) mutex_lock(&conn->chan_lock) This patch releases chan->lock before acquiring the conn_chan_lock to avoid the potential deadlock. Fixes: a2a9339e1c9d ("Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}") Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: hci_qca: fix debugfs registrationJohan Hovold
Since commit 3e4be65eb82c ("Bluetooth: hci_qca: Add poweroff support during hci down for wcn3990"), the setup callback which registers the debugfs interface can be called multiple times. This specifically leads to the following error when powering on the controller: debugfs: Directory 'ibs' with parent 'hci0' already present! Add a driver flag to avoid trying to register the debugfs interface more than once. Fixes: 3e4be65eb82c ("Bluetooth: hci_qca: Add poweroff support during hci down for wcn3990") Cc: stable@vger.kernel.org # 4.20 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: fix debugfs registrationJohan Hovold
Since commit ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") the debugfs interface for unconfigured controllers will be created when the controller is configured. There is however currently nothing preventing a controller from being configured multiple time (e.g. setting the device address using btmgmt) which results in failed attempts to register the already registered debugfs entries: debugfs: File 'features' in directory 'hci0' already present! debugfs: File 'manufacturer' in directory 'hci0' already present! debugfs: File 'hci_version' in directory 'hci0' already present! ... debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present! Add a controller flag to avoid trying to register the debugfs interface more than once. Fixes: ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") Cc: stable@vger.kernel.org # 4.0 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: hci_sync: add lock to protect HCI_UNREGISTERZhengping Jiang
When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix potential race when HCI_UNREGISTER is set after the flag is tested in hci_cmd_sync_queue. Fixes: 0b94f2651f56 ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irkLuiz Augusto von Dentz
Similar to commit 0f7d9b31ce7a ("netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy()"). We can not access k after kfree_rcu() call. Cc: stable@vger.kernel.org Signed-off-by: Min Li <lm0963hack@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIGPauli Virtanen
Make CIG auto-allocation to select the first CIG_ID that is still configurable. Also use correct CIG_ID range (see Core v5.3 Vol 4 Part E Sec 7.8.97 p.2553). Previously, it would always select CIG_ID 0 regardless of anything, because cis_list with data.cis == 0xff (BT_ISO_QOS_CIS_UNSET) would not count any CIS. Since we are not adding CIS here, use find_cis instead. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: ISO: consider right CIS when removing CIG at cleanupPauli Virtanen
When looking for CIS blocking CIG removal, consider only the CIS with the right CIG ID. Don't try to remove CIG with unset CIG ID. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05bpf: netfilter: Add BPF_NETFILTER bpf_attach_typeFlorian Westphal
Andrii Nakryiko writes: And we currently don't have an attach type for NETLINK BPF link. Thankfully it's not too late to add it. I see that link_create() in kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't have done that. Instead we need to add BPF_NETLINK attach type to enum bpf_attach_type. And wire all that properly throughout the kernel and libbpf itself. This adds BPF_NETFILTER and uses it. This breaks uabi but this wasn't in any non-rc release yet, so it should be fine. v2: check link_attack prog type in link_create too Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20230605131445.32016-1-fw@strlen.de
2023-06-05HID: hidpp: terminate retry loop on successBenjamin Tissoires
It seems we forgot the normal case to terminate the retry loop, making us asking 3 times each command, which is probably a little bit too much. And remove the ugly "goto exit" that can be replaced by a simpler "break" Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy") Suggested-by: Mark Lord <mlord@pobox.com> Tested-by: Mark Lord <mlord@pobox.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-06-05Merge tag 'asym-keys-fix-for-linus-v6.4-rc5' of ↵Linus Torvalds
https://github.com/robertosassu/linux Pull asymmetric keys fix from Roberto Sassu: "Here is a small fix to make an unconditional copy of the buffer passed to crypto operations, to take into account the case of the stack not in the linear mapping area. It has been tested and verified to fix the bug" Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David Howells <dhowells@redhat.com> * tag 'asym-keys-fix-for-linus-v6.4-rc5' of https://github.com/robertosassu/linux: KEYS: asymmetric: Copy sig and digest in public_key_verify_signature()
2023-06-05Merge branch 'mptcp-addr-adv-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: Fixes for address advertisement Patches 1 and 2 allow address advertisements to be removed without affecting current connected subflows, and updates associated self tests. Patches 3 and 4 correctly track (and allow removal of) addresses that were implicitly announced as part of subflow creation. Also updates associated self tests. Patch 5 makes subflow and address announcement counters work consistently between the userspace and in-kernel path managers. ==================== Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mptcp: update userspace pm infosGeliang Tang
Increase pm subflows counter on both server side and client side when userspace pm creates a new subflow, and decrease the counter when it closes a subflow. Increase add_addr_signaled counter in mptcp_nl_cmd_announce() when the address is announced by userspace PM. This modification is similar to how the in-kernel PM is updating the counter: when additional subflows are created/removed. Fixes: 9ab4807c84a4 ("mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE") Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/329 Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05selftests: mptcp: update userspace pm subflow testsGeliang Tang
To align with what is done by the in-kernel PM, update userspace pm subflow selftests, by sending the a remove_addrs command together before the remove_subflows command. This will get a RM_ADDR in chk_rm_nr(). Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Fixes: 5e986ec46874 ("selftests: mptcp: userspace pm subflow tests") Link: https://github.com/multipath-tcp/mptcp_net-next/issues/379 Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mptcp: add address into userspace pm listGeliang Tang
Add the address into userspace_pm_local_addr_list when the subflow is created. Make sure it can be found in mptcp_nl_cmd_remove(). And delete it in the new helper mptcp_userspace_pm_delete_local_addr(). By doing this, the "REMOVE" command also works with subflows that have been created via the "SUB_CREATE" command instead of restricting to the addresses that have been announced via the "ANNOUNCE" command. Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Link: https://github.com/multipath-tcp/mptcp_net-next/issues/379 Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05selftests: mptcp: update userspace pm addr testsGeliang Tang
This patch is linked to the previous commit ("mptcp: only send RM_ADDR in nl_cmd_remove"). To align with what is done by the in-kernel PM, update userspace pm addr selftests, by sending a remove_subflows command together after the remove_addrs command. Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Fixes: 97040cf9806e ("selftests: mptcp: userspace pm address tests") Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mptcp: only send RM_ADDR in nl_cmd_removeGeliang Tang
The specifications from [1] about the "REMOVE" command say: Announce that an address has been lost to the peer It was then only supposed to send a RM_ADDR and not trying to delete associated subflows. A new helper mptcp_pm_remove_addrs() is then introduced to do just that, compared to mptcp_pm_remove_addrs_and_subflows() also removing subflows. To delete a subflow, the userspace daemon can use the "SUB_DESTROY" command, see mptcp_nl_cmd_sf_destroy(). Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Link: https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.96/include/uapi/linux/mptcp.h [1] Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05net: stmmac: dwmac-qcom-ethqos: fix a regression on EMAC < 3Bartosz Golaszewski
We must not assign plat_dat->dwmac4_addrs unconditionally as for structures which don't set them, this will result in the core driver using zeroes everywhere and breaking the driver for older HW. On EMAC < 2 the address should remain NULL. Fixes: b68376191c69 ("net: stmmac: dwmac-qcom-ethqos: Add EMAC3 support") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05Merge tag 'linux-can-fixes-for-6.4-20230605' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== this is a pull request of 3 patches for net/master. All 3 patches target the j1939 stack. The 1st patch is by Oleksij Rempel and fixes the error queue handling for (E)TP sessions that run into timeouts. The last 2 patches are by Fedor Pchelkin and fix a potential use-after-free in j1939_netdev_start() if j1939_can_rx_register() fails. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM valuesEric Dumazet
We got multiple syzbot reports, all duplicates of the following [1] syzbot managed to install fq_pie with a zero TCA_FQ_PIE_QUANTUM, thus triggering infinite loops. Use limits similar to sch_fq, with commits 3725a269815b ("pkt_sched: fq: avoid hang when quantum 0") and d9e15a273306 ("pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM") [1] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0] Modules linked in: irq event stamp: 172817 hardirqs last enabled at (172816): [<ffff80001242fde4>] __el1_irq arch/arm64/kernel/entry-common.c:476 [inline] hardirqs last enabled at (172816): [<ffff80001242fde4>] el1_interrupt+0x58/0x68 arch/arm64/kernel/entry-common.c:486 hardirqs last disabled at (172817): [<ffff80001242fdb0>] __el1_irq arch/arm64/kernel/entry-common.c:468 [inline] hardirqs last disabled at (172817): [<ffff80001242fdb0>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:486 softirqs last enabled at (167634): [<ffff800008020c1c>] softirq_handle_end kernel/softirq.c:414 [inline] softirqs last enabled at (167634): [<ffff800008020c1c>] __do_softirq+0xac0/0xd54 kernel/softirq.c:600 softirqs last disabled at (167701): [<ffff80000802a660>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc3-syzkaller-geb0f1697d729 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/28/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : fq_pie_qdisc_dequeue+0x10c/0x8ac net/sched/sch_fq_pie.c:246 lr : fq_pie_qdisc_dequeue+0xe4/0x8ac net/sched/sch_fq_pie.c:240 sp : ffff800008007210 x29: ffff800008007280 x28: ffff0000c86f7890 x27: ffff0000cb20c2e8 x26: ffff0000cb20c2f0 x25: dfff800000000000 x24: ffff0000cb20c2e0 x23: ffff0000c86f7880 x22: 0000000000000040 x21: 1fffe000190def10 x20: ffff0000cb20c2e0 x19: ffff0000cb20c2e0 x18: ffff800008006e60 x17: 0000000000000000 x16: ffff80000850af6c x15: 0000000000000302 x14: 0000000000000100 x13: 0000000000000000 x12: 0000000000000001 x11: 0000000000000302 x10: 0000000000000100 x9 : 0000000000000000 x8 : 0000000000000000 x7 : ffff80000841c468 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff0000cb20c2e0 x1 : ffff0000cb20c2e0 x0 : 0000000000000001 Call trace: fq_pie_qdisc_dequeue+0x10c/0x8ac net/sched/sch_fq_pie.c:246 dequeue_skb net/sched/sch_generic.c:292 [inline] qdisc_restart net/sched/sch_generic.c:397 [inline] __qdisc_run+0x1fc/0x231c net/sched/sch_generic.c:415 __dev_xmit_skb net/core/dev.c:3868 [inline] __dev_queue_xmit+0xc80/0x3318 net/core/dev.c:4210 dev_queue_xmit include/linux/netdevice.h:3085 [inline] neigh_connected_output+0x2f8/0x38c net/core/neighbour.c:1581 neigh_output include/net/neighbour.h:544 [inline] ip6_finish_output2+0xd60/0x1a1c net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x538/0x8c8 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip6_output+0x270/0x594 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:458 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ndisc_send_skb+0xc30/0x1790 net/ipv6/ndisc.c:508 ndisc_send_rs+0x47c/0x5d4 net/ipv6/ndisc.c:718 addrconf_rs_timer+0x300/0x58c net/ipv6/addrconf.c:3936 call_timer_fn+0x19c/0x8cc kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x55c/0x734 kernel/time/timer.c:2022 run_timer_softirq+0x7c/0x114 kernel/time/timer.c:2035 __do_softirq+0x2d0/0xd54 kernel/softirq.c:571 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:882 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:85 invoke_softirq kernel/softirq.c:452 [inline] __irq_exit_rcu+0x28c/0x534 kernel/softirq.c:650 irq_exit_rcu+0x14/0x84 kernel/softirq.c:662 __el1_irq arch/arm64/kernel/entry-common.c:472 [inline] el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:486 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:491 el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:587 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:33 [inline] arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:55 cpuidle_idle_call kernel/sched/idle.c:170 [inline] do_idle+0x1f0/0x4e8 kernel/sched/idle.c:282 cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:379 rest_init+0x2dc/0x2f4 init/main.c:735 start_kernel+0x0/0x55c init/main.c:834 start_kernel+0x3f0/0x55c init/main.c:1088 __primary_switched+0xb8/0xc0 arch/arm64/kernel/head.S:523 Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05Merge patch series "can: j1939: avoid possible use-after-free when ↵Marc Kleine-Budde
j1939_can_rx_register fails" Fedor Pchelkin <pchelkin@ispras.ru> says: The patch series fixes a possible racy use-after-free scenario described in 2/2: if j1939_can_rx_register() fails then the concurrent thread may have already read the invalid priv structure. The 1/2 makes j1939_netdev_lock a mutex so that access to j1939_can_rx_register() can be serialized without changing GFP_KERNEL to GFP_ATOMIC inside can_rx_register(). This seems to be safe. Note that the patch series has been tested only via Syzkaller and not with a real device. Link: https://lore.kernel.org/r/20230526171910.227615-1-pchelkin@ispras.ru Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: avoid possible use-after-free when j1939_can_rx_register failsFedor Pchelkin
Syzkaller reports the following failure: BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline] BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172 Write of size 4 at addr ffff888141c15058 by task swapper/3/0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x145/0x190 mm/kasan/generic.c:192 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline] __refcount_sub_and_test include/linux/refcount.h:272 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] kref_put include/linux/kref.h:64 [inline] j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172 j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374 __sk_destruct+0x4e/0x820 net/core/sock.c:1784 rcu_do_batch kernel/rcu/tree.c:2485 [inline] rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726 __do_softirq+0x289/0x9a3 kernel/softirq.c:298 asm_call_irq_on_stack+0x12/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0x136/0x200 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635 Allocated by task 1141: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] j1939_priv_create net/can/j1939/main.c:131 [inline] j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485 __sys_bind+0x1f2/0x260 net/socket.c:1645 __do_sys_bind net/socket.c:1656 [inline] __se_sys_bind net/socket.c:1654 [inline] __x64_sys_bind+0x6f/0xb0 net/socket.c:1654 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 1141: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1542 [inline] slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576 slab_free mm/slub.c:3149 [inline] kfree+0xd9/0x3b0 mm/slub.c:4125 j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485 __sys_bind+0x1f2/0x260 net/socket.c:1645 __do_sys_bind net/socket.c:1656 [inline] __se_sys_bind net/socket.c:1654 [inline] __x64_sys_bind+0x6f/0xb0 net/socket.c:1654 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x61/0xc6 It can be caused by this scenario: CPU0 CPU1 j1939_sk_bind(socket0, ndev0, ...) j1939_netdev_start() j1939_sk_bind(socket1, ndev0, ...) j1939_netdev_start() mutex_lock(&j1939_netdev_lock) j1939_priv_set(ndev0, priv) mutex_unlock(&j1939_netdev_lock) if (priv_new) kref_get(&priv_new->rx_kref) return priv_new; /* inside j1939_sk_bind() */ jsk->priv = priv j1939_can_rx_register(priv) // fails j1939_priv_set(ndev, NULL) kfree(priv) j1939_sk_sock_destruct() j1939_priv_put() // <- uaf To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so that a concurrent thread cannot process j1939_priv before j1939_can_rx_register() returns. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526171910.227615-3-pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: change j1939_netdev_lock type to mutexFedor Pchelkin
It turns out access to j1939_can_rx_register() needs to be serialized, otherwise j1939_priv can be corrupted when parallel threads call j1939_netdev_start() and j1939_can_rx_register() fails. This issue is thoroughly covered in other commit which serializes access to j1939_can_rx_register(). Change j1939_netdev_lock type to mutex so that we do not need to remove GFP_KERNEL from can_rx_register(). j1939_netdev_lock seems to be used in normal contexts where mutex usage is not prohibited. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526171910.227615-2-pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in ↵Oleksij Rempel
J1939 Socket This patch addresses an issue within the j1939_sk_send_loop_abort() function in the j1939/socket.c file, specifically in the context of Transport Protocol (TP) sessions. Without this patch, when a TP session is initiated and a Clear To Send (CTS) frame is received from the remote side requesting one data packet, the kernel dispatches the first Data Transport (DT) frame and then waits for the next CTS. If the remote side doesn't respond with another CTS, the kernel aborts due to a timeout. This leads to the user-space receiving an EPOLLERR on the socket, and the socket becomes active. However, when trying to read the error queue from the socket with sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN, given that the socket is non-blocking. This situation results in an infinite loop: the user-space repeatedly calls epoll(), epoll() returns the socket file descriptor with EPOLLERR, but the socket then blocks on the recv() of ERRQUEUE. This patch introduces an additional check for the J1939_SOCK_ERRQUEUE flag within the j1939_sk_send_loop_abort() function. If the flag is set, it indicates that the application has subscribed to receive error queue messages. In such cases, the kernel can communicate the current transfer state via the error queue. This allows for the function to return early, preventing the unnecessary setting of the socket into an error state, and breaking the infinite loop. It is crucial to note that a socket error is only needed if the application isn't using the error queue, as, without it, the application wouldn't be aware of transfer issues. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-by: David Jander <david@protonic.nl> Tested-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526081946.715190-1-o.rempel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05xfs: collect errors from inodegc for unlinked inode recoveryDave Chinner
Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: validate block number being freed before adding to xefiDave Chinner
Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: validity check agbnos on the AGFLDave Chinner
If the agfl or the indexing in the AGF has been corrupted, getting a block form the AGFL could return an invalid block number. If this happens, bad things happen. Check the agbno we pull off the AGFL and return -EFSCORRUPTED if we find somethign bad. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix agf/agfl verification on v4 filesystemsDave Chinner
When a v4 filesystem has fl_last - fl_first != fl_count, we do not not detect the corruption and allow the AGF to be used as it if was fully valid. On V5 filesystems, we reset the AGFL to empty in these cases and avoid the corruption at a small cost of leaked blocks. If we don't catch the corruption on V4 filesystems, bad things happen later when an allocation attempts to trim the free list and either double-frees stale entries in the AGFl or tries to free NULLAGBNO entries. Either way, this is bad. Prevent this from happening by using the AGFL_NEED_RESET logic for v4 filesysetms, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()Dave Chinner
xfs_bmap_longest_free_extent() can return an error when accessing the AGF fails. In this case, the behaviour of xfs_filestream_pick_ag() is conditional on the error. We may continue the loop, or break out of it. The error handling after the loop cleans up the perag reference held when the break occurs. If we continue, the next loop iteration handles cleaning up the perag reference. EIther way, we don't need to release the active perag reference when xfs_bmap_longest_free_extent() fails. Doing so means we do a double decrement on the active reference count, and this causes tha active reference count to fall to zero. At this point, new active references will fail. This leads to unmount hanging because it tries to grab active references to that perag, only for it to fail. This happens inside a loop that retries until a inode tree radix tree tag is cleared, which cannot happen because we can't get an active reference to the perag. The unmount livelocks in this path: xfs_reclaim_inodes+0x80/0xc0 xfs_unmount_flush_inodes+0x5b/0x70 xfs_unmountfs+0x5b/0x1a0 xfs_fs_put_super+0x49/0x110 generic_shutdown_super+0x7c/0x1a0 kill_block_super+0x27/0x50 deactivate_locked_super+0x30/0x90 deactivate_super+0x3c/0x50 cleanup_mnt+0xc2/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x5e/0xa0 exit_to_user_mode_prepare+0x1bc/0x1c0 syscall_exit_to_user_mode+0x16/0x40 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: Pengfei Xu <pengfei.xu@intel.com> Fixes: eb70aa2d8ed9 ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix broken logic when detecting mergeable bmap recordsDarrick J. Wong
Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536 Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536 These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: 336642f79283 ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: Fix undefined behavior of shift into sign bitGeert Uytterhoeven
With gcc-5: In file included from ./include/trace/define_trace.h:102:0, from ./fs/xfs/scrub/trace.h:988, from fs/xfs/scrub/trace.c:40: ./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’: ./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ ^ Shifting the (signed) value 1 into the sign bit is undefined behavior. Fix this for all definitions in the file by shifting "1U" instead of "1". This was exposed by the first user added in commit 466c525d6d35e691 ("xfs: minimize overhead of drain wakeups by using jump labels"). Fixes: 160b5a784525e8a4 ("xfs: hoist the already_fixed variable to the scrub context") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix AGF vs inode cluster buffer deadlockDave Chinner
Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: defered work could create precommitsDave Chinner
To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: restore allocation trylock iterationDave Chinner
It was accidentally dropped when refactoring the allocation code, resulting in the AG iteration always doing blocking AG iteration. This results in a small performance regression for a specific fsmark test that runs more user data writer threads than there are AGs. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 2edf06a50f5b ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: buffer pins need to hold a buffer referenceDave Chinner
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-04Linux 6.4-rc5Linus Torvalds
2023-06-04Merge tag 'irq_urgent_for_v6.4_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Fix open firmware quirks validation so that they don't get applied wrongly * tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Correctly validate OF quirk descriptors
2023-06-04net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINEMin-Hua Chen
This patch fixes the following sparse warning: net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static? No functional change intended. Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Acked-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04Merge branch 'enetc-fixes'David S. Miller
Wei Fang says: ==================== net: enetc: correct the statistics of rx bytes The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct rx_bytes statistics of XDPWei Fang
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt is not updated after it is initialized to 0. So fix it. Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct the statistics of rx bytesWei Fang
The rx_bytes of struct net_device_stats should count the length of ethernet frames excluding the FCS. However, there are two problems with the rx_bytes statistics of the current enetc driver. one is that the length of VLAN header is not counted if the VLAN extraction feature is enabled. The other is that the length of L2 header is not counted, because eth_type_trans() is invoked before updating rx_bytes which will subtract the length of L2 header from skb->len. BTW, the rx_bytes statistics of XDP path also have similar problem, I will fix it in another patch. Fixes: a800abd3ecb9 ("net: enetc: move skb creation into enetc_build_skb") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04Merge tag 'media/v6.4-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some driver fixes: - a regression fix for the verisilicon driver - uvcvideo: don't expose unsupported video formats to userspace - camss-video: don't zero subdev format after init - mediatek: some fixes for 4K decoder formats - fix a Sphinx build warning (missing doc for client_caps) - some fixes for imx and atomisp staging drivers And two CEC core fixes: - don't set last_initiator if TX in progress - disable adapter in cec_devnode_unregister" * tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: uvcvideo: Don't expose unsupported formats to userspace media: v4l2-subdev: Fix missing kerneldoc for client_caps media: staging: media: imx: initialize hs_settle to avoid warning media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad() media: staging: media: atomisp: init high & low vars media: cec: core: don't set last_initiator if tx in progress media: cec: core: disable adapter in cec_devnode_unregister media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats media: camss: camss-video: Don't zero subdev format again after initialization media: verisilicon: Additional fix for the crash when opening the driver