linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-08-10	netfilter: nf_tables: remove busy mark and gc batch API	Pablo Neira Ayuso
	Ditch it, it has been replace it by the GC transaction API and it has no clients anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10	netfilter: nf_tables: GC transaction API to avoid race with control plane	Pablo Neira Ayuso
	The set types rhashtable and rbtree use a GC worker to reclaim memory. From system work queue, in periodic intervals, a scan of the table is done. The major caveat here is that the nft transaction mutex is not held. This causes a race between control plane and GC when they attempt to delete the same element. We cannot grab the netlink mutex from the work queue, because the control plane has to wait for the GC work queue in case the set is to be removed, so we get following deadlock: cpu 1 cpu2 GC work transaction comes in , lock nft mutex `acquire nft mutex // BLOCKS transaction asks to remove the set set destruction calls cancel_work_sync() cancel_work_sync will now block forever, because it is waiting for the mutex the caller already owns. This patch adds a new API that deals with garbage collection in two steps: 1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT so they are not visible via lookup. Annotate current GC sequence in the GC transaction. Enqueue GC transaction work as soon as it is full. If ruleset is updated, then GC transaction is aborted and retried later. 2) GC work grabs the mutex. If GC sequence has changed then this GC transaction lost race with control plane, abort it as it contains stale references to objects and let GC try again later. If the ruleset is intact, then this GC transaction deactivates and removes the elements and it uses call_rcu() to destroy elements. Note that no elements are removed from GC lockless path, the _DEAD bit is set and pointers are collected. GC catchall does not remove the elements anymore too. There is a new set->dead flag that is set on to abort the GC transaction to deal with set->ops->destroy() path which removes the remaining elements in the set from commit_release, where no mutex is held. To deal with GC when mutex is held, which allows safe deactivate and removal, add sync GC API which releases the set element object via call_rcu(). This is used by rbtree and pipapo backends which also perform garbage collection from control plane path. Since element removal from sets can happen from control plane and element garbage collection/timeout, it is necessary to keep the set structure alive until all elements have been deactivated and destroyed. We cannot do a cancel_work_sync or flush_work in nft_set_destroy because its called with the transaction mutex held, but the aforementioned async work queue might be blocked on the very mutex that nft_set_destroy() callchain is sitting on. This gives us the choice of ABBA deadlock or UaF. To avoid both, add set->refs refcount_t member. The GC API can then increment the set refcount and release it once the elements have been free'd. Set backends are adapted to use the GC transaction API in a follow up patch entitled: ("netfilter: nf_tables: use gc transaction API in set backends") This is joint work with Florian Westphal. Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10	media: ipu-bridge: Add a runtime-pm device-link between VCM and sensor	Hans de Goede
	In most cases when a VCM is used there is a single integrated module with the sensor + VCM + lens. This means that the sensor and VCM often share regulators and possibly also something like a powerdown pin. In the ACPI tables this is modelled as a single ACPI device with multiple I2cSerialBus resources. On atomisp devices the regulators and clks are modelled as ACPI power-resources, which are controlled by the (ACPI) power state of the sensor. So the sensor must be in D0 power state for the VCM to work. To make this work add a device-link with DL_FLAG_PM_RUNTIME flag so that the sensor will automatically be runtime-resumed whenever the VCM is runtime-resumed. This requires the probing of the VCM and thus the creation of the VCM I2C-client to be delayed till after the sensor driver has bound. Move the instantiation of the VCM I2C-client to the v4l2_async_notifier bound op, so that it is done after the sensor driver has bound; and add code to add the device-link. This fixes the problem with the shared ACPI power-resources on atomisp2 and this avoids the need for VCM related workarounds on IPU3 / IPU6. E.g. until now the dw9719 driver needed to get and control a Vsio (V sensor IO) regulator since that needs to be enabled to enable I2C pass-through on the PMIC on the sensor module. So the driver was controlling this regulator even though the actual dw9719 chip has no Vsio pin / power-plane. This also removes the need for ipu_bridge_init() to return -EPROBE_DEFER since the VCM is now instantiated later. Reviewed-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Tested-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: ipu-bridge: Move ipu-bridge.h to include/media/	Hans de Goede
	Move ipu-bridge.h to include/media/, so that it can also be used by the atomisp code. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: Remove ov_16bit_addr_reg_helpers.h	Hans de Goede
	The helpers in this header are not used anywhere anymore, they have been superseded by the new CCI register access helpers. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: Add MIPI CCI register access helper functions	Hans de Goede
	The CSI2 specification specifies a standard method to access camera sensor registers called "Camera Control Interface (CCI)". This uses either 8 or 16 bit (big-endian wire order) register addresses and supports 8, 16, 24 or 32 bit (big-endian wire order) register widths. Currently a lot of Linux camera sensor drivers all have their own custom helpers for this, often copy and pasted from other drivers. Add a set of generic helpers for this so that all sensor drivers can switch to a single common implementation. These helpers take an extra optional "int *err" function parameter, this can be used to chain a bunch of register accesses together with only a single error check at the end, rather than needing to error check each individual register access. The first failing call will set the contents of err to a non 0 value and all other calls will then become no-ops. Link: https://lore.kernel.org/linux-media/59aefa7f-7bf9-6736-6040-39551329cd0a@redhat.com/ Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Tommaso Merciai <tomm.merciai@gmail.com> Reviewed-by: Tommaso Merciai <tomm.merciai@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: subdev: Constify v4l2_subdev_set_routing_with_fmt() param	Tomi Valkeinen
	The routing parameter of v4l2_subdev_set_routing_with_fmt() is missing 'const'. Add it. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: mediatek: vcodec: Add capture format to support 10bit raster mode	Mingjia Zhang
	Define one uncompressed capture format V4L2_PIX_FMT_MT2110R in order to support 10bit for H264 in mt8195. Signed-off-by: Mingjia Zhang <mingjia.zhang@mediatek.com> Co-developed-by: Yunfei Dong <yunfei.dong@mediatek.com> Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-08-10	media: mediatek: vcodec: Add capture format to support 10bit tile mode	Mingjia Zhang
	Define one uncompressed capture format V4L2_PIX_FMT_MT2110T in order to support 10bit for AV1/VP9/HEVC in mt8195. Signed-off-by: Mingjia Zhang <mingjia.zhang@mediatek.com> Co-developed-by: Yunfei Dong <yunfei.dong@mediatek.com> Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-08-10	media: cec: core: add adap_unconfigured() callback	Hans Verkuil
	The adap_configured() callback was called with the adap->lock mutex held if the 'configured' argument was false, and without the adap->lock mutex held if that argument was true. That was very confusing, and so split this up in a adap_unconfigured() callback and a high-level configured() callback. This also makes it easier to understand when the mutex is held: all low-level adap_* callbacks are called with the mutex held. All other callbacks are called without that mutex held. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: f1b57164305d ("media: cec: add optional adap_configured callback") Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: cec: core: add adap_nb_transmit_canceled() callback	Hans Verkuil
	A potential deadlock was found by Zheng Zhang with a local syzkaller instance. The problem is that when a non-blocking CEC transmit is canceled by calling cec_data_cancel, that in turn can call the high-level received() driver callback, which can call cec_transmit_msg() to transmit a new message. The cec_data_cancel() function is called with the adap->lock mutex held, and cec_transmit_msg() tries to take that same lock. The root cause is that the received() callback can either be used to pass on a received message (and then adap->lock is not held), or to report a canceled transmit (and then adap->lock is held). This is confusing, so create a new low-level adap_nb_transmit_canceled callback that reports back that a non-blocking transmit was canceled. And the received() callback is only called when a message is received, as was the case before commit f9d0ecbf56f4 ("media: cec: correctly pass on reply results") complicated matters. Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: f9d0ecbf56f4 ("media: cec: correctly pass on reply results") Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-10	media: v4l: async: Set v4l2_device and subdev in async notifier init	Sakari Ailus
	Set the v4l2_device already in async notifier init, so struct device related to it will be available before the notifier is registered. This requires separating notifier initialisation into two functions, one that takes v4l2_device as its argument, v4l2_async_nf_init and v4l2_async_subdev_nf_init, for sub-device notifiers. Registering the notifier will use a single function, v4l2_async_nf_register. This is done in order to make struct device available earlier, during construction of the async connections, for sensible debug prints. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> # imx6qp Tested-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> # rcar + adv746x Tested-by: Aishwarya Kothari <aishwarya.kothari@toradex.com> # Apalis i.MX6Q with TC358743 Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # Renesas RZ/G2L SMARC Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-08-09	bpf, sockmap: Fix bug that strp_done cannot be called	Xu Kuohai
	strp_done is only called when psock->progs.stream_parser is not NULL, but stream_parser was set to NULL by sk_psock_stop_strp(), called by sk_psock_drop() earlier. So, strp_done can never be called. Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock. Change the condition for calling strp_done from judging whether stream_parser is set to judging whether this flag is set. This flag is only set once when strp_init() succeeds, and will never be cleared later. Fixes: c0d95d3380ee ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-09	net: ptp: create a mock-up PTP Hardware Clock driver	Vladimir Oltean
	There are several cases where virtual net devices may benefit from having a PTP clock, and these have to do with testing. I can see at least netdevsim and veth as potential users of a common mock-up PTP hardware clock driver. The proposed idea is to create an object which emulates PTP clock operations on top of the unadjustable CLOCK_MONOTONIC_RAW plus a software-controlled time domain via a timecounter/cyclecounter and then link that PHC to the netdevsim device. The driver is fully functional for its intended purpose, and it successfully passes the PTP selftests. $ cd tools/testing/selftests/ptp/ $ ./phc.sh /dev/ptp2 TEST: settime [ OK ] TEST: adjtime [ OK ] TEST: adjfreq [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230807193324.4128292-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	net/mlx5: Expose NIC temperature via hardware monitoring kernel API	Adham Faris
	Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare <jdelvare@suse.com> Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	net: annotate data-races around sock->ops	Eric Dumazet
	IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	Merge tag 'wireless-2023-08-09' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few small updates: * fix an integer overflow in nl80211 * fix rtw89 8852AE disconnections * fix a buffer overflow in ath12k * fix AP_VLAN configuration lookups * fix allocation failure handling in brcm80211 * update MAINTAINERS for some drivers * tag 'wireless-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath12k: Fix buffer overflow when scanning with extraie wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() wifi: cfg80211: fix sband iftype data lookup for AP_VLAN wifi: rtw89: fix 8852AE disconnection caused by RX full flags MAINTAINERS: Remove tree entry for rtl8180 MAINTAINERS: Update entry for rtl8187 wifi: brcm80211: handle params_v1 allocation failure ==================== Link: https://lore.kernel.org/r/20230809124818.167432-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	block: don't make REQ_POLLED imply REQ_NOWAIT	Jens Axboe
	Normally these two flags do go together, as the issuer of polled IO generally cannot wait for resources that will get freed as part of IO completion. This is because that very task is the one that will complete the request and free those resources, hence that would introduce a deadlock. But it is possible to have someone else issue the polled IO, eg via io_uring if the request is punted to io-wq. For that case, it's fine to have the task block on IO submission, as it is not the same task that will be completing the IO. It's completely up to the caller to ask for both polled and nowait IO separately! If we don't allow polled IO where IOCB_NOWAIT isn't set in the kiocb, then we can run into repeated -EAGAIN submissions and not make any progress. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09	Merge tag 'nf-next-2023-08-08' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First 4 Patches, from Yue Haibing, remove unused prototypes in various netfilter headers. Last patch makes nfnetlink_log to always include a packet timestamp, up to now it was only included if the skb had assigned previously. From Maciej Żenczykowski. * tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nfnetlink_log: always add a timestamp netfilter: h323: Remove unused function declarations netfilter: conntrack: Remove unused function declarations netfilter: helper: Remove unused function declarations netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush() ==================== Link: https://lore.kernel.org/r/20230808124159.19046-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	tcp: add missing family to tcp_set_ca_state() tracepoint	Eric Dumazet
	Before this code is copied, add the missing family, as we did in commit 3dd344ea84e1 ("net: tracepoint: exposing sk_family in all tcp:tracepoints") Fixes: 15fcdf6ae116 ("tcp: Add tracepoint for tcp_set_ca_state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ping Gan <jacky_gam_2001@163.com> Cc: Manjusaka <me@manjusaka.me> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230808084923.2239142-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()	Yue Haibing
	Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	net: phy: Remove two unused function declarations	Yue Haibing
	Commit 1e2dc14509fd ("net: ethtool: Add helpers for reporting test results") declared but never implemented these function. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230808144610.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09	PCI: switchtec: Add support for PCIe Gen5 devices	Kelvin Cao
	Advertise support of Gen5 devices in the driver's device ID table and add the same IDs for the switchtec quirks. Also update driver code to accommodate them. Link: https://lore.kernel.org/r/20230624000003.2315364-3-kelvin.cao@microchip.com Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2023-08-09	io_uring: Add io_uring command support for sockets	Breno Leitao
	Enable io_uring commands on network sockets. Create two new SOCKET_URING_OP commands that will operate on sockets. In order to call ioctl on sockets, use the file_operations->io_uring_cmd callbacks, and map it to a uring socket function, which handles the SOCKET_URING_OP accordingly, and calls socket ioctls. This patches was tested by creating a new test case in liburing. Link: https://github.com/leitao/liburing/tree/io_uring_cmd Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230627134424.2784797-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09	iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope()	YueHaibing
	Since commit 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") this is not used anymore, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230802133934.19712-1-yuehaibing@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09	iommu: Prevent RESV_DIRECT devices from blocking domains	Lu Baolu
	The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09	iommu: Move global PASID allocation from SVA to core	Jacob Pan
	Intel ENQCMD requires a single PASID to be shared between multiple devices, as the PASID is stored in a single MSR register per-process and userspace can use only that one PASID. This means that the PASID allocation for any ENQCMD using device driver must always come from a shared global pool, regardless of what kind of domain the PASID will be used with. Split the code for the global PASID allocator into iommu_alloc/free_global_pasid() so that drivers can attach non-SVA domains to PASIDs as well. This patch moves global PASID allocation APIs from SVA to IOMMU APIs. Reserved PASIDs, currently only RID_PASID, are excluded from the global PASID allocation. It is expected that device drivers will use the allocated PASIDs to attach to appropriate IOMMU domains for use. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-3-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09	iommu: Generalize PASID 0 for normal DMA w/o PASID	Jacob Pan
	PCIe Process address space ID (PASID) is used to tag DMA traffic, it provides finer grained isolation than requester ID (RID). For each device/RID, 0 is a special PASID for the normal DMA (no PASID). This is universal across all architectures that supports PASID, therefore warranted to be reserved globally and declared in the common header. Consequently, we can avoid the conflict between different PASID use cases in the generic code. e.g. SVA and DMA API with PASIDs. This paved away for device drivers to choose global PASID policy while continue doing normal DMA. Noting that VT-d could support none-zero RID/NO_PASID, but currently not used. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09	USB: Remove remnants of Wireless USB and UWB	Alan Stern
	Wireless USB has long been defunct, and kernel support for it was removed in 2020 by commit caa6772db4c1 ("Staging: remove wusbcore and UWB from the kernel tree."). Nevertheless, some vestiges of the old implementation still clutter up the USB subsystem and one or two other places. Let's get rid of them once and for all. The only parts still left are the user-facing APIs in include/uapi/linux/usb/ch9.h. (There are also a couple of misleading instances, such as the Sierra Wireless USB modem, which is a USB modem made by Sierra Wireless.) Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/b4f2710f-a2de-4fb0-b50f-76776f3a961b@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-09	usb: chipidea: add workaround for chipidea PEC bug	Xu Yang
	Some NXP processors using ChipIdea USB IP have a bug when frame babble is detected. Issue description: In USB camera test, our controller is host in HS mode. In ISOC IN, when device sends data across the micro frame, it causes the babble in host controller. This will clear the PE bit. In spec, it also requires to set the PEC bit and then set the PCI bit. Without the PCI interrupt, the software does not know the PE is cleared. This will add a flag CI_HDRC_HAS_PORTSC_PEC_MISSED to some impacted platform datas. And the ehci host driver will assert PEC by SW when specific conditions are satisfied. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20230809024432.535160-2-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-09	accel/ivpu: Refactor memory ranges logic	Karol Wachowski
	Add new dma range and change naming convention for virtual address memory ranges managed by KMD. New available ranges are named as follows: * global range - global context accessible by FW * aliased range - user context accessible by FW * dma range - user context accessible by DMA * shave range - user context accessible by shaves * global shave range - global context accessible by shave nn Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230731161258.2987564-6-stanislaw.gruszka@linux.intel.com
2023-08-09	accel/ivpu: Extend get_param ioctl to identify capabilities	Stanislaw Gruszka
	Add DRM_IVPU_PARAM_CAPABILITIES parameters to get_param ioctl to query driver capabilities. For now use it for identify metric streamer and new dma memory range features. Currently upstream version of intel_vpu does not have those, they will be added it the future. Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230731161258.2987564-5-stanislaw.gruszka@linux.intel.com
2023-08-09	tmpfs: track free_ispace instead of free_inodes	Hugh Dickins
	In preparation for assigning some inode space to extended attributes, keep track of free_ispace instead of number of free_inodes: as if one tmpfs inode (and accompanying dentry) occupies very approximately 1KiB. Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit: but take care to enforce the maximum. And fix the nr_blocks maximum on 32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead. Delete the incorrect limited<->unlimited blocks/inodes comment above shmem_reconfigure(): leave it to the error messages below to describe. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <4fe1739-d9e7-8dfd-5bce-12e7339711da@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	xattr: simple_xattr_set() return old_xattr to be freed	Hugh Dickins
	tmpfs wants to support limited user extended attributes, but kernfs (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR) already supports user extended attributes through simple xattrs: but limited by a policy (128KiB per inode) too liberal to be used on tmpfs. To allow a different limiting policy for tmpfs, without affecting the policy for kernfs, change simple_xattr_set() to return the replaced or removed xattr (if any), leaving the caller to update their accounting then free the xattr (by simple_xattr_free(), renamed from the static free_simple_xattr()). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	shmem: stable directory offsets	Chuck Lever
	The current cursor-based directory offset mechanism doesn't work when a tmpfs filesystem is exported via NFS. This is because NFS clients do not open directories. Each server-side READDIR operation has to open the directory, read it, then close it. The cursor state for that directory, being associated strictly with the opened struct file, is thus discarded after each NFS READDIR operation. Directory offsets are cached not only by NFS clients, but also by user space libraries on those clients. Essentially there is no way to invalidate those caches when directory offsets have changed on an NFS server after the offset-to-dentry mapping changes. Thus the whole application stack depends on unchanging directory offsets. The solution we've come up with is to make the directory offset for each file in a tmpfs filesystem stable for the life of the directory entry it represents. shmem_readdir() and shmem_dir_llseek() now use an xarray to map each directory offset (an loff_t integer) to the memory address of a struct dentry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <168814734331.530310.3911190551060453102.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	libfs: Add directory operations for stable offsets	Chuck Lever
	Create a vector of directory operations in fs/libfs.c that handles directory seeks and readdir via stable offsets instead of the current cursor-based mechanism. For the moment these are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <168814732984.530310.11190772066786107220.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	shmem: Add default quota limit mount options	Lukas Czerner
	Allow system administrator to set default global quota limits at tmpfs mount time. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230725144510.253763-7-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	shmem: quota support	Carlos Maiolino
	Now the basic infra-structure is in place, enable quota support for tmpfs. This offers user and group quotas to tmpfs (project quotas will be added later). Also, as other filesystems, the tmpfs quota is not supported within user namespaces yet, so idmapping is not translated. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230725144510.253763-6-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	shmem: prepare shmem quota infrastructure	Carlos Maiolino
	Add new shmem quota format, its quota_format_ops together with dquot_operations Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230725144510.253763-5-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: drop the timespec64 arg from generic_update_time	Jeff Layton
	In future patches we're going to change how the ctime is updated to keep track of when it has been queried. The way that the update_time operation works (and a lot of its callers) make this difficult, since they grab a timestamp early and then pass it down to eventually be copied into the inode. All of the existing update_time callers pass in the result of current_time() in some fashion. Drop the "time" parameter from generic_update_time, and rework it to fetch its own timestamp. This change means that an update_time could fetch a different timestamp than was seen in inode_needs_update_time. update_time is only ever called with one of two flag combinations: Either S_ATIME is set, or S_MTIME\|S_CTIME\|S_VERSION are set. With this change we now treat the flags argument as an indicator that some value needed to be updated when last checked, rather than an indication to update specific timestamps. Rework the logic for updating the timestamps and put it in a new inode_update_timestamps helper that other update_time routines can use. S_ATIME is as treated as we always have, but if any of the other three are set, then we attempt to update all three. Also, some callers of generic_update_time need to know what timestamps were actually updated. Change it to return an S_* flag mask to indicate that and rework the callers to expect it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-3-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: pass the request_mask to generic_fillattr	Jeff Layton
	generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs, block: remove bdev->bd_super	Christoph Hellwig
	bdev->bd_super is unused now, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230807112625.652089-5-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fsi: core: Add trace events for scan and unregister	Eddie James
	Add more trace events for the scanning and unregistration functions for debug purposes. Signed-off-by: Eddie James <eajames@linux.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20230612195657.245125-9-eajames@linux.ibm.com Signed-off-by: Joel Stanley <joel@jms.id.au>
2023-08-09	fsi: sbefifo: Add configurable in-command timeout	Eddie James
	A new use case for the SBEFIFO requires a long in-command timeout as the SBE processes each part of the command before clearing the upstream FIFO for the next part of the command. Signed-off-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/r/20230612195657.245125-6-eajames@linux.ibm.com Signed-off-by: Joel Stanley <joel@jms.id.au>
2023-08-08	bpf: btf: Remove two unused function declarations	Yue Haibing
	Commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") removed the implementations but leave declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808145741.33292-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-08	Merge tag 'mlx5-updates-2023-08-07' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-07 1) Few cleanups 2) Dynamic completion EQs The driver creates completion EQs for all vectors directly on driver load, even if those EQs will not be utilized later on. To allow more flexibility in managing completion EQs and to reduce the memory overhead of driver load, this series will adjust completion EQs creation to be dynamic. Meaning, completion EQs will be created only when needed. Patch #1 introduces a counter for tracking the current number of completion EQs. Patches #2-6 refactor the existing infrastructure of managing completion EQs and completion IRQs to be compatible with per-vector allocation/release requests. Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient in case the affinity is requested but completion IRQ is not allocated yet. Patch #9 function rename. Patch #10 handles the corner case of SF performing an IRQ request when no SF IRQ pool is found, and no PF IRQ exists for the same vector. Patch #11 modify driver to use dynamically allocate completion EQs. * tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Bridge, Only handle registered netdev bridge events net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl net/mlx5: Fix typo reminder -> remainder net/mlx5: remove many unnecessary NULL values net/mlx5: Allocate completion EQs dynamically net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() net/mlx5: Add IRQ vector to CPU lookup function net/mlx5: Introduce mlx5_cpumask_default_spread net/mlx5: Implement single completion EQ create/destroy methods net/mlx5: Use xarray to store and manage completion EQs net/mlx5: Refactor completion IRQ request/release handlers in EQ layer net/mlx5: Use xarray to store and manage completion IRQs net/mlx5: Refactor completion IRQ request/release API net/mlx5: Track the current number of completion EQs ==================== Link: https://lore.kernel.org/r/20230807175642.20834-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08	docs: net: page_pool: de-duplicate the intro comment	Jakub Kicinski
	In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid duplicating the information") I shied away from using the DOC: comments when moving to kdoc for documenting page_pool API, because I wasn't sure how familiar people are with it. Turns out there is already a DOC: comment for the intro, which is the same in both places, modulo what looks like minor rewording. Use the version from Documentation/ but keep the contents with the code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08	devlink: Remove unused devlink_dpipe_table_resource_set() declaration	Yue Haibing
	Commit f655dacb59ac ("net: devlink: remove unused locked functions") removed this but leave the declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807143214.46648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08	net: fq: Remove unused typedef fq_flow_get_default_t	Yue Haibing
	Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin") remove its last user, so can remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08	team: change the getter function in the team_option structure to void	Zhengchao Shao
	Because the getter function in the team_option structure always returns 0, so change the getter function to void and remove redundant code. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>