linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-26	power: supply: samsung-sdi-battery: Constify struct power_supply_vbat_ri_table	Christophe JAILLET
	'struct power_supply_vbat_ri_table' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. In order to do it, some code also needs to be adjusted to this new const qualifier. On a x86_64, with allmodconfig: Before: ====== $ size drivers/power/supply/samsung-sdi-battery.o text data bss dec hex filename 955 7664 0 8619 21ab drivers/power/supply/samsung-sdi-battery.o After: ===== $ size drivers/power/supply/samsung-sdi-battery.o text data bss dec hex filename 4055 4584 0 8639 21bf drivers/power/supply/samsung-sdi-battery.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/d01818abd880bf435d1106a9a6cc11a7a8a3e661.1719125040.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-06-26	block: change rq_integrity_vec to respect the iterator	Mikulas Patocka
	If we allocate a bio that is larger than NVMe maximum request size, attach integrity metadata to it and send it to the NVMe subsystem, the integrity metadata will be corrupted. Splitting the bio works correctly. The function bio_split will clone the bio, trim the iterator of the first bio and advance the iterator of the second bio. However, the function rq_integrity_vec has a bug - it returns the first vector of the bio's metadata and completely disregards the metadata iterator that was advanced when the bio was split. Thus, the second bio uses the same metadata as the first bio and this leads to metadata corruption. This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec instead of returning the first vector. mp_bvec_iter_bvec reads the iterator and uses it to build a bvec for the current position in the iterator. The "queue_max_integrity_segments(rq->q) > 1" check was removed, because the updated rq_integrity_vec function works correctly with multiple segments. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/49d1afaa-f934-6ed2-a678-e0d428c63a65@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-26	mfd: stm32-timers: Drop unused TIM_DIER_CC_IE	Uwe Kleine-König
	This macro is misleading as TIM_DIER_CC_IE(1) == TIM_DIER_CC2IE . The only user was updated to use TIM_DIER_CCxIE() instead which doesn't suffer from this mismatch, so TIM_DIER_CC_IE can be dropped. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/6c8fcc4ed159992a1dbb0796087e6ceb10c39c96.1718791090.git.u.kleine-koenig@baylibre.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-06-26	mfd: stm32-timers: Add some register definitions with a parameter	Uwe Kleine-König
	There are some registers that belong together and are numbered from 1 to 4. Introduce a macro definition for these that takes the channel number as parameter and define the previously available constants using the new ones. This allows to simplify some users that up to now use constructs like TIM_CCER_CC1NE << (ch * 4) which is an ugly mix of using a predefined value and still knowing internal details about it. Note that there are several decrements by 1 involved. These are necessary because software guys start counting at 0 while the hardware designer started at 1 (and having TIM_CCER_CCxE(1) be TIM_CCER_CC2E isn't a sane option). The compiler is expected to optimize these out nicely. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/05df15f61dde81033407d3b4fcb67ee403ecc8db.1718791090.git.u.kleine-koenig@baylibre.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-06-26	mfd: stm32-timers: Unify alignment of register definition	Uwe Kleine-König
	Use tabs consistently for indention and properly align register names, values and comments. This improves readability (at least for my eyes). Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/da3b7f9af5794d7463aa62cbaa7251abf1af2018.1718791090.git.u.kleine-koenig@baylibre.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-06-26	nvme-pci: do not directly handle subsys reset fallout	Keith Busch
	Scheduling reset_work after a nvme subsystem reset is expected to fail on pcie, but this also prevents potential handling the platform's pcie services may provide that might successfully recovering the link without re-enumeration. Such examples include AER, DPC, and power's EEH. Provide a pci specific operation that safely initiates a subsystem reset, and instead of scheduling reset work, read back the status register to trigger a pcie read error. Since this only affects pci, the other fabrics drivers subscribe to a generic nvmf subsystem reset that is exactly the same as before. The loop fabric doesn't use it because nvmet doesn't support setting that property anyway. And since we're using the magic NSSR value in two places now, provide a symbolic define for it. Reported-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-26	IB/core: add support for draining Shared receive queues	Max Gurtovoy
	To avoid leakage for QPs assocoated with SRQ, according to IB spec (section 10.3.1): "Note, for QPs that are associated with an SRQ, the Consumer should take the QP through the Error State before invoking a Destroy QP or a Modify QP to the Reset State. The Consumer may invoke the Destroy QP without first performing a Modify QP to the Error State and waiting for the Affiliated Asynchronous Last WQE Reached Event. However, if the Consumer does not wait for the Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment leakage may occur. Therefore, it is good programming practice to teardown a QP that is associated with an SRQ by using the following process: - Put the QP in the Error State; - wait for the Affiliated Asynchronous Last WQE Reached Event; - either: - drain the CQ by invoking the Poll CQ verb and either wait for CQ to be empty or the number of Poll CQ operations has exceeded CQ capacity size; or - post another WR that completes on the same CQ and wait for this WR to return as a WC; - and then invoke a Destroy QP or Reset QP." Catch the Last WQE Reached Event in the core layer during drain QP flow. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20240619171153.34631-2-mgurtovoy@nvidia.com Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26	RDMA/mlx5: Use sq timestamp as QP timestamp when RoCE is disabled	Or Har-Toov
	When creating a QP, one of the attributes is TS format (timestamp). In some devices, we have a limitation that all QPs should have the same ts_format. The ts_format is chosen based on the device's capability. The qp_ts_format cap resides under the RoCE caps table, and the cap will be 0 when RoCE is disabled. So when RoCE is disabled, the value that should be queried is sq_ts_format under HCA caps. Consider the case when the system supports REAL_TIME_TS format (0x2), some QPs are created with REAL_TIME_TS as ts_format, and afterwards RoCE gets disabled. When trying to construct a new QP, we can't use the qp_ts_format, that is queried from the RoCE caps table, Since it leads to passing 0x0 (FREE_RUNNING_TS) as the value of the qp_ts_format, which is different than the ts_format of the previously allocated QPs REAL_TIME_TS format (0x2). Thus, to resolve this, read the sq_ts_format, which also reflect the supported ts format for the QP when RoCE is disabled. Fixes: 4806f1e2fee8 ("net/mlx5: Set QP timestamp mode to default") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/32801966eb767c7fd62b8dea3b63991d5fbfe213.1718554199.git.leon@kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26	Merge tag 'platform-drivers-x86-ib-lenovo-c630-v6.11-2'	Sebastian Reichel
	Immutable branch between pdx86 lenovo c630 branch, power/supply and USB subsystems due for the v6.11 merge window, which is required for the Lenovo C630 battery driver. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-06-26	xfrm: support sending NAT keepalives in ESP in UDP states	Eyal Birger
	Add the ability to send out RFC-3948 NAT keepalives from the xfrm stack. To use, Userspace sets an XFRM_NAT_KEEPALIVE_INTERVAL integer property when creating XFRM outbound states which denotes the number of seconds between keepalive messages. Keepalive messages are sent from a per net delayed work which iterates over the xfrm states. The logic is guarded by the xfrm state spinlock due to the xfrm state walk iterator. Possible future enhancements: - Adding counters to keep track of sent keepalives. - deduplicate NAT keepalives between states sharing the same nat keepalive parameters. - provisioning hardware offloads for devices capable of implementing this. - revise xfrm state list to use an rcu list in order to avoid running this under spinlock. Suggested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-26	wifi: mac80211: inform the low level if drv_stop() is a suspend	Emmanuel Grumbach
	This will allow the low level driver to take different actions for different flows. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240618192529.739036208b6e.Ie18a2fe8e02bf2717549d39420b350cfdaf3d317@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-26	media: v4l: subdev: Fix typo in documentation	Laurent Pinchart
	Replace the incorrect reference to the v4l2_subdev_enable_stream() function with the correct v4l2_subdev_enable_streams() spelling. Fixes: d0749adb3070 ("media: v4l2-subdev: Add subdev .(enable\|disable)_streams() operations") Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://lore.kernel.org/r/20240619225343.15873-1-laurent.pinchart@ideasonboard.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-06-26	OPP: Introduce an OF helper function to inform if required-opps is used	Ulf Hansson
	As being shown from a subsequent change to genpd, it's useful to understand if a device's OF node has an OPP-table described and whether it contains OPP nodes that makes use of the required-opps DT property. For this reason, let's introduce an OPP OF helper function called dev_pm_opp_of_has_required_opp(). Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-06-25	Merge branch '20240602114439.1611-1-quic_jkona@quicinc.com' into arm64-for-6.11	Bjorn Andersson
	Merge the SM8650 video and clock controller drivers to gain access to the constants from the DeviceTree binding.
2024-06-25	Merge branch '20240602114439.1611-1-quic_jkona@quicinc.com' into clk-for-6.11	Bjorn Andersson
	Merge SM8650 video and camera clock drivers through topic branch, to make available the DeviceTree binding includes to the DeviceTree source branches as well.
2024-06-25	PCI: Add Edimax Vendor ID to pci_ids.h	FUJITA Tomonori
	Add the Edimax Vendor ID (0x1432) for an ethernet driver for Tehuti Networks TN40xx chips. This ID can be used for Realtek 8180 and Ralink rt28xx wireless drivers. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20240623235507.108147-2-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25	dim: add new interfaces for initialization and getting results	Heng Qi
	DIM-related mode and work have been collected in one same place, so new interfaces are added to provide convenience. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-5-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25	ethtool: provide customized dim profile management	Heng Qi
	The NetDIM library, currently leveraged by an array of NICs, delivers excellent acceleration benefits. Nevertheless, NICs vary significantly in their dim profile list prerequisites. Specifically, virtio-net backends may present diverse sw or hw device implementation, making a one-size-fits-all parameter list impractical. On Alibaba Cloud, the virtio DPU's performance under the default DIM profile falls short of expectations, partly due to a mismatch in parameter configuration. I also noticed that ice/idpf/ena and other NICs have customized profilelist or placed some restrictions on dim capabilities. Motivated by this, I tried adding new params for "ethtool -C" that provides a per-device control to modify and access a device's interrupt parameters. Usage ======== The target NIC is named ethx. Assume that ethx only declares support for rx profile setting (with DIM_PROFILE_RX flag set in profile_flags) and supports modification of usec and pkt fields. 1. Query the currently customized list of the device $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 256, .comps = n/a,}, {.usec = 8, .pkts = 256, .comps = n/a,}, {.usec = 64, .pkts = 256, .comps = n/a,}, {.usec = 128, .pkts = 256, .comps = n/a,}, {.usec = 256, .pkts = 256, .comps = n/a,} tx-profile: n/a 2. Tune $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n "n" means do not modify this field. $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 1, .comps = n/a,}, {.usec = 2, .pkts = 256, .comps = n/a,}, {.usec = 3, .pkts = 3, .comps = n/a,}, {.usec = 4, .pkts = 4, .comps = n/a,}, {.usec = 256, .pkts = 5, .comps = n/a,} tx-profile: n/a 3. Hint If the device does not support some type of customized dim profiles, the corresponding "n/a" will display. If the "n/a" field is being modified, -EOPNOTSUPP will be reported. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25	linux/dim: move useful macros to .h file	Heng Qi
	Useful macros will be used effectively elsewhere. These will be utilized in subsequent patches. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-2-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25	dt-bindings: clock: qcom: Add SM8650 camera clock controller	Jagadeesh Kona
	Add device tree bindings for the camera clock controller on Qualcomm SM8650 platform. Signed-off-by: Jagadeesh Kona <quic_jkona@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Link: https://lore.kernel.org/r/20240602114439.1611-7-quic_jkona@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-06-25	dt-bindings: clock: qcom: Add SM8650 video clock controller	Jagadeesh Kona
	SM8650 video clock controller has most clocks same as SM8450, but it also has few additional clocks and resets. Add device tree bindings for the video clock controller on Qualcomm SM8650 platform by defining these additional clocks and resets on top of SM8450. Signed-off-by: Jagadeesh Kona <quic_jkona@quicinc.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Link: https://lore.kernel.org/r/20240602114439.1611-3-quic_jkona@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-06-26	netfilter: nf_tables: rise cap on SELinux secmark context	Pablo Neira Ayuso
	secmark context is artificially limited 256 bytes, rise it to 4Kbytes. Fixes: fb961945457f ("netfilter: nf_tables: add SECMARK support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	nsfs: add pid translation ioctls	Christian Brauner
	Add ioctl()s to translate pids between pid namespaces. LXCFS is a tiny fuse filesystem used to virtualize various aspects of procfs. LXCFS is run on the host. The files and directories it creates can be bind-mounted by e.g. a container at startup and mounted over the various procfs files the container wishes to have virtualized. When e.g. a read request for uptime is received, LXCFS will receive the pid of the reader. In order to virtualize the corresponding read, LXCFS needs to know the pid of the init process of the reader's pid namespace. In order to do this, LXCFS first needs to fork() two helper processes. The first helper process setns() to the readers pid namespace. The second helper process is needed to create a process that is a proper member of the pid namespace. The second helper process then creates a ucred message with ucred.pid set to 1 and sends it back to LXCFS. The kernel will translate the ucred.pid field to the corresponding pid number in LXCFS's pid namespace. This way LXCFS can learn the init pid number of the reader's pid namespace and can go on to virtualize. Since these two forks() are costly LXCFS maintains an init pid cache that caches a given pid for a fixed amount of time. The cache is pruned during new read requests. However, even with the cache the hit of the two forks() is singificant when a very large number of containers are running. With this simple patch we add an ns ioctl that let's a caller retrieve the init pid nr of a pid namespace through its pid namespace fd. This significantly improves performance with a very simple change. Support translation of pids and tgids. Other concepts can be added but there are no obvious users for this right now. To protect against races pidfds can be used to check whether the process is still valid. If needed, this can also be extended to work on pidfds directly. Link: https://lore.kernel.org/r/20240619-work-ns_ioctl-v1-1-7c0097e6bb6b@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25	Merge remote-tracking branch ↵	Rob Clark
	'qcom/20240430-a750-raytracing-v3-2-7f57c5ac082d@gmail.com' into msm-next-robclark Merge qcom drivers to pick up dependency for SMEM based speedbin. Signed-off-by: Rob Clark <robdclark@chromium.org>
2024-06-25	iio: imu: adis: remove legacy lock helpers	Nuno Sa
	Since all users were converted to the new cleanup based helper, adis_dev_lock() and adis_dev_unlock() can now be removed from the lib. Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://patch.msgid.link/20240618-dev-iio-adis-cleanup-v1-9-bd93ce7845c7@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-25	iio: imu: adis: add cleanup based lock helpers	Nuno Sa
	Add two new lock helpers that make use of the cleanup guard() and scoped_guard() macros. Thus, users won't have to worry about unlocking which is less prone to errors and allows for simpler error paths. Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://patch.msgid.link/20240618-dev-iio-adis-cleanup-v1-3-bd93ce7845c7@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-25	iio: imu: adis: move to the cleanup magic	Nuno Sa
	This makes locking and handling error paths simpler. Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://patch.msgid.link/20240618-dev-iio-adis-cleanup-v1-2-bd93ce7845c7@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-25	iio: adc: ad_sigma_delta: add disable_one callback	Dumitru Ceclan
	Sigma delta ADCs with a sequencer need to disable the previously enabled channel when reading using ad_sigma_delta_single_conversion(). This was done manually in drivers for devices with sequencers. This patch implements handling of single channel disabling after a single conversion. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20240607-ad4111-v7-3-97e3855900a0@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-25	netfilter: nf_tables: do not store nft_ctx in transaction objects	Florian Westphal
	nft_ctx is huge and most of the information stored within isn't used at all. Remove nft_ctx member from the base transaction structure and store only what is needed. After this change, relevant struct sizes are: struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 / struct nft_trans_elem { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_flowtable { / size: 80 (-48), cachelines: 2, members: 5 / struct nft_trans_obj { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_rule { / size: 80 (-32), cachelines: 2, members: 6 / struct nft_trans_set { / size: 96 (-24), cachelines: 2, members: 8 / struct nft_trans_table { / size: 56 (-40), cachelines: 1, members: 2 */ struct nft_trans_elem can now be allocated from kmalloc-96 instead of kmalloc-128 slab. A further reduction by 8 bytes would even allow for kmalloc-64. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	netfilter: nf_tables: store chain pointer in rule transaction	Florian Westphal
	Currently the chain can be derived from trans->ctx.chain, but the ctx will go away soon. Thus add the chain pointer to nft_trans_rule structure itself. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx	Florian Westphal
	It would be better to not store nft_ctx inside nft_trans object, the netlink ctx strucutre is huge and most of its information is never needed in places that use trans->ctx. Avoid/reduce its usage if possible, no runtime behaviour change intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	netfilter: nf_tables: compact chain+ft transaction objects	Florian Westphal
	Cover holes to reduce both structures by 8 byte. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	netfilter: nf_tables: move bind list_head into relevant subtypes	Florian Westphal
	Only nft_trans_chain and nft_trans_set subtypes use the trans->binding_list member. Add a new common binding subtype and move the member there. This reduces size of all other subtypes by 16 bytes on 64bit platforms. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	netfilter: nf_tables: make struct nft_trans first member of derived subtypes	Florian Westphal
	There is 'struct nft_trans', the basic structure for all transactional objects, and the the various different transactional objects, such as nft_trans_table, chain, set, set_elem and so on. Right now 'struct nft_trans' uses a flexible member at the tail (data[]), and casting is needed to access the actual type-specific members. Change this to make the hierarchy visible in source code, i.e. make struct nft_trans the first member of all derived subtypes. This has several advantages: 1. pahole output reflects the real size needed by the particular subtype 2. allows to use container_of() to convert the base type to the actual object type instead of casting ->data to the overlay structure. 3. It makes it easy to add intermediate types. 'struct nft_trans' contains a 'binding_list' that is only needed by two subtypes, so it should be part of the two subtypes, not in the base structure. But that makes it hard to interate over the binding_list, because there is no common base structure. A follow patch moves the bind list to a new struct: struct nft_trans_binding { struct nft_trans nft_trans; struct list_head binding_list; }; ... and makes that structure the new 'first member' for both nft_trans_chain and nft_trans_set. No functional change intended in this patch. Some numbers: struct nft_trans { /* size: 88, cachelines: 2, members: 5 / struct nft_trans_chain { / size: 152, cachelines: 3, members: 10 / struct nft_trans_elem { / size: 112, cachelines: 2, members: 4 / struct nft_trans_flowtable { / size: 128, cachelines: 2, members: 5 / struct nft_trans_obj { / size: 112, cachelines: 2, members: 4 / struct nft_trans_rule { / size: 112, cachelines: 2, members: 5 / struct nft_trans_set { / size: 120, cachelines: 2, members: 8 / struct nft_trans_table { / size: 96, cachelines: 2, members: 2 */ Of particular interest is nft_trans_elem, which needs to be allocated once for each pending (to be added or removed) set element. Add BUILD_BUG_ON to check struct nft_trans is placed at the top of the container structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-25	linux/syscalls.h: add missing __user annotations	Arnd Bergmann
	A couple of declarations in linux/syscalls.h are missing __user annotations on their pointers, which can lead to warnings from sparse because these don't match the implementation that have the correct address space annotations. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-25	syscalls: mmap(): use unsigned offset type consistently	Arnd Bergmann
	Most architectures that implement the old-style mmap() with byte offset use 'unsigned long' as the type for that offset, but microblaze and riscv have the off_t type that is shared with userspace, matching the prototype in include/asm-generic/syscalls.h. Make this consistent by using an unsigned argument everywhere. This changes the behavior slightly, as the argument is shifted to a page number, and an user input with the top bit set would result in a negative page offset rather than a large one as we use elsewhere. For riscv, the 32-bit sys_mmap2() definition actually used a custom type that is different from the global declaration, but this was missed due to an incorrect type check. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-25	syscalls: fix compat_sys_io_pgetevents_time64 usage	Arnd Bergmann
	Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode"), as well as by using more sophisticated system call wrappers on x86 and s390. However, arm64, mips, powerpc, sparc and riscv still have the same bug. Change all of them over to use compat_sys_io_pgetevents_time64() like parisc already does. This was clearly the intention when the function was originally added, but it got hooked up incorrectly in the tables. Cc: stable@vger.kernel.org Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-25	firewire: ohci: add tracepoints event for hardIRQ event	Takashi Sakamoto
	1394 OHCI hardware triggers PCI interrupts to notify any events to software. Current driver for the hardware is programmed by the typical way to utilize top- and bottom- halves, thus it has a timing gap to handle the notification in softIRQ (tasklet). This commit adds a tracepoint event for the hardIRQ event. The comparison of the tracepoint event to tracepoints events in firewire subsystem is helpful to diagnose the timing gap. Link: https://lore.kernel.org/r/20240625031806.956650-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-25	firewire: ohci: add support for Linux kernel tracepoints	Takashi Sakamoto
	The Linux Kernel Tracepoints framework is enough useful to trace the interaction between 1394 OHCI hardware and its driver. This commit adds firewire_ohci subsystem to use the framework. It is defined as the different subsystem from the existing firewire subsystem. The definition file for the existing subsystem is slightly changed so that both subsystems are available in 1394 OHCI driver. Link: https://lore.kernel.org/r/20240625031806.956650-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-25	fbdev: mmp: Constify struct mmp_overlay_ops	Christophe JAILLET
	'struct mmp_overlay_ops' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 11798 555 16 12369 3051 drivers/video/fbdev/mmp/hw/mmp_ctrl.o After: ===== text data bss dec hex filename 11834 507 16 12357 3045 drivers/video/fbdev/mmp/hw/mmp_ctrl.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Helge Deller <deller@gmx.de>
2024-06-25	drm/connector: hdmi: shorten too long function name	Dmitry Baryshkov
	If CONFIG_MODVERSIONS is enabled, then using the HDMI Connector framework can result in build failures. Rename the function to make it fit into the name requirements. ERROR: modpost: too long symbol "drm_atomic_helper_connector_hdmi_disable_audio_infoframe" [drivers/gpu/drm/msm/msm.ko] Reported-by: Mark Brown <broonie@kernel.org> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240624-hdmi-connector-shorten-name-v1-1-5bd3410138db@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-06-25	Fix race for duplicate reqsk on identical SYN	luoxuanqiang
	When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets are received at the same time and processed on different CPUs, it can potentially create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request(). These two different reqsk will respond with two SYNACK packets, and since the generation of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have different seq values. The consequence is that when the Client receives and replies with an ACK to the earlier SYNACK packet, we will reset(RST) it. ======================================================================== This behavior is consistently reproducible in my local setup, which comprises: \| NETA1 ------ NETB1 \| PC_A --- bond --- \| \| --- bond --- PC_B \| NETA2 ------ NETB2 \| - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU. - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are also bonded and configured in BOND_MODE_BROADCAST mode. If the client attempts a TCP connection to the server, it might encounter a failure. Capturing packets from the server side reveals: 10.10.10.10.45182 > localhost: Flags [S], seq 320236027, 10.10.10.10.45182 > localhost: Flags [S], seq 320236027, localhost > 10.10.10.10.45182: Flags [S.], seq 2967855116, localhost > 10.10.10.10.45182: Flags [S.], seq 2967855123, <== 10.10.10.10.45182 > localhost: Flags [.], ack 4294967290, 10.10.10.10.45182 > localhost: Flags [.], ack 4294967290, localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, <== localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly. ======================================================================== The attempted solution is as follows: Add a return value to inet_csk_reqsk_queue_hash_add() to confirm if the ehash insertion is successful (Up to now, the reason for unsuccessful insertion is that a reqsk for the same connection has already been inserted). If the insertion fails, release the reqsk. Due to the refcnt, Kuniyuki suggests also adding a return value check for the DCCP module; if ehash insertion fails, indicating a successful insertion of the same connection, simply release the reqsk as well. Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is adjusted to be after successful insertion. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: luoxuanqiang <luoxuanqiang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240621013929.1386815-1-luoxuanqiang@kylinos.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25	fs: Export in_group_or_capable()	Youling Tang
	Export in_group_or_capable() as a VFS helper function. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Link: https://lore.kernel.org/r/20240620032335.147136-1-youling.tang@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25	af_unix: Remove U_LOCK_GC_LISTENER.	Kuniyuki Iwashima
	Commit 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") added U_LOCK_GC_LISTENER for the old GC, but it's no longer needed for the new GC. Let's remove U_LOCK_GC_LISTENER and unix_state_lock_nested() as there's no user. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25	af_unix: Remove U_LOCK_DIAG.	Kuniyuki Iwashima
	sk_diag_dump_icons() acquires embryo's lock by unix_state_lock_nested() to fetch its peer. The embryo's ->peer is set to NULL only when its parent listener is close()d. Then, unix_release_sock() is called for each embryo after unlinking skb by skb_dequeue(). In sk_diag_dump_icons(), we hold the parent's recvq lock, so we need not acquire unix_state_lock_nested(), and peer is always non-NULL. Let's remove unnecessary unix_state_lock_nested() and non-NULL test for peer. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25	af_unix: Define locking order for U_LOCK_SECOND in unix_stream_connect().	Kuniyuki Iwashima
	While a SOCK_(STREAM\|SEQPACKET) socket connect()s to another, we hold two locks of them by unix_state_lock() and unix_state_lock_nested() in unix_stream_connect(). Before unix_state_lock_nested(), the following is guaranteed by checking sk->sk_state: 1. The first socket is TCP_LISTEN 2. The second socket is not the first one 3. Simultaneous connect() must fail So, the client state can be TCP_CLOSE or TCP_LISTEN or TCP_ESTABLISHED. Let's define the expected states as unix_state_lock_cmp_fn() instead of using unix_state_lock_nested(). Note that 2. is detected by debug_spin_lock_before() and 3. cannot be expressed as lock_cmp_fn. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25	xfrm: Fix unregister netdevice hang on hardware offload.	Steffen Klassert
	When offloading xfrm states to hardware, the offloading device is attached to the skbs secpath. If a skb is free is deferred, an unregister netdevice hangs because the netdevice is still refcounted. Fix this by removing the netdevice from the xfrm states when the netdevice is unregistered. To find all xfrm states that need to be cleared we add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-25	Revert "printk: Save console options for add_preferred_console_match()"	Greg Kroah-Hartman
	This reverts commit f03e8c1060f86c23eb49bafee99d9fcbd1c1bd77. Let's roll back all of the serial core and printk console changes that went into 6.10-rc1 as there still are problems with them that need to be sorted out. Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1 Reported-by: Petr Mladek <pmladek@suse.com> Reported-by: Tony Lindgren <tony@atomide.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-24	kernel/panic: add verbose logging of kernel taints in backtraces	Jani Nikula
	With nearly 20 taint flags and respective characters, it's getting a bit difficult to remember what each taint flag character means. Add verbose logging of the set taints in the format: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN in dump_stack_print_info() when there are taints. Note that the "negative flag" G is not included. Link: https://lkml.kernel.org/r/7321e306166cb2ca2807ab8639e665baa2462e9c.1717146197.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24	cpumask: make core headers including cpumask_types.h where possible	Yury Norov
	Now that cpumask types are split out to a separate smaller header, many frequently included core headers may switch to using it. Link: https://lkml.kernel.org/r/20240528005648.182376-7-yury.norov@gmail.com Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Yury Norov <yury.norov@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>