summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-10-14pid: add pidfd_get_task() helperChristian Brauner
The number of system calls making use of pidfds is constantly increasing. Some of those new system calls duplicate the code to turn a pidfd into task_struct it refers to. Give them a simple helper for this. Link: https://lore.kernel.org/r/20211004125050.1153693-2-christian.brauner@ubuntu.com Link: https://lore.kernel.org/r/20211011133245.1703103-2-brauner@kernel.org Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Bobrowski <repnop@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Matthew Bobrowski <repnop@google.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-10-14sched: Fill unconditional hole induced by sched_entityKees Cook
With struct sched_entity before the other sched entities, its alignment won't induce a struct hole. This saves 64 bytes in defconfig task_struct: Before: ... unsigned int rt_priority; /* 120 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ const struct sched_class * sched_class; /* 128 8 */ /* XXX 56 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ struct sched_entity se __attribute__((__aligned__(64))); /* 192 448 */ /* --- cacheline 10 boundary (640 bytes) --- */ struct sched_rt_entity rt; /* 640 48 */ struct sched_dl_entity dl __attribute__((__aligned__(8))); /* 688 224 */ /* --- cacheline 14 boundary (896 bytes) was 16 bytes ago --- */ After: ... unsigned int rt_priority; /* 120 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ struct sched_entity se __attribute__((__aligned__(64))); /* 128 448 */ /* --- cacheline 9 boundary (576 bytes) --- */ struct sched_rt_entity rt; /* 576 48 */ struct sched_dl_entity dl __attribute__((__aligned__(8))); /* 624 224 */ /* --- cacheline 13 boundary (832 bytes) was 16 bytes ago --- */ Summary diff: - /* size: 7040, cachelines: 110, members: 188 */ + /* size: 6976, cachelines: 109, members: 188 */ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210924025450.4138503-1-keescook@chromium.org
2021-10-14kernel/sched: Fix sched_fork() access an invalid sched_task_groupZhang Qiao
There is a small race between copy_process() and sched_fork() where child->sched_task_group point to an already freed pointer. parent doing fork() | someone moving the parent | to another cgroup -------------------------------+------------------------------- copy_process() + dup_task_struct()<1> parent move to another cgroup, and free the old cgroup. <2> + sched_fork() + __set_task_cpu()<3> + task_fork_fair() + sched_slice()<4> In the worst case, this bug can lead to "use-after-free" and cause panic as shown above: (1) parent copy its sched_task_group to child at <1>; (2) someone move the parent to another cgroup and free the old cgroup at <2>; (3) the sched_task_group and cfs_rq that belong to the old cgroup will be accessed at <3> and <4>, which cause a panic: [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0 [] Oops: 0000 [#1] SMP PTI [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G OE --------- - - 4.18.0.x86_64+ #1 [] RIP: 0010:sched_slice+0x84/0xc0 [] Call Trace: [] task_fork_fair+0x81/0x120 [] sched_fork+0x132/0x240 [] copy_process.part.5+0x675/0x20e0 [] ? __handle_mm_fault+0x63f/0x690 [] _do_fork+0xcd/0x3b0 [] do_syscall_64+0x5d/0x1d0 [] entry_SYSCALL_64_after_hwframe+0x65/0xca [] RIP: 0033:0x7f04418cd7e1 Between cgroup_can_fork() and cgroup_post_fork(), the cgroup membership and thus sched_task_group can't change. So update child's sched_task_group at sched_post_fork() and move task_fork() and __set_task_cpu() (where accees the sched_task_group) from sched_fork() to sched_post_fork(). Fixes: 8323f26ce342 ("sched: Fix race in task_group") Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
2021-10-14sched,livepatch: Use wake_up_if_idle()Peter Zijlstra
Make sure to prod idle CPUs so they call klp_update_patch_state(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Petr Mladek <pmladek@suse.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> # on s390 Link: https://lkml.kernel.org/r/20210929151723.162004989@infradead.org
2021-10-14device property: Add missed header in fwnode.hAndy Shevchenko
When adding some stuff to the header file we must not rely on implicit dependencies that are happen by luck or bugs in other headers. Hence fwnode.h needs to use bits.h directly. Fixes: c2c724c868c4 ("driver core: Add fw_devlink_parse_fwtree()") Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211013143707.80222-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-13net: delete redundant function declarationChen Wandun
The implement of function netdev_all_upper_get_next_dev_rcu has been removed in: commit f1170fd462c6 ("net: Remove all_adj_list and its references") so delete redundant declaration in header file. Fixes: f1170fd462c6 ("net: Remove all_adj_list and its references") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Link: https://lore.kernel.org/r/20211013094702.3931071-1-chenwandun@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13dt-bindings: clock: Add YAML schemas for CAMCC clocks on SC7280Taniya Das
The camera clock controller clock provider have a bunch of generic properties that are needed in a device tree. Add the CAMCC clock IDs for camera client to request for the clocks. Signed-off-by: Taniya Das <tdas@codeaurora.org> Link: https://lore.kernel.org/r/1633567425-11953-1-git-send-email-tdas@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13dt-bindings: clock: Add YAML schemas for LPASS clocks on SC7280Taniya Das
The LPASS(Low Power Audio Subsystem) clock provider have a bunch of generic properties that are needed in a device tree. Add the LPASS clock IDs for LPASS PIL client to request for the clocks. Signed-off-by: Taniya Das <tdas@codeaurora.org> Link: https://lore.kernel.org/r/1633484416-27852-2-git-send-email-tdas@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13clk: qcom: gcc-msm8994: Add modem resetKonrad Dybcio
This will be required to support the modem. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20210923162645.23257-7-konrad.dybcio@somainline.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13clk: qcom: gcc-msm8994: Add missing clocksKonrad Dybcio
This should be the last "add missing clocks" commit, as to my knowledge there are no more clocks registered within gcc. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20210923162645.23257-5-konrad.dybcio@somainline.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13clk: qcom: gcc-msm8994: Add missing NoC clocksKonrad Dybcio
Add necessary NoC clocks to provide frequency sources for relevant branch clocks. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20210923162645.23257-4-konrad.dybcio@somainline.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13clk: qcom: smd-rpm: Add QCM2290 RPM clock supportShawn Guo
Add support for RPM-managed clocks on the QCM2290 platform. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Link: https://lore.kernel.org/r/20210917030434.19859-4-shawn.guo@linaro.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13Merge tag 'mlx5-fixes-2021-10-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-10-12 * tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp net/mlx5e: Switchdev representors are not vlan challenged net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode net/mlx5: Fix cleanup of bridge delayed work ==================== Link: https://lore.kernel.org/r/20211012205323.20123-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13Merge tag 'qcom-drivers-for-5.16' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers Qualcomm driver updates for v5.16 This drops the use of power-domains for exposing the load_state from the QMP driver to clients, to avoid issues related to system suspend. SMP2P becomes wakeup capable, to allow dying remoteprocs to wake up Linux from suspend to perform recovery. It adds RPM power-domain support for SM6350 and MSM8953 and base RPM support for MSM8953 and QCM2290. It adds support for MSM8996, SDM630 and SDM660 in the SPM driver, which will enable the introduction of proper voltage scaling of the CPU subsystem. Support for releasing secondary CPUs on MSM8226 is introduced. The Asynchronous Packet Router (APR) driver is extended to support the new Generic Packet Router (GPR) variant, which is used to communicate with the firmware in the new AudioReach audio driver. Lastly it transitions a number of drivers to safer string functions, as well as switching things to use devm_platform_ioremap_resource(). * tag 'qcom-drivers-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (40 commits) soc: qcom: apr: Add GPR support soc: dt-bindings: qcom: add gpr bindings soc: qcom: apr: make code more reuseable soc: dt-bindings: qcom: apr: deprecate qcom,apr-domain property soc: dt-bindings: qcom: apr: convert to yaml dt-bindings: soc: qcom: aoss: Delete unused power-domain definitions dt-bindings: msm/dp: Remove aoss-qmp header soc: qcom: aoss: Drop power domain support dt-bindings: soc: qcom: aoss: Drop the load state power-domain soc: qcom: smp2p: Add wakeup capability to SMP2P IRQ dt-bindings: power: rpmpd: Add SM6350 to rpmpd binding dt-bindings: soc: qcom: aoss: Add SM6350 compatible soc: qcom: llcc: Disable MMUHWT retention soc: qcom: smd-rpm: Add QCM2290 compatible dt-bindings: soc: qcom: smd-rpm: Add QCM2290 compatible firmware: qcom_scm: Add compatible for MSM8953 SoC dt-bindings: firmware: qcom-scm: Document msm8953 bindings soc: qcom: pdr: Prefer strscpy over strcpy soc: qcom: rpmh-rsc: Make use of the helper function devm_platform_ioremap_resource_byname() soc: qcom: gsbi: Make use of the helper function devm_platform_ioremap_resource() ... Link: https://lore.kernel.org/r/20211012173442.1017010-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-13marvell: octeontx2: build error: unknown type name 'u64'Anders Roxell
Building an allmodconfig kernel arm64 kernel, the following build error shows up: In file included from drivers/crypto/marvell/octeontx2/cn10k_cpt.c:4: include/linux/soc/marvell/octeontx2/asm.h:38:15: error: unknown type name 'u64' 38 | static inline u64 otx2_atomic64_fetch_add(u64 incr, u64 *ptr) | ^~~ Include linux/types.h in asm.h so the compiler knows what the type 'u64' are. Fixes: af3826db74d1 ("octeontx2-pf: Use hardware register for CQE count") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Link: https://lore.kernel.org/r/20211013135743.3826594-1-anders.roxell@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13clk: qcom: Add Global Clock Controller driver for QCM2290Shawn Guo
Add Global Clock Controller (GCC) driver for QCM2290. This is a porting of gcc-scuba driver from CAF msm-4.19, with GDSC support added on top. Because the alpha_pll on the platform has a different register layout (offsets), its own clk_alpha_pll_regs_offset[] is used in the driver. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Link: https://lore.kernel.org/r/20210919023308.24498-3-shawn.guo@linaro.org Acked-by: Rob Herring <robh@kernel.org> [sboyd@kernel.org: Drop duplicate includes, clk.h include, module alias] Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-10-13RDMA/core: Set sgtable nents when using ib_dma_virt_map_sg()Logan Gunthorpe
ib_dma_map_sgtable_attrs() should be mapping the sgls and setting nents but the ib_uses_virt_dma() path falls back to ib_dma_virt_map_sg() which will not set the nents in the sgtable. Check the return value (per the map_sg calling convention) and set sgt->nents appropriately on success. Fixes: 79fbd3e1241c ("RDMA: Use the sg_table directly and remove the opencoded version from umem") Link: https://lore.kernel.org/r/20211013165942.89806-1-logang@deltatee.com Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-13gpio: max730x: Make __max730x_remove() return voidUwe Kleine-König
An spi or i2c remove callback is only called for devices that probed successfully. In this case this implies that __max730x_probe() set a non-NULL driver data. So the check ts == NULL is never true. With this check dropped, __max730x_remove() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of i2c and spi remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-10-13netdevice: demote the type of some dev_addr_set() helpersJakub Kicinski
__dev_addr_set() and dev_addr_mod() and pretty low level, let the arguments be void, there's no chance for confusion in callers converted to use them. Keep u8 in dev_addr_set() because some of the callers are converted from a loop and we want to make sure assignments are not from an array of a different type. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13decnet: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in decnet constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ipv6: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in ndisc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13llc/snap: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in LLC and SNAP constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13rose: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in rose constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ax25: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in AX25 constant. Modify callers as well (netrom, rose). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13SUNRPC: Change return value type of .pc_encodeChuck Lever
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_encode return only true or false. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Replace the "__be32 *p" parameter to .pc_encodeChuck Lever
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR encoder, and can be removed. Note also that there is a line in each encoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per encoder function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Change return value type of .pc_decodeChuck Lever
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_decode return only true or false. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13SUNRPC: Replace the "__be32 *p" parameter to .pc_decodeChuck Lever
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR decoder, and can be removed. Note also that there is a line in each decoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per decoder function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13nvmem: core: add nvmem cell post processing callbackSrinivas Kandagatla
Some NVMEM providers have certain nvmem cells encoded, which requires post processing before actually using it. For example mac-address is stored in either in ascii or delimited or reverse-order. Having a post-process callback hook to provider drivers would enable them to do this vendor specific post processing before nvmem consumers see it. Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20211013131957.30271-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-13fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)Claudio Suarez
Scroll acceleration is disabled in fbcon by hard-wiring p->scrollmode = SCROLL_REDRAW. Remove the obsolete code in fbcon.c and fbdev/core/ Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YVXTYqszZix9TxjJ@gineta.localdomain
2021-10-13drm/locking: add backtrace for locking contended locks without backoffJani Nikula
If drm_modeset_lock() returns -EDEADLK, the caller is supposed to drop all currently held locks using drm_modeset_backoff(). Failing to do so will result in warnings and backtraces on the paths trying to lock a contended lock. Add support for optionally printing the backtrace on the path that hit the deadlock and didn't gracefully handle the situation. For example, the patch [1] inadvertently dropped the return value check and error return on replacing calc_watermark_data() with intel_compute_global_watermarks(). The backtraces on the subsequent locking paths hitting WARN_ON(ctx->contended) were unhelpful, but adding the backtrace to the deadlock path produced this helpful printout: <7> [98.002465] drm_modeset_lock attempting to lock a contended lock without backoff: drm_modeset_lock+0x107/0x130 drm_atomic_get_plane_state+0x76/0x150 skl_compute_wm+0x251d/0x2b20 [i915] intel_atomic_check+0x1942/0x29e0 [i915] drm_atomic_check_only+0x554/0x910 drm_atomic_nonblocking_commit+0xe/0x50 drm_mode_atomic_ioctl+0x8c2/0xab0 drm_ioctl_kernel+0xac/0x140 Add new CONFIG_DRM_DEBUG_MODESET_LOCK to enable modeset lock debugging with stack depot and trace. [1] https://lore.kernel.org/r/20210924114741.15940-4-jani.nikula@intel.com v2: - default y if DEBUG_WW_MUTEX_SLOWPATH (Daniel) - depends on DEBUG_KERNEL Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211001091444.8177-1-jani.nikula@intel.com
2021-10-13drm/ttm_bo_api: update the description for @placement and @sgAmos Kong
Correct the argument name of @placement and added @sg description for ttm_bo_init() and ttm_bo_init_reserved(). Argument @flags was replaced to @placement by Jerome in commit 09855acb1c2e3779f25317ec9a8ffe1b1784a4a8 Argument @sg was added by Dave in commit 129b78bfca591e736e56a294f0e357d73d938f7e Signed-off-by: Amos Kong <amos@sietium.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/38eac09bf2ddd6088cc8a126e6bc4792eaa2dc88.1633462176.git.amos@sietium.com Signed-off-by: Christian König <christian.koenig@amd.com>
2021-10-12scsi: libsas: Export sas_phy_enable()Luo Jiaxing
Export sas_phy_enable() so LLDDs can directly use it to control remote phys. We already do this for companion function sas_phy_reset(). Link: https://lore.kernel.org/r/1634041588-74824-4-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch libVladimir Oltean
Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driverVladimir Oltean
As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit 40d3f295b5fe ("net: mscc: ocelot: use common tag parsing code with DSA"). With that function declared as static inline, its body is expanded inside each call site, so the dependency is broken and the DSA tagger can be built without the switch library, upon which the felix driver depends. Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with ↵Vladimir Oltean
the skb PTP header The sad reality is that when a PTP frame with a TX timestamping request is transmitted, it isn't guaranteed that it will make it all the way to the wire (due to congestion inside the switch), and that a timestamp will be taken by the hardware and placed in the timestamp FIFO where an IRQ will be raised for it. The implication is that if enough PTP frames are silently dropped by the hardware such that the timestamp ID has rolled over, it is possible to match a timestamp to an old skb. Furthermore, nobody will match on the real skb corresponding to this timestamp, since we stupidly matched on a previous one that was stale in the queue, and stopped there. So PTP timestamping will be broken and there will be no way to recover. It looks like the hardware parses the sequenceID from the PTP header, and also provides that metadata for each timestamp. The driver currently ignores this, but it shouldn't. As an extra resiliency measure, do the following: - check whether the PTP sequenceID also matches between the skb and the timestamp, treat the skb as stale otherwise and free it - if we see a stale skb, don't stop there and try to match an skb one more time, chances are there's one more skb in the queue with the same timestamp ID, otherwise we wouldn't have ever found the stale one (it is by timestamp ID that we matched it). While this does not prevent PTP packet drops, it at least prevents the catastrophic consequences of incorrect timestamp matching. Since we already call ptp_classify_raw in the TX path, save the result in the skb->cb of the clone, and just use that result in the interrupt code path. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: avoid overflowing the PTP timestamp FIFOVladimir Oltean
PTP packets with 2-step TX timestamp requests are matched to packets based on the egress port number and a 6-bit timestamp identifier. All PTP timestamps are held in a common FIFO that is 128 entry deep. This patch ensures that back-to-back timestamping requests cannot exceed the hardware FIFO capacity. If that happens, simply send the packets without requesting a TX timestamp to be taken (in the case of felix, since the DSA API has a void return code in ds->ops->port_txtstamp) or drop them (in the case of ocelot). I've moved the ts_id_lock from a per-port basis to a per-switch basis, because we need separate accounting for both numbers of PTP frames in flight. And since we need locking to inc/dec the per-switch counter, that also offers protection for the per-port counter and hence there is no reason to have a per-port counter anymore. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: make use of all 63 PTP timestamp identifiersVladimir Oltean
At present, there is a problem when user space bombards a port with PTP event frames which have TX timestamping requests (or when a tc-taprio offload is installed on a port, which delays the TX timestamps by a significant amount of time). The driver will happily roll over the 2-bit timestamp ID and this will cause incorrect matches between an skb and the TX timestamp collected from the FIFO. The Ocelot switches have a 6-bit PTP timestamp identifier, and the value 63 is reserved, so that leaves identifiers 0-62 to be used. The timestamp identifiers are selected by the REW_OP packet field, and are actually shared between CPU-injected frames and frames which match a VCAP IS2 rule that modifies the REW_OP. The hardware supports partitioning between the two uses of the REW_OP field through the PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2. The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping, and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in ocelot_init_timestamp() which makes all timestamp identifiers available to CPU injection. Therefore, we can make use of all 63 timestamp identifiers, which should allow more timestampable packets to be in flight on each port. This is only part of the solution, more issues will be addressed in future changes. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch ↵Vladimir Oltean
driver It's nice to be able to test a tagging protocol with dsa_loop, but not at the cost of losing the ability of building the tagging protocol and switch driver as modules, because as things stand, there is a circular dependency between the two. Tagging protocol drivers cannot depend on switch drivers, that is a hard fact. The reasoning behind the blamed patch was that accessing dp->priv should first make sure that the structure behind that pointer is what we really think it is. Currently the "sja1105" and "sja1110" tagging protocols only operate with the sja1105 switch driver, just like any other tagging protocol and switch combination. The only way to mix and match them is by modifying the code, and this applies to dsa_loop as well (by default that uses DSA_TAG_PROTO_NONE). So while in principle there is an issue, in practice there isn't one. Until we extend dsa_loop to allow user space configuration, treat the problem as a non-issue and just say that DSA ports found by tag_sja1105 are always sja1105 ports, which is in fact true. But keep the dsa_port_is_sja1105 function so that it's easy to patch it during testing, and rely on dead code elimination. Fixes: 994d2cbb08ca ("net: dsa: tag_sja1105: be dsa_loop-safe") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driverVladimir Oltean
The problem is that DSA tagging protocols really must not depend on the switch driver, because this creates a circular dependency at insmod time, and the switch driver will effectively not load when the tagging protocol driver is missing. The code was structured in the way it was for a reason, though. The DSA driver-facing API for PTP timestamping relies on the assumption that two-step TX timestamps are provided by the hardware in an out-of-band manner, typically by raising an interrupt and making that timestamp available inside some sort of FIFO which is to be accessed over SPI/MDIO/etc. So the API puts .port_txtstamp into dsa_switch_ops, because it is expected that the switch driver needs to save some state (like put the skb into a queue until its TX timestamp arrives). On SJA1110, TX timestamps are provided by the switch as Ethernet packets, so this makes them be received and processed by the tagging protocol driver. This in itself is great, because the timestamps are full 64-bit and do not require reconstruction, and since Ethernet is the fastest I/O method available to/from the switch, PTP timestamps arrive very quickly, no matter how bottlenecked the SPI connection is, because SPI interaction is not needed at all. DSA's code structure and strict isolation between the tagging protocol driver and the switch driver break the natural code organization. When the tagging protocol driver receives a packet which is classified as a metadata packet containing timestamps, it passes those timestamps one by one to the switch driver, which then proceeds to compare them based on the recorded timestamp ID that was generated in .port_txtstamp. The communication between the tagging protocol and the switch driver is done through a method exported by the switch driver, sja1110_process_meta_tstamp. To satisfy build requirements, we force a dependency to build the tagging protocol driver as a module when the switch driver is a module. However, as explained in the first paragraph, that causes the circular dependency. To solve this, move the skb queue from struct sja1105_private :: struct sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data. The latter is a data structure for which hacks have already been put into place to be able to create persistent storage per switch that is accessible from the tagging protocol driver (see sja1105_setup_ports). With the skb queue directly accessible from the tagging protocol driver, we can now move sja1110_process_meta_tstamp into the tagging driver itself, and avoid exporting a symbol. Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Delete reload enable/disable interfaceLeon Romanovsky
Commit a0c76345e3d3 ("devlink: disallow reload operation during device cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload operation from racing with device probe/dismantle. After recent changes to move devlink_register() to the end of device probe and devlink_unregister() to the beginning of device dismantle, these races can no longer happen. Reload operations will be denied if the devlink instance is unregistered and devlink_unregister() will block until all in-flight operations are done. Therefore, remove these devlink_reload_{enable,disable}() APIs. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Allow control devlink ops behavior through feature maskLeon Romanovsky
Introduce new devlink call to set feature mask to control devlink behavior during device initialization phase after devlink_alloc() is already called. This allows us to set reload ops based on device property which is not known at the beginning of driver initialization. For the sake of simplicity, this API lacks any type of locking and needs to be called before devlink_register() to make sure that no parallel access to the ops is possible at this stage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Move netdev_to_devlink helpers to devlink.cLeon Romanovsky
Both netdev_to_devlink and netdev_to_devlink_port are used in devlink.c only, so move them in order to reduce their scope. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Reduce struct devlink exposureLeon Romanovsky
The declaration of struct devlink in general header provokes the situation where internal fields can be accidentally used by the driver authors. In order to reduce such possible situations, let's reduce the namespace exposure of struct devlink. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12PCI: Return NULL for to_pci_driver(NULL)Bjorn Helgaas
to_pci_driver() takes a pointer to a struct device_driver and uses container_of() to find the struct pci_driver that contains it. If given a NULL pointer to a struct device_driver, return a NULL pci_driver pointer instead of applying container_of() to NULL. This simplifies callers that would otherwise have to check for a NULL pointer first. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-12net/mlx5e: Mutually exclude RX-FCS and RX-port-timestampAya Levin
Due to current HW arch limitations, RX-FCS (scattering FCS frame field to software) and RX-port-timestamp (improved timestamp accuracy on the receive side) can't work together. RX-port-timestamp is not controlled by the user and it is enabled by default when supported by the HW/FW. This patch sets RX-port-timestamp opposite to RX-FCS configuration. Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12RDMA/rxe: Create AH index and return to user spaceBob Pearson
Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the index in UD send WRs to the driver and returning the index to the rxe provider. Modify rxe_create_ah() to add an index to AH when created and if called from a new user provider return it to user space. If called from an old provider mark the AH as not having a useful index. Modify rxe_destroy_ah to drop the index before deleting the object. Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wrBob Pearson
Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr placing it in wr.ud at the same offset as it was previously. This has the effect of increasing the size of struct rxe_send_wr while keeping the size of struct rxe_send_wqe the same. This better reflects the use of this field which is only used for UD sends. This change has no effect on ABI compatibility so the modified rxe driver will operate with older versions of rdma-core. Link: https://lore.kernel.org/r/20211007204051.10086-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12Merge branch '5.15/scsi-fixes' into 5.16/scsi-stagingMartin K. Petersen
Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12RDMA/mlx5: Add modify_op_stat() supportAharon Landau
Add support for ib callback modify_op_stat() to add or remove an optional counter. When adding, a steering flow table is created with a rule that catches and counts all the matching packets. When removing, the table and flow counter are destroyed. Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>