summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-08-09block: return ELEVATOR_DISCARD_MERGE if possibleMing Lei
When merging one bio to request, if they are discard IO and the queue supports multi-range discard, we need to return ELEVATOR_DISCARD_MERGE because both block core and related drivers(nvme, virtio-blk) doesn't handle mixed discard io merge(traditional IO merge together with discard merge) well. Fix the issue by returning ELEVATOR_DISCARD_MERGE in this situation, so both blk-mq and drivers just need to handle multi-range discard. Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Fixes: 2705dfb20947 ("block: fix discard request merge") Link: https://lore.kernel.org/r/20210729034226.1591070-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09SUNRPC: Record timeout value in xprt_retransmit tracepointChuck Lever
The client can alter the timeout value after each retransmit. Record the updated timeout value in the trace log. Suggested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09SUNRPC: xprt_retransmit() displays the the NULL procedure incorrectlyChuck Lever
Currently: xprt_retransmit: task:11@1 xid=0x55a7ffac nfsv4 (null) ntrans=2 should be: xprt_retransmit: task:11@1 xid=0x55a7ffac nfsv4 NULL ntrans=2 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09SUNRPC: Update trace flagsChuck Lever
Recent patches added RPC_TASK_MOVEABLE, XPRT_OFFLINE, and XPRT_REMOVE. Update the tracepoint display macros to display these flags properly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09SUNRPC: Remove unneeded TRACE_DEFINE_ENUMsChuck Lever
Clean up: TRACE_DEFINE_ENUM is needed only for enums, not for C macros. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09bpf: Add _kernel suffix to internal lockdown_bpf_readDaniel Borkmann
Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-09counter: Rename counter_count_function to counter_functionWilliam Breathitt Gray
The phrase "Counter Count function" is verbose and unintentionally implies that function is a Count extension. This patch adjusts the Counter subsystem code to use the more direct "Counter function" phrase to make the intent of this code clearer. Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com> Cc: Patrick Havelange <patrick.havelange@essensium.com> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Cc: Kamel Bouhara <kamel.bouhara@bootlin.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: David Lechner <david@lechnology.com> Acked-by: Syed Nayyar Waris <syednwaris@gmail.com> Reviewed-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Link: https://lore.kernel.org/r/8268c54d6f42075a19bb08151a37831e22652499.1627990337.git.vilhelm.gray@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-08-09counter: Rename counter_signal_value to counter_signal_levelWilliam Breathitt Gray
Signal values will always be levels so let's be explicit it about it to make the intent of the code clear. Cc: Oleksij Rempel <o.rempel@pengutronix.de> Cc: Kamel Bouhara <kamel.bouhara@bootlin.com> Acked-by: Syed Nayyar Waris <syednwaris@gmail.com> Reviewed-by: David Lechner <david@lechnology.com> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Link: https://lore.kernel.org/r/3f17010abe2415859cea9a5fddabd3c97f635ff5.1627990337.git.vilhelm.gray@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-08-09block: remove the bd_bdi in struct block_deviceChristoph Hellwig
Just retrieve the bdi from the disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: move the bdi from the request_queue to the gendiskChristoph Hellwig
The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: add a queue_has_disk helperChristoph Hellwig
Add a helper to check if a gendisk is associated with a request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: pass a gendisk to blk_queue_update_readaheadChristoph Hellwig
.. and rename the function to disk_update_readahead. This is in preparation for moving the BDI from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: remove support for delayed queue registrationsChristoph Hellwig
Now that device mapper has been changed to register the disk once it is fully ready all this code is unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: support delayed holder registrationChristoph Hellwig
device mapper needs to register holders before it is ready to do I/O. Currently it does so by registering the disk early, which can leave the disk and queue in a weird half state where the queue is registered with the disk, except for sysfs and the elevator. And this state has been a bit promlematic before, and will get more so when sorting out the responsibilities between the queue and the disk. Support registering holders on an initialized but not registered disk instead by delaying the sysfs registration until the disk is registered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: look up holders by bdevChristoph Hellwig
Invert they way the holder relations are tracked. This very slightly reduces the memory overhead for partitioned devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210804094147.459763-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: make the block holder code optionalChristoph Hellwig
Move the block holder code into a separate file as it is not in any way related to the other block_dev.c code, and add a new selectable config option for it so that we don't have to build it without any remapped drivers selected. The Kconfig symbol contains a _DEPRECATED suffix to match the comments added in commit 49731baa41df ("block: restore multiple bd_link_disk_holder() support"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09iommu: return full error code from iommu_map_sg[_atomic]()Logan Gunthorpe
Convert to ssize_t return code so the return code from __iommu_map() can be returned all the way down through dma_iommu_map_sg(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-08-09dma-mapping: allow map_sg() ops to return negative error codesLogan Gunthorpe
Allow dma_map_sgtable() to pass errors from the map_sg() ops. This will be required for returning appropriate error codes when mapping P2PDMA memory. Introduce __dma_map_sg_attrs() which will return the raw error code from the map_sg operation (whether it be negative or zero). Then add a dma_map_sg_attrs() wrapper to convert any negative errors to zero to satisfy the existing calling convention. dma_map_sgtable() defines three error codes that .map_sg implementations are allowed to return: -EINVAL, -ENOMEM and -EIO. The latter of which is a generic return for cases that are passing DMA_MAPPING_ERROR through. dma_map_sgtable() will convert a zero error return for old map_sg() ops into a -EIO return and return any negative errors as reported. This allows map_sg implementations to start returning multiple negative error codes. Legacy map_sg implementations can continue to return zero until they are all converted. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-08-09dt-bindings: interconnect: Add Qualcomm SC8180x DT bindingsGeorgi Djakov
Add compatibles and port definitions for the SC8180x RPMH interconnect providers. Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> [bjorn: Split defines from driver patch and added binding update] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210723194243.3675795-1-bjorn.andersson@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org>
2021-08-09thunderbolt: Add vendor specific NHI quirk for auto-clearing interrupt statusSanjay R Mehta
Introduce nhi_check_quirks() routine to handle any vendor specific quirks to manage a hardware specific implementation. On Intel hardware the USB4 controller supports clearing the interrupt status register automatically right after it is being issued. For this reason add a new quirk that does that on all Intel hardware. Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2021-08-09devlink: Set device as early as possibleLeon Romanovsky
All kernel devlink implementations call to devlink_alloc() during initialization routine for specific device which is used later as a parent device for devlink_register(). Such late device assignment causes to the situation which requires us to call to device_register() before setting other parameters, but that call opens devlink to the world and makes accessible for the netlink users. Any attempt to move devlink_register() to be the last call generates the following error due to access to the devlink->dev pointer. [ 8.758862] devlink_nl_param_fill+0x2e8/0xe50 [ 8.760305] devlink_param_notify+0x6d/0x180 [ 8.760435] __devlink_params_register+0x2f1/0x670 [ 8.760558] devlink_params_register+0x1e/0x20 The simple change of API to set devlink device in the devlink_alloc() instead of devlink_register() fixes all this above and ensures that prior to call to devlink_register() everything already set. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09mfd: db8500-prcmu: Handle missing FW variantLinus Walleij
There was an "unknown" firmware variant turning up in the wild causing problems in the clock driver. Add this missing variant and clarify that varian 11 and 15 are Samsung variants, as this is now very well known from released products. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-09netfilter: x_tables: never register tables by defaultFlorian Westphal
For historical reasons x_tables still register tables by default in the initial namespace. Only newly created net namespaces add the hook on demand. This means that the init_net always pays hook cost, even if no filtering rules are added (e.g. only used inside a single netns). Note that the hooks are added even when 'iptables -L' is called. This is because there is no way to tell 'iptables -A' and 'iptables -L' apart at kernel level. The only solution would be to register the table, but delay hook registration until the first rule gets added (or policy gets changed). That however means that counters are not hooked either, so 'iptables -L' would always show 0-counters even when traffic is flowing which might be unexpected. This keeps table and hook registration consistent with what is already done in non-init netns: first iptables(-save) invocation registers both table and hooks. This applies the same solution adopted for ebtables. All tables register a template that contains the l3 family, the name and a constructor function that is called when the initial table has to be added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-09Merge 5.14-rc5 into driver-core-nextGreg Kroah-Hartman
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09Merge 5.14-rc5 into staging-nextGreg Kroah-Hartman
We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09Merge 5.14-rc5 into char-misc-nextGreg Kroah-Hartman
We need the fixes in here as well, and resolves some merge issues with the mhi codebase. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09Merge 5.14-rc5 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09Merge 5.14-rc5 into usb-nextGreg Kroah-Hartman
We need the usb fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-09Merge tag 'v5.14-rc5' into nextVinod Koul
Linux 5.14-rc5
2021-08-08net: dsa: sja1105: rely on DSA core tracking of port learning stateVladimir Oltean
Now that DSA keeps track of the port learning state, it becomes superfluous to keep an additional variable with this information in the sja1105 driver. Remove it. The DSA core's learning state is present in struct dsa_port *dp. To avoid the antipattern where we iterate through a DSA switch's ports and then call dsa_to_port to obtain the "dp" reference (which is bad because dsa_to_port iterates through the DSA switch tree once again), just iterate through the dst->ports and operate on those directly. The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA does not touch the learning state of those ports - drivers are free to do what they wish on them. Mark that information with a comment in struct dsa_port and let sja1105 set dp->learning for cascade ports. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08net: dsa: centralize fast ageing when address learning is turned offVladimir Oltean
Currently DSA leaves it down to device drivers to fast age the FDB on a port when address learning is disabled on it. There are 2 reasons for doing that in the first place: - when address learning is disabled by user space, through IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user space typically wants to achieve is to operate in a mode with no dynamic FDB entry on that port. But if the port is already up, some addresses might have been already learned on it, and it seems silly to wait for 5 minutes for them to expire until something useful can be done. - when a port leaves a bridge and becomes standalone, DSA turns off address learning on it. This also has the nice side effect of flushing the dynamically learned bridge FDB entries on it, which is a good idea because standalone ports should not have bridge FDB entries on them. We let drivers manage fast ageing under this condition because if DSA were to do it, it would need to track each port's learning state, and act upon the transition, which it currently doesn't. But there are 2 reasons why doing it is better after all: - drivers might get it wrong and not do it (see b53_port_set_learning) - we would like to flush the dynamic entries from the software bridge too, and letting drivers do that would be another pain point So track the port learning state and trigger a fast age process automatically within DSA. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08drm/gem: Provide offset-adjusted framebuffer BO mappingsThomas Zimmermann
Add an additional argument to drm_gem_fb_vmap() to return each BO's mapping adjusted by the respective offset. Update all callers. The newly returned values point to the first byite of the data stored in the framebuffer BOs. Drivers that access the BO data should use it. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210803125928.27780-2-tzimmermann@suse.de
2021-08-08drm/simple-kms: Support custom CRTC stateThomas Zimmermann
Simple KMS helpers already support custom state for planes. Extend the helpers to support custom CRTC state as well. Drivers can set the reset, duplicate and destroy callbacks for the display pipeline's CRTC state and inherit from struct drm_crtc_state by embedding an instance. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210714142240.21979-12-tzimmermann@suse.de
2021-08-08Merge tag 'tty-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty/serial driver fixes for 5.14-rc5 to resolve a number of reported problems. They include: - mips serial driver fixes - 8250 driver fixes for reported problems - fsl_lpuart driver fixes - other tiny driver fixes All have been in linux-next for a while with no reported problems" * tag 'tty-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts. serial: 8250_mtk: fix uart corruption issue when rx power off tty: serial: fsl_lpuart: fix the wrong return value in lpuart32_get_mctrl serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver serial: 8250: fix handle_irq locking serial: tegra: Only print FIFO error message when an error occurs MIPS: Malta: Do not byte-swap accesses to the CBUS UART serial: 8250: Mask out floating 16/32-bit bus bits serial: max310x: Unprepare and disable clock in error path
2021-08-08Merge tag 'usb-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes for 5.14-rc5. They resolve a number of small reported issues, including: - cdnsp driver fixes - usb serial driver fixes and device id updates - usb gadget hid fixes - usb host driver fixes - usb dwc3 driver fixes - other usb gadget driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits) usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events usb: dwc3: gadget: Avoid runtime resume if disabling pullup usb: dwc3: gadget: Use list_replace_init() before traversing lists USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2 USB: serial: pl2303: fix GT type detection USB: serial: option: add Telit FD980 composition 0x1056 USB: serial: pl2303: fix HX type detection USB: serial: ch341: fix character loss at high transfer rates usb: cdnsp: Fix the IMAN_IE_SET and IMAN_IE_CLEAR macro usb: cdnsp: Fixed issue with ZLP usb: cdnsp: Fix incorrect supported maximum speed usb: cdns3: Fixed incorrect gadget state usb: gadget: f_hid: idle uses the highest byte for duration Revert "thunderbolt: Hide authorized attribute if router does not support PCIe tunnels" usb: otg-fsm: Fix hrtimer list corruption usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses usb: musb: Fix suspend and resume issues for PHYs on I2C and SPI usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers usb: gadget: f_hid: fixed NULL pointer dereference usb: gadget: remove leaked entry from udc driver list ...
2021-08-08once: Fix panic when module unloadKefeng Wang
DO_ONCE DEFINE_STATIC_KEY_TRUE(___once_key); __do_once_done once_disable_jump(once_key); INIT_WORK(&w->work, once_deferred); struct once_work *w; w->key = key; schedule_work(&w->work); module unload //*the key is destroy* process_one_work once_deferred BUG_ON(!static_key_enabled(work->key)); static_key_count((struct static_key *)x) //*access key, crash* When module uses DO_ONCE mechanism, it could crash due to the above concurrency problem, we could reproduce it with link[1]. Fix it by add/put module refcount in the once work process. [1] https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/ Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Minmin chen <chenmingmin@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08devlink: Simplify devlink port API callsLeon Romanovsky
Devlink port already has pointer to the devlink instance and all API calls that forward these devlink ports to the drivers perform same "devlink_port->devlink" assignment before actual call. This patch removes useless parameter and allows us in the future to create specific devlink_port_ops to manage user space access with reliable ops assignment. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-07dt-bindings: msm: dsi: document phy-type property for 7nm dsi phyJonathan Marek
Document a new phy-type property which will be used to determine whether the phy should operate in D-PHY or C-PHY mode. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20210617144349.28448-3-jonathan@marek.ca Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-08-06srcutiny: Mark read-side data racesPaul E. McKenney
This commit marks some interrupt-induced read-side data races in __srcu_read_lock(), __srcu_read_unlock(), and srcu_torture_stats_print(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rculist: Unify documentation about missing list_empty_rcu()Julian Wiedmann
We have two separate sections that talk about why list_empty_rcu() is not needed, so this commit consolidates them. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> [ paulmck: The usual wordsmithing. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Mark accesses to ->rcu_read_lock_nestingPaul E. McKenney
KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but in the past, the overhead of marked accesses was excessive. However, that was long ago, and much has changed since then, both in terms of hardware and of compilers. Here is data taken on an eight-core laptop using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built using gcc version 9.3.0, with all data in nanoseconds. Unmarked accesses (status quo), measured by three refscale runs: Minimum reader duration: 3.286 2.851 3.395 Median reader duration: 3.698 3.531 3.4695 Maximum reader duration: 4.481 5.215 5.157 Marked accesses, also measured by three refscale runs: Minimum reader duration: 3.501 3.677 3.580 Median reader duration: 4.053 3.723 3.895 Maximum reader duration: 7.307 4.999 5.511 This focused microbenhmark shows only sub-nanosecond differences which are unlikely to be visible at the system level. This commit therefore marks data-racing accesses to ->rcu_read_lock_nesting. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Remove special bit at the bottom of the ->dynticks counterJoel Fernandes (Google)
Commit b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter") reserved a bit at the bottom of the ->dynticks counter to defer flushing of TLBs, but this facility never has been used. This commit therefore removes this capability along with the rcu_eqs_special_set() function used to trigger it. Link: https://lore.kernel.org/linux-doc/CALCETrWNPOOdTrFabTDd=H7+wc6xJ9rJceg6OL1S0rTV5pfSsA@mail.gmail.com/ Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> [ paulmck: Forward-port to v5.13-rc1. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06drm/amdkfd: Allow querying SVM attributes that are clearFelix Kuehling
Currently the SVM get_attr call allows querying, which flags are set in the entire address range. Add the opposite query, which flags are clear in the entire address range. Both queries can be combined in a single get_attr call, which allows answering questions such as, "is this address range coherent, non-coherent, or a mix of both"? Proposed userspace for UAPI: https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/memory_model_queries Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yand <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-06Merge tag 'soc-fixes-5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Lots of small fixes for Arm SoCs this time, nothing too worrying: - omap/beaglebone boot regression fix in gpt12 timer - revert for i.mx8 soc driver breaking as a platform_driver - kexec/kdump fixes for op-tee - various fixes for incorrect DT settings on imx, mvebu, omap, stm32, and tegra causing problems. - device tree fixes for static checks in nomadik, versatile, stm32 - code fixes for issues found in build testing and with static checking on tegra, ixp4xx, imx, omap" * tag 'soc-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits) soc: ixp4xx/qmgr: fix invalid __iomem access soc: ixp4xx: fix printing resources ARM: ixp4xx: goramo_mlr depends on old PCI driver ARM: ixp4xx: fix compile-testing soc drivers soc/tegra: Make regulator couplers depend on CONFIG_REGULATOR ARM: dts: nomadik: Fix up interrupt controller node names ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM ARM: dts: stm32: Prefer HW RTC on DHCOM SoM omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator ARM: dts: am437x-l4: fix typo in can@0 node ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 bus: ti-sysc: AM3: RNG is GP only ARM: omap2+: hwmod: fix potential NULL pointer access arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz arm64: dts: ls1028: sl28: fix networking for variant 2 ...
2021-08-06xfs: introduce CPU hotplug infrastructureDave Chinner
We need to move to per-cpu state for both deferred inode inactivation and CIL tracking, but to do that we need to handle CPUs being removed from the system by the hot-plug code. Introduce generic XFS infrastructure to handle CPU hotplug events that is set up at module init time and torn down at module exit time. Initially, we only need CPU dead notifications, so we only set up a callback for these notifications. The infrastructure can be updated in future for other CPU hotplug state machine notifications easily if ever needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: rearrange some macros, fix function prototypes] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict range element expansion in ipset to avoid soft lockup, from Jozsef Kadlecsik. 2) Memleak in error path for nf_conntrack_bridge for IPv4 packets, from Yajun Deng. 3) Simplify conntrack garbage collection strategy to avoid frequent wake-ups, from Florian Westphal. 4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name. 5) Missing chain family netlink attribute in chain description in nfnetlink_hook. 6) Incorrect sequence number on nfnetlink_hook dumps. 7) Use netlink request family in reply message for consistency. 8) Remove offload_pickup sysctl, use conntrack for established state instead, from Florian Westphal. 9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since NFPROTO_INET is not exposed through nfnetlink_hook. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nfnetlink_hook: translate inet ingress to netdev netfilter: conntrack: remove offload_pickup sysctl again netfilter: nfnetlink_hook: Use same family as request message netfilter: nfnetlink_hook: use the sequence number of the request message netfilter: nfnetlink_hook: missing chain family netfilter: nfnetlink_hook: strip off module name from hookfn netfilter: conntrack: collect all entries in one cycle netfilter: nf_conntrack_bridge: Fix memory leak when error netfilter: ipset: Limit the maximal range of consecutive elements to add/delete ==================== Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06netfilter: conntrack: remove offload_pickup sysctl againFlorian Westphal
These two sysctls were added because the hardcoded defaults (2 minutes, tcp, 30 seconds, udp) turned out to be too low for some setups. They appeared in 5.14-rc1 so it should be fine to remove it again. Marcelo convinced me that there should be no difference between a flow that was offloaded vs. a flow that was not wrt. timeout handling. Thus the default is changed to those for TCP established and UDP stream, 5 days and 120 seconds, respectively. Marcelo also suggested to account for the timeout value used for the offloading, this avoids increase beyond the value in the conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls. Example: nf_conntrack_udp_timeout_stream=60 nf_flowtable_udp_timeout=60 This will remove offloaded udp flows after one minute, rather than two. An earlier version of this patch also cleared the ASSURED bit to allow nf_conntrack to evict the entry via early_drop (i.e., table full). However, it looks like we can safely assume that connection timed out via HW is still in established state, so this isn't needed. Quoting Oz: [..] the hardware sends all packets with a set FIN flags to sw. [..] Connections that are aged in hardware are expected to be in the established state. In case it turns out that back-to-sw-path transition can occur for 'dodgy' connections too (e.g., one side disappeared while software-path would have been in RETRANS timeout), we can adjust this later. Cc: Oz Shlomo <ozsh@nvidia.com> Cc: Paul Blakey <paulb@nvidia.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-06netfilter: nfnetlink_hook: missing chain familyPablo Neira Ayuso
The family is relevant for pseudo-families like NFPROTO_INET otherwise the user needs to rely on the hook function name to differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names. Add nfnl_hook_chain_desc_attributes instead of using the existing NFTA_CHAIN_* attributes, since these do not provide a family number. Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-06dmaengine: dw: Convert members to u32 in platform dataAndy Shevchenko
u32 is a type that is used for properties retrieval from DT. With the type change it allows to clean up properties reading routine. While at it, order the fields in way how they are parsed. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20210802184355.49879-2-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-08-06PM: EM: Increase energy calculation precisionLukasz Luba
The Energy Model (EM) provides useful information about device power in each performance state to other subsystems like: Energy Aware Scheduler (EAS). The energy calculation in EAS does arithmetic operation based on the EM em_cpu_energy(). Current implementation of that function uses em_perf_state::cost as a pre-computed cost coefficient equal to: cost = power * max_frequency / frequency. The 'power' is expressed in milli-Watts (or in abstract scale). There are corner cases when the EAS energy calculation for two Performance Domains (PDs) return the same value. The EAS compares these values to choose smaller one. It might happen that this values are equal due to rounding error. In such scenario, we need better resolution, e.g. 1000 times better. To provide this possibility increase the resolution in the em_perf_state::cost for 64-bit architectures. The cost of increasing resolution on 32-bit is pretty high (64-bit division) and is not justified since there are no new 32bit big.LITTLE EAS systems expected which would benefit from this higher resolution. This patch allows to avoid the rounding to milli-Watt errors, which might occur in EAS energy estimation for each PD. The rounding error is common for small tasks which have small utilization value. There are two places in the code where it makes a difference: 1. In the find_energy_efficient_cpu() where we are searching for best_delta. We might suffer there when two PDs return the same result, like in the example below. Scenario: Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There are quite a few small tasks ~10-15 util. These tasks would suffer for the rounding error. These utilization values are typical when running games on Android. One of our partners has reported 5..10mA less battery drain when running with increased resolution. Some details: We have two PDs: PD0 (big) and PD1 (little) Let's compare w/o patch set ('old') and w/ patch set ('new') We are comparing energy w/ task and w/o task placed in the PDs a) 'old' w/o patch set, PD0 task_util = 13 cost = 480 sum_util_w/o_task = 215 sum_util_w_task = 228 scale_cpu = 1024 energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100 energy_w_task = 480 * 228 / 1024 = 106.87 => 106 energy_diff = 106 - 100 = 6 (this is equal to 'old' PD1's energy_diff in 'c)') b) 'new' w/ patch set, PD0 task_util = 13 cost = 480 * 1000 = 480000 sum_util_w/o_task = 215 sum_util_w_task = 228 energy_w/o_task = 480000 * 215 / 1024 = 100781 energy_w_task = 480000 * 228 / 1024 = 106875 energy_diff = 106875 - 100781 = 6094 (this is not equal to 'new' PD1's energy_diff in 'd)') c) 'old' w/o patch set, PD1 task_util = 13 cost = 160 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160 * 283 / 355 = 127.55 => 127 energy_w_task = 160 * 296 / 355 = 133.41 => 133 energy_diff = 133 - 127 = 6 (this is equal to 'old' PD0's energy_diff in 'a)') d) 'new' w/ patch set, PD1 task_util = 13 cost = 160 * 1000 = 160000 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160000 * 283 / 355 = 127549 energy_w_task = 160000 * 296 / 355 = 133408 energy_diff = 133408 - 127549 = 5859 (this is not equal to 'new' PD0's energy_diff in 'b)') 2. Difference in the 6% energy margin filter at the end of find_energy_efficient_cpu(). With this patch the margin comparison also has better resolution, so it's possible to have better task placement thanks to that. Fixes: 27871f7a8a341ef ("PM: Introduce an Energy Model management framework") Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>