linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-10-02	ethernet: use eth_hw_addr_set() instead of ether_addr_copy()	Jakub Kicinski
	Convert Ethernet from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02	net: mscc: ocelot: write full VLAN TCI in the injection header	Vladimir Oltean
	The VLAN TCI contains more than the VLAN ID, it also has the VLAN PCP and Drop Eligibility Indicator. If the ocelot driver is going to write the VLAN header inside the DSA tag, it could just as well write the entire TCI. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02	net: mscc: ocelot: support egress VLAN rewriting via VCAP ES0	Vladimir Oltean
	Currently the ocelot driver does support the 'vlan modify' action, but in the ingress chain, and it is offloaded to VCAP IS1. This action changes the classified VLAN before the packet enters the bridging service, and the bridging works with the classified VLAN modified by VCAP IS1. That is good for some use cases, but there are others where the VLAN must be modified at the stage of the egress port, after the packet has exited the bridging service. One example is simulating IEEE 802.1CB active stream identification filters ("active" means that not only the rule matches on a packet flow, but it is also able to change some headers). For example, a stream is replicated on two egress ports, but they must have different VLAN IDs on egress ports A and B. This seems like a task for the VCAP ES0, but that currently only supports pushing the ES0 tag A, which is specified in the rule. Pushing another VLAN header is not what we want, but rather overwriting the existing one. It looks like when we push the ES0 tag A, it is actually possible to not only take the ES0 tag A's value from the rule itself (VID_A_VAL), but derive it from the following formula: ES0_TAG_A = Classified VID + VID_A_VAL Otherwise said, ES0_TAG_A can be used to increment with a given value the VLAN ID that the packet was already classified to, and the packet will have this value as an outer VLAN tag. This new VLAN ID value then gets stripped on egress (or not) according to the value of the native VLAN from the bridging service. While the hardware will happily increment the classified VLAN ID for all packets that match the ES0 rule, in practice this would be rather insane, so we only allow this kind of ES0 action if the ES0 filter contains a VLAN ID too, so as to restrict the matching on a known classified VLAN. If we program VID_A_VAL with the delta between the desired final VLAN (ES0_TAG_A) and the classified VLAN, we obtain the desired behavior. It doesn't look like it is possible with the tc-vlan action to modify the VLAN ID but not the PCP. In hardware it is possible to leave the PCP to the classified value, but we unconditionally program it to overwrite it with the PCP value from the rule. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net (v2) The following patchset contains Netfilter fixes for net: 1) Move back the defrag users fields to the global netns_nf area. Kernel fails to boot if conntrack is builtin and kernel is booted with: nf_conntrack.enable_hooks=1. From Florian Westphal. 2) Rule event notification is missing relevant context such as the position handle and the NLM_F_APPEND flag. 3) Rule replacement is expanded to add + delete using the existing rule handle, reverse order of this operation so it makes sense from rule notification standpoint. 4) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags from the rule notification path. Patches #2, #3 and #4 are used by 'nft monitor' and 'iptables-monitor' userspace utilities which are not correctly representing the following operations through netlink notifications: - rule insertions - rule addition/insertion from position handle - create table/chain/set/map/flowtable/... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02	netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification	Pablo Neira Ayuso
	Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event notifications, otherwise userspace cannot distiguish between create and add commands. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-01	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	Jakub Kicinski
	Daniel Borkmann says: ==================== bpf-next 2021-10-02 We've added 85 non-merge commits during the last 15 day(s) which contain a total of 132 files changed, 13779 insertions(+), 6724 deletions(-). The main changes are: 1) Massive update on test_bpf.ko coverage for JITs as preparatory work for an upcoming MIPS eBPF JIT, from Johan Almbladh. 2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool, with driver support for i40e and ice from Magnus Karlsson. 3) Add legacy uprobe support to libbpf to complement recently merged legacy kprobe support, from Andrii Nakryiko. 4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky. 5) Support saving the register state in verifier when spilling <8byte bounded scalar to the stack, from Martin Lau. 6) Add libbpf opt-in for stricter BPF program section name handling as part of libbpf 1.0 effort, from Andrii Nakryiko. 7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov. 8) Fix skel_internal.h to propagate errno if the loader indicates an internal error, from Kumar Kartikeya Dwivedi. 9) Fix build warnings with -Wcast-function-type so that the option can later be enabled by default for the kernel, from Kees Cook. 10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it otherwise errors out when encountering them, from Toke Høiland-Jørgensen. 11) Teach libbpf to recognize specialized maps (such as for perf RB) and internally remove BTF type IDs when creating them, from Hengqi Chen. 12) Various fixes and improvements to BPF selftests. ==================== Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01	Bluetooth: Rename driver .prevent_wake to .wakeup	Luiz Augusto von Dentz
	prevent_wake logic is backward since what it is really checking is if the device may wakeup the system or not, not that it will prevent the to be awaken. Also looking on how other subsystems have the entry as power/wakeup this also renames the force_prevent_wake to force_wakeup in vhci driver. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-01	Merge series "Add support for on demand pipeline setup/destroy" from Peter ↵	Mark Brown
	Ujfalusi <peter.ujfalusi@linux.intel.com>: Hi, The previous, v2 of this series was sent by Daniel Baluta: https://lore.kernel.org/alsa-devel/20210917143659.401102-1-daniel.baluta@oss.nxp.com/ We have agreed that it might be better that someone from Intel is going to take it from here as we already have the infrastructure up to test and verify the dynamic pipelines support. Changes since v2 (sent by Daniel Baluta): - patch 10: Fix NULL point dereference in hda_dai_update_config() - I have kept Daniel's SoB for the series. Changes since v1: - Signed-off-by tag added by Daniel This series implements initial support for dynamic pipelines to setup/teardown pipeline as needed when a PCM is open/closed. Initially dynamic pipelines are only supported with single core setup which will be expanded with a follow-up series. Review with SOF community at https://github.com/thesofproject/linux/pull/2794 The feature has been merged on 1st of April to sof-dev, all issues found since has been fixed and squashed to this upstream series. Regards, Peter --- Ranjani Sridharan (12): ASoC: topology: change the complete op in snd_soc_tplg_ops to return int ASoC: SOF: control: Add access field in struct snd_sof_control ASoC: SOF: topology: Add new token for dynamic pipeline ASoC: SOF: sof-audio: add helpers for widgets, kcontrols and dai config set up AsoC: dapm: export a couple of functions ASoC: SOF: Add new fields to snd_sof_route ASoC: SOF: restore kcontrols for widget during set up ASoC: SOF: Don't set up widgets during topology parsing ASoC: SOF: Introduce widget use_count ASoC: SOF: Intel: hda: make sure DAI widget is set up before IPC ASoC: SOF: Add support for dynamic pipelines ASoC: SOF: topology: Add kernel parameter for topology verification include/sound/soc-dpcm.h \| 1 + include/sound/soc-topology.h \| 2 +- include/uapi/sound/sof/tokens.h \| 1 + sound/soc/intel/skylake/skl-topology.c \| 6 +- sound/soc/soc-dapm.c \| 2 + sound/soc/soc-pcm.c \| 4 +- sound/soc/soc-topology.c \| 10 +- sound/soc/sof/intel/hda-dai.c \| 174 +++--- sound/soc/sof/intel/hda.c \| 177 ++++-- sound/soc/sof/intel/hda.h \| 5 + sound/soc/sof/ipc.c \| 22 + sound/soc/sof/pcm.c \| 58 +- sound/soc/sof/pm.c \| 4 +- sound/soc/sof/sof-audio.c \| 709 +++++++++++++++++++------ sound/soc/sof/sof-audio.h \| 32 +- sound/soc/sof/sof-priv.h \| 1 + sound/soc/sof/topology.c \| 362 +++++-------- 17 files changed, 1032 insertions(+), 538 deletions(-) -- 2.33.0
2021-10-01	net: mscc: ocelot: fix VCAP filters remaining active after being deleted	Vladimir Oltean
	When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a given filter->id.cookie. This filter is added to the block->rules list. However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id() which passes the cookie as argument, the filter is never found by filter->id.cookie when searching through the block->rules list. This is unsurprising, since the filter->id.cookie is an unsigned long, but the cookie argument provided to ocelot_vcap_block_find_filter_by_id() is a signed int, and the comparison fails. Fixes: 50c6cc5b9283 ("net: mscc: ocelot: store a namespaced VCAP filter ID") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01	audit: add support for the openat2 syscall	Richard Guy Briggs
	The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9 ("open: introduce openat2(2) syscall"). Add the openat2(2) syscall to the audit syscall classifier. Link: https://github.com/linux-audit/audit-kernel/issues/67 Link: https://lore.kernel.org/r/f5f1a4d8699613f8c02ce762807228c841c2e26f.1621363275.git.rgb@redhat.com Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> [PM: merge fuzz due to previous header rename, commit line wraps] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01	audit: replace magic audit syscall class numbers with macros	Richard Guy Briggs
	Replace audit syscall class magic numbers with macros. This required putting the macros into new header file include/linux/audit_arch.h since the syscall macros were included for both 64 bit and 32 bit in any compat code, causing redefinition warnings. Link: https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git.rgb@redhat.com Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> [PM: renamed header to audit_arch.h after consulting with Richard] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01	firmware: xilinx: Add OSPI Mux selection support	Sai Krishna Potthuri
	Add OSPI Mux selection API support to select the AXI interface to OSPI. Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Link: https://lore.kernel.org/r/1632478031-12242-2-git-send-email-lakshmi.sai.krishna.potthuri@xilinx.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-01	AsoC: dapm: export a couple of functions	Ranjani Sridharan
	Export a couple of DAPM functions that can be used by ASoC drivers to determine connected widgets when a PCM is started. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20210927120517.20505-6-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-01	ASoC: SOF: topology: Add new token for dynamic pipeline	Ranjani Sridharan
	Today, we set up all widgets required for all PCM streams at the time of topology parsing even if they are not used. An optimization would be to only set up the widgets required for currently active PCM streams. This would give the FW the opportunity to power gate unused memory blocks, thereby saving power. For dynamic pipelines, the widgets in the connected DAPM path for each PCM will need to be set up at runtime. This patch introduces a new token, DYNAMIC_PIPELINE, for scheduler type widgets that indicate whether a pipeline should be set up statically during topology load or at runtime when the PCM is opened. Introduce a new field called dynamic_pipeline_widget in struct snd_sof_widget to save the value of the parsed token. The token is set only for the pipeline (scheduler type) widget and must be propagated to all widgets in the same pipeline during topology load. Introduce another field called pipe_widget in struct snd_sof_widget that saves the pointer to the scheduler widget with the same pipeline ID as that of the widget. This field is populated when the pipeline completion callback is invoked during topology loading. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20210927120517.20505-4-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-01	ASoC: topology: change the complete op in snd_soc_tplg_ops to return int	Ranjani Sridharan
	In the SOF driver, the operations performed in the complete callback can fail and therefore topology loading should return an error in such cases. So, change the signature of the complete op in struct snd_soc_tplg_ops to return an int to return the error. Also, amend the complete callback functions in the SOF driver and the SKL driver to conform with the new signature. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20210927120517.20505-2-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-01	net: add kerneldoc comment for sk_peer_lock	Eric Dumazet
	Fixes following warning: include/net/sock.h:533: warning: Function parameter or member 'sk_peer_lock' not described in 'sock' Fixes: 35306eb23814 ("af_unix: fix races in sk_peer_pid and sk_peer_cred accesses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20211001164622.58520-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01	ASoC: soc-component: Remove conditional definition of debugfs data members	Simon Trimmer
	This simplification allows the use of the standard kernel pattern of static inline dummy functions for debugfs code. Most systems will only have a small number of snd_soc_components so the memory impact is minimal. Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com> Suggested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210930142116.528878-1-simont@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-01	drm: cleanup: remove drm_modeset_(un)lock_all()	Fernando Ramos
	Functions drm_modeset_lock_all() and drm_modeset_unlock_all() are no longer used anywhere and can be removed. Signed-off-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210924064324.229457-17-greenfoo@u92.eu
2021-10-01	drm/lease: allow empty leases	Simon Ser
	This can be used to create a separate DRM file description, thus creating a new GEM handle namespace. My use-case is wlroots. The library splits responsibilities between separate components: the GBM allocator creates buffers, the GLES2 renderer uses EGL to import them and render to them, the DRM backend imports the buffers and displays them. wlroots has a modular architecture, and any of these components can be swapped and replaced with something else. For instance, the pipeline can be set up so that the DRM dumb buffer allocator is used instead of GBM and the Pixman renderer is used instead of GLES2. Library users can also replace any of these components with their own custom one. DMA-BUFs are used to pass buffer references across components. We could use GEM handles instead, but this would result in pain if multiple GPUs are in use: wlroots copies buffers across GPUs as needed. Importing a GEM handle created on one GPU into a completely different GPU will blow up (fail at best, mix unrelated buffers otherwise). Everything is fine if all components use Mesa. However, this isn't always desirable. For instance when running with DRM dumb buffers and the Pixman software renderer it's unfortunate to depend on GBM in the DRM backend just to turn DMA-BUFs into FB IDs. GBM loads Mesa drivers to perform an action which has nothing driver-specific. Additionally, drivers will fail the import if the 3D engine can't use the imported buffer, for instance amdgpu will refuse to import DRM dumb buffers [1]. We might also want to be running with a Vulkan renderer and a Vulkan allocator in the future, and GBM wouldn't be welcome in this setup. To address this, GBM can be side-stepped in the DRM backend, and can be replaced with drmPrimeFDToHandle calls. However because of GEM handle reference counting issues, care must be taken to avoid double-closing the same GEM handle. In particular, it's not possible to share a DRM FD with GBM or EGL and perform some drmPrimeFDToHandle calls manually. So wlroots needs to re-open the DRM FD to create a new GEM handle namespace. However there's no guarantee that the file-system permissions will be set up so that the primary FD can be opened by the compsoitor. On modern systems seatd or logind is a privileged process responsible for doing this, and other processes aren't expected to do it. For historical reasons systemd still allows physically logged in users to open primary DRM nodes, but this doesn't work on non-systemd setups and it's desirable to lock them down at some point. Some might suggest to open the render node instead of re-opening the primary node. However some systems don't have a render node at all (e.g. no GPU, or a split render/display SoC). Solutions to this issue have been discussed in [2]. One solution would be to open the magic /proc/self/fd/<fd> file, but it's a Linux-specific hack (wlroots supports BSDs too). Another solution is to add support for re-opening a DRM primary node to seatd/logind, but they don't support it now and really haven't been designed for this (logind would need to grow a completely new API, because it assumes unique dev_t IDs). Also this seems like pushing down a kernel limitation to user-space a bit too hard. Another solution is to allow creating empty DRM leases. The lessee FD would have its own GEM handle namespace, so wouldn't conflict wth GBM/EGL. It would have the master bit set, but would be able to manage zero resources. wlroots doesn't intend to share this FD with any other process. All in all IMHO that seems like a pretty reasonable solution to the issue at hand. Note, I've discussed with Jonas Ådahl and Mutter plans to adopt a similar design in the future. Example usage in wlroots is available at [3]. IGT test available at [4]. [1]: https://github.com/swaywm/wlroots/issues/2916 [2]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110 [3]: https://github.com/swaywm/wlroots/pull/3158 [4]: https://patchwork.freedesktop.org/series/94323/ Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Daniel Stone <daniels@collabora.com> Cc: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Cc: Michel Dänzer <michel@daenzer.net> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Keith Packard <keithp@keithp.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Dave Airlie <airlied@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210903130000.1590-2-contact@emersion.fr
2021-10-01	devlink: report maximum number of snapshots with regions	Jacob Keller
	Each region has an independently configurable number of maximum snapshots. This information is not reported to userspace, making it not very discoverable. Fix this by adding a new DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this maximum. Ex: $devlink region pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10 pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10 This information enables users to understand why a new region command may fail due to having too many existing snapshots. Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-01	sched: Always inline is_percpu_thread()	Peter Zijlstra
	vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org
2021-10-01	perf/core: fix userpage->time_enabled of inactive events	Song Liu
	Users of rdpmc rely on the mmapped user page to calculate accurate time_enabled. Currently, userpage->time_enabled is only updated when the event is added to the pmu. As a result, inactive event (due to counter multiplexing) does not have accurate userpage->time_enabled. This can be reproduced with something like: /* open 20 task perf_event "cycles", to create multiplexing / fd = perf_event_open(); / open task perf_event "cycles" / userpage = mmap(fd); / use mmap and rdmpc / while (true) { time_enabled_mmap = xxx; / use logic in perf_event_mmap_page */ time_enabled_read = read(fd).time_enabled; if (time_enabled_mmap > time_enabled_read) BUG(); } Fix this by updating userpage for inactive events in merge_sched_in. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-and-tested-by: Lucian Grijincu <lucian@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
2021-10-01	sched: Make cond_resched_lock() variants RT aware	Thomas Gleixner
	The __might_resched() checks in the cond_resched_lock() variants use PREEMPT_LOCK_OFFSET for preempt count offset checking which takes the preemption disable by the spin_lock() which is still held at that point into account. On PREEMPT_RT enabled kernels spin/rw_lock held sections stay preemptible which means PREEMPT_LOCK_OFFSET is 0, but that still triggers the __might_resched() check because that takes RCU read side nesting into account. On RT enabled kernels spin/read/write_lock() issue rcu_read_lock() to resemble the !RT semantics, which means in cond_resched_lock() the might resched check will see preempt_count() == 0 and rcu_preempt_depth() == 1. Introduce PREEMPT_LOCK_SCHED_OFFSET for those might resched checks and map them depending on CONFIG_PREEMPT_RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210923165358.305969211@linutronix.de
2021-10-01	sched: Make RCU nest depth distinct in __might_resched()	Thomas Gleixner
	For !RT kernels RCU nest depth in __might_resched() is always expected to be 0, but on RT kernels it can be non zero while the preempt count is expected to be always 0. Instead of playing magic games in interpreting the 'preempt_offset' argument, rename it to 'offsets' and use the lower 8 bits for the expected preempt count, allow to hand in the expected RCU nest depth in the upper bits and adopt the __might_resched() code and related checks and printks. The affected call sites are updated in subsequent steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210923165358.243232823@linutronix.de
2021-10-01	sched: Remove preempt_offset argument from __might_sleep()	Thomas Gleixner
	All callers hand in 0 and never will hand in anything else. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210923165358.054321586@linutronix.de
2021-10-01	sched: Make cond_resched_*lock() variants consistent vs. might_sleep()	Thomas Gleixner
	Commit 3427445afd26 ("sched: Exclude cond_resched() from nested sleep test") removed the task state check of __might_sleep() for cond_resched_lock() because cond_resched_lock() is not a voluntary scheduling point which blocks. It's a preemption point which requires the lock holder to release the spin lock. The same rationale applies to cond_resched_rwlock_read/write(), but those were not touched. Make it consistent and use the non-state checking __might_resched() there as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210923165357.991262778@linutronix.de
2021-10-01	sched: Clean up the might_sleep() underscore zoo	Thomas Gleixner
	__might_sleep() vs. ___might_sleep() is hard to distinguish. Aside of that the three underscore variant is exposed to provide a checkpoint for rescheduling points which are distinct from blocking points. They are semantically a preemption point which means that scheduling is state preserving. A real blocking operation, e.g. mutex_lock(), wait*(), which cannot preserve a task state which is not equal to RUNNING. While technically blocking on a "sleeping" spinlock in RT enabled kernels falls into the voluntary scheduling category because it has to wait until the contended spin/rw lock becomes available, the RT lock substitution code can semantically be mapped to a voluntary preemption because the RT lock substitution code and the scheduler are providing mechanisms to preserve the task state and to take regular non-lock related wakeups into account. Rename ___might_sleep() to __might_resched() to make the distinction of these functions clear. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210923165357.928693482@linutronix.de
2021-10-01	kvm: use kvfree() in kvm_arch_free_vm()	Juergen Gross
	By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can use the common variant. This can be accomplished by adding another macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now. Further simplification can be achieved by adding __kvm_arch_free_vm() doing the common part. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-Id: <20210903130808.30142-5-jgross@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01	dma-buf: fix and rework dma_buf_poll v7	Christian König
	Daniel pointed me towards this function and there are multiple obvious problems in the implementation. First of all the retry loop is not working as intended. In general the retry makes only sense if you grab the reference first and then check the sequence values. Then we should always also wait for the exclusive fence. It's also good practice to keep the reference around when installing callbacks to fences you don't own. And last the whole implementation was unnecessary complex and rather hard to understand which could lead to probably unexpected behavior of the IOCTL. Fix all this by reworking the implementation from scratch. Dropping the whole RCU approach and taking the lock instead. Only mildly tested and needs a thoughtful review of the code. Pushing through drm-misc-next to avoid merge conflicts and give the code another round of testing. v2: fix the reference counting as well v3: keep the excl fence handling as is for stable v4: back to testing all fences, drop RCU v5: handle in and out separately v6: add missing clear of events v7: change coding style as suggested by Michel, drop unused variables Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Michel Dänzer <mdaenzer@redhat.com> CC: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210720131110.88512-1-christian.koenig@amd.com
2021-09-30	x86/kprobes: Fixup return address in generic trampoline handler	Masami Hiramatsu
	In x86, the fake return address on the stack saved by __kretprobe_trampoline() will be replaced with the real return address after returning from trampoline_handler(). Before fixing the return address, the real return address can be found in the 'current->kretprobe_instances'. However, since there is a window between updating the 'current->kretprobe_instances' and fixing the address on the stack, if an interrupt happens at that timing and the interrupt handler does stacktrace, it may fail to unwind because it can not get the correct return address from 'current->kretprobe_instances'. This will eliminate that window by fixing the return address right before updating 'current->kretprobe_instances'. Link: https://lkml.kernel.org/r/163163057094.489837.9044470370440745866.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	objtool: Add frame-pointer-specific function ignore	Josh Poimboeuf
	Add a CONFIG_FRAME_POINTER-specific version of STACK_FRAME_NON_STANDARD() for the case where a function is intentionally missing frame pointer setup, but otherwise needs objtool/ORC coverage when frame pointers are disabled. Link: https://lkml.kernel.org/r/163163047364.489837.17377799909553689661.stgit@devnote2 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: Add kretprobe_find_ret_addr() for searching return address	Masami Hiramatsu
	Introduce kretprobe_find_ret_addr() and is_kretprobe_trampoline(). These APIs will be used by the ORC stack unwinder and ftrace, so that they can check whether the given address points kretprobe trampoline code and query the correct return address in that case. Link: https://lkml.kernel.org/r/163163046461.489837.1044778356430293962.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: treewide: Make it harder to refer kretprobe_trampoline directly	Masami Hiramatsu
	Since now there is kretprobe_trampoline_addr() for referring the address of kretprobe trampoline code, we don't need to access kretprobe_trampoline directly. Make it harder to refer by renaming it to __kretprobe_trampoline(). Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2 Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()	Masami Hiramatsu
	The __kretprobe_trampoline_handler() callback, called from low level arch kprobes methods, has the 'trampoline_address' parameter, which is entirely superfluous as it basically just replicates: dereference_kernel_function_descriptor(kretprobe_trampoline) In fact we had bugs in arch code where it wasn't replicated correctly. So remove this superfluous parameter and use kretprobe_trampoline_addr() instead. Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: treewide: Replace arch_deref_entry_point() with ↵	Masami Hiramatsu
	dereference_symbol_descriptor() ~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function: 3d7e33825d87: ("jprobes: make jprobes a little safer for users") But this is just open-coded dereference_symbol_descriptor() in essence, and its obscure nature was causing bugs. Just use the real thing and remove arch_deref_entry_point(). Link: https://lkml.kernel.org/r/163163043630.489837.7924988885652708696.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: Use bool type for functions which returns boolean value	Masami Hiramatsu
	Use the 'bool' type instead of 'int' for the functions which returns a boolean value, because this makes clear that those functions don't return any error code. Link: https://lkml.kernel.org/r/163163041649.489837.17311187321419747536.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: treewide: Use 'kprobe_opcode_t *' for the code address in ↵	Masami Hiramatsu
	get_optimized_kprobe() Since get_optimized_kprobe() is only used inside kprobes, it doesn't need to use 'unsigned long' type for 'addr' parameter. Make it use 'kprobe_opcode_t ' for the 'addr' parameter and subsequent call of arch_within_optimized_kprobe() also should use 'kprobe_opcode_t '. Note that MAX_OPTIMIZED_LENGTH and RELATIVEJUMP_SIZE are defined by byte-size, but the size of 'kprobe_opcode_t' depends on the architecture. Therefore, we must be careful when calculating addresses using those macros. Link: https://lkml.kernel.org/r/163163040680.489837.12133032364499833736.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: Use IS_ENABLED() instead of kprobes_built_in()	Masami Hiramatsu
	Use IS_ENABLED(CONFIG_KPROBES) instead of kprobes_built_in(). This inline function is introduced only for avoiding #ifdef. But since now we have IS_ENABLED(), it is no longer needed. Link: https://lkml.kernel.org/r/163163038581.489837.2805250706507372658.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: Fix coding style issues	Masami Hiramatsu
	Fix coding style issues reported by checkpatch.pl and update comments to quote variable names and add "()" to function name. One TODO comment in __disarm_kprobe() is removed because it has been done by following commit. Link: https://lkml.kernel.org/r/163163037468.489837.4282347782492003960.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobes: Make arch_check_ftrace_location static	Punit Agrawal
	arch_check_ftrace_location() was introduced as a weak function in commit f7f242ff004499 ("kprobes: introduce weak arch_check_ftrace_location() helper function") to allow architectures to handle kprobes call site on their own. Recently, the only architecture (csky) to implement arch_check_ftrace_location() was migrated to using the common version. As a result, further cleanup the code to drop the weak attribute and rename the function to remove the architecture specific implementation. Link: https://lkml.kernel.org/r/163163035673.489837.2367816318195254104.stgit@devnote2 Signed-off-by: Punit Agrawal <punitagrawal@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	kprobe: Simplify prepare_kprobe() by dropping redundant version	Punit Agrawal
	The function prepare_kprobe() is called during kprobe registration and is responsible for ensuring any architecture related preparation for the kprobe is done before returning. One of two versions of prepare_kprobe() is chosen depending on the availability of KPROBE_ON_FTRACE in the kernel configuration. Simplify the code by dropping the version when KPROBE_ON_FTRACE is not selected - instead relying on kprobe_ftrace() to return false when KPROBE_ON_FTRACE is not set. No functional change. Link: https://lkml.kernel.org/r/163163033696.489837.9264661820279300788.stgit@devnote2 Signed-off-by: Punit Agrawal <punitagrawal@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	drivers/net/phy/bcm7xxx.c d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations") f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c b193e15ac69d ("net: prevent user from passing illegal stab size") 69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30	Merge tag 'net-5.15-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
2021-09-30	bpf, xdp, docs: Correct some English grammar and spelling	Kev Jackson
	Header DOC on include/net/xdp.h contained a few English grammer and spelling errors. Signed-off-by: Kev Jackson <foamdino@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/YVVaWmKqA8l9Tm4J@kev-VirtualBox
2021-09-30	drm/dp: Add Additional DP2 Headers	Fangzhi Zuo
	Include FEC, DSC, Link Training related headers. Change since v2 - Align with the spec for DP_DSC_SUPPORT_AND_DSC_DECODER_COUNT Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210927192324.5428-1-Jerry.Zuo@amd.com
2021-09-30	Merge branch 'v5.16/vfio/hch-cleanup-vfio-iommu_group-creation-v6' into ↵	Alex Williamson
	v5.16/vfio/next
2021-09-30	vfio: remove the unused mdev iommu hook	Christoph Hellwig
	The iommu_device field in struct mdev_device has never been used since it was added more than 2 years ago. This is a manual revert of commit 7bd50f0cd2 ("vfio/type1: Add domain at(de)taching group helpers"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-11-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30	vfio: move the vfio_iommu_driver_ops interface out of <linux/vfio.h>	Christoph Hellwig
	Create a new private drivers/vfio/vfio.h header for the interface between the VFIO core and the iommu drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30	vfio: remove unused method from vfio_iommu_driver_ops	Christoph Hellwig
	The read, write and mmap methods are never implemented, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30	vfio: simplify iommu group allocation for mediated devices	Christoph Hellwig
	Reuse the logic in vfio_noiommu_group_alloc to allocate a fake single-device iommu group for mediated devices by factoring out a common function, and replacing the noiommu boolean field in struct vfio_group with an enum to distinguish the three different kinds of groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>