linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-13	agpgart.h: do not include <stdlib.h> from exported header	Masahiro Yamada
	Commit 35d0f1d54ecd ("include/uapi/linux/agpgart.h: include stdlib.h in userspace") included <stdlib.h> to fix the unknown size_t error, but I do not think it is the right fix. This header already uses __kernel_size_t a few lines below. Replace the remaining size_t, and stop including <stdlib.h>. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-05-13	extcon: Fix extcon_get_extcon_dev() error handling	Dan Carpenter
	The extcon_get_extcon_dev() function returns error pointers on error, NULL when it's a -EPROBE_DEFER defer situation, and ERR_PTR(-ENODEV) when the CONFIG_EXTCON option is disabled. This is very complicated for the callers to handle and a number of them had bugs that would lead to an Oops. In real life, there are two things which prevented crashes. First, error pointers would only be returned if there was bug in the caller where they passed a NULL "extcon_name" and none of them do that. Second, only two out of the eight drivers will build when CONFIG_EXTCON is disabled. The normal way to write this would be to return -EPROBE_DEFER directly when appropriate and return NULL when CONFIG_EXTCON is disabled. Then the error handling is simple and just looks like: dev->edev = extcon_get_extcon_dev(acpi_dev_name(adev)); if (IS_ERR(dev->edev)) return PTR_ERR(dev->edev); For the two drivers which can build with CONFIG_EXTCON disabled, then extcon_get_extcon_dev() will now return NULL which is not treated as an error and the probe will continue successfully. Those two drivers are "typec_fusb302" and "max8997-battery". In the original code, the typec_fusb302 driver had an 800ms hang in tcpm_get_current_limit() but now that function is a no-op. For the max8997-battery driver everything should continue working as is. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-05-12	ELF, uapi: fixup ELF_ST_TYPE definition	Alexey Dobriyan
	This is very theoretical compile failure: ELF_ST_TYPE(st_info = A) Cast will bind first and st_info will stop being lvalue: error: lvalue required as left operand of assignment Given that the only use of this macro is ELF_ST_TYPE(sym->st_info) where st_info is "unsigned char" I've decided to remove cast especially given that companion macro ELF_ST_BIND doesn't use cast. Link: https://lkml.kernel.org/r/Ymv7G1BeX4kt3obz@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	Merge tag 'drm/tegra/for-5.19-rc1' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/tegra into drm-next drm/tegra: Changes for v5.19-rc1 Only a few fixes this time, and some debuggability improvements. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220506164004.3922226-1-thierry.reding@gmail.com
2022-05-12	net: inet: Retire port only listening_hash	Martin KaFai Lau
	The listen sk is currently stored in two hash tables, listening_hash (hashed by port) and lhash2 (hashed by port and address). After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the TCP-SYN lookup fast path does not use listening_hash. The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2") also moved the seq_file (/proc/net/tcp) iteration usage from listening_hash to lhash2. There are still a few listening_hash usages left. One of them is inet_reuseport_add_sock() which uses the listening_hash to search a listen sk during the listen() system call. This turns out to be very slow on use cases that listen on many different VIPs at a popular port (e.g. 443). [ On top of the slowness in adding to the tail in the IPv6 case ]. The latter patch has a selftest to demonstrate this case. This patch takes this chance to move all remaining listening_hash usages to lhash2 and then retire listening_hash. Since most changes need to be done together, it is hard to cut the listening_hash to lhash2 switch into small patches. The changes in this patch is highlighted here for the review purpose. 1. Because of the listening_hash removal, lhash2 can use the sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node. This will also keep the sk_unhashed() check to work as is after stop adding sk to listening_hash. The union is removed from inet_listen_hashbucket because only nulls_head is needed. 2. icsk->icsk_listen_portaddr_node and its helpers are removed. 3. The current lhash2 users needs to iterate with sk_nulls_node instead of icsk_listen_portaddr_node. One case is in the inet[6]_lhash2_lookup(). Another case is the seq_file iterator in tcp_ipv4.c. One thing to note is sk_nulls_next() is needed because the old inet_lhash2_for_each_icsk_continue() does a "next" first before iterating. 4. Move the remaining listening_hash usage to lhash2 inet_reuseport_add_sock() which this series is trying to improve. inet_diag.c and mptcp_diag.c are the final two remaining use cases and is moved to lhash2 now also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: inet: Remove count from inet_listen_hashbucket	Martin KaFai Lau
	After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the count is no longer used. This patch removes it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: move ocelot_port_private :: chip_port to ocelot_port :: index	Vladimir Oltean
	Currently the ocelot switch lib is unaware of the index of a struct ocelot_port, since that is kept in the encapsulating structures of outer drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port). With the upcoming increase in complexity associated with assigning DSA tag_8021q CPU ports to certain user ports, it becomes necessary for the switch lib to be able to retrieve the index of a certain ocelot_port. Therefore, introduce a new u8 to ocelot_port (same size as the chip_port used by the ocelot switchdev driver) and rework the existing code to populate and use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: minimize holes in struct ocelot_port	Vladimir Oltean
	Reorder members of struct ocelot_port to eliminate holes and reduce structure size. Pahole says: Before: struct ocelot_port { struct ocelot * ocelot; /* 0 8 / struct regmap target; /* 8 8 / bool vlan_aware; / 16 1 / / XXX 7 bytes hole, try to pack / const struct ocelot_bridge_vlan pvid_vlan; /* 24 8 / unsigned int ptp_skbs_in_flight; / 32 4 / u8 ptp_cmd; / 36 1 / / XXX 3 bytes hole, try to pack / struct sk_buff_head tx_skbs; / 40 96 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / u8 ts_id; / 136 1 / / XXX 3 bytes hole, try to pack / phy_interface_t phy_mode; / 140 4 / bool is_dsa_8021q_cpu; / 144 1 / bool learn_ena; / 145 1 / / XXX 6 bytes hole, try to pack / struct net_device bond; /* 152 8 / bool lag_tx_active; / 160 1 / / XXX 1 byte hole, try to pack / u16 mrp_ring_id; / 162 2 / / XXX 4 bytes hole, try to pack / struct net_device bridge; /* 168 8 / int bridge_num; / 176 4 / u8 stp_state; / 180 1 / / XXX 3 bytes hole, try to pack / int speed; / 184 4 / / size: 192, cachelines: 3, members: 18 / / sum members: 161, holes: 7, sum holes: 27 / / padding: 4 / }; After: struct ocelot_port { struct ocelot ocelot; /* 0 8 / struct regmap target; /* 8 8 / struct net_device bond; /* 16 8 / struct net_device bridge; /* 24 8 / const struct ocelot_bridge_vlan pvid_vlan; /* 32 8 / phy_interface_t phy_mode; / 40 4 / unsigned int ptp_skbs_in_flight; / 44 4 / struct sk_buff_head tx_skbs; / 48 96 / / --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- / u16 mrp_ring_id; / 144 2 / u8 ptp_cmd; / 146 1 / u8 ts_id; / 147 1 / u8 stp_state; / 148 1 / bool vlan_aware; / 149 1 / bool is_dsa_8021q_cpu; / 150 1 / bool learn_ena; / 151 1 / bool lag_tx_active; / 152 1 / / XXX 3 bytes hole, try to pack / int bridge_num; / 156 4 / int speed; / 160 4 / / size: 168, cachelines: 3, members: 18 / / sum members: 161, holes: 1, sum holes: 3 / / padding: 4 / / last cacheline: 40 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: delete ocelot_port :: xmit_template	Vladimir Oltean
	This is no longer used since commit 7c4bb540e917 ("net: dsa: tag_ocelot: create separate tagger for Seville"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: remove port argument from ->change_tag_protocol()	Vladimir Oltean
	DSA has not supported (and probably will not support in the future either) independent tagging protocols per CPU port. Different switch drivers have different requirements, some may need to replicate some settings for each CPU port, some may need to apply some settings on a single CPU port, while some may have to configure some global settings and then some per-CPU-port settings. In any case, the current model where DSA calls ->change_tag_protocol for each CPU port turns out to be impractical for drivers where there are global things to be done. For example, felix calls dsa_tag_8021q_register(), which makes no sense per CPU port, so it suppresses the second call. Let drivers deal with replication towards all CPU ports, and remove the CPU port argument from the function prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: manage host flooding using a specific driver callback	Vladimir Oltean
	At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: introduce the dsa_cpu_ports() helper	Vladimir Oltean
	Similar to dsa_user_ports() which retrieves a port mask of all user ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU ports of a switch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	skbuff: replace a BUG_ON() with the new DEBUG_NET_WARN_ON_ONCE()	Jakub Kicinski
	Very few drivers actually have Kconfig knobs for adding -DDEBUG. 8 according to a quick grep, while there are 93 users of skb_checksum_none_assert(). Switch to the new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220511172305.1382810-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters	Jaegeuk Kim
	No functional change. - remove checkpoint=disable check for f2fs_write_checkpoint - get sec_freed all the time Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12	Merge tag 'net-5.18-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, and bluetooth. No outstanding fires. Current release - regressions: - eth: atlantic: always deep reset on pm op, fix null-deref Current release - new code bugs: - rds: use maybe_get_net() when acquiring refcount on TCP sockets [refinement of a previous fix] - eth: ocelot: mark traps with a bool instead of guessing type based on list membership Previous releases - regressions: - net: fix skipping features in for_each_netdev_feature() - phy: micrel: fix null-derefs on suspend/resume and probe - bcmgenet: check for Wake-on-LAN interrupt probe deferral Previous releases - always broken: - ipv4: drop dst in multicast routing path, prevent leaks - ping: fix address binding wrt vrf - net: fix wrong network header length when BPF protocol translation is used on skbs with a fraglist - bluetooth: fix the creation of hdev->name - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing - wifi: ath11k: reduce the wait time of 11d scan and hw scan while adding an interface - mac80211: fix rx reordering with non explicit / psmp ack policy - mac80211: reset MBSSID parameters upon connection - nl80211: fix races in nl80211_set_tx_bitrate_mask() - tls: fix context leak on tls_device_down - sched: act_pedit: really ensure the skb is writable - batman-adv: don't skb_split skbuffs with frag_list - eth: ocelot: fix various issues with TC actions (null-deref; bad stats; ineffective drops; ineffective filter removal)" * tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) tls: Fix context leak on tls_device_down net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() mlxsw: Avoid warning during ip6gre device removal net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral net: ethernet: mediatek: ppe: fix wrong size passed to memset() Bluetooth: Fix the creation of hdev->name i40e: i40e_main: fix a missing check on list iterator net/sched: act_pedit: really ensure the skb is writable s390/lcs: fix variable dereferenced before check s390/ctcm: fix potential memory leak s390/ctcm: fix variable dereferenced before check net: atlantic: verify hw_head_ lies within TX buffer ring net: atlantic: add check for MAX_SKB_FRAGS net: atlantic: reduce scope of is_rsc_complete net: atlantic: fix "frag[0] not initialized" net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() net: phy: micrel: Fix incorrect variable type in micrel decnet: Use container_of() for struct dn_neigh casts ...
2022-05-12	module.h: simplify MODULE_IMPORT_NS	Greg Kroah-Hartman
	In commit ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") I fixed up the MODULE_IMPORT_NS() macro to allow defined strings to work with it. Unfortunatly I did it in a two-stage process, when it could just be done with the __stringify() macro as pointed out by Masahiro Yamada. Clean this up to only be one macro instead of two steps to achieve the same end result. Fixes: ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") Reported-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Matthias Maennich <maennich@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12	kunit: Rework kunit_resource allocation policy	David Gow
	KUnit's test-managed resources can be created in two ways: - Using the kunit_add_resource() family of functions, which accept a struct kunit_resource pointer, typically allocated statically or on the stack during the test. - Using the kunit_alloc_resource() family of functions, which allocate a struct kunit_resource using kzalloc() behind the scenes. Both of these families of functions accept a 'free' function to be called when the resource is finally disposed of. At present, KUnit will kfree() the resource if this 'free' function is specified, and will not if it is NULL. However, this can lead kunit_alloc_resource() to leak memory (if no 'free' function is passed in), or kunit_add_resource() to incorrectly kfree() memory which was allocated by some other means (on the stack, as part of a larger allocation, etc), if a 'free' function is provided. Instead, always kfree() if the resource was allocated with kunit_alloc_resource(), and never kfree() if it was passed into kunit_add_resource() by the user. (If the user of kunit_add_resource() wishes the resource be kfree()ed, they can call kfree() on the resource from within the 'free' function. This is implemented by adding a 'should_free' member to struct kunit_resource and setting it appropriately. To facilitate this, the various resource add/alloc functions have been refactored somewhat, making them all call a __kunit_add_resource() helper after setting the 'should_free' member appropriately. In the process, all other functions have been made static inline functions. Signed-off-by: David Gow <davidgow@google.com> Tested-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-05-12	f2fs: change the current atomic write way	Daeho Jeong
	Current atomic write has three major issues like below. - keeps the updates in non-reclaimable memory space and they are even hard to be migrated, which is not good for contiguous memory allocation. - disk spaces used for atomic files cannot be garbage collected, so this makes it difficult for the filesystem to be defragmented. - If atomic write operations hit the threshold of either memory usage or garbage collection failure count, All the atomic write operations will fail immediately. To resolve the issues, I will keep a COW inode internally for all the updates to be flushed from memory, when we need to flush them out in a situation like high memory pressure. These COW inodes will be tagged as orphan inodes to be reclaimed in case of sudden power-cut or system failure during atomic writes. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12	ipmi: Add an intializer for ipmi_recv_msg struct	Corey Minyard
	Don't hand-initialize the struct here, create a macro to initialize it so new fields added don't get forgotten in places. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12	ipmi: Add an intializer for ipmi_smi_msg struct	Corey Minyard
	There was a "type" element added to this structure, but some static values were missed. The default value will be zero, which is correct, but create an initializer for the type and initialize the type properly in the initializer to avoid future issues. Reported-by: Joe Wiese <jwiese@rackspace.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12	spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller	Siddh Raman Pant
	This fixes the corresponding warnings during building the docs. Signed-off-by: Siddh Raman Pant <siddhpant.gh@gmail.com> Link: https://lore.kernel.org/r/4e6187a4-d0f8-4750-e407-e09cc1c91789@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12	trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations	Tony Luck
	Add tracing support which may be useful for debugging systems that fail to complete In Field Scan tests. Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12	stop_machine: Add stop_core_cpuslocked() for per-core operations	Peter Zijlstra
	Hardware core level testing features require near simultaneous execution of WRMSR instructions on all threads of a core to initiate a test. Provide a customized cut down version of stop_machine_cpuslocked() that just operates on the threads of a single core. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12	block: Fix the bio.bi_opf comment	Bart Van Assche
	Commit ef295ecf090d modified the Linux kernel such that the bottom bits of the bi_opf member contain the operation instead of the topmost bits. That commit did not update the comment next to bi_opf. Hence this patch. From commit ef295ecf090d: -#define bio_op(bio) ((bio)->bi_opf >> BIO_OP_SHIFT) +#define bio_op(bio) ((bio)->bi_opf & REQ_OP_MASK) Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: ef295ecf090d ("block: better op and flags encoding") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511235152.1082246-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	block: reorder the REQ_ flags	Christoph Hellwig
	Keep the op-specific flag last so that they are clearly separate from the generic flags. Various recent commits just kept adding new flags at the end. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220512061408.1826595-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	io_uring: fix ordering of args in io_uring_queue_async_work	Dylan Yudaken
	Fix arg ordering in TP_ARGS macro, which fixes the output. Fixes: 502c87d65564c ("io-uring: Make tracepoints consistent.") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	usb: core: hcd: Add support for deferring roothub registration	Kishon Vijay Abraham I
	It has been observed with certain PCIe USB cards (like Inateck connected to AM64 EVM or J7200 EVM) that as soon as the primary roothub is registered, port status change is handled even before xHC is running leading to cold plug USB devices not detected. For such cases, registering both the root hubs along with the second HCD is required. Add support for deferring roothub registration in usb_add_hcd(), so that both primary and secondary roothubs are registered along with the second HCD. This patch has been added and reverted earier as it triggered a race in usb device enumeration. That race is now fixed in 5.16-rc3, and in stable back to 5.4 commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex") commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") CC: stable@vger.kernel.org # 5.4+ Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Chris Chiu <chris.chiu@canonical.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20220510091630.16564-2-kishon@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12	ASoC: SOF: Add header for IPC4 manifest	Ranjani Sridharan
	Add the header for the IPC4 manifest. Co-developed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Rander Wang <rander.wang@intel.com> Co-developed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20220511171648.1622993-4-ranjani.sridharan@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12	fortify: Provide a memcpy trap door for sharp corners	Kees Cook
	As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-11	bpf: add bpf_map_lookup_percpu_elem for percpu map	Feng Zhou
	Add new ebpf helpers bpf_map_lookup_percpu_elem. The implementation method is relatively simple, refer to the implementation method of map_lookup_elem of percpu map, increase the parameters of cpu, and obtain it according to the specified cpu. Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11	Merge tag 'for-net-2022-05-11' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix the creation of hdev->name when index is greater than 9999 * tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix the creation of hdev->name ==================== Link: https://lore.kernel.org/r/20220512002901.823647-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Merge tag 'wireless-2022-05-11' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v5.18 Second set of fixes for v5.18 and hopefully the last one. We have a new iwlwifi maintainer, a fix to rfkill ioctl interface and important fixes to both stack and two drivers. * tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition nl80211: fix locking in nl80211_set_tx_bitrate_mask() mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection mac80211_hwsim: fix RCU protected chanctx access mailmap: update Kalle Valo's email mac80211: Reset MBSSID parameters upon connection cfg80211: retrieve S1G operating channel number nl80211: validate S1G channel width mac80211: fix rx reordering with non explicit / psmp ack policy ath11k: reduce the wait time of 11d scan and hw scan while add interface MAINTAINERS: update iwlwifi driver maintainer iwlwifi: iwl-dbg: Use del_timer_sync() before freeing ==================== Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Bluetooth: Fix the creation of hdev->name	Itay Iellin
	Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-05-11	bpf: Fix sparse warning for bpf_kptr_xchg_proto	Kumar Kartikeya Dwivedi
	Kernel Test Robot complained about missing static storage class annotation for bpf_kptr_xchg_proto variable. sparse: symbol 'bpf_kptr_xchg_proto' was not declared. Should it be static? This caused by missing extern definition in the header. Add it to suppress the sparse warning. Fixes: c0a5a21c25f3 ("bpf: Allow storing referenced kptr in map") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220511194654.765705-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-12	sched/tracing: Append prev_state to tp args instead	Delyan Kratunov
	Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) added a new prev_state argument to the sched_switch tracepoint, before the prev task_struct pointer. This reordering of arguments broke BPF programs that use the raw tracepoint (e.g. tp_btf programs). The type of the second argument has changed and existing programs that assume a task_struct* argument (e.g. for bpf_task_storage access) will now fail to verify. If we instead append the new argument to the end, all existing programs would continue to work and can conditionally extract the prev_state argument on supported kernel versions. Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
2022-05-11	net/sched: act_pedit: really ensure the skb is writable	Paolo Abeni
	Currently pedit tries to ensure that the accessed skb offset is writable via skb_unclone(). The action potentially allows touching any skb bytes, so it may end-up modifying shared data. The above causes some sporadic MPTCP self-test failures, due to this code: tc -n $ns2 filter add dev ns2eth$i egress \ protocol ip prio 1000 \ handle 42 fw \ action pedit munge offset 148 u8 invert \ pipe csum tcp \ index 100 The above modifies a data byte outside the skb head and the skb is a cloned one, carrying a TCP output packet. This change addresses the issue by keeping track of a rough over-estimate highest skb offset accessed by the action and ensuring such offset is really writable. Note that this may cause performance regressions in some scenarios, but hopefully pedit is not in the critical path. Fixes: db2c24175d14 ("act_pedit: access skb->data safely") Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state	Peter Zijlstra
	Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else. There's two spots of bother with this: - PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable. - An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour. As such, add additional state to task->jobctl to track this state outside of task->__state. NOTE: this doesn't actually fix anything yet, just adds extra state. --EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-05-11	ptrace: Don't change __state	Eric W. Biederman
	Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up. Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP	Eric W. Biederman
	xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in user_enable_single_step and user_disable_single_step without locking could potentiallly cause problems. So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP that xtensa already had defined but unused. Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users. Cc: stable@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP	Eric W. Biederman
	User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate single stepping is a little confusing and worse changing tsk->ptrace without locking could potentionally cause problems. So use a thread info flag with a better name instead of flag in tsk->ptrace. Remove the definition PT_DTRACE as uml is the last user. Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	signal: Replace __group_send_sig_info with send_signal_locked	Eric W. Biederman
	The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	signal: Rename send_signal send_signal_locked	Eric W. Biederman
	Rename send_signal and __send_signal to send_signal_locked and __send_signal_locked to make send_signal usable outside of signal.c. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	vfio/pci: Remove vfio_device_get_from_dev()	Jason Gunthorpe
	The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio: Remove dead code	Jason Gunthorpe
	Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw()	Jason Gunthorpe
	Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages()	Jason Gunthorpe
	Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio: Make vfio_(un)register_notifier accept a vfio_device	Jason Gunthorpe
	All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	Merge tag 'gvt-next-2022-04-29' into v5.19/vfio/next	Alex Williamson
	Merge GVT-g dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	Merge tag 'mlx5-lm-parallel' of ↵	Alex Williamson
	https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>