summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-12-02ipv4: convert fib_num_tclassid_users to atomic_tEric Dumazet
Before commit faa041a40b9f ("ipv4: Create cleanup helper for fib_nh") changes to net->ipv4.fib_num_tclassid_users were protected by RTNL. After the change, this is no longer the case, as free_fib_info_rcu() runs after rcu grace period, without rtnl being held. Fixes: faa041a40b9f ("ipv4: Create cleanup helper for fib_nh") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-02drm: Return error codes from struct drm_driver.gem_create_objectThomas Zimmermann
GEM helper libraries use struct drm_driver.gem_create_object to let drivers override GEM object allocation. On failure, the call returns NULL. Change the semantics to make the calls return a pointer-encoded error. This aligns the callback with its callers. Fixes the ingenic driver, which already returns an error pointer. Also update the callers to handle the involved types more strictly. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20211130095255.26710-1-tzimmermann@suse.de
2021-12-01net: avoid uninit-value from tcp_conn_requestEric Dumazet
A recent change triggers a KMSAN warning, because request sockets do not initialize @sk_rx_queue_mapping field. Add sk_rx_queue_update() helper to make our intent clear. BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline] BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922 sk_rx_queue_set include/net/sock.h:1922 [inline] tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922 tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528 tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406 tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738 tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline] ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline] __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696 gro_normal_list net/core/dev.c:5850 [inline] napi_complete_done+0x579/0xdd0 net/core/dev.c:6587 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020 napi_poll net/core/dev.c:7087 [inline] net_rx_action+0x824/0x1880 net/core/dev.c:7174 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558 invoke_softirq+0xa4/0x130 kernel/softirq.c:432 __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0x76/0x130 kernel/softirq.c:648 common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x1e/0x40 smap_restore arch/x86/include/asm/smap.h:67 [inline] get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline] __msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63 tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173 tomoyo_path_permission security/tomoyo/file.c:586 [inline] tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777 tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311 security_file_open+0xb1/0x1f0 security/security.c:1635 do_dentry_open+0x4e4/0x1bf0 fs/open.c:809 vfs_open+0xaf/0xe0 fs/open.c:957 do_open fs/namei.c:3426 [inline] path_openat+0x52f1/0x5dd0 fs/namei.c:3559 do_filp_open+0x306/0x760 fs/namei.c:3586 do_sys_openat2+0x263/0x8f0 fs/open.c:1212 do_sys_open fs/open.c:1228 [inline] __do_sys_open fs/open.c:1236 [inline] __se_sys_open fs/open.c:1232 [inline] __x64_sys_open+0x314/0x380 fs/open.c:1232 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409 alloc_pages+0x8a5/0xb80 alloc_slab_page mm/slub.c:1810 [inline] allocate_slab+0x287/0x1c20 mm/slub.c:1947 new_slab mm/slub.c:2010 [inline] ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039 __slab_alloc mm/slub.c:3126 [inline] slab_alloc_node mm/slub.c:3217 [inline] slab_alloc mm/slub.c:3259 [inline] kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264 reqsk_alloc include/net/request_sock.h:91 [inline] inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712 tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852 tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528 tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406 tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738 tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100 ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline] ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609 ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline] __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553 __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605 netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696 gro_normal_list net/core/dev.c:5850 [inline] napi_complete_done+0x579/0xdd0 net/core/dev.c:6587 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557 __napi_poll+0x14e/0xbc0 net/core/dev.c:7020 napi_poll net/core/dev.c:7087 [inline] net_rx_action+0x824/0x1880 net/core/dev.c:7174 __do_softirq+0x1fe/0x7eb kernel/softirq.c:558 Fixes: 342159ee394d ("net: avoid dirtying sk->sk_rx_queue_mapping") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01net: annotate data-races on txq->xmit_lock_ownerEric Dumazet
syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner without annotations. No serious issue there, let's document what is happening there. BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0: __netif_tx_unlock include/linux/netdevice.h:4437 [inline] __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229 dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265 macvlan_queue_xmit drivers/net/macvlan.c:543 [inline] macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567 __netdev_start_xmit include/linux/netdevice.h:4987 [inline] netdev_start_xmit include/linux/netdevice.h:5001 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3590 dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606 sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342 __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817 __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194 dev_queue_xmit+0x13/0x20 net/core/dev.c:4259 neigh_hh_output include/net/neighbour.h:511 [inline] neigh_output include/net/neighbour.h:525 [inline] ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:450 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508 ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702 addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898 call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421 expire_timers+0x116/0x240 kernel/time/timer.c:1466 __run_timers+0x368/0x410 kernel/time/timer.c:1734 run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747 __do_softirq+0x158/0x2de kernel/softirq.c:558 __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0x37/0x70 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1: __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213 dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265 macvlan_queue_xmit drivers/net/macvlan.c:543 [inline] macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567 __netdev_start_xmit include/linux/netdevice.h:4987 [inline] netdev_start_xmit include/linux/netdevice.h:5001 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3590 dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606 sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342 __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817 __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194 dev_queue_xmit+0x13/0x20 net/core/dev.c:4259 neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523 neigh_output include/net/neighbour.h:527 [inline] ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:450 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508 ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702 addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898 call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421 expire_timers+0x116/0x240 kernel/time/timer.c:1466 __run_timers+0x368/0x410 kernel/time/timer.c:1734 run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747 __do_softirq+0x158/0x2de kernel/softirq.c:558 __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0x37/0x70 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443 folio_test_anon include/linux/page-flags.h:581 [inline] PageAnon include/linux/page-flags.h:586 [inline] zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347 zap_pmd_range mm/memory.c:1467 [inline] zap_pud_range mm/memory.c:1496 [inline] zap_p4d_range mm/memory.c:1517 [inline] unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538 unmap_single_vma+0x157/0x210 mm/memory.c:1583 unmap_vmas+0xd0/0x180 mm/memory.c:1615 exit_mmap+0x23d/0x470 mm/mmap.c:3170 __mmput+0x27/0x1b0 kernel/fork.c:1113 mmput+0x3d/0x50 kernel/fork.c:1134 exit_mm+0xdb/0x170 kernel/exit.c:507 do_exit+0x608/0x17a0 kernel/exit.c:819 do_group_exit+0xce/0x180 kernel/exit.c:929 get_signal+0xfc3/0x1550 kernel/signal.c:2852 arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0xffffffff Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G W 5.16.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01Revert "net: snmp: add statistics for tcp small queue check"Eric Dumazet
This reverts commit aeeecb889165617a841e939117f9a8095d0e7d80. The new SNMP variable (TCPSmallQueueFailure) can be incremented for good reasons, even on a 100Gbit single TCP_STREAM flow. If we really wanted to ease driver debugging [1], this would require something more sophisticated. [1] Usually, if a driver is delaying TX completions too much, this can lead to stalls in TCP output. Various work arounds have been used in the past, like skb_orphan() in ndo_start_xmit(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> Link: https://lore.kernel.org/r/20211201033246.2826224-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01net: dsa: replace phylink_get_interfaces() with phylink_get_caps()Russell King (Oracle)
Phylink needs slightly more information than phylink_get_interfaces() allows us to get from the DSA drivers - we need the MAC capabilities. Replace the phylink_get_interfaces() method with phylink_get_caps() to allow DSA drivers to fill in the phylink_config MAC capabilities field as well. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01kprobes: Limit max data_size of the kretprobe instancesMasami Hiramatsu
The 'kprobe::data_size' is unsigned, thus it can not be negative. But if user sets it enough big number (e.g. (size_t)-8), the result of 'data_size + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct kretprobe_instance) or zero. In result, the kretprobe_instance are allocated without enough memory, and kretprobe accesses outside of allocated memory. To avoid this issue, introduce a max limitation of the kretprobe::data_size. 4KB per instance should be OK. Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: f47cd9b553aa ("kprobes: kretprobe user entry-handler") Reported-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-02Merge tag 'drm-intel-next-2021-11-30' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 feature pull for v5.17: Features and functionality: - Implement per-lane DP drive settings for ICL+ (Ville) - Enable runtime pm autosuspend by default (Tilak Tangudu) - ADL-P DSI support (Vandita) - Add support for pipe C and D DMC firmware (Anusha) - Implement (near)atomic gamma LUT updates via vblank workers (Ville) - Split plane updates to noarm+arm phases (Ville) - Remove the CCS FB stride restrictions on ADL-P (Imre) - Add PSR selective fetch support for biplanar formats (Jouni) - Add support for display audio codec keepalive (Kai) - VRR platform support for display 11 (Manasi) Refactoring and cleanups: - FBC refactoring and cleanups preparing for multiple FBC instances (Ville) - PCH modeset refactoring, move to its own file (Ville) - Refactor and simplify handling of modifiers (Imre) - PXP cleanups (Ville) - Display header and include refactoring (Jani) - Some register macro cleanups (Ville) - Refactor DP HDMI DFP limit code (Ville) Fixes: - Disable DSB usage for now due to incorrect gamma LUT updates (Ville) - Check async flip state of every crtc and plane only once (José) - Fix DPT FB suspend/resume (Imre) - Fix black screen on reboot due to disabled DP++ TMDS output buffers (Ville) - Don't request GMBUS to generate irqs when called while irqs are off (Ville) - Fix type1 DVI DP dual mode adapter heuristics for modern platforms (Ville) - Fix fix integer overflow in 128b/132b data rate calculation (Jani) - Fix bigjoiner state readout (Ville) - Build fix for non-x86 (Siva) - PSR fixes (José, Jouni, Ville) - Disable ADL-P underrun recovery (José) - Fix DP link parameter usage before valid DPCD (Imre) - VRR vblank and frame counter fixes (Ville) - Fix fastsets on TypeC ports following a non-blocking modeset (Imre) - Compiler warning fixes (Nathan Chancellor) - Fix DSI HS mode commands (William Tseng) - Error return fixes (Dan Carpenter) - Update memory bandwidth calculations (Radhakrishna) - Implement WM0 cursor WA for DG2 (Stan) - Fix DSI Double pixelclock on read-back for dual-link panels (Hans de Goede) - HDMI 2.1 PCON FRL configuration fixes (Ankit) Merges: - DP link training delay helpers, via topic branch (Jani) - Backmerge drm-next (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87v909it0t.fsf@intel.com
2021-12-01ACPI: Change acpi_device_always_present() into acpi_device_override_status()Hans de Goede
Currently, acpi_bus_get_status() calls acpi_device_always_present() to allow platform quirks to override the _STA return to report that a device is present (status = ACPI_STA_DEFAULT) independent of the _STA return. In some cases it might also be useful to have the opposite functionality and have a platform quirk which marks a device as not present (status = 0) to work around ACPI table bugs. Change acpi_device_always_present() into a more generic acpi_device_override_status() function to allow this. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01Merge tag 'sound-5.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. A large series is found for ASoC tegra drivers to correct the control element handlings, while others are mostly for device-specific quirks and fix-ups" * tag 'sound-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits) ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid ALSA: hda/cs8409: Set PMSG_ON earlier inside cs8409 driver ASoC: SOF: hda: reset DAI widget before reconfiguring it ASoC: cs35l41: Set the max SPI speed for the whole device ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec ASoC: Intel: soc-acpi: add entry for ESSX8336 on CML ASoC: rk817: Add module alias for rk817-codec ASoC: soc-acpi: Set mach->id field on comp_ids matches ASoC: tegra: Fix kcontrol put callback in Mixer ASoC: tegra: Fix kcontrol put callback in ADX ASoC: tegra: Fix kcontrol put callback in AMX ASoC: tegra: Fix kcontrol put callback in SFC ASoC: tegra: Fix kcontrol put callback in MVC ASoC: tegra: Fix kcontrol put callback in AHUB ASoC: tegra: Fix kcontrol put callback in DSPK ASoC: tegra: Fix kcontrol put callback in DMIC ASoC: tegra: Fix kcontrol put callback in I2S ASoC: tegra: Fix kcontrol put callback in ADMAIF ASoC: tegra: Fix wrong value type in MVC ...
2021-12-01cgroup: Trace event cgroup id fields should be u64William Kucharski
Various trace event fields that store cgroup IDs were declared as ints, but cgroup_id(() returns a u64 and the structures and associated TP_printk() calls were not updated to reflect this. Fixes: 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID") Signed-off-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-01drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs controlZack Rusin
TTM takes full control over TTM_PL_SYSTEM placed buffers. This makes driver internal usage of TTM_PL_SYSTEM prone to errors because it requires the drivers to manually handle all interactions between TTM which can swap out those buffers whenever it thinks it's the right thing to do and driver. CPU buffers which need to be fenced and shared with accelerators should be placed in driver specific placements that can explicitly handle CPU/accelerator buffer fencing. Currently, apart, from things silently failing nothing is enforcing that requirement which means that it's easy for drivers and new developers to get this wrong. To avoid the confusion we can document this requirement and clarify the solution. This came up during a discussion on dri-devel: https://lore.kernel.org/dri-devel/232f45e9-8748-1243-09bf-56763e6668b3@amd.com Signed-off-by: Zack Rusin <zackr@vmware.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211110145034.487512-1-zackr@vmware.com
2021-12-01cgroup: fix a typo in commentWei Yang
In commit 8699b7762a62 ("cgroup: s/child_subsys_mask/subtree_ss_mask/"), we rename child_subsys_mask to subtree_ss_mask. While it missed to rename this in comment. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-11-30net/mlx5: Fix access to a non-supported registerAya Levin
Validate MRTC register is supported before triggering a delayed work which accesses it. Fixes: 5a1023deeed0 ("net/mlx5: Add periodic update of host time to firmware") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-30rcu: in_irq() cleanupChangbin Du
This commit replaces the obsolete and ambiguous macro in_irq() with its shiny new in_hardirq() equivalent. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)Chun-Hung Tseng
This commit replaces both ________p1 and _________p1 with __UNIQUE_ID(rcu), and also adjusts the callers of the affected macros. __UNIQUE_ID(rcu) will generate unique variable names during compilation, which eliminates the need of ________p1 and _________p1 (both having 4 occurrences prior to the code change). This also avoids the variable name shadowing issue, or at least makes those wishing to cause shadowing problems work much harder to do so. The same idea is used for the min/max macros (commit 589a978 and commit e9092d0). Signed-off-by: Jim Huang <jserv@ccns.ncku.edu.tw> Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-01entry: Snapshot thread flagsMark Rutland
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race. To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not. Convert them all to the new flag accessor helpers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20211129130653.2037928-3-mark.rutland@arm.com
2021-12-01thread_info: Add helpers to snapshot thread flagsMark Rutland
In <linux/thread_info.h> there are helpers to manipulate individual thread flags, but where code wants to check several flags at once, it must open code reading current_thread_info()->flags and operating on a snapshot. As some flags can be set remotely it's necessary to use READ_ONCE() to get a consistent snapshot even when IRQs are disabled, but some code forgets to do this. Generally this is unlike to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race. To make it easier to do the right thing, and to highlight that concurrent modification is possible, add new helpers to snapshot the flags, which should be used in preference to plain reads. Subsequent patches will move existing code to use the new helpers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marco Elver <elver@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211129130653.2037928-2-mark.rutland@arm.com
2021-11-30bpf: Add bpf_loop helperJoanne Koong
This patch adds the kernel-side and API changes for a new helper function, bpf_loop: long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags); where long (*callback_fn)(u32 index, void *ctx); bpf_loop invokes the "callback_fn" **nr_loops** times or until the callback_fn returns 1. The callback_fn can only return 0 or 1, and this is enforced by the verifier. The callback_fn index is zero-indexed. A few things to please note: ~ The "u64 flags" parameter is currently unused but is included in case a future use case for it arises. ~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c), bpf_callback_t is used as the callback function cast. ~ A program can have nested bpf_loop calls but the program must still adhere to the verifier constraint of its stack depth (the stack depth cannot exceed MAX_BPF_STACK)) ~ Recursive callback_fns do not pass the verifier, due to the call stack for these being too deep. ~ The next patch will include the tests and benchmark Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
2021-11-30bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.Sebastian Andrzej Siewior
The initial implementation of migrate_disable() for mainline was a wrapper around preempt_disable(). RT kernels substituted this with a real migrate disable implementation. Later on mainline gained true migrate disable support, but neither documentation nor affected code were updated. Remove stale comments claiming that migrate_disable() is PREEMPT_RT only. Don't use __this_cpu_inc() in the !PREEMPT_RT path because preemption is not disabled and the RMW operation can be preempted. Fixes: 74d862b682f51 ("sched: Make migrate_disable/enable() independent of RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211127163200.10466-3-bigeasy@linutronix.de
2021-11-30devlink: Simplify devlink resources unregister callLeon Romanovsky
The devlink_resources_unregister() used second parameter as an entry point for the recursive removal of devlink resources. None of the callers outside of devlink core needed to use this field, so let's remove it. As part of this removal, the "struct devlink_resource" was moved from .h to .c file as it is not possible to use in any place in the code except devlink.c. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-30Bonding: add arp_missed_max optionHangbin Liu
Currently, we use hard code number to verify if we are in the arp_interval timeslice. But some user may want to reduce/extend the verify timeslice. With the similar team option 'missed_max' the uers could change that number based on their own environment. Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-30net: stmmac: Add platform level debug register dump featureBhupesh Sharma
dwmac-qcom-ethqos currently exposes a mechanism to dump rgmii registers after the 'stmmac_dvr_probe()' returns. However with commit 5ec55823438e ("net: stmmac: add clocks management for gmac driver"), we now let 'pm_runtime_put()' disable the clocks before returning from 'stmmac_dvr_probe()'. This causes a crash when 'rgmii_dump()' register dumps are enabled, as the clocks are already off. Since other dwmac drivers (possible future users as well) might require a similar register dump feature, introduce a platform level callback to allow the same. This fixes the crash noticed while enabling rgmii_dump() dumps in dwmac-qcom-ethqos driver as well. It also allows future changes to keep a invoking the register dump callback from the correct place inside 'stmmac_dvr_probe()'. Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver") Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-30drm/cma-helper: Pass GEM CMA object in public interfacesThomas Zimmermann
Change all GEM CMA object functions that receive a GEM object of type struct drm_gem_object to expect an object of type struct drm_gem_cma_object instead. This change reduces the number of upcasts from struct drm_gem_object by moving them into callers. The C compiler can now verify that the GEM CMA functions are called with the correct type. For consistency, the patch also renames drm_gem_cma_free_object to drm_gem_cma_free. It further updates documentation for a number of functions. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20211115120148.21766-4-tzimmermann@suse.de
2021-11-30drm/cma-helper: Export dedicated wrappers for GEM object functionsThomas Zimmermann
Wrap GEM CMA functions for struct drm_gem_object_funcs and update all callers. This will allow for an update of the public interfaces of the GEM CMA helper library. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20211115120148.21766-3-tzimmermann@suse.de
2021-11-30drm/cma-helper: Move driver and file ops to the end of headerThomas Zimmermann
Restructure the header file for CMA helpers by moving declarations for driver and file operations to the end of the file. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211115120148.21766-2-tzimmermann@suse.de
2021-11-30drm: Declare hashtable as legacyThomas Zimmermann
The DRM hashtable code is only used by internal functions for legacy UMS drivers. Move the implementation behind CONFIG_DRM_LEGACY and the declarations into legacy header files. Unexport the symbols. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211129094841.22499-4-tzimmermann@suse.de
2021-11-30drm/ttm: Don't include drm_hashtab.hThomas Zimmermann
Remove the include statement for drm_hashtab.h. It's not required by TTM. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211129094841.22499-2-tzimmermann@suse.de
2021-11-29siphash: use _unaligned version by defaultArnd Bergmann
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/ Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/ Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29wireguard: device: reset peer src endpoint when netns exitsJason A. Donenfeld
Each peer's endpoint contains a dst_cache entry that takes a reference to another netdev. When the containing namespace exits, we take down the socket and prevent future sockets from being created (by setting creating_net to NULL), which removes that potential reference on the netns. However, it doesn't release references to the netns that a netdev cached in dst_cache might be taking, so the netns still might fail to exit. Since the socket is gimped anyway, we can simply clear all the dst_caches (by way of clearing the endpoint src), which will release all references. However, the current dst_cache_reset function only releases those references lazily. But it turns out that all of our usages of wg_socket_clear_peer_endpoint_src are called from contexts that are not exactly high-speed or bottle-necked. For example, when there's connection difficulty, or when userspace is reconfiguring the interface. And in particular for this patch, when the netns is exiting. So for those cases, it makes more sense to call dst_release immediately. For that, we add a small helper function to dst_cache. This patch also adds a test to netns.sh from Hangbin Liu to ensure this doesn't regress. Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reported-by: Xiumei Mu <xmu@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Paolo Abeni <pabeni@redhat.com> Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29ipv6: fix memory leak in fib6_rule_suppressmsizanoen1
The kernel leaks memory when a `fib` rule is present in IPv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in `ip6_dst_cache` slab cache. After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule"). The problem with that change is that the generic `args->flags` always have `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not decreasing the refcount when needed. How to reproduce: - Add the following nftables rule to a prerouting chain: meta nfproto ipv6 fib saddr . mark . iif oif missing drop This can be done with: sudo nft create table inet test sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }' sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop - Run: sudo ip -6 rule add table main suppress_prefixlength 0 - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase with every incoming ipv6 packet. This patch exposes the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this. [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71 [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105 Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29net: snmp: add statistics for tcp small queue checkMenglong Dong
Once tcp small queue check failed in tcp_small_queue_check(), the throughput of tcp will be limited, and it's hard to distinguish whether it is out of tcp congestion control. Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29net: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO ↵Colin Foster
access Switch to a shared MDIO access implementation by way of the mdio-mscc-miim driver. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29net: vxlan: add macro definition for number of IANA VXLAN-GPE portHao Chen
Add macro definition for number of IANA VXLAN-GPE port for generic use. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29tcp: fix page frag corruption on page faultPaolo Abeni
Steffen reported a TCP stream corruption for HTTP requests served by the apache web-server using a cifs mount-point and memory mapping the relevant file. The root cause is quite similar to the one addressed by commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from memory reclaim"). Here the nested access to the task page frag is caused by a page fault on the (mmapped) user-space memory buffer coming from the cifs file. The page fault handler performs an smb transaction on a different socket, inside the same process context. Since sk->sk_allaction for such socket does not prevent the usage for the task_frag, the nested allocation modify "under the hood" the page frag in use by the outer sendmsg call, corrupting the stream. The overall relevant stack trace looks like the following: httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked: ffffffff91461d91 tcp_sendmsg_locked+0x1 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139814e sock_sendmsg+0x3e ffffffffc06dfe1d smb_send_kvec+0x28 [...] ffffffffc06cfaf8 cifs_readpages+0x213 ffffffff90e83c4b read_pages+0x6b ffffffff90e83f31 __do_page_cache_readahead+0x1c1 ffffffff90e79e98 filemap_fault+0x788 ffffffff90eb0458 __do_fault+0x38 ffffffff90eb5280 do_fault+0x1a0 ffffffff90eb7c84 __handle_mm_fault+0x4d4 ffffffff90eb8093 handle_mm_fault+0xc3 ffffffff90c74f6d __do_page_fault+0x1ed ffffffff90c75277 do_page_fault+0x37 ffffffff9160111e page_fault+0x1e ffffffff9109e7b5 copyin+0x25 ffffffff9109eb40 _copy_from_iter_full+0xe0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139815c sock_sendmsg+0x4c ffffffff913981f7 sock_write_iter+0x97 ffffffff90f2cc56 do_iter_readv_writev+0x156 ffffffff90f2dff0 do_iter_write+0x80 ffffffff90f2e1c3 vfs_writev+0xa3 ffffffff90f2e27c do_writev+0x5c ffffffff90c042bb do_syscall_64+0x5b ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65 The cifs filesystem rightfully sets sk_allocations to GFP_NOFS, we can avoid the nesting using the sk page frag for allocation lacking the __GFP_FS flag. Do not define an additional mm-helper for that, as this is strictly tied to the sk page frag usage. v1 -> v2: - use a stricted sk_page_frag() check instead of reordering the code (Eric) Reported-by: Steffen Froemer <sfroemer@redhat.com> Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-29drm/virtgpu api: define a dummy fence signaled eventGurchetan Singh
The current virtgpu implementation of poll(..) drops events when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise it's like a normal DRM driver). This is because paravirtualized userspaces receives responses in a buffer of type BLOB_MEM_GUEST, not by read(..). To be in line with other DRM drivers and avoid specialized behavior, it is possible to define a dummy event for virtgpu. Paravirtualized userspace will now have to call read(..) on the DRM fd to receive the dummy event. Fixes: b10790434cf2 ("drm/virtgpu api: create context init feature") Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20211122232210.602-2-gurchetansingh@google.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-11-28ieee80211: change HE nominal packet padding value definesMiri Korenblit
It's easier to use and understand, and to extend for EHT later, if we use the values here instead of the shifted values. Unfortunately, we need to add _POS so that we can use it in places like iwlwifi/mvm where constants are needed. While at it, fix the typo ("NOMIMAL") which also helps catch any conflicts. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://lore.kernel.org/r/20211126104817.7c29a05b8eb5.I2ca9faf06e177e3035bec91e2ae53c2f91d41774@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-28Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin: "Misc fixes all over the place. Revert of virtio used length validation series: the approach taken does not seem to work, breaking too many guests in the process. We'll need to do length validation using some other approach" [ This merge also ends up reverting commit f7a36b03a732 ("vsock/virtio: suppress used length validation"), which came in through the networking tree in the meantime, and was part of that whole used length validation series - Linus ] * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim: avoid putting an uninitialized iova_domain vhost-vdpa: clean irqs before reseting vdpa device virtio-blk: modify the value type of num in virtio_queue_rq() vhost/vsock: cleanup removing `len` variable vhost/vsock: fix incorrect used length reported to the guest Revert "virtio_ring: validate used buffer length" Revert "virtio-net: don't let virtio core to validate used length" Revert "virtio-blk: don't let virtio core to validate used length" Revert "virtio-scsi: don't let virtio core to validate used buffer length"
2021-11-28drm/dp: Add macro to check max_downspread capabilitySankeerth Billakanti
Add a macro to check for the max_downspread capability in drm_dp_helper. Signed-off-by: Sankeerth Billakanti <quic_sbillaka@quicinc.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> changes in v4: - Return 1 for DPCD version >= v1.1 (Stephen Boyd) Link: https://lore.kernel.org/r/1635839325-401-4-git-send-email-quic_sbillaka@quicinc.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-27Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Highlights include: Stable fixes: - NFSv42: Fix pagecache invalidation after COPY/CLONE Bugfixes: - NFSv42: Don't fail clone() just because the server failed to return post-op attributes - SUNRPC: use different lockdep keys for INET6 and LOCAL - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION - SUNRPC: fix header include guard in trace header" * tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: use different lock keys for INET6 and LOCAL sunrpc: fix header include guard in trace header NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION NFSv42: Fix pagecache invalidation after COPY/CLONE NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() NFSv42: Don't fail clone() unless the OP_CLONE operation failed
2021-11-27drm: Decouple nomodeset from CONFIG_VGA_CONSOLEJavier Martinez Canillas
This relationship was only for historical reasons and the nomodeset option should be available even on platforms that don't enable CONFIG_VGA_CONSOLE. Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211112133230.1595307-5-javierm@redhat.com
2021-11-27drm: Move nomodeset kernel parameter to the DRM subsystemJavier Martinez Canillas
The "nomodeset" kernel cmdline parameter is handled by the vgacon driver but the exported vgacon_text_force() symbol is only used by DRM drivers. It makes much more sense for the parameter logic to be in the subsystem of the drivers that are making use of it. Let's move the vgacon_text_force() function and related logic to the DRM subsystem. While doing that, rename it to drm_firmware_drivers_only() and make it return true if "nomodeset" was used and false otherwise. This is a better description of the condition that the drivers are testing for. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211112133230.1595307-4-javierm@redhat.com
2021-11-26af_unix: Replace the big lock with small locks.Kuniyuki Iwashima
The hash table of AF_UNIX sockets is protected by the single lock. This patch replaces it with per-hash locks. The effect is noticeable when we handle multiple sockets simultaneously. Here is a test result on an EC2 c5.24xlarge instance. It shows latency (under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating 1024 sockets for each in parallel. Without this patch: nsec : count distribution 0 : 179 | | 500 : 3021 |********* | 1000 : 6271 |******************* | 1500 : 6318 |******************* | 2000 : 5828 |***************** | 2500 : 5124 |*************** | 3000 : 4426 |************* | 3500 : 3672 |*********** | 4000 : 3138 |********* | 4500 : 2811 |******** | 5000 : 2384 |******* | 5500 : 2023 |****** | 6000 : 1954 |***** | 6500 : 1737 |***** | 7000 : 1749 |***** | 7500 : 1520 |**** | 8000 : 1469 |**** | 8500 : 1394 |**** | 9000 : 1232 |*** | 9500 : 1138 |*** | 10000 : 994 |*** | With this patch: nsec : count distribution 0 : 1634 |**** | 500 : 13170 |****************************************| 1000 : 13156 |*************************************** | 1500 : 9010 |*************************** | 2000 : 6363 |******************* | 2500 : 4443 |************* | 3000 : 3240 |********* | 3500 : 2549 |******* | 4000 : 1872 |***** | 4500 : 1504 |**** | 5000 : 1247 |*** | 5500 : 1035 |*** | 6000 : 889 |** | 6500 : 744 |** | 7000 : 634 |* | 7500 : 498 |* | 8000 : 433 |* | 8500 : 355 |* | 9000 : 336 |* | 9500 : 284 | | 10000 : 243 | | Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26af_unix: Save hash in sk_hash.Kuniyuki Iwashima
To replace unix_table_lock with per-hash locks in the next patch, we need to save a hash in each socket because /proc/net/unix or BPF prog iterate sockets while holding a hash table lock and release it later in a different function. Currently, we store a real/pseudo hash in struct unix_address. However, we do not allocate it to unbound sockets, nor should we do just for that. For this purpose, we can use sk_hash. Then, we no longer use the hash field in struct unix_address and can remove it. Also, this patch does - rename unix_insert_socket() to unix_insert_unbound_socket() - remove the redundant list argument from __unix_insert_socket() and unix_insert_unbound_socket() - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash() - remove 'inline' from unix_remove_socket() and unix_insert_unbound_socket(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ipa/ipa_main.c 8afc7e471ad3 ("net: ipa: separate disabling setup from modem stop") 76b5fbcd6b47 ("net: ipa: kill ipa_modem_init()") Duplicated include, drop one. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26Merge tag 'net-5.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter. Current release - regressions: - r8169: fix incorrect mac address assignment - vlan: fix underflow for the real_dev refcnt when vlan creation fails - smc: avoid warning of possible recursive locking Current release - new code bugs: - vsock/virtio: suppress used length validation - neigh: fix crash in v6 module initialization error path Previous releases - regressions: - af_unix: fix change in behavior in read after shutdown - igb: fix netpoll exit with traffic, avoid warning - tls: fix splice_read() when starting mid-record - lan743x: fix deadlock in lan743x_phy_link_status_change() - marvell: prestera: fix bridge port operation Previous releases - always broken: - tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows - nexthop: fix refcount issues when replacing IPv6 groups - nexthop: fix null pointer dereference when IPv6 is not enabled - phylink: force link down and retrigger resolve on interface change - mptcp: fix delack timer length calculation and incorrect early clearing - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds - nfc: virtual_ncidev: change default device permissions - netfilter: ctnetlink: fix error codes and flags used for kernel side filtering of dumps - netfilter: flowtable: fix IPv6 tunnel addr match - ncsi: align payload to 32-bit to fix dropped packets - iavf: fix deadlock and loss of config during VF interface reset - ice: avoid bpf_prog refcount underflow - ocelot: fix broken PTP over IP and PTP API violations Misc: - marvell: mvpp2: increase MTU limit when XDP enabled" * tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) net: dsa: microchip: implement multi-bridge support net: mscc: ocelot: correctly report the timestamping RX filters in ethtool net: mscc: ocelot: set up traps for PTP packets net: ptp: add a definition for the UDP port for IEEE 1588 general messages net: mscc: ocelot: create a function that replaces an existing VCAP filter net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP net: hns3: fix incorrect components info of ethtool --reset command net: hns3: fix one incorrect value of page pool info when queried by debugfs net: hns3: add check NULL address for page pool net: hns3: fix VF RSS failed problem after PF enable multi-TCs net: qed: fix the array may be out of bound net/smc: Don't call clcsock shutdown twice when smc shutdown net: vlan: fix underflow for the real_dev refcnt ptp: fix filter names in the documentation ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce() nfc: virtual_ncidev: change default device permissions net/sched: sch_ets: don't peek at classes beyond 'nbands' net: stmmac: Disable Tx queues when reconfiguring the interface selftests: tls: test for correct proto_ops tls: fix replacing proto_ops ...
2021-11-26Merge tag 'acpi-5.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a NULL pointer dereference in the CPPC library code and a locking issue related to printing the names of ACPI device nodes in the device properties framework. Specifics: - Fix NULL pointer dereference in the CPPC library code occuring on hybrid systems without CPPC support (Rafael Wysocki). - Avoid attempts to acquire a semaphore with interrupts off when printing the names of ACPI device nodes and clean up code on top of that fix (Sakari Ailus)" * tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Add NULL pointer check to cppc_get_perf() ACPI: Make acpi_node_get_parent() local ACPI: Get acpi_device's parent from the parent field
2021-11-26net: ptp: add a definition for the UDP port for IEEE 1588 general messagesVladimir Oltean
As opposed to event messages (Sync, PdelayReq etc) which require timestamping, general messages (Announce, FollowUp etc) do not. In PTP they are part of different streams of data. IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP destination port assigned by IANA is 319 for event messages, and 320 for general messages. Yet the kernel seems to be missing the definition for general messages. This patch adds it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net: mscc: ocelot: create a function that replaces an existing VCAP filterVladimir Oltean
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind tc flower offload on ocelot, among other things. The ingress port mask on which VCAP rules match is present as a bit field in the actual key of the rule. This means that it is possible for a rule to be shared among multiple source ports. When the rule is added one by one on each desired port, that the ingress port mask of the key must be edited and rewritten to hardware. But the API in ocelot_vcap.c does not allow for this. For one thing, ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric, because ocelot_vcap_filter_add() works with a preallocated and prepopulated filter and programs it to hardware, and ocelot_vcap_filter_del() does both the job of removing the specified filter from hardware, as well as kfreeing it. That is to say, the only option of editing a filter in place, which is to delete it, modify the structure and add it back, does not work because it results in use-after-free. This patch introduces ocelot_vcap_filter_replace, which trivially reprograms a VCAP entry to hardware, at the exact same index at which it existed before, without modifying any list or allocating any memory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26Merge tag 'for-linus-5.16c-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - Kconfig fix to make it possible to control building of the privcmd driver - three fixes for issues identified by the kernel test robot - a five-patch series to simplify timeout handling for Xen PV driver initialization - two patches to fix error paths in xenstore/xenbus driver initialization * tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: make HYPERVISOR_set_debugreg() always_inline xen: make HYPERVISOR_get_debugreg() always_inline xen: detect uninitialized xenbus in xenbus_init xen: flag xen_snd_front to be not essential for system boot xen: flag pvcalls-front to be not essential for system boot xen: flag hvc_xen to be not essential for system boot xen: flag xen_drm_front to be not essential for system boot xen: add "not_essential" flag to struct xenbus_driver xen/pvh: add missing prototype to header xen: don't continue xenstore initialization in case of errors xen/privcmd: make option visible in Kconfig