summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-08-09clk: core: introduce clk_hw_set_parent()Neil Armstrong
Introduce the clk_hw_set_parent() provider call to change parent of a clock by using the clk_hw pointers. This eases the clock reparenting from clock rate notifiers and implementing DVFS with simpler code avoiding the boilerplates functions as __clk_lookup(clk_hw_get_name()) then clk_set_parent(). Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2019-08-08net/tls: prevent skb_orphan() from leaking TLS plain text with offloadJakub Kicinski
sk_validate_xmit_skb() and drivers depend on the sk member of struct sk_buff to identify segments requiring encryption. Any operation which removes or does not preserve the original TLS socket such as skb_orphan() or skb_clone() will cause clear text leaks. Make the TCP socket underlying an offloaded TLS connection mark all skbs as decrypted, if TLS TX is in offload mode. Then in sk_validate_xmit_skb() catch skbs which have no socket (or a socket with no validation) and decrypted flag set. Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and sk->sk_validate_xmit_skb are slightly interchangeable right now, they all imply TLS offload. The new checks are guarded by CONFIG_TLS_DEVICE because that's the option guarding the sk_buff->decrypted member. Second, smaller issue with orphaning is that it breaks the guarantee that packets will be delivered to device queues in-order. All TLS offload drivers depend on that scheduling property. This means skb_orphan_partial()'s trick of preserving partial socket references will cause issues in the drivers. We need a full orphan, and as a result netem delay/throttling will cause all TLS offload skbs to be dropped. Reusing the sk_buff->decrypted flag also protects from leaking clear text when incoming, decrypted skb is redirected (e.g. by TC). See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP") for justification why the internal flag is safe. The only location which could leak the flag in is tcp_bpf_sendmsg(), which is taken care of by clearing the previously unused bit. v2: - remove superfluous decrypted mark copy (Willem); - remove the stale doc entry (Boris); - rely entirely on EOR marking to prevent coalescing (Boris); - use an internal sendpages flag instead of marking the socket (Boris). v3 (Willem): - reorganize the can_skb_orphan_partial() condition; - fix the flag leak-in through tcp_bpf_sendmsg. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08inet: frags: re-introduce skb coalescing for local deliveryGuillaume Nault
Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus generating many fragments) and running over an IPsec tunnel, reported more than 6Gbps throughput. After that patch, the same test gets only 9Mbps when receiving on a be2net nic (driver can make a big difference here, for example, ixgbe doesn't seem to be affected). By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing (IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert "ipv4: use skb coalescing in defragmentation"")). Without fragment coalescing, be2net runs out of Rx ring entries and starts to drop frames (ethtool reports rx_drops_no_frags errors). Since the netperf traffic is only composed of UDP fragments, any lost packet prevents reassembly of the full datagram. Therefore, fragments which have no possibility to ever get reassembled pile up in the reassembly queue, until the memory accounting exeeds the threshold. At that point no fragment is accepted anymore, which effectively discards all netperf traffic. When reassembly timeout expires, some stale fragments are removed from the reassembly queue, so a few packets can be received, reassembled and delivered to the netperf receiver. But the nic still drops frames and soon the reassembly queue gets filled again with stale fragments. These long time frames where no datagram can be received explain why the performance drop is so significant. Re-introducing fragment coalescing is enough to get the initial performances again (6.6Gbps with be2net): driver doesn't drop frames anymore (no more rx_drops_no_frags errors) and the reassembly engine works at full speed. This patch is quite conservative and only coalesces skbs for local IPv4 and IPv6 delivery (in order to avoid changing skb geometry when forwarding). Coalescing could be extended in the future if need be, as more scenarios would probably benefit from it. [0]: Test configuration Sender: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow netserver -D -L fc00:2::1 Receiver: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6 Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08net/mlx5e: kTLS, Fix progress params context WQE layoutTariq Toukan
The TLS progress params context WQE should not include an Eth segment, drop it. In addition, align the tls_progress_params layout with the HW specification document: - fix the tisn field name. - remove the valid bit. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08net/mlx5: kTLS, Fix wrong TIS opmod constantsTariq Toukan
Fix the used constants for TLS TIS opmods, per the HW specification. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08auxdisplay: charlcd: move charlcd.h to drivers/auxdisplayMasahiro Yamada
This header is included in drivers/auxdisplay/. Make it a local header. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-08-08efi: Export Runtime Configuration Interface table to sysfsNarendra K
System firmware advertises the address of the 'Runtime Configuration Interface table version 2 (RCI2)' via an EFI Configuration Table entry. This code retrieves the RCI2 table from the address and exports it to sysfs as a binary attribute 'rci2' under /sys/firmware/efi/tables directory. The approach adopted is similar to the attribute 'DMI' under /sys/firmware/dmi/tables. RCI2 table contains BIOS HII in XML format and is used to populate BIOS setup page in Dell EMC OpenManage Server Administrator tool. The BIOS setup page contains BIOS tokens which can be configured. Signed-off-by: Narendra K <Narendra.K@dell.com> Reviewed-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08efi: ia64: move SAL systab handling out of generic EFI codeArd Biesheuvel
The SAL systab is an Itanium specific EFI configuration table, so move its handling into arch/ia64 where it belongs. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08efi/x86: move UV_SYSTAB handling into arch/x86Ard Biesheuvel
The SGI UV UEFI machines are tightly coupled to the x86 architecture so there is no need to keep any awareness of its existence in the generic EFI layer, especially since we already have the infrastructure to handle arch-specific configuration tables, and were even already using it to some extent. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08efi: x86: move efi_is_table_address() into arch/x86Ard Biesheuvel
The function efi_is_table_address() and the associated array of table pointers is specific to x86. Since we will be adding some more x86 specific tables, let's move this code out of the generic code first. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08mutex: Fix up mutex_waiter usagePeter Zijlstra
The patch moving bits into mutex.c was a little too much; by also moving struct mutex_waiter a few less common CONFIGs would no longer build. Fixes: 5f35d5a66b3e ("locking/mutex: Make __mutex_owner static to mutex.c") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2019-08-08Merge tag 'drm-fixes-5.3-2019-08-07' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.3-2019-08-07: amdgpu: - Fixes VCN to handle the latest navi10 firmware - Fixes for fan control on navi10 - Properly handle SMU metrics table on navi10 - Fix a resume regression on Stoney amdkfd: - Revert new GWS ioctl. It's not ready. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190807184221.3323-1-alexander.deucher@amd.com
2019-08-07Revert "drm/amdkfd: New IOCTL to allocate queue GWS"Alex Deucher
This reverts commit 1a058c3376765ee31d65e28cbbb9d4ff15120056. This interface is still in too much flux. Revert until it's sorted out. Acked-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-07irqdomain/debugfs: Use PAs to generate fwnode namesMarc Zyngier
Booting a large arm64 server (HiSi D05) leads to the following shouting at boot time: [ 20.722132] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.730851] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.739560] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.748267] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.756975] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.765683] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! [ 20.774391] debugfs: File 'irqchip@(____ptrval____)-3' in directory 'domains' already present! and many more... Evidently, we expect something a bit more informative than ____ptrval____, and certainly we want all of our domains, not just the first one. For that, turn the %p used to generate the fwnode name into something that won't be repainted (%pa). Given that we've now fixed all users to pass a pointer to a PA, it will actually do the right thing. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-07error-injection: Consolidate override function definitionLeo Yan
The function override_function_with_return() is defined separately for each architecture and every architecture's definition is almost same with each other. E.g. x86 and powerpc both define function in its own asm/error-injection.h header and override_function_with_return() has the same definition, the only difference is that x86 defines an extra function just_return_func() but it is specific for x86 and is only used by x86's override_function_with_return(), so don't need to export this function. This patch consolidates override_function_with_return() definition into asm-generic/error-injection.h header, thus all architectures can use the common definition. As result, the architecture specific headers are removed; the include/linux/error-injection.h header also changes to include asm-generic/error-injection.h header rather than architecture header, furthermore, it includes linux/compiler.h for successful compilation. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Yeah I should have sent a pull request last week, so there is a lot more here than usual: 1) Fix memory leak in ebtables compat code, from Wenwen Wang. 2) Several kTLS bug fixes from Jakub Kicinski (circular close on disconnect etc.) 3) Force slave speed check on link state recovery in bonding 802.3ad mode, from Thomas Falcon. 4) Clear RX descriptor bits before assigning buffers to them in stmmac, from Jose Abreu. 5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF loops, from Nishka Dasgupta. 6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean. 7) Need to hold sock across skb->destructor invocation, from Cong Wang. 8) IP header length needs to be validated in ipip tunnel xmit, from Haishuang Yan. 9) Use after free in ip6 tunnel driver, also from Haishuang Yan. 10) Do not use MSI interrupts on r8169 chips before RTL8168d, from Heiner Kallweit. 11) Upon bridge device init failure, we need to delete the local fdb. From Nikolay Aleksandrov. 12) Handle erros from of_get_mac_address() properly in stmmac, from Martin Blumenstingl. 13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef Kadlecsik. 14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with some devices, so revert. From Johannes Berg. 15) Fix deadlock in rxrpc, from David Howells. 16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue Haibing. 17) Fix mvpp2 crash on module removal, from Matteo Croce. 18) Fix race in genphy_update_link, from Heiner Kallweit. 19) bpf_xdp_adjust_head() stopped working with generic XDP when we fixes generic XDP to support stacked devices properly, fix from Jesper Dangaard Brouer. 20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from David Ahern. 21) Several memory leaks in new sja1105 driver, from Vladimir Oltean" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits) net: dsa: sja1105: Fix memory leak on meta state machine error path net: dsa: sja1105: Fix memory leak on meta state machine normal path net: dsa: sja1105: Really fix panic on unregistering PTP clock net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well net: dsa: sja1105: Fix broken learning with vlan_filtering disabled net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus() net: sched: sample: allow accessing psample_group with rtnl net: sched: police: allow accessing police->params with rtnl net: hisilicon: Fix dma_map_single failed on arm64 net: hisilicon: fix hip04-xmit never return TX_BUSY net: hisilicon: make hip04_tx_reclaim non-reentrant tc-testing: updated vlan action tests with batch create/delete net sched: update vlan action for batched events operations net: stmmac: tc: Do not return a fragment entry net: stmmac: Fix issues when number of Queues >= 4 net: stmmac: xgmac: Fix XGMAC selftests be2net: disable bh with spin_lock in be_process_mcc net: cxgb3_main: Fix a resource leak in a error path in 'init_one()' net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER ...
2019-08-06net: sched: sample: allow accessing psample_group with rtnlVlad Buslov
Recently implemented support for sample action in flow_offload infra leads to following rcu usage warning: [ 1938.234856] ============================= [ 1938.234858] WARNING: suspicious RCU usage [ 1938.234863] 5.3.0-rc1+ #574 Not tainted [ 1938.234866] ----------------------------- [ 1938.234869] include/net/tc_act/tc_sample.h:47 suspicious rcu_dereference_check() usage! [ 1938.234872] other info that might help us debug this: [ 1938.234875] rcu_scheduler_active = 2, debug_locks = 1 [ 1938.234879] 1 lock held by tc/19540: [ 1938.234881] #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970 [ 1938.234900] stack backtrace: [ 1938.234905] CPU: 2 PID: 19540 Comm: tc Not tainted 5.3.0-rc1+ #574 [ 1938.234908] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1938.234911] Call Trace: [ 1938.234922] dump_stack+0x85/0xc0 [ 1938.234930] tc_setup_flow_action+0xed5/0x2040 [ 1938.234944] fl_hw_replace_filter+0x11f/0x2e0 [cls_flower] [ 1938.234965] fl_change+0xd24/0x1b30 [cls_flower] [ 1938.234990] tc_new_tfilter+0x3e0/0x970 [ 1938.235021] ? tc_del_tfilter+0x720/0x720 [ 1938.235028] rtnetlink_rcv_msg+0x389/0x4b0 [ 1938.235038] ? netlink_deliver_tap+0x95/0x400 [ 1938.235044] ? rtnl_dellink+0x2d0/0x2d0 [ 1938.235053] netlink_rcv_skb+0x49/0x110 [ 1938.235063] netlink_unicast+0x171/0x200 [ 1938.235073] netlink_sendmsg+0x224/0x3f0 [ 1938.235091] sock_sendmsg+0x5e/0x60 [ 1938.235097] ___sys_sendmsg+0x2ae/0x330 [ 1938.235111] ? __handle_mm_fault+0x12cd/0x19e0 [ 1938.235125] ? __handle_mm_fault+0x12cd/0x19e0 [ 1938.235138] ? find_held_lock+0x2b/0x80 [ 1938.235147] ? do_user_addr_fault+0x22d/0x490 [ 1938.235160] __sys_sendmsg+0x59/0xa0 [ 1938.235178] do_syscall_64+0x5c/0xb0 [ 1938.235187] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1938.235192] RIP: 0033:0x7ff9a4d597b8 [ 1938.235197] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1938.235200] RSP: 002b:00007ffcfe381c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1938.235205] RAX: ffffffffffffffda RBX: 000000005d4497f9 RCX: 00007ff9a4d597b8 [ 1938.235208] RDX: 0000000000000000 RSI: 00007ffcfe381cb0 RDI: 0000000000000003 [ 1938.235211] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1938.235214] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1938.235217] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001 Change tcf_sample_psample_group() helper to allow using it from both rtnl and rcu protected contexts. Fixes: a7a7be6087b0 ("net/sched: add sample action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: sched: police: allow accessing police->params with rtnlVlad Buslov
Recently implemented support for police action in flow_offload infra leads to following rcu usage warning: [ 1925.881092] ============================= [ 1925.881094] WARNING: suspicious RCU usage [ 1925.881098] 5.3.0-rc1+ #574 Not tainted [ 1925.881100] ----------------------------- [ 1925.881104] include/net/tc_act/tc_police.h:57 suspicious rcu_dereference_check() usage! [ 1925.881106] other info that might help us debug this: [ 1925.881109] rcu_scheduler_active = 2, debug_locks = 1 [ 1925.881112] 1 lock held by tc/18591: [ 1925.881115] #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970 [ 1925.881124] stack backtrace: [ 1925.881127] CPU: 2 PID: 18591 Comm: tc Not tainted 5.3.0-rc1+ #574 [ 1925.881130] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1925.881132] Call Trace: [ 1925.881138] dump_stack+0x85/0xc0 [ 1925.881145] tc_setup_flow_action+0x1771/0x2040 [ 1925.881155] fl_hw_replace_filter+0x11f/0x2e0 [cls_flower] [ 1925.881175] fl_change+0xd24/0x1b30 [cls_flower] [ 1925.881200] tc_new_tfilter+0x3e0/0x970 [ 1925.881231] ? tc_del_tfilter+0x720/0x720 [ 1925.881243] rtnetlink_rcv_msg+0x389/0x4b0 [ 1925.881250] ? netlink_deliver_tap+0x95/0x400 [ 1925.881257] ? rtnl_dellink+0x2d0/0x2d0 [ 1925.881264] netlink_rcv_skb+0x49/0x110 [ 1925.881275] netlink_unicast+0x171/0x200 [ 1925.881284] netlink_sendmsg+0x224/0x3f0 [ 1925.881299] sock_sendmsg+0x5e/0x60 [ 1925.881305] ___sys_sendmsg+0x2ae/0x330 [ 1925.881309] ? task_work_add+0x43/0x50 [ 1925.881314] ? fput_many+0x45/0x80 [ 1925.881329] ? __lock_acquire+0x248/0x1930 [ 1925.881342] ? find_held_lock+0x2b/0x80 [ 1925.881347] ? task_work_run+0x7b/0xd0 [ 1925.881359] __sys_sendmsg+0x59/0xa0 [ 1925.881375] do_syscall_64+0x5c/0xb0 [ 1925.881381] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1925.881384] RIP: 0033:0x7feb245047b8 [ 1925.881388] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1925.881391] RSP: 002b:00007ffc2d2a5788 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1925.881395] RAX: ffffffffffffffda RBX: 000000005d4497ed RCX: 00007feb245047b8 [ 1925.881398] RDX: 0000000000000000 RSI: 00007ffc2d2a57f0 RDI: 0000000000000003 [ 1925.881400] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1925.881403] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1925.881406] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001 Change tcf_police_rate_bytes_ps() and tcf_police_tcfp_burst() helpers to allow using them from both rtnl and rcu protected contexts. Fixes: 8c8cfc6ed274 ("net/sched: add police action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06arm64: Introduce prctl() options to control the tagged user addresses ABICatalin Marinas
It is not desirable to relax the ABI to allow tagged user addresses into the kernel indiscriminately. This patch introduces a prctl() interface for enabling or disabling the tagged ABI with a global sysctl control for preventing applications from enabling the relaxed ABI (meant for testing user-space prctl() return error checking without reconfiguring the kernel). The ABI properties are inherited by threads of the same application and fork()'ed children but cleared on execve(). A Kconfig option allows the overall disabling of the relaxed ABI. The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle MTE-specific settings like imprecise vs precise exceptions. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-06locking/mutex: Make __mutex_owner static to mutex.cMukesh Ojha
__mutex_owner() should only be used by the mutex api's. So, to put this restiction let's move the __mutex_owner() function definition from linux/mutex.h to mutex.c file. There exist functions that uses __mutex_owner() like mutex_is_locked() and mutex_trylock_recursive(), So to keep legacy thing intact move them as well and export them. Move mutex_waiter structure also to keep it private to the file. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mingo@redhat.com Cc: will@kernel.org Link: https://lkml.kernel.org/r/1564585504-3543-1-git-send-email-mojha@codeaurora.org
2019-08-06locking/rwsem: Check for operations on an uninitialized rwsemDavidlohr Bueso
Currently rwsems is the only locking primitive that lacks this debug feature. Add it under CONFIG_DEBUG_RWSEMS and do the magic checking in the locking fastpath (trylock) operation such that we cover all cases. The unlocking part is pretty straightforward. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: mingo@kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Link: https://lkml.kernel.org/r/20190729044735.9632-1-dave@stgolabs.net
2019-08-06Merge tag 'asoc-fix-v5.3-rc3' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.3 A relatively large batch of mostly unremarkable fixes here, a couple of small core fixes for fairly obscure issues, more comment/email updates with no code impact than usual and a bunch of small driver fixes. The support for new sample rates in the max98373 driver is a fix for the fact that the driver declared support for those rates but would in fact return an error if these rates were selected.
2019-08-05base: soc: Add serial_number attribute to socBjorn Andersson
Add new attribute named "serial_number" as a standard interface for user space to acquire the serial number of the device. For ST-Ericsson SoCs this is exposed by the cryptically named "soc_id" attribute, but this provides a human readable standardized name for this property. Tested-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Vaishali Thakkar <vaishali.thakkar@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-08-05net/tls: partially revert fix transition through disconnect with closeJakub Kicinski
Looks like we were slightly overzealous with the shutdown() cleanup. Even though the sock->sk_state can reach CLOSED again, socket->state will not got back to SS_UNCONNECTED once connections is ESTABLISHED. Meaning we will see EISCONN if we try to reconnect, and EINVAL if we try to listen. Only listen sockets can be shutdown() and reused, but since ESTABLISHED sockets can never be re-connected() or used for listen() we don't need to try to clean up the ULP state early. Fixes: 32857cf57f92 ("net/tls: fix transition through disconnect with close") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05KVM: no need to check return value of debugfs_create functionsGreg KH
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return void instead of an integer, as we should not care at all about if this function actually does anything or not. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Cc: <kvm@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05KVM: remove kvm_arch_has_vcpu_debugfs()Paolo Bonzini
There is no need for this function as all arches have to implement kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol let us actually simplify the code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05KVM: Fix leak vCPU's VMCS value into other pCPUWanpeng Li
After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-03net/socket: fix GCC8+ Wpacked-not-aligned warningsQian Cai
There are a lot of those warnings with GCC8+ 64-bit, In file included from ./include/linux/sctp.h:42, from net/core/skbuff.c:47: ./include/uapi/linux/sctp.h:395:1: warning: alignment 4 of 'struct sctp_paddr_change' is less than 8 [-Wpacked-not-aligned] } __attribute__((packed, aligned(4))); ^ ./include/uapi/linux/sctp.h:728:1: warning: alignment 4 of 'struct sctp_setpeerprim' is less than 8 [-Wpacked-not-aligned] } __attribute__((packed, aligned(4))); ^ ./include/uapi/linux/sctp.h:727:26: warning: 'sspp_addr' offset 4 in 'struct sctp_setpeerprim' isn't aligned to 8 [-Wpacked-not-aligned] struct sockaddr_storage sspp_addr; ^~~~~~~~~ ./include/uapi/linux/sctp.h:741:1: warning: alignment 4 of 'struct sctp_prim' is less than 8 [-Wpacked-not-aligned] } __attribute__((packed, aligned(4))); ^ ./include/uapi/linux/sctp.h:740:26: warning: 'ssp_addr' offset 4 in 'struct sctp_prim' isn't aligned to 8 [-Wpacked-not-aligned] struct sockaddr_storage ssp_addr; ^~~~~~~~ ./include/uapi/linux/sctp.h:792:1: warning: alignment 4 of 'struct sctp_paddrparams' is less than 8 [-Wpacked-not-aligned] } __attribute__((packed, aligned(4))); ^ ./include/uapi/linux/sctp.h:784:26: warning: 'spp_address' offset 4 in 'struct sctp_paddrparams' isn't aligned to 8 [-Wpacked-not-aligned] struct sockaddr_storage spp_address; ^~~~~~~~~~~ ./include/uapi/linux/sctp.h:905:1: warning: alignment 4 of 'struct sctp_paddrinfo' is less than 8 [-Wpacked-not-aligned] } __attribute__((packed, aligned(4))); ^ ./include/uapi/linux/sctp.h:899:26: warning: 'spinfo_address' offset 4 in 'struct sctp_paddrinfo' isn't aligned to 8 [-Wpacked-not-aligned] struct sockaddr_storage spinfo_address; ^~~~~~~~~~~~~~ This is because the commit 20c9c825b12f ("[SCTP] Fix SCTP socket options to work with 32-bit apps on 64-bit kernels.") added "packed, aligned(4)" GCC attributes to some structures but one of the members, i.e, "struct sockaddr_storage" in those structures has the attribute, "aligned(__alignof__ (struct sockaddr *)" which is 8-byte on 64-bit systems, so the commit overwrites the designed alignments for "sockaddr_storage". To fix this, "struct sockaddr_storage" needs to be aligned to 4-byte as it is only used in those packed sctp structure which is part of UAPI, and "struct __kernel_sockaddr_storage" is used in some other places of UAPI that need not to change alignments in order to not breaking userspace. Use an implicit alignment for "struct __kernel_sockaddr_storage" so it can keep the same alignments as a member in both packed and un-packed structures without breaking UAPI. Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/acpi/scan.c: document why we don't need the device_hotplug_lock memremap: move from kernel/ to mm/ lib/test_meminit.c: use GFP_ATOMIC in RCU critical section asm-generic: fix -Wtype-limits compiler warnings cgroup: kselftest: relax fs_spec checks mm/memory_hotplug.c: remove unneeded return for void function mm/migrate.c: initialize pud_entry in migrate_vma() coredump: split pipe command whitespace before expanding template page flags: prioritize kasan bits over last-cpuid ubsan: build ubsan.c more conservatively kasan: remove clang version check for KASAN_STACK mm: compaction: avoid 100% CPU usage during compaction when a task is killed mm: migrate: fix reference check race between __find_get_block() and migration mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker ocfs2: remove set but not used variable 'last_hash' Revert "kmemleak: allow to coexist with fault injection" kernel/signal.c: fix a kernel-doc markup
2019-08-03asm-generic: fix -Wtype-limits compiler warningsQian Cai
Commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a compilation warning because "rx_frag_size" is an "ushort" while PAGE_SHIFT here is 16. The commit changed the get_order() to be a multi-line macro where compilers insist to check all statements in the macro even when __builtin_constant_p(rx_frag_size) will return false as "rx_frag_size" is a module parameter. In file included from ./arch/powerpc/include/asm/page_64.h:107, from ./arch/powerpc/include/asm/page.h:242, from ./arch/powerpc/include/asm/mmu.h:132, from ./arch/powerpc/include/asm/lppaca.h:47, from ./arch/powerpc/include/asm/paca.h:17, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./arch/powerpc/include/asm/processor.h:39, from ./include/linux/prefetch.h:15, from drivers/net/ethernet/emulex/benet/be_main.c:14: drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create': ./include/asm-generic/getorder.h:54:9: warning: comparison is always true due to limited range of data type [-Wtype-limits] (((n) < (1UL << PAGE_SHIFT)) ? 0 : \ ^ drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion of macro 'get_order' adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE; ^~~~~~~~~ Fix it by moving all of this multi-line macro into a proper function, and killing __get_order() off. [akpm@linux-foundation.org: remove __get_order() altogether] [cai@lca.pw: v2] Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw Fixes: d66acc39c7ce ("bitops: Optimise get_order()") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Howells <dhowells@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: James Y Knight <jyknight@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-03page flags: prioritize kasan bits over last-cpuidArnd Bergmann
ARM64 randdconfig builds regularly run into a build error, especially when NUMA_BALANCING and SPARSEMEM are enabled but not SPARSEMEM_VMEMMAP: #error "KASAN: not enough bits in page flags for tag" The last-cpuid bits are already contitional on the available space, so the result of the calculation is a bit random on whether they were already left out or not. Adding the kasan tag bits before last-cpuid makes it much more likely to end up with a successful build here, and should be reliable for randconfig at least, as long as that does not randomize NR_CPUS or NODES_SHIFT but uses the defaults. In order for the modified check to not trigger in the x86 vdso32 code where all constants are wrong (building with -m32), enclose all the definitions with an #ifdef. [arnd@arndb.de: build fix] Link: http://lkml.kernel.org/r/CAK8P3a3Mno1SWTcuAOT0Wa9VS15pdU6EfnkxLbDpyS55yO04+g@mail.gmail.com Link: http://lkml.kernel.org/r/20190722115520.3743282-1-arnd@arndb.de Link: https://lore.kernel.org/lkml/20190618095347.3850490-1-arnd@arndb.de/ Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-03clk: imx8: Add DSP related clocksDaniel Baluta
i.MX8QXP contains Hifi4 DSP. There are four clocks associated with DSP: * dsp_lpcg_core_clk * dsp_lpcg_ipg_clk * dsp_lpcg_adb_aclk * ocram_lpcg_ipg_clk Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-08-03dt-bindings: imx: Add clock binding doc for i.MX8MNAnson Huang
Add the clock binding doc for i.MX8MN. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Reviewed-by: Maxime Ripard <maxime.ripard@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-08-02Merge tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull more drm fixes from Daniel Vetter: "Dave sends his pull, everyone realizes they've been asleep at the wheel and hits send on their own pulls :-/ Normally I'd just ignore these all because w/e for me and Dave. But this time around the latecomers also included drm-intel-fixes, which failed to send out a -fixes pull thus far for this release (screwed up vacation coverage, despite that 2/3 maintainers were around ... they all look appropriately guilty), and that really is overdue to get landed. And since I had to do a pull request anyway I pulled the other two late ones too. intel fixes (didn't have any ever since the main merge window pull): - gvt fixes (2 cc: stable) - fix gpu reset vs mm-shrinker vs wakeup fun (needed a few patches) - two gem locking fixes (one cc: stable) - pile of misc fixes all over with minor impact, 6 cc: stable, others from this window exynos: - misc minor fixes misc: - some build/Kconfig fixes - regression fix for vm scalability perf test which seems to mostly exercise dmesg/console logging ... - the vgem cache flush fix for arm64 broke the world on x86, so that's reverted again * tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm: (42 commits) Revert "drm/vgem: fix cache synchronization on arm/arm64" drm/exynos: fix missing decrement of retry counter drm/exynos: add CONFIG_MMU dependency drm/exynos: remove redundant assignment to pointer 'node' drm/exynos: using dev_get_drvdata directly drm/bochs: Use shadow buffer for bochs framebuffer console drm/fb-helper: Instanciate shadow FB if configured in device's mode_config drm/fb-helper: Map DRM client buffer only when required drm/client: Support unmapping of DRM client buffers drm/i915: Only recover active engines drm/i915: Add a wakeref getter for iff the wakeref is already active drm/i915: Lift intel_engines_resume() to callers drm/vgem: fix cache synchronization on arm/arm64 drm/i810: Use CONFIG_PREEMPTION drm/bridge: tc358764: Fix build error drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled. drm/i915/gvt: grab runtime pm first for forcewake use drm/i915/gvt: fix incorrect cache entry for guest page mapping drm/i915/gvt: Checking workload's gma earlier ...
2019-08-02Merge tag 'for-linus-5.3a-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a small cleanup - a fix for a build error on ARM with some configs - a fix of a patch for the Xen gntdev driver - three patches for fixing a potential problem in the swiotlb-xen driver which Konrad was fine with me carrying them through the Xen tree * tag 'for-linus-5.3a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/swiotlb: remember having called xen_create_contiguous_region() xen/swiotlb: simplify range_straddles_page_boundary() xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() xen: avoid link error on ARM xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero() xen/pciback: remove set but not used variable 'old_state'
2019-08-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Seven fixes to four drivers with no core changes. The mpt3sas one is theoretical until we get a CPU that goes up to 64 bits physical, the qla2xxx one fixes an oops in a driver initialization error leg and the others are mostly cosmetic" [ The fcoe patches may be worth highlighting - they may be "just" cleanups, but they simplify and fix the odd fc_rport_priv structure handling rules so that the new gcc-9 warnings about memset crossing structure boundaries are gone. The old code was hard for humans to understand too, and really confused the compiler sanity checks - Linus ] * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Fix possible fcport null-pointer dereferences scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA scsi: hpsa: remove printing internal cdb on tag collision scsi: hpsa: correct scsi command status issue after reset scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure scsi: libfc: Whitespace cleanup in libfc.h
2019-08-02Merge tag 'for-linus-20190802' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Here's a small collection of fixes that should go into this series. This contains: - io_uring potential use-after-free fix (Jackie) - loop regression fix (Jan) - O_DIRECT fragmented bio regression fix (Damien) - Mark Denis as the new floppy maintainer (Denis) - ataflop switch fall-through annotation (Gustavo) - libata zpodd overflow fix (Kees) - libata ahci deferred probe fix (Miquel) - nbd invalidation BUG_ON() fix (Munehisa) - dasd endless loop fix (Stefan)" * tag 'for-linus-20190802' of git://git.kernel.dk/linux-block: s390/dasd: fix endless loop after read unit address configuration block: Fix __blkdev_direct_IO() for bio fragments MAINTAINERS: floppy: take over maintainership nbd: replace kill_bdev() with __invalidate_device() again ata: libahci: do not complain in case of deferred probe io_uring: fix KASAN use after free in io_sq_wq_submit_work loop: Fix mount(2) failure due to race with LOOP_SET_FD libata: zpodd: Fix small read overflow in zpodd_get_mech_type() ataflop: Mark expected switch fall-through
2019-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Doug Ledford: "Here's our second -rc pull request. Nothing particularly special in this one. The client removal deadlock fix is kindy tricky, but we had multiple eyes on it and no one could find a fault in it. A couple Spectre V1 fixes too. Otherwise, all just normal -rc fodder: - A couple Spectre V1 fixes (umad, hfi1) - Fix a tricky deadlock in the rdma core code with refcounting instead of locks (client removal patches) - Build errors (hns) - Fix a scheduling while atomic issue (mlx5) - Use after free fix (mad) - Fix error path return code (hns) - Null deref fix (siw_crypto_hash) - A few other misc. minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp() RDMA/mlx5: Release locks during notifier unregister IB/hfi1: Fix Spectre v1 vulnerability IB/mad: Fix use-after-free in ib mad completion handling RDMA/restrack: Track driver QP types in resource tracker IB/mlx5: Fix MR registration flow to use UMR properly RDMA/devices: Remove the lock around remove_client_context RDMA/devices: Do not deadlock during client removal IB/core: Add mitigation for Spectre V1 Do not dereference 'siw_crypto_shash' before checking RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes RDMA/hns: Fix build error
2019-08-02Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A few fixes for code that came in during the merge window or that started getting exercised differently this time around: - Select regmap MMIO kconfig in spreadtrum driver to avoid compile errors - Complete kerneldoc on devm_clk_bulk_get_optional() - Register an essential clk earlier on mediatek mt8183 SoCs so the clocksource driver can use it - Fix divisor math in the at91 driver - Plug a race in Renesas reset control logic" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: renesas: cpg-mssr: Fix reset control race condition clk: sprd: Select REGMAP_MMIO to avoid compile errors clk: mediatek: mt8183: Register 13MHz clock earlier for clocksource clk: Add missing documentation of devm_clk_bulk_get_optional() argument clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
2019-08-02crypto: ccp - Add support for valid authsize values less than 16Gary R Hook
AES GCM encryption allows for authsize values of 4, 8, and 12-16 bytes. Validate the requested authsize, and retain it to save in the request context. Fixes: 36cf515b9bbe2 ("crypto: ccp - Enable support for AES GCM on v5 CCPs") Cc: <stable@vger.kernel.org> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-01treewide: Rename rcu_dereference_raw_notrace() to _check()Joel Fernandes (Google)
The rcu_dereference_raw_notrace() API name is confusing. It is equivalent to rcu_dereference_raw() except that it also does sparse pointer checking. There are only a few users of rcu_dereference_raw_notrace(). This patches renames all of them to be rcu_dereference_raw_check() with the "_check()" indicating sparse checking. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Fix checkpatch warnings about parentheses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01pidfd: add P_PIDFD to waitid()Christian Brauner
This adds the P_PIDFD type to waitid(). One of the last remaining bits for the pidfd api is to make it possible to wait on pidfds. With P_PIDFD added to waitid() the parts of userspace that want to use the pidfd api to exclusively manage processes can do so now. One of the things this will unblock in the future is the ability to make it possible to retrieve the exit status via waitid(P_PIDFD) for non-parent processes if handed a _suitable_ pidfd that has this feature set. This is similar to what you can do on FreeBSD with kqueue(). It might even end up being possible to wait on a process as a non-parent if an appropriate property is enabled on the pidfd. With P_PIDFD no scoping of the process identified by the pidfd is possible, i.e. it explicitly blocks things such as wait4(-1), wait4(0), waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics equivalent to wait4(pid), waitid(P_PID). Users that need scoping should rely on pid-based wait*() syscalls for now. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20190727222229.6516-2-christian@brauner.io
2019-08-01posix-timers: Move rcu_head out of it unionSebastian Andrzej Siewior
Timer deletion on PREEMPT_RT is prone to priority inversion and live locks. The hrtimer code has a synchronization mechanism for this. Posix CPU timers will grow one. But that mechanism cannot be invoked while holding the k_itimer lock because that can deadlock against the running timer callback. So the lock must be dropped which allows the timer to be freed. The timer free can be prevented by taking RCU readlock before dropping the lock, but because the rcu_head is part of the 'it' union a concurrent free will overwrite the hrtimer on which the task is trying to synchronize. Move the rcu_head out of the union to prevent this. [ tglx: Fixed up kernel-doc. Rewrote changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.965541887@linutronix.de
2019-08-01timers: Prepare support for PREEMPT_RTAnna-Maria Gleixner
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de
2019-08-01hrtimer: Prepare support for PREEMPT_RTAnna-Maria Gleixner
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.737767218@linutronix.de
2019-08-01hrtimer: Make enqueue mode check work on RTThomas Gleixner
hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer which is marker for softirq expiry is not queued in the hard interrupt base and vice versa. When PREEMPT_RT is enabled, timers which are not explicitely marked to expire in hard interrupt context are deferrred to the soft interrupt. So the regular check would trigger. Change the check, so when PREEMPT_RT is enabled, it is verified that the timers marked for hard interrupt expiry are not tried to be queued for soft interrupt expiry or any of the unmarked and softirq marked is tried to be expired in hard interrupt context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-01RDMA/devices: Remove the lock around remove_client_contextJason Gunthorpe
Due to the complexity of client->remove() callbacks it is desirable to not hold any locks while calling them. Remove the last one by tracking only the highest client ID and running backwards from there over the xarray. Since the only purpose of that lock was to protect the linked list, we can drop the lock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01RDMA/devices: Do not deadlock during client removalJason Gunthorpe
lockdep reports: WARNING: possible circular locking dependency detected modprobe/302 is trying to acquire lock: 0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990 but task is already holding lock: 000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&device->client_data_rwsem){++++}: down_read+0x3f/0x160 ib_get_net_dev_by_params+0xd5/0x200 [ib_core] cma_ib_req_handler+0x5f6/0x2090 [rdma_cm] cm_process_work+0x29/0x110 [ib_cm] cm_req_handler+0x10f5/0x1c00 [ib_cm] cm_work_handler+0x54c/0x311d [ib_cm] process_one_work+0x4aa/0xa30 worker_thread+0x62/0x5b0 kthread+0x1ca/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(&(&work->work)->work)){+.+.}: process_one_work+0x45f/0xa30 worker_thread+0x62/0x5b0 kthread+0x1ca/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)ib_cm){+.+.}: lock_acquire+0xc8/0x1d0 flush_workqueue+0x102/0x990 cm_remove_one+0x30e/0x3c0 [ib_cm] remove_client_context+0x94/0xd0 [ib_core] disable_device+0x10a/0x1f0 [ib_core] __ib_unregister_device+0x5a/0xe0 [ib_core] ib_unregister_device+0x21/0x30 [ib_core] mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib] __mlx5_ib_remove+0x3d/0x70 [mlx5_ib] mlx5_ib_remove+0x12e/0x140 [mlx5_ib] mlx5_remove_device+0x144/0x150 [mlx5_core] mlx5_unregister_interface+0x3f/0xf0 [mlx5_core] mlx5_ib_cleanup+0x10/0x3a [mlx5_ib] __x64_sys_delete_module+0x227/0x350 do_syscall_64+0xc3/0x6a4 entry_SYSCALL_64_after_hwframe+0x49/0xbe Which is due to the read side of the client_data_rwsem being obtained recursively through a work queue flush during cm client removal. The lock is being held across the remove in remove_client_context() so that the function is a fence, once it returns the client is removed. This is required so that the two callers do not proceed with destruction until the client completes removal. Instead of using client_data_rwsem use the existing device unregistration refcount and add a similar client unregistration (client->uses) refcount. This will fence the two unregistration paths without holding any locks. Cc: <stable@vger.kernel.org> Fixes: 921eab1143aa ("RDMA/devices: Re-organize device.c locking") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01hrtimer: Introduce HARD expiry modeSebastian Andrzej Siewior
On PREEMPT_RT not all hrtimers can be expired in hard interrupt context even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer most hrtimers' expiry into soft interrupt context. But there are hrtimers which must be expired in hard interrupt context even when PREEMPT_RT is enabled: - hrtimers which must expiry in hard interrupt context, e.g. scheduler, perf, watchdog related hrtimers - latency critical hrtimers, e.g. nanosleep, ..., kvm lapic timer Add a new mode flag HRTIMER_MODE_HARD which allows to mark these timers so PREEMPT_RT will not move them into softirq expiry mode. [ tglx: Split out of a larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185752.981398465@linutronix.de