summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-06block: move 'q_usage_counter' into front of 'request_queue'Ming Lei
The field of 'q_usage_counter' is always fetched in fast path of every block driver, and move it into front of 'request_queue', so it can be fetched into 1st cacheline of 'request_queue' instance. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06percpu_ref: reduce memory footprint of percpu_ref in fast pathMing Lei
'struct percpu_ref' is often embedded into one user structure, and the instance is usually referenced in fast path, however actually only 'percpu_count_ptr' is needed in fast path. So move other fields into one new structure of 'percpu_ref_data', and allocate it dynamically via kzalloc(), then memory footprint of 'percpu_ref' in fast path is reduced a lot and becomes suitable to put into hot cacheline of user structure. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06Merge branch 'ethtool-allow-dumping-policies-to-user-space'David S. Miller
Jakub Kicinski says: ==================== ethtool: allow dumping policies to user space This series wires up ethtool policies to ops, so they can be dumped to user space for feature discovery. First patch wires up GET commands, and second patch wires up SETs. The policy tables are trimmed to save space and LoC. Next - take care of linking up nested policies for the header (which is the policy what we actually care about). And once header policy is linked make sure that attribute range validation for flags is done by policy, not a conditions in the code. New type of policy is needed to validate masks (patch 6). Netlink as always staying a step ahead of all the other kernel API interfaces :) v2: - merge patches 1 & 2 -> 1 - add patch 3 & 5 - remove .max_attr from struct ethnl_request_ops ==================== Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: specify which header flags are supported per commandJakub Kicinski
Perform header flags validation through the policy. Only pause command supports ETHTOOL_FLAG_STATS. Create a separate policy to be able to express that in policy dumps to user space. Note that even though the core will validate the header policy, it cannot record multiple layers of attributes and we have to re-parse header sub-attrs. When doing so we could skip attribute validation, or use most permissive policy. Opt for the former. We will no longer return the extack cookie for flags but since we only added first new flag in this release it's not expected that any user space had a chance to make use of it. v2: - remove the re-validation in ethnl_parse_header_dev_get() Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06netlink: add mask validationJakub Kicinski
We don't have good validation policy for existing unsigned int attrs which serve as flags (for new ones we could use NLA_BITFIELD32). With increased use of policy dumping having the validation be expressed as part of the policy is important. Add validation policy in form of a mask of supported/valid bits. Support u64 in the uAPI to be future-proof, but really for now the embedded mask member can only hold 32 bits, so anything with bit 32+ set will always fail validation. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06netlink: create helpers for checking type is an intJakub Kicinski
There's a number of policies which check if type is a uint or sint. Factor the checking against the list of value sizes to a helper for easier reuse. v2: - new patch Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: link up ethnl_header_policy as a nested policyJakub Kicinski
To get the most out of parsing by the core, and to allow dumping full policies we need to specify which policy applies to nested attrs. For headers it's ethnl_header_policy. $ sed -i 's@\(ETHTOOL_A_.*HEADER\].*=\) { .type = NLA_NESTED },@\1\n\t\tNLA_POLICY_NESTED(ethnl_header_policy),@' net/ethtool/* Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: trim policy tablesJakub Kicinski
Since ethtool uses strict attribute validation there's no need to initialize all attributes in policy tables. 0 is NLA_UNSPEC which is going to be rejected. Remove the NLA_REJECTs. Similarly attributes above maxattrs are rejected, so there's no need to always size the policy tables to ETHTOOL_A_..._MAX. v2: - new patch Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: wire up set policies to opsJakub Kicinski
Similarly to get commands wire up the policies of set commands to get parsing by the core and policy dumps. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: wire up get policies to opsJakub Kicinski
Wire up policies for get commands in struct nla_policy of the ethtool family. Make use of genetlink code attr validation and parsing, as well as allow dumping policies to user space. For every ETHTOOL_MSG_*_GET: - add 'ethnl_' prefix to policy name - add extern declaration in net/ethtool/netlink.h - wire up the policy & attr in ethtool_genl_ops[]. - remove .request_policy and .max_attr from ethnl_request_ops. Obviously core only records the first "layer" of parsed attrs so we still need to parse the sub-attrs of the nested header attribute. v2: - merge of patches 1 and 2 from v1 - remove stray empty lines in ops - also remove .max_attr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06Merge branch 'drivers-net-add-sw_netstats_rx_add-helper'David S. Miller
Fabian Frederick says: ==================== drivers/net: add sw_netstats_rx_add helper This small patchset creates netstats addition dev_sw_netstats_rx_add() based on dev_lstats_add() and replaces some open coding in both drivers/net and net branches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ipv4: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: openvswitch: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06xfrm: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ipv6: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06gtp: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06bareudp: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06geneve: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06vxlan: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: netdevice.h: sw_netstats_rx_add helperFabian Frederick
some drivers/network protocols update rx bytes/packets under u64_stats_update_begin/end sequence. Add a specific helper like dev_lstats_add() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06Merge tag 'rxrpc-fixes-20201005' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Miscellaneous fixes Here are some miscellaneous rxrpc fixes: (1) Fix the xdr encoding of the contents read from an rxrpc key. (2) Fix a BUG() for a unsupported encoding type. (3) Fix missing _bh lock annotations. (4) Fix acceptance handling for an incoming call where the incoming call is encrypted. (5) The server token keyring isn't network namespaced - it belongs to the server, so there's no need. Namespacing it means that request_key() fails to find it. (6) Fix a leak of the server keyring. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06perf/x86: Fix n_metric for cancelled txnPeter Zijlstra
When a group that has TopDown members is failed to be scheduled, any later TopDown groups will not return valid values. Here is an example. A background perf that occupies all the GP counters and the fixed counter 1. $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles, cycles,cycles}:D" -a A user monitors a TopDown group. It works well, because the fixed counter 3 and the PERF_METRICS are available. $perf stat -x, --topdown -- ./workload retiring,bad speculation,frontend bound,backend bound, 18.0,16.1,40.4,25.5, Then the user tries to monitor a group that has TopDown members. Because of the cycles event, the group is failed to be scheduled. $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound, topdown-fe-bound,topdown-bad-spec,cycles}' -- ./workload <not counted>,,slots,0,0.00,, <not counted>,,topdown-retiring,0,0.00,, <not counted>,,topdown-be-bound,0,0.00,, <not counted>,,topdown-fe-bound,0,0.00,, <not counted>,,topdown-bad-spec,0,0.00,, <not counted>,,cycles,0,0.00,, The user tries to monitor a TopDown group again. It doesn't work anymore. $perf stat -x, --topdown -- ./workload ,,,,, In a txn, cancel_txn() is to truncate the event_list for a canceled group and update the number of events added in this transaction. However, the number of TopDown events added in this transaction is not updated. The kernel will probably fail to add new Topdown events. Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics") Reported-by: Andi Kleen <ak@linux.intel.com> Reported-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20201005082611.GH2628@hirez.programming.kicks-ass.net
2020-10-06perf/x86: Fix n_pair for cancelled txnPeter Zijlstra
Kan reported that n_metric gets corrupted for cancelled transactions; a similar issue exists for n_pair for AMD's Large Increment thing. The problem was confirmed and confirmed fixed by Kim using: sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 & # should succeed: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload # should fail: sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload # previously failed, now succeeds with this patch: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") Reported-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kim Phillips <kim.phillips@amd.com> Link: https://lkml.kernel.org/r/20201005082516.GG2628@hirez.programming.kicks-ass.net
2020-10-06Merge branch 'net-atlantic-phy-tunables-from-mac-driver'David S. Miller
Igor Russkikh says: ==================== net: atlantic: phy tunables from mac driver This series implements phy tunables settings via MAC driver callbacks. AQC 10G devices use integrated MAC+PHY solution, where PHY is fully controlled by MAC firmware. Therefore, it is not possible to implement separate phy driver for these. We use ethtool ops callbacks to implement downshift and EDPC tunables. v3: fixed flaw in EDPD logic, from Andrew v2: comments from Andrew ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: atlantic: implement media detect feature via phy tunablesIgor Russkikh
Mediadetect is another name for the EDPD (energy detect power down). This feature allows device to save extra power when no link is available. PHY goes into the extreme power saving mode and only periodically wakes up and checks for the link. AQC devices has fixed check period of 6 seconds The feature may increase linkup time. Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: atlantic: implement phy downshift featureIgor Russkikh
PHY downshift allows phy to try renegotiate if link is unstable and can carry higher speed. AQC devices has integrated PHY which is controlled by MAC firmware. Thus, driver defines new ethtool callbacks to implement phy tunables via netdev. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: allow netdev driver to define phy tunablesIgor Russkikh
Define get/set phy tunable callbacks in ethtool ops. This will allow MAC drivers with integrated PHY still to implement these tunables. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: always dump full packets with skb_dumpVladimir Oltean
Currently skb_dump has a restriction to only dump full packet for the first 5 socket buffers, then only headers will be printed. Remove this arbitrary and confusing restriction, which is only documented vaguely ("up to") in the comments above the prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06tcp: fix receive window update in tcp_add_backlog()Eric Dumazet
We got reports from GKE customers flows being reset by netfilter conntrack unless nf_conntrack_tcp_be_liberal is set to 1. Traces seemed to suggest ACK packet being dropped by the packet capture, or more likely that ACK were received in the wrong order. wscale=7, SYN and SYNACK not shown here. This ACK allows the sender to send 1871*128 bytes from seq 51359321 : New right edge of the window -> 51359321+1871*128=51598809 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 Receiver now allows to send 606*128=77568 from seq 51521241 : New right edge of the window -> 51521241+606*128=51598809 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 It seems the sender exceeds RWIN allowance, since 51611353 > 51598809 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0 netfilter conntrack is not happy and sends RST 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0 Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. New right edge of the window -> 51521241+859*128=51631193 Normally TCP stack handles OOO packets just fine, but it turns out tcp_add_backlog() does not. It can update the window field of the aggregated packet even if the ACK sequence of the last received packet is too old. Many thanks to Alexandre Ferrieux for independently reporting the issue and suggesting a fix. Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ASoC: mediatek: mt8183-da7219: fix wrong ops for I2S3Tzung-Bi Shih
DA7219 uses I2S2 and I2S3 for input and output respectively. Commit 9e30251fb22e ("ASoC: mediatek: mt8183-da7219: support machine driver with rt1015") introduces a bug that: - If using I2S2 solely, MCLK to DA7219 is 256FS. - If using I2S3 solely, MCLK to DA7219 is 128FS. - If using I2S3 first and then I2S2, the MCLK changes from 128FS to 256FS. As a result, no sound output to the headset. Also no sound input from the headset microphone. Both I2S2 and I2S3 should set MCLK to 256FS. Fixes the wrong ops for I2S3. Fixes: 9e30251fb22e ("ASoC: mediatek: mt8183-da7219: support machine driver with rt1015") Signed-off-by: Tzung-Bi Shih <tzungbi@google.com> Link: https://lore.kernel.org/r/20201006101252.1890385-1-tzungbi@google.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06net: usb: rtl8150: set random MAC address when set_ethernet_addr() failsAnant Thazhemadam
When get_registers() fails in set_ethernet_addr(),the uninitialized value of node_id gets copied over as the address. So, check the return value of get_registers(). If get_registers() executed successfully (i.e., it returns sizeof(node_id)), copy over the MAC address using ether_addr_copy() (instead of using memcpy()). Else, if get_registers() failed instead, a randomly generated MAC address is set as the MAC address instead. Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Acked-by: Petko Manolov <petkan@nucleusys.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06mptcp: don't skip needed ackPaolo Abeni
Currently we skip calling tcp_cleanup_rbuf() when packets are moved into the OoO queue or simply dropped. In both cases we still increment tp->copied_seq, and we should ask the TCP stack to check for ack. Fixes: c76c6956566f ("mptcp: call tcp_cleanup_rbuf on subflows") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06mptcp: more DATA FIN fixesPaolo Abeni
Currently data fin on data packet are not handled properly: the 'rcv_data_fin_seq' field is interpreted as the last sequence number carrying a valid data, but for data fin packet with valid maps we currently store map_seq + map_len, that is, the next value. The 'write_seq' fields carries instead the value subseguent to the last valid byte, so in mptcp_write_data_fin() we never detect correctly the last DSS map. Fixes: 7279da6145bb ("mptcp: Use MPTCP-level flag for sending DATA_FIN") Fixes: 1a49b2c2a501 ("mptcp: Handle incoming 32-bit DATA_FIN values") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06Merge branch 'Fix-tail-dropping-watermarks-for-Ocelot-switches'David S. Miller
Vladimir Oltean says: ==================== Fix tail dropping watermarks for Ocelot switches This series adds a missing division by 60, and a warning to prevent that in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: mscc: ocelot: warn when encoding an out-of-bounds watermark valueVladimir Oltean
There is an upper bound to the value that a watermark may hold. That upper bound is not immediately obvious during configuration, and it might be possible to have accidental truncation. Actually this has happened already, add a warning to prevent it from happening again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOPVladimir Oltean
Tail dropping is enabled for a port when: 1. A source port consumes more packet buffers than the watermark encoded in SYS:PORT:ATOP_CFG.ATOP. AND 2. Total memory use exceeds the consumption watermark encoded in SYS:PAUSE_CFG:ATOP_TOT_CFG. The unit of these watermarks is a 60 byte memory cell. That unit is programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when written into ATOP, it would get truncated and wrap around. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()Manivannan Sadhasivam
The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence, fix it by excluding the locking for kernel_sendmsg(). While at it, let's also use radix_tree_deref_retry() to confirm the validity of the pointer returned by radix_tree_deref_slot() and use radix_tree_iter_resume() to resume iterating the tree properly before releasing the lock as suggested by Doug. Fixes: a7809ff90ce6 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks") Reported-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Alex Elder <elder@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()Rafael J. Wysocki
It turns out that in some cases there are EC events to flush in acpi_ec_dispatch_gpe() even though the ec_no_wakeup kernel parameter is set and the EC GPE is disabled while sleeping, so drop the ec_no_wakeup check that prevents those events from being processed from acpi_ec_dispatch_gpe(). Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06ACPI: EC: PM: Flush EC work unconditionally after wakeupRafael J. Wysocki
Commit 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive") has been reported to cause some power button wakeup events to be missed on some systems, so modify acpi_ec_dispatch_gpe() to call acpi_ec_flush_work() unconditionally to effectively reverse the changes made by that commit. Also note that the problem which prompted commit 607b9df63057 is not reproducible any more on the affected machine. Fixes: 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive") Reported-by: Raymond Tan <raymond.tan@intel.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06Merge branch 'irq/qcom-pdc-wakeup' into irq/irqchip-nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-06Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.10-rc1 from Viresh Kumar: "- STI cpufreq driver updates to allow new hardware (Alain Volmat). - Minor tegra driver fixes around initial frequency mismatch warnings (Jon Hunter). - dev_err simplification for s5pv210 driver (Krzysztof Kozlowski). - Qcom driver updates to allow new hardware and minor cleanup (Manivannan Sadhasivam and Matthias Kaehlcke). - Add missing MODULE_DEVICE_TABLE for armada driver (Pali Rohár). - Improved defer-probe handling in cpufreq-dt driver (Stephan Gerhold). - Call dev_pm_opp_of_remove_table() unconditionally for imx driver (Viresh Kumar)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom: Don't add frequencies without an OPP cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC cpufreq: qcom-hw: Use of_device_get_match_data for offsets and row size cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code dt-bindings: cpufreq: cpufreq-qcom-hw: Document Qcom EPSS compatible cpufreq: qcom-hw: Make use of cpufreq driver_data for passing pdev cpufreq: armada-37xx: Add missing MODULE_DEVICE_TABLE cpufreq: arm: Kconfig: add CPUFREQ_DT depend for STI CPUFREQ cpufreq: dt-platdev: Blacklist st,stih418 SoC cpufreq: sti-cpufreq: add stih418 support cpufreq: s5pv210: Use dev_err instead of pr_err in probe cpufreq: s5pv210: Simplify with dev_err_probe() cpufreq: tegra186: Fix initial frequency cpufreq: dt: Refactor initialization to handle probe deferral properly opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1() cpufreq: imx6q: Unconditionally call dev_pm_opp_of_remove_table() opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER
2020-10-06irqchip/qcom-pdc: Reset PDC interrupts during initMaulik Shah
Kexec can directly boot into a new kernel without going to complete reboot. This can leave the previous kernel's configuration for PDC interrupts as is. Clear previous kernel's configuration during init by setting interrupts in enable bank to zero. The IRQs specified in qcom,pdc-ranges property are the only ones that can be used by the new kernel so clear only those IRQs. The remaining ones may be in use by a different kernel and should not be set by new kernel. Suggested-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-7-git-send-email-mkshah@codeaurora.org
2020-10-06irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the wakeirqs during suspend entry. Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-6-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the wakeirqs during suspend entry. Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-5-git-send-email-mkshah@codeaurora.org
2020-10-06genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
An interrupt that is disabled/masked but set for wakeup may still need to be able to wake up the system from sleep states like "suspend to RAM". To that effect, introduce the IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag. If the irqchip have this flag set, the irq PM code will enable/unmask the irqs that are marked for wakeup, but that are in a disabled state. On resume, such irqs will be restored back to their disabled state. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> [maz: commit message fix-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/1601267524-20199-4-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Use return value from irq_set_wake() callMaulik Shah
msmgpio irqchip was not using return value of irq_set_irq_wake() callback since previously GIC-v3 irqchip neither had IRQCHIP_SKIP_SET_WAKE flag nor it implemented .irq_set_wake callback. This lead to irq_set_irq_wake() return error -ENXIO. However from 'commit 4110b5cbb014 ("irqchip/gic-v3: Allow interrupt to be configured as wake-up sources")' GIC irqchip has IRQCHIP_SKIP_SET_WAKE flag. Use return value from irq_set_irq_wake() and irq_chip_set_wake_parent() instead of always returning success. Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-3-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flagsMaulik Shah
Both IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags are already set for msmgpio's parent PDC irqchip but GPIO interrupts do not get masked during suspend or during setting irq type since genirq checks irqchip flag of msmgpio irqchip which forwards these calls to its parent PDC irqchip. Add irqchip specific flags for msmgpio irqchip to mask non wakeirqs during suspend and mask before setting irq type. Masking before changing type make sures any spurious interrupt is not detected during this operation. Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-2-git-send-email-mkshah@codeaurora.org
2020-10-06PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPILukas Wunner
Recent laptops with dual AMD GPUs fail to suspend the discrete GPU, thus causing lockups on system sleep and high power consumption at runtime. The discrete GPU would normally be suspended to D3cold by turning off ACPI _PR3 Power Resources of the Root Port above the GPU. However on affected systems, the Root Port is hotplug-capable and pci_bridge_d3_possible() only allows hotplug ports to go to D3 if they belong to a Thunderbolt device or if the Root Port possesses a "HotPlugSupportInD3" ACPI property. Neither is the case on affected laptops. The reason for whitelisting only specific, known to work hotplug ports for D3 is that there have been reports of SkyLake Xeon-SP systems raising Hardware Error NMIs upon suspending their hotplug ports: https://lore.kernel.org/linux-pci/20170503180426.GA4058@otc-nc-03/ But if a hotplug port is power manageable by ACPI (as can be detected through presence of Power Resources and corresponding _PS0 and _PS3 methods) then it ought to be safe to suspend it to D3. To this end, amend acpi_pci_bridge_d3() to whitelist such ports for D3. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1222 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1252 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1304 Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com> Reported-and-tested-by: matoro <matoro@airmail.cc> Reported-by: Aaron Zakhrov <aaron.zakhrov@gmail.com> Reported-by: Michal Rostecki <mrostecki@suse.com> Reported-by: Shai Coleman <git@shaicoleman.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06x86/copy_mc: Introduce copy_mc_enhanced_fast_string()Dan Williams
The motivations to go rework memcpy_mcsafe() are that the benefit of doing slow and careful copies is obviated on newer CPUs, and that the current opt-in list of CPUs to instrument recovery is broken relative to those CPUs. There is no need to keep an opt-in list up to date on an ongoing basis if pmem/dax operations are instrumented for recovery by default. With recovery enabled by default the old "mcsafe_key" opt-in to careful copying can be made a "fragile" opt-out. Where the "fragile" list takes steps to not consume poison across cachelines. The discussion with Linus made clear that the current "_mcsafe" suffix was imprecise to a fault. The operations that are needed by pmem/dax are to copy from a source address that might throw #MC to a destination that may write-fault, if it is a user page. So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate the separate precautions taken on source and destination. copy_mc_to_kernel() is introduced as a non-SMAP version that does not expect write-faults on the destination, but is still prepared to abort with an error code upon taking #MC. The original copy_mc_fragile() implementation had negative performance implications since it did not use the fast-string instruction sequence to perform copies. For this reason copy_mc_to_kernel() fell back to plain memcpy() to preserve performance on platforms that did not indicate the capability to recover from machine check exceptions. However, that capability detection was not architectural and now that some platforms can recover from fast-string consumption of memory errors the memcpy() fallback now causes these more capable platforms to fail. Introduce copy_mc_enhanced_fast_string() as the fast default implementation of copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'. With this in place, copy_mc_to_kernel() is fast and recovery-ready by default regardless of hardware capability. Thanks to Vivek for identifying that copy_user_generic() is not suitable as the copy_mc_to_user() backend since the #MC handler explicitly checks ex_has_fault_handler(). Thanks to the 0day robot for catching a performance bug in the x86/copy_mc_to_user implementation. [ bp: Add the "why" for this change from the 0/2th message, massage. ] Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()") Reported-by: Erwin Tsaur <erwin.tsaur@intel.com> Reported-by: 0day robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Erwin Tsaur <erwin.tsaur@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com