summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-06hwmon: (pmbus/max20730) adjust the vout reading given voltage dividerChu Lin
Problem: We use voltage dividers so that the voltage presented at the voltage sense pins is confusing. We might need to convert these readings to more meaningful readings given the voltage divider. Solution: Read the voltage divider resistance from dts and convert the voltage reading to a more meaningful reading. Testing: max20730 with voltage divider Signed-off-by: Chu Lin <linchuyuan@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20201004031445.2321090-3-linchuyuan@google.com [groeck: Return -EINVAL instead of -ENODEV on bad deevicetree data] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06dt-bindings: hwmon: max20730: adding device tree doc for max20730Chu Lin
max20730 Integrated, Step-Down Switching Regulator with PMBus Signed-off-by: Chu Lin <linchuyuan@google.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201004031445.2321090-2-linchuyuan@google.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controllerRahul Tanwar
PVT controller (MR75203) is used to configure & control Moortec embedded analog IP which contains temprature sensor(TS), voltage monitor(VM) & process detector(PD) modules. Add hardware monitoring driver to support MR75203 PVT controller. Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Link: https://lore.kernel.org/r/05b59cd860d2a1aa0a68ab300829efe709645184.1601889876.git.rahul.tanwar@linux.intel.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06hwmon: Add DT bindings schema for PVT controllerRahul Tanwar
PVT controller (MR75203) is used to configure & control Moortec embedded analog IP which contains temprature sensor(TS), voltage monitor(VM) & process detector(PD) modules. Add DT bindings schema for PVT controller. Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/b540b49ca47d75c5f716f8a4e4eed0664a1116bf.1601889876.git.rahul.tanwar@linux.intel.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06block: Consider only dispatched requests for inflight statisticGabriel Krisman Bertazi
According to Documentation/block/stat.rst, inflight should not include I/O requests that are in the queue but not yet dispatched to the device, but blk-mq identifies as inflight any request that has a tag allocated, which, for queues without elevator, happens at request allocation time and before it is queued in the ctx (default case in blk_mq_submit_bio). In addition, current behavior is different for queues with elevator from queues without it, since for the former the driver tag is allocated at dispatch time. A more precise approach would be to only consider requests with state MQ_RQ_IN_FLIGHT. This effectively reverts commit 6131837b1de6 ("blk-mq: count allocated but not started requests in iostats inflight") to consolidate blk-mq behavior with itself (elevator case) and with original documentation, but it differs from the behavior used by the legacy path. This version differs from v1 by using blk_mq_rq_state to access the state attribute. Avoid using blk_mq_request_started, which was suggested, since we don't want to include MQ_RQ_COMPLETE. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06dt-bindings: hwmon: Add the +vs supply to the lm75 bindingsAlban Bedel
Some boards might have a regulator that control the +VS supply, add it to the bindings. Signed-off-by: Alban Bedel <alban.bedel@aerq.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201001145738.17326-3-alban.bedel@aerq.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06dt-bindings: hwmon: Convert lm75 bindings to yamlAlban Bedel
In order to automate the verification of DT nodes convert lm75.txt to lm75.yaml. Signed-off-by: Alban Bedel <alban.bedel@aerq.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201001145738.17326-2-alban.bedel@aerq.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-10-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix a kernel panic in the AES crypto code caused by a BR tail call not matching the target BTI instruction (when branch target identification is enabled)" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: crypto: arm64: Use x16 with indirect branch to bti_c
2020-10-06Merge tag 'platform-drivers-x86-v5.9-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull another x86 platform driver fix from Hans de Goede: "One final pdx86 fix for Tablet Mode reporting regressions (which make the keyboard and touchpad unusable) on various Asus notebooks. These regressions were caused by the asus-nb-wmi and the intel-vbtn drivers both receiving recent patches to start reporting Tablet Mode / to report it on more models. Due to a miscommunication between Andy and me, Andy's earlier pull-req only contained the fix for the intel-vbtn driver and not the fix for the asus-nb-wmi code. This fix has been tested as a downstream patch in Fedora kernels for approx two weeks with no problems being reported" * tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models
2020-10-06Merge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Daniel queued these up last week and I took a long weekend so didn't get them out, but fixing the OOB access on get font seems like something we should land and it's cc'ed stable as well. The other big change is a partial revert for a regression on android on the clcd fbdev driver, and one other docs fix. fbdev: - Re-add FB_ARMCLCD for android - Fix global-out-of-bounds read in fbcon_get_font() core: - Small doc fix" * tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm: drm: drm_dsc.h: fix a kernel-doc markup Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver" fbcon: Fix global-out-of-bounds read in fbcon_get_font() Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
2020-10-06usermodehelper: reset umask to default before executing user processLinus Torvalds
Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd). It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete. [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ] However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional. Just reset umask to the original default (0022) before actually starting the usermode helper program. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06splice: teach splice pipe reading about empty pipe buffersLinus Torvalds
Tetsuo Handa reports that splice() can return 0 before the real EOF, if the data in the splice source pipe is an empty pipe buffer. That empty pipe buffer case doesn't happen in any normal situation, but you can trigger it by doing a write to a pipe that fails due to a page fault. Tetsuo has a test-case to show the behavior: #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char *argv[]) { const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600); int pipe_fd[2] = { -1, -1 }; pipe(pipe_fd); write(pipe_fd[1], NULL, 4096); /* This splice() should wait unless interrupted. */ return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0); } which results in write(5, NULL, 4096) = -1 EFAULT (Bad address) splice(4, NULL, 3, NULL, 65536, 0) = 0 and this can confuse splice() users into believing they have hit EOF prematurely. The issue was introduced when the pipe write code started pre-allocating the pipe buffers before copying data from user space. This is modified verion of Tetsuo's original patch. Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/ Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06crypto: arm64: Use x16 with indirect branch to bti_cJeremy Linton
The AES code uses a 'br x7' as part of a function called by a macro. That branch needs a bti_j as a target. This results in a panic as seen below. Using x16 (or x17) with an indirect branch keeps the target bti_c. Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1 pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-) pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs] sp : ffff80001052b730 aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] __xts_crypt+0xb0/0x2dc [aes_neon_bs] xts_encrypt+0x28/0x3c [aes_neon_bs] crypto_skcipher_encrypt+0x50/0x84 simd_skcipher_encrypt+0xc8/0xe0 crypto_skcipher_encrypt+0x50/0x84 test_skcipher_vec_cfg+0x224/0x5f0 test_skcipher+0xbc/0x120 alg_test_skcipher+0xa0/0x1b0 alg_test+0x3dc/0x47c cryptomgr_test+0x38/0x60 Fixes: 0e89640b640d ("crypto: arm64 - Use modern annotations for assembly functions") Cc: <stable@vger.kernel.org> # 5.6.x- Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Suggested-by: Dave P Martin <Dave.Martin@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-10-06spi: spi-mtk-nor: Add power management supportIkjoon Jang
This patch adds dev_pm_ops to mtk-nor to support suspend/resume, auto suspend delay is set to -1 by default. Accessing registers are only permitted after its clock is enabled to deal with unknown state of operating clk at probe time. Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Link: https://lore.kernel.org/r/20201006155010.v5.4.I68983b582d949a91866163bab588ff3c2a0d0275@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06spi: spi-mtk-nor: support 36bit dma addressingIkjoon Jang
This patch enables 36bit dma address support to spi-mtk-nor. Currently this is enabled only for mt8192-nor. Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Link: https://lore.kernel.org/r/20201006155010.v5.3.Id1cb208392928afc7ceed4de06924243c7858cd0@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06spi: spi-mtk-nor: use dma_alloc_coherent() for bounce bufferIkjoon Jang
Use dma_alloc_coherent() for bounce buffer instead of kmalloc() to make sure the bounce buffer to be allocated within its DMAable range. Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Link: https://lore.kernel.org/r/20201006155010.v5.2.I06cb65401ab5ad63ea30c4788d26633928d80f38@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06dt-bindings: spi: add mt8192-nor compatible stringIkjoon Jang
Add MT8192 spi-nor controller support. Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20201006155010.v5.1.I4cd089ef1fe576535c6b6e4f1778eaab1c4441cf@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06scsi: megaraid_sas: Added support for shared host tagset for cpuhotplugKashyap Desai
Fusion adapters can steer completions to individual queues, and we now have support for shared host-wide tags. So we can enable multiqueue support for fusion adapters. Once driver enable shared host-wide tags, cpu hotplug feature is also supported as it was enabled using below patchsets - commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline") Currently driver has provision to disable host-wide tags using "host_tagset_enable" module parameter. Once we do not have any major performance regression using host-wide tags, we will drop the hand-crafted interrupt affinity settings. Performance is also meeting the expecatation - (used both none and mq-deadline scheduler) 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get 3.1M IOPs. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: scsi_debug: Support host tagsetJohn Garry
When host_max_queue is set (> 0), set the Scsi_Host.host_tagset such that blk-mq will use a hostwide tagset over all SCSI host submission queues. This means that we may expose all submission queues and always use the hwq chosen by blk-mq. And since if sdebug_host_max_queue is set, sdebug_max_queue is fixed to the same value, we can simplify how sdebug_driver_template.can_queue is set. Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: hisi_sas: Switch v3 hw to MQJohn Garry
Now that the block layer provides a shared tag, we can switch the driver to expose all HW queues. Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: core: Show nr_hw_queues in sysfsJohn Garry
So that we don't use a value of 0 for when Scsi_Host.nr_hw_queues is unset, use the tag_set->nr_hw_queues (which holds 1 for this case). Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: Add host and host template flag 'host_tagset'Hannes Reinecke
Add Host and host template flag 'host_tagset' so hostwide tagset can be shared on multiple reply queues after the SCSI device's reply queue is converted to blk-mq hw queue. [jpg: Update comment on .can_queue and add Scsi_Host.host_tagset] Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06objtool: Allow nested externs to enable BUILD_BUG()Vasily Gorbik
Currently BUILD_BUG() macro is expanded to smth like the following: do { extern void __compiletime_assert_0(void) __attribute__((error("BUILD_BUG failed"))); if (!(!(1))) __compiletime_assert_0(); } while (0); If used in a function body this obviously would produce build errors with -Wnested-externs and -Werror. Build objtool with -Wno-nested-externs to enable BUILD_BUG() usage. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-10-06block: move blk_mq_sched_try_merge to blk-merge.cChristoph Hellwig
Move blk_mq_sched_try_merge to blk-merge.c, which allows to mark a lot of the merge infrastructure static there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06block: remove the unused blk_integrity_merge_bio exportChristoph Hellwig
Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06block: remove the unused blk_integrity_merge_rq exportChristoph Hellwig
Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06block: move 'q_usage_counter' into front of 'request_queue'Ming Lei
The field of 'q_usage_counter' is always fetched in fast path of every block driver, and move it into front of 'request_queue', so it can be fetched into 1st cacheline of 'request_queue' instance. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06percpu_ref: reduce memory footprint of percpu_ref in fast pathMing Lei
'struct percpu_ref' is often embedded into one user structure, and the instance is usually referenced in fast path, however actually only 'percpu_count_ptr' is needed in fast path. So move other fields into one new structure of 'percpu_ref_data', and allocate it dynamically via kzalloc(), then memory footprint of 'percpu_ref' in fast path is reduced a lot and becomes suitable to put into hot cacheline of user structure. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06Merge tag 'rxrpc-fixes-20201005' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Miscellaneous fixes Here are some miscellaneous rxrpc fixes: (1) Fix the xdr encoding of the contents read from an rxrpc key. (2) Fix a BUG() for a unsupported encoding type. (3) Fix missing _bh lock annotations. (4) Fix acceptance handling for an incoming call where the incoming call is encrypted. (5) The server token keyring isn't network namespaced - it belongs to the server, so there's no need. Namespacing it means that request_key() fails to find it. (6) Fix a leak of the server keyring. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06perf/x86: Fix n_metric for cancelled txnPeter Zijlstra
When a group that has TopDown members is failed to be scheduled, any later TopDown groups will not return valid values. Here is an example. A background perf that occupies all the GP counters and the fixed counter 1. $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles, cycles,cycles}:D" -a A user monitors a TopDown group. It works well, because the fixed counter 3 and the PERF_METRICS are available. $perf stat -x, --topdown -- ./workload retiring,bad speculation,frontend bound,backend bound, 18.0,16.1,40.4,25.5, Then the user tries to monitor a group that has TopDown members. Because of the cycles event, the group is failed to be scheduled. $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound, topdown-fe-bound,topdown-bad-spec,cycles}' -- ./workload <not counted>,,slots,0,0.00,, <not counted>,,topdown-retiring,0,0.00,, <not counted>,,topdown-be-bound,0,0.00,, <not counted>,,topdown-fe-bound,0,0.00,, <not counted>,,topdown-bad-spec,0,0.00,, <not counted>,,cycles,0,0.00,, The user tries to monitor a TopDown group again. It doesn't work anymore. $perf stat -x, --topdown -- ./workload ,,,,, In a txn, cancel_txn() is to truncate the event_list for a canceled group and update the number of events added in this transaction. However, the number of TopDown events added in this transaction is not updated. The kernel will probably fail to add new Topdown events. Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics") Reported-by: Andi Kleen <ak@linux.intel.com> Reported-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20201005082611.GH2628@hirez.programming.kicks-ass.net
2020-10-06perf/x86: Fix n_pair for cancelled txnPeter Zijlstra
Kan reported that n_metric gets corrupted for cancelled transactions; a similar issue exists for n_pair for AMD's Large Increment thing. The problem was confirmed and confirmed fixed by Kim using: sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 & # should succeed: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload # should fail: sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload # previously failed, now succeeds with this patch: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") Reported-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kim Phillips <kim.phillips@amd.com> Link: https://lkml.kernel.org/r/20201005082516.GG2628@hirez.programming.kicks-ass.net
2020-10-06tcp: fix receive window update in tcp_add_backlog()Eric Dumazet
We got reports from GKE customers flows being reset by netfilter conntrack unless nf_conntrack_tcp_be_liberal is set to 1. Traces seemed to suggest ACK packet being dropped by the packet capture, or more likely that ACK were received in the wrong order. wscale=7, SYN and SYNACK not shown here. This ACK allows the sender to send 1871*128 bytes from seq 51359321 : New right edge of the window -> 51359321+1871*128=51598809 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 Receiver now allows to send 606*128=77568 from seq 51521241 : New right edge of the window -> 51521241+606*128=51598809 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 It seems the sender exceeds RWIN allowance, since 51611353 > 51598809 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0 netfilter conntrack is not happy and sends RST 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0 Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. New right edge of the window -> 51521241+859*128=51631193 Normally TCP stack handles OOO packets just fine, but it turns out tcp_add_backlog() does not. It can update the window field of the aggregated packet even if the ACK sequence of the last received packet is too old. Many thanks to Alexandre Ferrieux for independently reporting the issue and suggesting a fix. Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: usb: rtl8150: set random MAC address when set_ethernet_addr() failsAnant Thazhemadam
When get_registers() fails in set_ethernet_addr(),the uninitialized value of node_id gets copied over as the address. So, check the return value of get_registers(). If get_registers() executed successfully (i.e., it returns sizeof(node_id)), copy over the MAC address using ether_addr_copy() (instead of using memcpy()). Else, if get_registers() failed instead, a randomly generated MAC address is set as the MAC address instead. Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Acked-by: Petko Manolov <petkan@nucleusys.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06mptcp: more DATA FIN fixesPaolo Abeni
Currently data fin on data packet are not handled properly: the 'rcv_data_fin_seq' field is interpreted as the last sequence number carrying a valid data, but for data fin packet with valid maps we currently store map_seq + map_len, that is, the next value. The 'write_seq' fields carries instead the value subseguent to the last valid byte, so in mptcp_write_data_fin() we never detect correctly the last DSS map. Fixes: 7279da6145bb ("mptcp: Use MPTCP-level flag for sending DATA_FIN") Fixes: 1a49b2c2a501 ("mptcp: Handle incoming 32-bit DATA_FIN values") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06Merge branch 'Fix-tail-dropping-watermarks-for-Ocelot-switches'David S. Miller
Vladimir Oltean says: ==================== Fix tail dropping watermarks for Ocelot switches This series adds a missing division by 60, and a warning to prevent that in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: mscc: ocelot: warn when encoding an out-of-bounds watermark valueVladimir Oltean
There is an upper bound to the value that a watermark may hold. That upper bound is not immediately obvious during configuration, and it might be possible to have accidental truncation. Actually this has happened already, add a warning to prevent it from happening again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOPVladimir Oltean
Tail dropping is enabled for a port when: 1. A source port consumes more packet buffers than the watermark encoded in SYS:PORT:ATOP_CFG.ATOP. AND 2. Total memory use exceeds the consumption watermark encoded in SYS:PAUSE_CFG:ATOP_TOT_CFG. The unit of these watermarks is a 60 byte memory cell. That unit is programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when written into ATOP, it would get truncated and wrap around. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()Manivannan Sadhasivam
The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence, fix it by excluding the locking for kernel_sendmsg(). While at it, let's also use radix_tree_deref_retry() to confirm the validity of the pointer returned by radix_tree_deref_slot() and use radix_tree_iter_resume() to resume iterating the tree properly before releasing the lock as suggested by Doug. Fixes: a7809ff90ce6 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks") Reported-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Alex Elder <elder@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()Rafael J. Wysocki
It turns out that in some cases there are EC events to flush in acpi_ec_dispatch_gpe() even though the ec_no_wakeup kernel parameter is set and the EC GPE is disabled while sleeping, so drop the ec_no_wakeup check that prevents those events from being processed from acpi_ec_dispatch_gpe(). Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06ACPI: EC: PM: Flush EC work unconditionally after wakeupRafael J. Wysocki
Commit 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive") has been reported to cause some power button wakeup events to be missed on some systems, so modify acpi_ec_dispatch_gpe() to call acpi_ec_flush_work() unconditionally to effectively reverse the changes made by that commit. Also note that the problem which prompted commit 607b9df63057 is not reproducible any more on the affected machine. Fixes: 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive") Reported-by: Raymond Tan <raymond.tan@intel.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06Merge branch 'irq/qcom-pdc-wakeup' into irq/irqchip-nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-06Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.10-rc1 from Viresh Kumar: "- STI cpufreq driver updates to allow new hardware (Alain Volmat). - Minor tegra driver fixes around initial frequency mismatch warnings (Jon Hunter). - dev_err simplification for s5pv210 driver (Krzysztof Kozlowski). - Qcom driver updates to allow new hardware and minor cleanup (Manivannan Sadhasivam and Matthias Kaehlcke). - Add missing MODULE_DEVICE_TABLE for armada driver (Pali Rohár). - Improved defer-probe handling in cpufreq-dt driver (Stephan Gerhold). - Call dev_pm_opp_of_remove_table() unconditionally for imx driver (Viresh Kumar)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom: Don't add frequencies without an OPP cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC cpufreq: qcom-hw: Use of_device_get_match_data for offsets and row size cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code dt-bindings: cpufreq: cpufreq-qcom-hw: Document Qcom EPSS compatible cpufreq: qcom-hw: Make use of cpufreq driver_data for passing pdev cpufreq: armada-37xx: Add missing MODULE_DEVICE_TABLE cpufreq: arm: Kconfig: add CPUFREQ_DT depend for STI CPUFREQ cpufreq: dt-platdev: Blacklist st,stih418 SoC cpufreq: sti-cpufreq: add stih418 support cpufreq: s5pv210: Use dev_err instead of pr_err in probe cpufreq: s5pv210: Simplify with dev_err_probe() cpufreq: tegra186: Fix initial frequency cpufreq: dt: Refactor initialization to handle probe deferral properly opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1() cpufreq: imx6q: Unconditionally call dev_pm_opp_of_remove_table() opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER
2020-10-06irqchip/qcom-pdc: Reset PDC interrupts during initMaulik Shah
Kexec can directly boot into a new kernel without going to complete reboot. This can leave the previous kernel's configuration for PDC interrupts as is. Clear previous kernel's configuration during init by setting interrupts in enable bank to zero. The IRQs specified in qcom,pdc-ranges property are the only ones that can be used by the new kernel so clear only those IRQs. The remaining ones may be in use by a different kernel and should not be set by new kernel. Suggested-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-7-git-send-email-mkshah@codeaurora.org
2020-10-06irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the wakeirqs during suspend entry. Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-6-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the wakeirqs during suspend entry. Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-5-git-send-email-mkshah@codeaurora.org
2020-10-06genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah
An interrupt that is disabled/masked but set for wakeup may still need to be able to wake up the system from sleep states like "suspend to RAM". To that effect, introduce the IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag. If the irqchip have this flag set, the irq PM code will enable/unmask the irqs that are marked for wakeup, but that are in a disabled state. On resume, such irqs will be restored back to their disabled state. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> [maz: commit message fix-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/1601267524-20199-4-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Use return value from irq_set_wake() callMaulik Shah
msmgpio irqchip was not using return value of irq_set_irq_wake() callback since previously GIC-v3 irqchip neither had IRQCHIP_SKIP_SET_WAKE flag nor it implemented .irq_set_wake callback. This lead to irq_set_irq_wake() return error -ENXIO. However from 'commit 4110b5cbb014 ("irqchip/gic-v3: Allow interrupt to be configured as wake-up sources")' GIC irqchip has IRQCHIP_SKIP_SET_WAKE flag. Use return value from irq_set_irq_wake() and irq_chip_set_wake_parent() instead of always returning success. Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-3-git-send-email-mkshah@codeaurora.org
2020-10-06pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flagsMaulik Shah
Both IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags are already set for msmgpio's parent PDC irqchip but GPIO interrupts do not get masked during suspend or during setting irq type since genirq checks irqchip flag of msmgpio irqchip which forwards these calls to its parent PDC irqchip. Add irqchip specific flags for msmgpio irqchip to mask non wakeirqs during suspend and mask before setting irq type. Masking before changing type make sures any spurious interrupt is not detected during this operation. Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-2-git-send-email-mkshah@codeaurora.org
2020-10-06PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPILukas Wunner
Recent laptops with dual AMD GPUs fail to suspend the discrete GPU, thus causing lockups on system sleep and high power consumption at runtime. The discrete GPU would normally be suspended to D3cold by turning off ACPI _PR3 Power Resources of the Root Port above the GPU. However on affected systems, the Root Port is hotplug-capable and pci_bridge_d3_possible() only allows hotplug ports to go to D3 if they belong to a Thunderbolt device or if the Root Port possesses a "HotPlugSupportInD3" ACPI property. Neither is the case on affected laptops. The reason for whitelisting only specific, known to work hotplug ports for D3 is that there have been reports of SkyLake Xeon-SP systems raising Hardware Error NMIs upon suspending their hotplug ports: https://lore.kernel.org/linux-pci/20170503180426.GA4058@otc-nc-03/ But if a hotplug port is power manageable by ACPI (as can be detected through presence of Power Resources and corresponding _PS0 and _PS3 methods) then it ought to be safe to suspend it to D3. To this end, amend acpi_pci_bridge_d3() to whitelist such ports for D3. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1222 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1252 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1304 Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com> Reported-and-tested-by: matoro <matoro@airmail.cc> Reported-by: Aaron Zakhrov <aaron.zakhrov@gmail.com> Reported-by: Michal Rostecki <mrostecki@suse.com> Reported-by: Shai Coleman <git@shaicoleman.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-06x86/copy_mc: Introduce copy_mc_enhanced_fast_string()Dan Williams
The motivations to go rework memcpy_mcsafe() are that the benefit of doing slow and careful copies is obviated on newer CPUs, and that the current opt-in list of CPUs to instrument recovery is broken relative to those CPUs. There is no need to keep an opt-in list up to date on an ongoing basis if pmem/dax operations are instrumented for recovery by default. With recovery enabled by default the old "mcsafe_key" opt-in to careful copying can be made a "fragile" opt-out. Where the "fragile" list takes steps to not consume poison across cachelines. The discussion with Linus made clear that the current "_mcsafe" suffix was imprecise to a fault. The operations that are needed by pmem/dax are to copy from a source address that might throw #MC to a destination that may write-fault, if it is a user page. So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate the separate precautions taken on source and destination. copy_mc_to_kernel() is introduced as a non-SMAP version that does not expect write-faults on the destination, but is still prepared to abort with an error code upon taking #MC. The original copy_mc_fragile() implementation had negative performance implications since it did not use the fast-string instruction sequence to perform copies. For this reason copy_mc_to_kernel() fell back to plain memcpy() to preserve performance on platforms that did not indicate the capability to recover from machine check exceptions. However, that capability detection was not architectural and now that some platforms can recover from fast-string consumption of memory errors the memcpy() fallback now causes these more capable platforms to fail. Introduce copy_mc_enhanced_fast_string() as the fast default implementation of copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'. With this in place, copy_mc_to_kernel() is fast and recovery-ready by default regardless of hardware capability. Thanks to Vivek for identifying that copy_user_generic() is not suitable as the copy_mc_to_user() backend since the #MC handler explicitly checks ex_has_fault_handler(). Thanks to the 0day robot for catching a performance bug in the x86/copy_mc_to_user implementation. [ bp: Add the "why" for this change from the 0/2th message, massage. ] Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()") Reported-by: Erwin Tsaur <erwin.tsaur@intel.com> Reported-by: 0day robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Erwin Tsaur <erwin.tsaur@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com