summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-18rq-qos: set ourself TASK_UNINTERRUPTIBLE after we scheduleJosef Bacik
In case we get a spurious wakeup we need to make sure to re-set ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18rq-qos: don't reset has_sleepers on spurious wakeupsJosef Bacik
If we raced with somebody else getting an inflight counter we could fail to get an inflight counter with no sleepers on the list, and thus need to go to sleep. In this case has_sleepers should be true because we are now relying on the waker to get our inflight counter for us. And in the case of spurious wakeups we'd still want this to be the case. So set has_sleepers to true if we went to sleep to make sure we're woken up the proper way. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18rq-qos: fix missed wake-ups in rq_qos_throttleJosef Bacik
We saw a hang in production with WBT where there was only one waiter in the throttle path and no outstanding IO. This is because of the has_sleepers optimization that is used to make sure we don't steal an inflight counter for new submitters when there are people already on the list. We can race with our check to see if the waitqueue has any waiters (this is done locklessly) and the time we actually add ourselves to the waitqueue. If this happens we'll go to sleep and never be woken up because nobody is doing IO to wake us up. Fix this by checking if the waitqueue has a single sleeper on the list after we add ourselves, that way we have an uptodate view of the list. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18wait: add wq_has_single_sleeper helperJosef Bacik
rq-qos sits in the io path so we want to take locks as sparingly as possible. To accomplish this we try not to take the waitqueue head lock unless we are sure we need to go to sleep, and we have an optimization to make sure that we don't starve out existing waiters. Since we check if there are existing waiters locklessly we need to be able to update our view of the waitqueue list after we've added ourselves to the waitqueue. Accomplish this by adding this helper to see if there is more than just ourselves on the list. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18Merge tag 'acpi-5.3-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These get rid of two clang warnings, add a new quirk mechanism to the ACPI backlight driver (and apply it to one machine) and update the table load object initialization in ACPICA (this is a replacement for a previously reverted ACPICA commit). Specifics: - Make ACPI table loading work more consistently regardless of the exact mechanism used for loading a table (Erik Schmauss). - Get rid of two clang warnings (Arnd Bergmann). - Add new quirk mechanism to the ACPI backlight driver and use it to add a quirk for PB Easynote MZ35 (Hans de Goede)" * tag 'acpi-5.3-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: video: Add new hw_changes_brightness quirk, set it on PB Easynote MZ35 ACPI: fix false-positive -Wuninitialized warning ACPI: blacklist: fix clang warning for unused DMI table ACPICA: Update table load object initialization
2019-07-18Merge branch 'floppy'Linus Torvalds
Merge floppy ioctl verification fixes from Denis Efremov. This also marks the floppy driver as orphaned - it turns out that Jiri no longer has working hardware. Actual working physical floppy hardware is getting hard to find, and while Willy was able to test this, I think the driver can be considered pretty much dead from an actual hardware standpoint. The hardware that is still sold seems to be mainly USB-based, which doesn't use this legacy driver at all. The old floppy disk controller is still emulated in various VM environments, so the driver isn't going away, but let's see if anybody is interested to step up to maintain it. The lack of hardware also likely means that the ioctl range verification fixes are probably mostly relevant to anybody using floppies in a virtual environment. Which is probably also going away in favor of USB storage emulation, but who knows. Will Decon reviewed the patches but I'm not rebasing them just for that, so I'll add a Reviewed-by: Will Deacon <will@kernel.org> here instead. * floppy: MAINTAINERS: mark floppy.c orphaned floppy: fix out-of-bounds read in copy_buffer floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in next_valid_format floppy: fix div-by-zero in setup_format_params
2019-07-18MAINTAINERS: mark floppy.c orphanedJiri Kosina
I volunteered myself to maintain it quite some time ago back when I fixed the concurrency issues which exhibited itself only with VM-emulated devices, and at the same time I still had the physical 3.5" reader to test all the changes. The reader doesn't work any more though, so I guess it's time to step down from this super-prestigious role :p and mark floppy.c as Orphaned. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18riscv: fix build break after macro-to-function conversion in generic ↵Paul Walmsley
cacheflush.h Commit c296d4dc13ae ("asm-generic: fix a compilation warning") converted the various flush_*cache_* macros in asm-generic/cacheflush.h to static inline functions. This breaks RISC-V builds, since RISC-V's cacheflush.h includes the generic cacheflush.h and then undefines the macros to be overridden. Fix by copying the subset of the no-op functions that are reused from the generic cacheflush.h into the RISC-V cacheflush.h, and dropping the include of the generic cacheflush.h. Fixes: c296d4dc13ae ("asm-generic: fix a compilation warning") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Qian Cai <cai@lca.pw> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18stacktrace: Force USER_DS for stack_trace_save_user()Peter Zijlstra
When walking userspace stacks, USER_DS needs to be set, otherwise access_ok() will not function as expected. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lkml.kernel.org/r/20190718085754.GM3402@hirez.programming.kicks-ass.net
2019-07-18powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()Gautham R. Shenoy
xive_find_target_in_mask() has the following for(;;) loop which has a bug when @first == cpumask_first(@mask) and condition 1 fails to hold for every CPU in @mask. In this case we loop forever in the for-loop. first = cpu; for (;;) { if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1 return cpu; cpu = cpumask_next(cpu, mask); if (cpu == first) // condition 2 break; if (cpu >= nr_cpu_ids) // condition 3 cpu = cpumask_first(mask); } This is because, when @first == cpumask_first(@mask), we never hit the condition 2 (cpu == first) since prior to this check, we would have executed "cpu = cpumask_next(cpu, mask)" which will set the value of @cpu to a value greater than @first or to nr_cpus_ids. When this is coupled with the fact that condition 1 is not met, we will never exit this loop. This was discovered by the hard-lockup detector while running LTP test concurrently with SMT switch tests. watchdog: CPU 12 detected hard LOCKUP on other CPUs 68 watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago) watchdog: CPU 68 Hard LOCKUP watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago) CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1 NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000 REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le) MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000 CFAR: c0000000006f558c IRQMASK: 1 GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18 GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8 GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8 GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001 GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18 GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0 GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f NIP [c0000000006f5578] find_next_bit+0x38/0x90 LR [c000000000cba9ec] cpumask_next+0x2c/0x50 Call Trace: [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable) [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240 [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0 [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260 [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0 [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0 [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90 [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220 [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x] [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x] [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0 [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50 [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0 [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0 [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70 [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160 [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70 Instruction dump: 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040 To fix this, move the check for condition 2 after the check for condition 3, so that we are able to break out of the loop soon after iterating through all the CPUs in the @mask in the problem case. Use do..while() to achieve this. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Indira P. Joga <indira.priya@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
2019-07-18block, bfq: check also in-flight I/O in dispatch pluggingPaolo Valente
Consider a sync bfq_queue Q that remains empty while in service, and suppose that, when this happens, there is a fair amount of already in-flight I/O not belonging to Q. In such a situation, I/O dispatching may need to be plugged (until new I/O arrives for Q), for the following reason. The drive may decide to serve in-flight non-Q's I/O requests before Q's ones, thereby delaying the arrival of new I/O requests for Q (recall that Q is sync). If I/O-dispatching is not plugged, then, while Q remains empty, a basically uncontrolled amount of I/O from other queues may be dispatched too, possibly causing the service of Q's I/O to be delayed even longer in the drive. This problem gets more and more serious as the speed and the queue depth of the drive grow, because, as these two quantities grow, the probability to find no queue busy but many requests in flight grows too. If Q has the same weight and priority as the other queues, then the above delay is unlikely to cause any issue, because all queues tend to undergo the same treatment. So, since not plugging I/O dispatching is convenient for throughput, it is better not to plug. Things change in case Q has a higher weight or priority than some other queue, because Q's service guarantees may simply be violated. For this reason, commit 1de0c4cd9ea6 ("block, bfq: reduce idling only in symmetric scenarios") does plug I/O in such an asymmetric scenario. Plugging minimizes the delay induced by already in-flight I/O, and enables Q to recover the bandwidth it may lose because of this delay. Yet the above commit does not cover the case of weight-raised queues, for efficiency concerns. For weight-raised queues, I/O-dispatch plugging is activated simply if not all bfq_queues are weight-raised. But this check does not handle the case of in-flight requests, because a bfq_queue may become non busy *before* all its in-flight requests are completed. This commit performs I/O-dispatch plugging for weight-raised queues if there are some in-flight requests. As a practical example of the resulting recover of control, under write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5 seconds after this fix, against 15 seconds before the fix (as a reference, gnome-terminal takes about 35 seconds to start with any of the other I/O schedulers). Fixes: 1de0c4cd9ea6 ("block, bfq: reduce idling only in symmetric scenarios") Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1Kai-Heng Feng
Commit 7b9584fa1c0b ("staging: line6: Move altsetting to properties") set a wrong altsetting for LINE6_PODHD500_1 during refactoring. Set the correct altsetting number to fix the issue. BugLink: https://bugs.launchpad.net/bugs/1790595 Fixes: 7b9584fa1c0b ("staging: line6: Move altsetting to properties") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-18ALSA: hda - Optimize resume for codecs without jack detectionTakashi Iwai
The codecs without jack detection also don't have to be resumed forcibly because, obviously, they have no jack. Skip the forced resume in such a case as optimization as well. Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-18Merge branches 'acpi-misc' and 'acpi-video'Rafael J. Wysocki
* acpi-misc: ACPI: fix false-positive -Wuninitialized warning ACPI: blacklist: fix clang warning for unused DMI table * acpi-video: ACPI: video: Add new hw_changes_brightness quirk, set it on PB Easynote MZ35
2019-07-18Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: cpufreq: Make cpufreq_generic_init() return void cpufreq: imx-cpufreq-dt: Add i.MX8MN support cpufreq: Add QoS requests for userspace constraints cpufreq: intel_pstate: Reuse refresh_frequency_limits() cpufreq: Register notifiers with the PM QoS framework PM / QoS: Add support for MIN/MAX frequency constraints PM / QOS: Pass request type to dev_pm_qos_read_value() PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
2019-07-18padata: use smp_mb in padata_reorder to avoid orphaned padata jobsDaniel Jordan
Testing padata with the tcrypt module on a 5.2 kernel... # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3 # modprobe tcrypt mode=211 sec=1 ...produces this splat: INFO: task modprobe:10075 blocked for more than 120 seconds. Not tainted 5.2.0-base+ #16 modprobe D 0 10075 10064 0x80004080 Call Trace: ? __schedule+0x4dd/0x610 ? ring_buffer_unlock_commit+0x23/0x100 schedule+0x6c/0x90 schedule_timeout+0x3b/0x320 ? trace_buffer_unlock_commit_regs+0x4f/0x1f0 wait_for_common+0x160/0x1a0 ? wake_up_q+0x80/0x80 { crypto_wait_req } # entries in braces added by hand { do_one_aead_op } { test_aead_jiffies } test_aead_speed.constprop.17+0x681/0xf30 [tcrypt] do_test+0x4053/0x6a2b [tcrypt] ? 0xffffffffa00f4000 tcrypt_mod_init+0x50/0x1000 [tcrypt] ... The second modprobe command never finishes because in padata_reorder, CPU0's load of reorder_objects is executed before the unlocking store in spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment: CPU0 CPU1 padata_reorder padata_do_serial LOAD reorder_objects // 0 INC reorder_objects // 1 padata_reorder TRYLOCK pd->lock // failed UNLOCK pd->lock CPU0 deletes the timer before returning from padata_reorder and since no other job is submitted to padata, modprobe waits indefinitely. Add a pair of full barriers to guarantee proper ordering: CPU0 CPU1 padata_reorder padata_do_serial UNLOCK pd->lock smp_mb() LOAD reorder_objects INC reorder_objects smp_mb__after_atomic() padata_reorder TRYLOCK pd->lock smp_mb__after_atomic is needed so the read part of the trylock operation comes after the INC, as Andrea points out. Thanks also to Andrea for help with writing a litmus test. Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-arch@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-18crypto: ccp - Fix SEV_VERSION_GREATER_OR_EQUALDavid Rientjes
SEV_VERSION_GREATER_OR_EQUAL() will fail if upgrading from 2.2 to 3.1, for example, because the minor version is not equal to or greater than the major. Fix this and move to a static inline function for appropriate type checking. Fixes: edd303ff0e9e ("crypto: ccp - Add DOWNLOAD_FIRMWARE SEV command") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-18crypto: ccp/gcm - use const time tag comparison.Cfir Cohen
Avoid leaking GCM tag through timing side channel. Fixes: 36cf515b9bbe ("crypto: ccp - Enable support for AES GCM on v5 CCPs") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Cfir Cohen <cfir@google.com> Acked-by: Gary R Hook <ghook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-18SUNRPC: Fix up backchannel slot table accountingTrond Myklebust
Add a per-transport maximum limit in the socket case, and add helpers to allow the NFSv4 code to discover that limit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18SUNRPC: Fix initialisation of struct rpc_xprt_switchTrond Myklebust
Ensure that we do initialise the fields xps_nactive, xps_queuelen and xps_net. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18Merge tag 'drm-misc-next-fixes-2019-07-11' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next Pull request for drm-misc-fixes-next for v5.3: - Revert properties exposed in komeda that need improvement before they become ABI. - Only add modes from the cmdline if they are valid. - Add orientation quirk for GPD MicroPC. - Reduce stack usage in drm selftests. - Fix bochs framebuffer setup. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e6b84ce4-2728-fb02-87c1-6a6b87703c0b@linux.intel.com
2019-07-18xen: let alloc_xenballooned_pages() fail if not enough memory freeJuergen Gross
Instead of trying to allocate pages with GFP_USER in add_ballooned_pages() check the available free memory via si_mem_available(). GFP_USER is far less limiting memory exhaustion than the test via si_mem_available(). This will avoid dom0 running out of memory due to excessive foreign page mappings especially on ARM and on x86 in PVH mode, as those don't have a pre-ballooned area which can be used for foreign mappings. As the normal ballooning suffers from the same problem don't balloon down more than si_mem_available() pages in one iteration. At the same time limit the default maximum number of retries. This is part of XSA-300. Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-18objtool: Rename elf_open() to prevent conflict with libelf from elftoolchainMichael Forney
The elftoolchain version of libelf has a function named elf_open(). The function name isn't quite accurate anyway, since it also reads all the ELF data. Rename it to elf_read(), which is more accurate. [ jpoimboe: rename to elf_read(); write commit description ] Signed-off-by: Michael Forney <mforney@mforney.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/7ce2d1b35665edf19fd0eb6fbc0b17b81a48e62f.1562793604.git.jpoimboe@redhat.com
2019-07-18objtool: Use Elf_Scn typedef instead of assuming struct nameMichael Forney
The libelf implementation might use a different struct name, and the Elf_Scn typedef is already used throughout the rest of objtool. Signed-off-by: Michael Forney <mforney@mforney.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/d270e1be2835fc2a10acf67535ff2ebd2145bf43.1562793448.git.jpoimboe@redhat.com
2019-07-18Merge tag 'perf-core-for-mingo-5.3-20190715' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf db-export: Adrian Hunter: - Improvements in how COMM details are exported to databases for post processing and use in the sql-viewer.py UI. - Export switch events to the database. BPF: Arnaldo Carvalho de Melo: - Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just like selftests/bpf/bpf_rlimit.h do, which makes errors due to exhaustion of this limit, which are kinda cryptic (EPERM sometimes) less frequent. perf version: Ravi Bangoria: - Fix segfault due to missing OPT_END(), noticed on PowerPC. perf vendor events: Thomas Richter: - Add JSON files for IBM s/390 machine type 8561. perf cs-etm (ARM): YueHaibing: - Fix two cases of error returns not bing done properly: Invalid ERR_PTR() use and loss of propagation error codes.
2019-07-17ipv6: rt6_check should return NULL if 'from' is NULLDavid Ahern
Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17tipc: initialize 'validated' field of received packetsJon Maloy
The tipc_msg_validate() function leaves a boolean flag 'validated' in the validated buffer's control block, to avoid performing this action more than once. However, at reception of new packets, the position of this field may already have been set by lower layer protocols, so that the packet is erroneously perceived as already validated by TIPC. We fix this by initializing the said field to 'false' before performing the initial validation. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge branch 'ipv4-relax-source-validation-check-for-loopback-packets'David S. Miller
Cong Wang says: ==================== ipv4: relax source validation check for loopback packets This patchset fixes a corner case when loopback packets get dropped by rp_filter when we route them from veth to lo. Patch 1 is the fix and patch 2 provides a simplified test case for this scenario. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17selftests: add a test case for rp_filterCong Wang
Add a test case to simulate the loopback packet case fixed in the previous patch. This test gets passed after the fix: IPv4 rp_filter tests TEST: rp_filter passes local packets [ OK ] TEST: rp_filter passes loopback packets [ OK ] Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17fib: relax source validation check for loopback packetsCong Wang
In a rare case where we redirect local packets from veth to lo, these packets fail to pass the source validation when rp_filter is turned on, as the tracing shows: <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0 <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130 So, the fib table lookup returns eth0 as the nexthop even though the packets are local and should be routed to loopback nonetheless, but they can't pass the dev match check in fib_info_nh_uses_dev() without this patch. It should be safe to relax this check for this special case, as normally packets coming out of loopback device still have skb_dst so they won't even hit this slow path. Cc: Julian Anastasov <ja@ssi.bg> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge branch 'mlxsw-Two-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Two fixes This patchset contains two fixes for mlxsw. Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even if the egress port was switched to Trust L2 mode where priority mapping is based on PCP. Patch #2 fixes a problem where packets can be learned on a non-existing FID if a tc filter with a redirect action is configured on a bridged port. The problem and fix are explained in detail in the commit message. Please consider both patches for 5.2.y ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17mlxsw: spectrum: Do not process learned records with a dummy FIDIdo Schimmel
The switch periodically sends notifications about learned FDB entries. Among other things, the notification includes the FID (Filtering Identifier) and the port on which the MAC was learned. In case the driver does not have the FID defined on the relevant port, the following error will be periodically generated: mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification This is not supposed to happen under normal conditions, but can happen if an ingress tc filter with a redirect action is installed on a bridged port. The redirect action will cause the packet's FID to be changed to the dummy FID and a learning notification will be emitted with this FID - which is not defined on the bridged port. Fix this by having the driver ignore learning notifications generated with the dummy FID and delete them from the device. Another option is to chain an ignore action after the redirect action which will cause the device to disable learning, but this means that we need to consume another action whenever a redirect action is used. In addition, the scenario described above is merely a corner case. Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removedPetr Machata
Spectrum systems use DSCP rewrite map to update DSCP field in egressing packets to correspond to priority that the packet has. Whether rewriting will take place is determined at the point when the packet ingresses the switch: if the port is in Trust L3 mode, packet priority is determined from the DSCP map at the port, and DSCP rewrite will happen. If the port is in Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP rewrite will happen. The driver determines the port trust mode based on whether any DSCP prioritization rules are in effect at given port. If there are any, trust level is L3, otherwise it's L2. When the last DSCP rule is removed, the port is switched to trust L2. Under that scenario, if DSCP of a packet should be rewritten, it should be rewritten to 0. However, when switching to Trust L2, the driver neglects to also update the DSCP rewrite map. The last DSCP rule thus remains in effect, and packets egressing through this port, if they have the right priority, will have their DSCP set according to this rule. Fix by first configuring the rewrite map, and only then switching to trust L2 and bailing out. Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp") Signed-off-by: Petr Machata <petrm@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17net: ag71xx: Add missing headerRosen Penev
ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17floppy: fix out-of-bounds read in copy_bufferDenis Efremov
This fixes a global out-of-bounds read access in the copy_buffer function of the floppy driver. The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect and head fields (unsigned int) of the floppy_drive structure are used to compute the max_sector (int) in the make_raw_rw_request function. It is possible to overflow the max_sector. Next, max_sector is passed to the copy_buffer function and used in one of the memcpy calls. An unprivileged user could trigger the bug if the device is accessible, but requires a floppy disk to be inserted. The patch adds the check for the .sect * .head multiplication for not overflowing in the set_geometry function. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix invalid pointer dereference in drive_nameDenis Efremov
This fixes the invalid pointer dereference in the drive_name function of the floppy driver. The native_format field of the struct floppy_drive_params is used as floppy_type array index in the drive_name function. Thus, the field should be checked the same way as the autodetect field. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should be used to call the drive_name. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for a value of the native_format field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix out-of-bounds read in next_valid_formatDenis Efremov
This fixes a global out-of-bounds read access in the next_valid_format function of the floppy driver. The values from autodetect field of the struct floppy_drive_params are used as indices for the floppy_type array in the next_valid_format function 'floppy_type[DP->autodetect[probed_format]].sect'. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for values of the autodetect field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix div-by-zero in setup_format_paramsDenis Efremov
This fixes a divide by zero error in the setup_format_params function of the floppy driver. Two consecutive ioctls can trigger the bug: The first one should set the drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK to become zero. Next, the floppy format operation should be called. A floppy disk is not required to be inserted. An unprivileged user could trigger the bug if the device is accessible. The patch checks F_SECT_PER_TRACK for a non-zero value in the set_geometry function. The proper check should involve a reasonable upper limit for the .sect and .rate fields, but it could change the UAPI. The patch also checks F_SECT_PER_TRACK in the setup_format_params, and cancels the formatting operation in case of zero. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17x86/mm, tracing: Fix CR2 corruptionPeter Zijlstra
Despite the current efforts to read CR2 before tracing happens there still exist a number of possible holes: idtentry page_fault do_page_fault has_error_code=1 call error_entry TRACE_IRQS_OFF call trace_hardirqs_off* #PF // modifies CR2 CALL_enter_from_user_mode __context_tracking_exit() trace_user_exit(0) #PF // modifies CR2 call do_page_fault address = read_cr2(); /* whoopsie */ And similar for i386. Fix it by pulling the CR2 read into the entry code, before any of that stuff gets a chance to run and ruin things. Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: joel@joelfernandes.org Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17x86/entry/64: Update comments and sanity tests for create_gapPeter Zijlstra
Commit 2700fefdb2d9 ("x86_64: Add gap to int3 to allow for call emulation") forgot to update the comment, do so now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.059780563@infradead.org
2019-07-17x86/entry/64: Simplify idtentry a littlePeter Zijlstra
There's a bunch of duplication in idtentry, namely the .Lfrom_usermode_switch_stack is a paranoid=0 copy of the normal flow. Make this explicit by creating a idtentry_part helper macro. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.002429503@infradead.org
2019-07-17x86/entry/32: Simplify common_exceptionPeter Zijlstra
Adding one more option to SAVE_ALL can be used in common_exception to simplify things. This also saves duplication later where page_fault will no longer use common_exception. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.945136187@infradead.org
2019-07-17x86/paravirt: Make read_cr2() CALLEE_SAVEPeter Zijlstra
The one paravirt read_cr2() implementation (Xen) is actually quite trivial and doesn't need to clobber anything other than the return register. Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows more convenient use from assembly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.887392493@infradead.org
2019-07-17parisc: Wire up clone3 syscallHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: Sven Schnelle <svens@stackframe.org> Acked-by: Christian Brauner <christian@brauner.io>
2019-07-17parisc: Avoid kernel panic triggered by invalid kprobeHelge Deller
When running gdb I was able to trigger this kernel panic: Kernel Fault: Code=26 (Data memory access rights trap) at addr 0000000000000060 CPU: 0 PID: 1401 Comm: gdb-crash Not tainted 5.2.0-rc7-64bit+ #1053 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001000000000000001111 Not tainted r00-03 000000000804000f 0000000040dee1a0 0000000040c78cf0 00000000b8d50160 r04-07 0000000040d2b1a0 000000004360a098 00000000bbbe87b8 0000000000000003 r08-11 00000000fac20a70 00000000fac24160 00000000fac1bbe0 0000000000000000 r12-15 00000000fabfb79a 00000000fac244a4 0000000000010000 0000000000000001 r16-19 00000000bbbe87b8 00000000f8f02910 0000000000010034 0000000000000000 r20-23 00000000fac24630 00000000fac24630 000000006474e552 00000000fac1aa52 r24-27 0000000000000028 00000000bbbe87b8 00000000bbbe87b8 0000000040d2b1a0 r28-31 0000000000000000 00000000b8d501c0 00000000b8d501f0 0000000003424000 sr00-03 0000000000423000 0000000000000000 0000000000000000 0000000000423000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040c78cf0 0000000040c78cf4 IIR: 539f00c0 ISR: 0000000000000000 IOR: 0000000000000060 CPU: 0 CR30: 00000000b8d50000 CR31: 00000000d22345e2 ORIG_R28: 0000000040250798 IAOQ[0]: parisc_kprobe_ss_handler+0x58/0x170 IAOQ[1]: parisc_kprobe_ss_handler+0x5c/0x170 RP(r2): parisc_kprobe_ss_handler+0x58/0x170 Backtrace: [<0000000040206ff8>] handle_interruption+0x178/0xbb8 Kernel panic - not syncing: Kernel Fault Avoid this panic by checking the return value of kprobe_running() and skip kprobe if none is currently active. Cc: <stable@vger.kernel.org> # v5.2 Acked-by: Sven Schnelle <svens@stackframe.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-17parisc: Ensure userspace privilege for ptraced processes in regset functionsHelge Deller
On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications in the regset support functions by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace regset calls. Link: https://bugs.gentoo.org/481768 Cc: <stable@vger.kernel.org> # v4.7+ Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-17parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1Helge Deller
On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace calls in the native and compat ptrace paths. Link: https://bugs.gentoo.org/481768 Reported-by: Jeroen Roovers <jer@gentoo.org> Cc: <stable@vger.kernel.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-17net_sched: unset TCQ_F_CAN_BYPASS when adding filtersCong Wang
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge tag 'platform-drivers-x86-v5.3-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull another x86 platform driver update from Andy Shevchenko: "Provide better naming for ABI, i.e. tell that we have fan boost mode. It won't break any ABI, but has to be done now to avoid confusion in the future" * tag 'platform-drivers-x86-v5.3-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: asus: Rename "fan mode" to "fan boost mode"
2019-07-17Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: - Convert thermal documents to ReST (Mauro Carvalho Chehab) - Fix a cyclic depedency in between thermal core and governors (Daniel Lezcano) - Fix processor_thermal_device driver to re-evaluate power limits after resume (Srinivas Pandruvada, Zhang Rui) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: drivers: thermal: processor_thermal_device: Fix build warning docs: thermal: convert to ReST thermal/drivers/core: Use governor table to initialize thermal/drivers/core: Add init section table for self-encapsulation drivers: thermal: processor_thermal: Read PPCC on resume