summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-04-18sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALLDaniel Jordan
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the kernel text, data, and bss fit in the required 32MB limit, but this option is not set for every config that enables lockdep. A 4.10 kernel fails to boot with the console output Kernel: Using 8 locked TLB entries for main kernel image. hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f Program terminated with these config options CONFIG_LOCKDEP=y CONFIG_LOCK_STAT=y CONFIG_PROVE_LOCKING=n To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and enable this option with CONFIG_LOCKDEP=y so we get the reduced memory usage every time lockdep is turned on. Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also enabled. Fixes: e6b5f1be7afe ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18mmc: dw_mmc: Don't allow Runtime PM for SDIO cardsDouglas Anderson
According to the SDIO standard interrupts are normally signalled in a very complicated way. They require the card clock to be running and require the controller to be paying close attention to the signals coming from the card. This simply can't happen with the clock stopped or with the controller in a low power mode. To that end, we'll disable runtime_pm when we detect that an SDIO card was inserted. This is much like with what we do with the special "SDMMC_CLKEN_LOW_PWR" bit that dw_mmc supports. NOTE: we specifically do this Runtime PM disabling at card init time rather than in the enable_sdio_irq() callback. This is _different_ than how SDHCI does it. Why do we do it differently? - Unlike SDHCI, dw_mmc uses the standard sdio_irq code in Linux (AKA dw_mmc doesn't set MMC_CAP2_SDIO_IRQ_NOTHREAD). - Because we use the standard sdio_irq code: - We see a constant stream of enable_sdio_irq(0) and enable_sdio_irq(1) calls. This is because the standard code disables interrupts while processing and re-enables them after. - While interrupts are disabled, there's technically a period where we could get runtime disabled while processing interrupts. - If we are runtime disabled while processing interrupts, we'll reset the controller at resume time (see dw_mci_runtime_resume), which seems like a terrible idea because we could possibly have another interrupt pending. To fix the above isues we'd want to put something in the standard sdio_irq code that makes sure to call pm_runtime get/put when interrupts are being actively being processed. That's possible to do, but it seems like a more complicated mechanism when we really just want the runtime pm disabled always for SDIO cards given that all the other bits needed to get Runtime PM vs. SDIO just aren't there. NOTE: at some point in time someone might come up with a fancy way to do SDIO interrupts and still allow (some) amount of runtime PM. Technically we could turn off the card clock if we used an alternate way of signaling SDIO interrupts (and out of band interrupt is one way to do this). We probably wouldn't actually want to fully runtime suspend in this case though--at least not with the current dw_mci_runtime_resume() which basically fully resets the controller at resume time. Fixes: e9ed8835e990 ("mmc: dw_mmc: add runtime PM callback") Cc: <stable@vger.kernel.org> Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-18Input: elantech - add Fujitsu Lifebook E547 to force crc_enabledThorsten Leemhuis
Temporary got a Lifebook E547 into my hands and noticed the touchpad only works after running: echo "1" > /sys/devices/platform/i8042/serio2/crc_enabled Add it to the list of machines that need this workaround. Cc: stable@vger.kernel.org Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Reviewed-by: Ulrik De Bie <ulrik.debie-os@e2big.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-04-18Merge tag 'trace-v4.11-rc5-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace testcase update from Steven Rostedt: "While testing my development branch, without the fix for the pid use after free bug, the selftest that Namhyung added triggers it. I figured it would be good to add the test for the bug after the fix, such that it does not exist without the fix. I added another patch that lets the test only test part of the pid filtering, and ignores the function-fork (filtering on children as well) if the function-fork feature does not exist. This feature is added by Namhyung just before he added this test. But since the test tests both with and without the feature, it would be good to let it not fail if the feature does not exist" * tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add check for function-fork before running pid filter test selftests: ftrace: Add a testcase for function PID filter
2017-04-18mmc: sdio: fix alignment issue in struct sdio_funcHeiner Kallweit
Certain 64-bit systems (e.g. Amlogic Meson GX) require buffers to be used for DMA to be 8-byte-aligned. struct sdio_func has an embedded small DMA buffer not meeting this requirement. When testing switching to descriptor chain mode in meson-gx driver SDIO is broken therefore. Fix this by allocating the small DMA buffer separately as kmalloc ensures that the returned memory area is properly aligned for every basic data type. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Helmut Klein <hgkr.klein@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-18selftests: ftrace: Add check for function-fork before running pid filter testSteven Rostedt (VMware)
Have the func-filter-pid test check for the function-fork option before testing it. It can still test the pid filtering, but will stop before testing the function-fork option for children inheriting the pids. This allows the test to be added before the function-fork feature, but after a bug fix that triggers one of the bugs the test can cause. Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18Merge tag 'trace-v4.11-rc5-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Namhyung Kim discovered a use after free bug. It has to do with adding a pid filter to function tracing in an instance, and then freeing the instance" * tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix function pid filter on instances
2017-04-18Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following problems: - regression in new XTS/LRW code when used with async crypto - long-standing bug in ahash API when used with certain algos - bogus memory dereference in async algif_aead with certain algos" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_aead - Fix bogus request dereference in completion function crypto: ahash - Fix EINPROGRESS notification callback crypto: lrw - Fix use-after-free on EINPROGRESS crypto: xts - Fix use-after-free on EINPROGRESS
2017-04-18selftests: ftrace: Add a testcase for function PID filterNamhyung Kim
Like event pid filtering test, add function pid filtering test with the new "function-fork" option. It also tests it on an instance directory so that it can verify the bug related pid filtering on instances. Link: http://lkml.kernel.org/r/20170417024430.21194-5-namhyung@kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18blkfront: add uevent for size changeMarc Olson
When a blkfront device is resized from dom0, emit a KOBJ_CHANGE uevent to notify the guest about the change. This allows for custom udev rules, such as automatically resizing a filesystem, when an event occurs. With this patch you get these udev KERNEL[577.206230] change /devices/vbd-51728/block/xvdb (block) UDEV [577.226218] change /devices/vbd-51728/block/xvdb (block) Signed-off-by: Marc Olson <marcolso@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-04-18ACPI / platform: Update platform device NUMA node based on _PXM methodShanker Donthineni
The optional _PXM method evaluates to an integer that identifies the proximity domain of a device object. On ACPI based kernel boot, the field numa_node in 'struct device' is always set to -1 irrespective of _PXM value that is specified in the ACPI device object. But in case of device-tree based kernel boot the numa_node field is populated and reflects to a DT property that is specified in DTS according to the below document. https://www.kernel.org/doc/Documentation/devicetree/bindings/numa.txt http://lxr.free-electrons.com/source/drivers/of/device.c#L54 Without this patch dev_to_node() always returns -1 for all platform devices. This patch adds support for the ACPI _PXM method and updates the platform device NUMA node id using acpi_get_node() which provides the PXM-to-NUMA mapping information. The individual platform device drivers should be able to use the NUMA-aware memory allocation functions kmalloc_node() and alloc_pages_node() to improve performance. Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Jiandi An <anjiandi@codeaurora.org> [ rjw: Subject / changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-18ACPI / Processor: Drop setup_max_cpus check from acpi_processor_add()Prakash, Prashanth
When maxcpus=X kernel argument is used, we parse the ACPI tables only for the first X cpus. The per-cpu ACPI data include the _PSD tables which gives information about related cpus. cppc_cpufreq and acpi-cpufreq parses the table once during init to deduce the related cpus. If a user brings a new cpu online after boot the related cpu data becomes incorrect. acpi_get_psd_map() in acpi_cppc.c returns error if it fails to find the parsed ACPI data for possible CPU resulting in cppc_cpufreq initialization failure. With this change we will probe all possible CPUs prior to cpufreq initialization, but will bring only setup_max_cpus online. nr_cpus kernel parameter can be used to restict even parsing per-cpu ACPI tables. Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-18KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyringsEric Biggers
This fixes CVE-2017-7472. Running the following program as an unprivileged user exhausts kernel memory by leaking thread keyrings: #include <keyutils.h> int main() { for (;;) keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING); } Fix it by only creating a new thread keyring if there wasn't one before. To make things more consistent, make install_thread_keyring_to_cred() and install_process_keyring_to_cred() both return 0 if the corresponding keyring is already present. Fixes: d84f4f992cbd ("CRED: Inaugurate COW credentials") Cc: stable@vger.kernel.org # 2.6.29+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-04-18KEYS: Change the name of the dead type to ".dead" to prevent user accessDavid Howells
This fixes CVE-2017-6951. Userspace should not be able to do things with the "dead" key type as it doesn't have some of the helper functions set upon it that the kernel needs. Attempting to use it may cause the kernel to crash. Fix this by changing the name of the type to ".dead" so that it's rejected up front on userspace syscalls by key_get_type_from_user(). Though this doesn't seem to affect recent kernels, it does affect older ones, certainly those prior to: commit c06cfb08b88dfbe13be44a69ae2fdc3a7c902d81 Author: David Howells <dhowells@redhat.com> Date: Tue Sep 16 17:36:06 2014 +0100 KEYS: Remove key_type::match in favour of overriding default by match_preparse which went in before 3.18-rc1. Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2017-04-18KEYS: Disallow keyrings beginning with '.' to be joined as session keyringsDavid Howells
This fixes CVE-2016-9604. Keyrings whose name begin with a '.' are special internal keyrings and so userspace isn't allowed to create keyrings by this name to prevent shadowing. However, the patch that added the guard didn't fix KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings, it can also subscribe to them as a session keyring if they grant SEARCH permission to the user. This, for example, allows a root process to set .builtin_trusted_keys as its session keyring, at which point it has full access because now the possessor permissions are added. This permits root to add extra public keys, thereby bypassing module verification. This also affects kexec and IMA. This can be tested by (as root): keyctl session .builtin_trusted_keys keyctl add user a a @s keyctl list @s which on my test box gives me: 2 keys in keyring: 180010936: ---lswrv 0 0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05 801382539: --alswrv 0 0 user: a Fix this by rejecting names beginning with a '.' in the keyctl. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> cc: linux-ima-devel@lists.sourceforge.net cc: stable@vger.kernel.org
2017-04-18powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=yMichael Ellerman
Prior to commit 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode() was just a bl hmi_exception_realmode, which the linker would turn into a bl to the local entry point of hmi_exception_realmode. This was broken when CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of the kernel text that is copied down to 0x0. But in fixing that, we added a new bug on little endian kernels. Because the branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry point of hmi_exception_realmode(). The global entry point must be called with r12 containing the address of hmi_exception_realmode(), because it uses that value to calculate the TOC value (r2). This may manifest as a checkstop, because we take a junk value from r12 which came from HSRR1, add a small constant to it and then use that as the TOC pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above RAM and somewhere in MMIO space. Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the label we're branching to. This means r12 will be setup correctly on LE, fixing this bug, and r12 is also volatile across function calls on BE so it's a good choice anyway. Fixes: 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts") Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18powerpc/kprobe: Fix oops when kprobed on 'stdu' instructionRavi Bangoria
If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel OOPS: Bad kernel stack pointer cd93c840 at c000000000009868 Oops: Bad kernel stack pointer, sig: 6 [#1] ... GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840 ... NIP [c000000000009868] resume_kernel+0x2c/0x58 LR [c000000000006208] program_check_common+0x108/0x180 On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does not emulate actual store in emulate_step() because it may corrupt the exception frame. So the kernel does the actual store operation in exception return code i.e. resume_kernel(). resume_kernel() loads the saved stack pointer from memory using lwz, which only loads the low 32-bits of the address, causing the kernel crash. Fix this by loading the 64-bit value instead. Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> [mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18x86: Enable KASLR by defaultIngo Molnar
KASLR is mature (and important) enough to be enabled by default on x86. Also enable it by default in the defconfigs. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: dan.j.williams@intel.com Cc: dave.jiang@intel.com Cc: dyoung@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18boot/param: Move next_arg() function to lib/cmdline.c for later reuseBaoquan He
next_arg() will be used to parse boot parameters in the x86/boot/compressed code, so move it to lib/cmdline.c for better code reuse. No change in functionality. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Cc: Jens Axboe <axboe@fb.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: dave.jiang@intel.com Cc: dyoung@redhat.com Cc: keescook@chromium.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18x86/unwind: Ensure stack pointer is alignedJosh Poimboeuf
With frame pointers disabled, on some older versions of GCC (like 4.8.3), it's possible for the stack pointer to get aligned at a half-word boundary: 00000000000004d0 <fib_table_lookup>: 4d0: 41 57 push %r15 4d2: 41 56 push %r14 4d4: 41 55 push %r13 4d6: 41 54 push %r12 4d8: 55 push %rbp 4d9: 53 push %rbx 4da: 48 83 ec 24 sub $0x24,%rsp In such a case, the unwinder ends up reading the entire stack at the wrong alignment. Then the last read goes past the end of the stack, hitting the stack guard page: BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff) kernel stack overflow (page fault): 0000 [#1] SMP ... Fix it by ensuring the stack pointer is properly aligned before unwinding. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations") Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18x86/mce: Update notifier priority checkBorislav Petkov
Update the check which enforces the registration of MCE decoder notifier callbacks with valid priority only, to include mcelog's priority. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkp@01.org Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18af_key: Fix sadb_x_ipsecrequest parsingHerbert Xu
The parsing of sadb_x_ipsecrequest is broken in a number of ways. First of all we're not verifying sadb_x_ipsecrequest_len. This is needed when the structure carries addresses at the end. Worse we don't even look at the length when we parse those optional addresses. The migration code had similar parsing code that's better but it also has some deficiencies. The length is overcounted first of all as it includes the header itself. It also fails to check the length before dereferencing the sa_family field. This patch fixes those problems in parse_sockaddr_pair and then uses it in parse_ipsecrequest. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-17Merge branch 'parisc-4.11-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "One patch which fixes get_user() for 64-bit values on 32-bit kernels. Up to now we lost the upper 32-bits of the returned 64-bit value" * 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix get_user() for 64-bit value on 32-bit kernel
2017-04-17cifs: Do not send echoes before Negotiate is completeSachin Prabhu
commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") added support for Negotiate requests to be initiated by echo calls. To avoid delays in calling echo after a reconnect, I added the patch introduced by the commit b8c600120fc8 ("Call echo service immediately after socket reconnect"). This has however caused a regression with cifs shares which do not have support for echo calls to trigger Negotiate requests. On connections which need to call Negotiation, the echo calls trigger an error which triggers a reconnect which in turn triggers another echo call. This results in a loop which is only broken when an operation is performed on the cifs share. For an idle share, it can DOS a server. The patch uses the smb_operation can_echo() for cifs so that it is called only if connection has been already been setup. kernel bz: 194531 Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Tested-by: Jonathan Liu <net147@gmail.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-17ftrace: Fix function pid filter on instancesNamhyung Kim
When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17Merge branch 'bpf-fixes'David S. Miller
Daniel Borkmann says: ==================== Two BPF fixes The set fixes cb_access and xdp_adjust_head bits in struct bpf_prog, that are used for requirement checks on the program rather than f.e. heuristics. Thus, for tail calls, we cannot make any assumptions and are forced to set them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: fix checking xdp_adjust_head on tail callsDaniel Borkmann
Commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") added the xdp_adjust_head bit to the BPF prog in order to tell drivers that the program that is to be attached requires support for the XDP bpf_xdp_adjust_head() helper such that drivers not supporting this helper can reject the program. There are also drivers that do support the helper, but need to check for xdp_adjust_head bit in order to move packet metadata prepended by the firmware away for making headroom. For these cases, the current check for xdp_adjust_head bit is insufficient since there can be cases where the program itself does not use the bpf_xdp_adjust_head() helper, but tail calls into another program that uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still set to 0. Since the first program has no control over which program it calls into, we need to assume that bpf_xdp_adjust_head() helper is used upon tail calls. Thus, for the very same reasons in cb_access, set the xdp_adjust_head bit to 1 when the main program uses tail calls. Fixes: 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: fix cb access in socket filter programs on tail callsDaniel Borkmann
Commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs") added a fix for socket filter programs such that in i) AF_PACKET the 20 bytes of skb->cb[] area gets zeroed before use in order to not leak data, and ii) socket filter programs attached to TCP/UDP sockets need to save/restore these 20 bytes since they are also used by protocol layers at that time. The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb() only look at the actual attached program to determine whether to zero or save/restore the skb->cb[] parts. There can be cases where the actual attached program does not access the skb->cb[], but the program tail calls into another program which does access this area. In such a case, the zero or save/restore is currently not performed. Since the programs we tail call into are unknown at verification time and can dynamically change, we need to assume that whenever the attached program performs a tail call, that later programs could access the skb->cb[], and therefore we need to always set cb_access to 1. Fixes: ff936a04e5f2 ("bpf: fix cb access in socket filter programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv6: drop non loopback packets claiming to originate from ::1Florian Westphal
We lack a saddr check for ::1. This causes security issues e.g. with acls permitting connections from ::1 because of assumption that these originate from local machine. Assuming a source address of ::1 is local seems reasonable. RFC4291 doesn't allow such a source address either, so drop such packets. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17fix nfs O_DIRECT advancing iov_iter too muchAl Viro
It leaves the iterator advanced by the amount of IO it has requested instead of the amount actually transferred. Among other things, that confuses the hell out of generic_file_splice_read(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17p9_client_readdir() fixAl Viro
Don't assume that server is sane and won't return more data than asked for. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17orangefs_bufmap_copy_from_iovec(): fix EFAULT handlingAl Viro
short copy here should mean instant EFAULT, not "move to the next page and hope it fails there, this time with nothing copied" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17Merge tag 'sunxi-clk-fixes-for-4.11-2-bis' of ↵Stephen Boyd
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes Pull Allwinner clock fixes for 4.11 from Maxime Ripard: Two build errors fixes for the sunxi-ng drivers. The two other patches fix random CPU crashes happening on the A33 since CPUFreq has been enabled in 4.11. * tag 'sunxi-clk-fixes-for-4.11-2-bis' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: clk: sunxi-ng: a33: gate then ungate PLL CPU clk after rate change clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks clk: sunxi-ng: fix build failure in ccu-sun9i-a80 driver clk: sunxi-ng: fix build error without CONFIG_RESET_CONTROLLER
2017-04-17Merge branch 'mediatek-tx-bugs'David S. Miller
Sean Wang says: ==================== mediatek: Fix crash caused by reporting inconsistent skb->len to BQL Changes since v1: - fix inconsistent enumeration which easily causes the potential bug The series fixes kernel BUG caused by inconsistent SKB length reported into BQL. The reason for inconsistent length comes from hardware BUG which results in different port number carried on the TXD within the lifecycle of SKB. So patch 2) is proposed for use a software way to track which port the SKB involving instead of hardware way. And patch 1) is given for another issue I found which causes TXD and SKB inconsistency that is not expected in the initial logic, so it is also being corrected it in the series. The log for the kernel BUG caused by the issue is posted as below. [ 120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26! [ 120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 120.842778] Modules linked in: [ 120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 #35 [ 120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree) [ 120.859012] task: c1007480 task.stack: c1000000 [ 120.863510] PC is at dql_completed+0x108/0x17c [ 120.867915] LR is at 0x46 [ 120.870512] pc : [<c03c19c8>] lr : [<00000046>] psr: 80000113 [ 120.870512] sp : c1001d58 ip : c1001d80 fp : c1001d7c [ 120.881895] r10: 0000003e r9 : df6b3400 r8 : 0ed86506 [ 120.887075] r7 : 00000001 r6 : 00000001 r5 : 0ed8654c r4 : df0135d8 [ 120.893546] r3 : 00000001 r2 : df016800 r1 : 0000fece r0 : df6b3480 [ 120.900018] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 120.907093] Control: 10c5387d Table: 9e27806a DAC: 00000051 [ 120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218) [ 120.918744] Stack: (0xc1001d58 to 0xc1002000) .... 121.085331] 1fc0: 00000000 c0a52a28 00000000 c10855d4 c1003c58 c0a52a24 c100885c 8000406a [ 121.093444] 1fe0: 410fc073 00000000 00000000 c1001ff8 8000807c c0a009cc 00000000 00000000 [ 121.101575] [<c03c19c8>] (dql_completed) from [<c04cb010>] (mtk_napi_tx+0x1d0/0x37c) [ 121.109263] [<c04cb010>] (mtk_napi_tx) from [<c05e28cc>] (net_rx_action+0x24c/0x3b8) [ 121.116951] [<c05e28cc>] (net_rx_action) from [<c010152c>] (__do_softirq+0xe4/0x35c) [ 121.124638] [<c010152c>] (__do_softirq) from [<c012a624>] (irq_exit+0xe8/0x150) [ 121.131895] [<c012a624>] (irq_exit) from [<c017750c>] (__handle_domain_irq+0x70/0xc4) [ 121.139666] [<c017750c>] (__handle_domain_irq) from [<c0101404>] (gic_handle_irq+0x58/0x9c) [ 121.147953] [<c0101404>] (gic_handle_irq) from [<c010e18c>] (__irq_svc+0x6c/0x90) [ 121.155373] Exception stack(0xc1001ef8 to 0xc1001f40) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: ethernet: mediatek: fix inconsistency of port number carried in TXDSean Wang
Fix port inconsistency on TXD due to hardware BUG that would cause different port number is carried on the same TXD between tx_map() and tx_unmap() with the iperf test. It would cause confusing BQL logic which leads to kernel panic when dual GMAC runs concurrently. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: ethernet: mediatek: fix inconsistency between TXD and the used bufferSean Wang
Fix inconsistency between the TXD descriptor and the used buffer that would cause unexpected logic at mtk_tx_unmap() during skb housekeeping. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: phy: micrel: fix crash when statistic requested for KSZ9031 phyGrygorii Strashko
Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 00000010" from: (kszphy_get_stats) from [<c069f1d8>] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [<c06a0738>] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [<c06b5484>] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [<c0679f7c>] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [<c02419d4>] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [<c02422c4>] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev ruleDavid Ahern
Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr. Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: thunderx: Fix set_max_bgx_per_node for 81xx rgxGeorge Cherian
Add the PCI_SUBSYS_DEVID_81XX_RGX and use the same to set the max bgx per node count. This fixes the issue intoduced by following commit 78aacb6f6 net: thunderx: Fix invalid mac addresses for node1 interfaces With this commit the max_bgx_per_node for 81xx is set as 2 instead of 3 because of which num_vfs is always calculated as zero. Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net-timestamp: avoid use-after-free in ip_recv_errorWillem de Bruijn
Syzkaller reported a use-after-free in ip_recv_error at line info->ipi_ifindex = skb->dev->ifindex; This function is called on dequeue from the error queue, at which point the device pointer may no longer be valid. Save ifindex on enqueue in __skb_complete_tx_timestamp, when the pointer is valid or NULL. Store it in temporary storage skb->cb. It is safe to reference skb->dev here, as called from device drivers or dev_queue_xmit. The exception is when called from tcp_ack_tstamp; in that case it is NULL and ifindex is set to 0 (invalid). Do not return a pktinfo cmsg if ifindex is 0. This maintains the current behavior of not returning a cmsg if skb->dev was NULL. On dequeue, the ipv4 path will cast from sock_exterr_skb to in_pktinfo. Both have ifindex as their first element, so no explicit conversion is needed. This is by design, introduced in commit 0b922b7a829c ("net: original ingress device index in PKTINFO"). For ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo. Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv4: fix a deadlock in ip_ra_controlWANG Cong
Similar to commit 87e9f0315952 ("ipv4: fix a potential deadlock in mcast getsockopt() path"), there is a deadlock scenario for IP_ROUTER_ALERT too: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET); lock(rtnl_mutex); lock(sk_lock-AF_INET); Fix this by always locking RTNL first on all setsockopt() paths. Note, after this patch ip_ra_lock is no longer needed either. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17cpufreq: schedutil: Use policy-dependent transition delaysRafael J. Wysocki
Make the schedutil governor take the initial (default) value of the rate_limit_us sysfs attribute from the (new) transition_delay_us policy parameter (to be set by the scaling driver). That will allow scaling drivers to make schedutil use smaller default values of rate_limit_us and reduce the default average time interval between consecutive frequency changes. Make intel_pstate set transition_delay_us to 500. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-04-17nbd: add a flag to destroy an nbd device on disconnectJosef Bacik
For ease of management it would be nice for users to specify that the device node for a nbd device is destroyed once it is disconnected and there are no more users. Add a client flag and enable this operation to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add device refcountingJosef Bacik
In order to support deleting the device on disconnect we need to refcount the actual nbd_device struct. So add the refcounting framework and change how we free the normal devices at rmmod time so we can catch reference leaks. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a status netlink commandJosef Bacik
Allow users to query the status of existing nbd devices. Right now this only returns whether or not the device is connected, but could be extended in the future to include more information. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: handle dead connectionsJosef Bacik
Sometimes we like to upgrade our server without making all of our clients freak out and reconnect. This patch provides a way to specify a dead connection timeout to allow us to pause all requests and wait for new connections to be opened. With this in place I can take down the nbd server for less than the dead connection timeout time and bring it back up and everything resumes gracefully. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: only clear the queue on device teardownJosef Bacik
When running a disconnect torture test I noticed that sometimes we would crash with a negative ref count on our queue. This was because we were ending the same request twice. Turns out we were racing with NBD_CLEAR_SOCK clearing the requests as well as the teardown of the device clearing the requests. So instead make the ioctl only shutdown the sockets and make it so that we only ever run nbd_clear_que from the device teardown. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: multicast dead link notificationsJosef Bacik
Provide a mechanism to notify userspace that there's been a link problem on a NBD device. This will allow userspace to re-establish a connection and provide the new socket to the device without disrupting the device. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a reconfigure netlink commandJosef Bacik
We want to be able to reconnect dead connections to existing block devices, so add a reconfigure netlink command. We will also allow users to change their timeout on the fly, but everything else will require a disconnect and reconnect. You won't be able to add more connections either, simply replace dead connections with new more lively connections. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a basic netlink interfaceJosef Bacik
The existing ioctl interface for configuring NBD devices is a bit cumbersome and hard to extend. The other problem is we leave a userspace app sitting in it's syscall until the device disconnects, which is less than ideal. This patch introduces a netlink interface for adding and disconnecting nbd devices. This has the benefits of being easily extendable without breaking older userspace applications, and allows us to configure a nbd device without leaving a userspace app sitting waiting for the device to disconnect. With this interface we also gain the ability to configure more devices than are preallocated at insmod time. We also have gained the ability to not specify a particular device and be provided one for us so that userspace doesn't need to find a free device to configure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>