Age | Commit message (Collapse) | Author |
|
Add the ability to offload PRIO qdisc by using ndo_setup_tc.
There are three commands for PRIO offloading:
* TC_PRIO_REPLACE: handles set and tune
* TC_PRIO_DESTROY: handles qdisc destroy
* TC_PRIO_STATS: updates the qdiscs counters (given as reference)
Like RED qdisc, the indication of whether PRIO is being offloaded is being
set and updated as part of the dump function. It is so because the driver
could decide to offload or not based on the qdisc parent, which could
change without notifying the qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.
This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Lunn says:
====================
mv88e6xxx: ATU and VTU interrupts
Both the ATU and VTU of Mavell switches can generate interrupts when
violations occur. Trap this interrupts and print what violation
occurred.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When there is a problem with the VTU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When there is a problem with the ATU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit eb789980d0aa ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.
Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.
This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Divides by zero are not nice, lets avoid them if possible.
Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
On targets that have different sizes for phys_addr_t and dma_addr_t,
we get a type mismatch error:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_alloc_dring':
drivers/net/ethernet/socionext/netsec.c:970:9: error: passing argument 3 of 'dma_zalloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
The code is otherwise correct, as the address is never actually used as a
physical address but only passed into a DMA register. For consistently,
I'm changing the variable name as well, to clarify that this is a DMA
address.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-01-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Follow-up fix to the recent BPF out-of-bounds speculation
fix that prevents max_entries overflows and an undefined
behavior on 32 bit archs on index_mask calculation, from
Daniel.
2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
was otherwise throwing an unknown opcode warning in the
interpreter, from Daniel.
3) Typo fix in one of the user facing verbose() messages that
was added during the BPF out-of-bounds speculation fix,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit b371ae0d4a194b178817b0edfb6a7395c7aec37a. It causes
boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/20171128145350.21560-1-ville.syrjala@linux.intel.com
|
|
SEGV_PKUERR is a signal specific si_code which happens to have the same
numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF,
TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe
to just test the si_code the signal number must also be tested to prevent a
false positive in fill_sig_info_pkey.
This error was by inspection, and BUS_MCEERR_AR appears to be a real
candidate for confusion. So pass in si_signo and check for SIG_SEGV to
verify that it is actually a SEGV_PKUERR
Fixes: 019132ff3daf ("x86/mm/pkeys: Fill in pkey field in siginfo")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180112203135.4669-2-ebiederm@xmission.com
|
|
If CPU and TSC frequency are the same the printout of the CPU frequency is
valid for the TSC as well:
tsc: Detected 2900.000 MHz processor
If the TSC frequency is different there is no information in dmesg. Add a
conditional printout:
tsc: Detected 2904.000 MHz TSC
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
|
|
The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:
- SKX workstations (with same model # as server variants) use a 24 MHz
crystal. This results in a -4.0% time drift rate on SKX workstations.
- SKX servers subject the crystal to an EMI reduction circuit that reduces its
actual frequency by (approximately) -0.25%. This results in -1 second per
10 minute time drift as compared to network time.
This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer). Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC). This causes Linux to poll-idle, when
it should be in an idle power saving state. The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.
Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.
[ tglx: Sanitized change log ]
Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
|
|
If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.
As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.
Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.
[ tglx: Steam-blastered changelog. Sigh ]
Fixes: 4ca4df0b7eb0 ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Bin Gao <bin.gao@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
|
|
The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.
This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.
As a quick fix disable this driver when PTI is enabled to prevent
malfunction.
Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: greg@kroah.com
Cc: hughd@google.com
Cc: luto@amacapital.net
Cc: Vince Weaver <vince@deater.net>
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
|
|
When the config option for PTI was added a reference to documentation was
added as well. But the documentation did not exist at that point. The final
documentation has a different file name.
Fix it up to point to the proper file.
Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
|
|
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.
This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.
While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.
This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.
Make sure that on non PCID machines bit 11 is not set by the page table
switching code.
Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.
That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and device ids for 4.15-rc8
Nothing major, small fixes for various devices, some resolutions for
bugs found by fuzzers, and the usual handful of new device ids.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Documentation: usb: fix typo in UVC gadgetfs config command
usb: misc: usb3503: make sure reset is low for at least 100us
uas: ignore UAS for Norelsys NS1068(X) chips
USB: UDC core: fix double-free in usb_add_gadget_udc_release
USB: fix usbmon BUG trigger
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
usbip: remove kernel addresses from usb device and urb debug msgs
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
USB: serial: cp210x: add new device ID ELV ALC 8xxx
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single android ashmem bugfix that resolves a reported issue
in that interface. It's been in linux-next this week with no reported
issues"
* tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are two bugfixes for some driver bugs for 4.15-rc8
The first is a bluetooth security bug that has been ignored by the
Bluetooth developers for months for no obvious reason at all, so I've
taken it through my tree.
The second is a simple double-free bug in the mux subsystem.
Both have been in linux-next for a while with no reported issues"
* tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mux: core: fix double get_device()
Bluetooth: Prevent stack info leak from the EFS element.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix cross-compilation for architectures that setup CROSS_COMPILE in
their arch Makefile
- fix Kconfig rational operators for bool / tristate
- drop a gperf-generated file from .gitignore
* tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: drop *.hash.c from .gitignore
kconfig: fix relational operators for bool and tristate symbols
kbuild: move cc-option and cc-disable-warning after incl. arch Makefile
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor regression fixes from John Johansen:
"This fixes a couple bugs I have been working with Matthew Garrett on
this week. Specifically a regression in the handling of a conflicting
profile attachment and label match restrictions for ptrace when
profiles are stacked.
Summary:
- fix ptrace label match when matching stacked labels
- fix regression in profile conflict logic"
* tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in profile conflict logic
apparmor: fix ptrace label match when matching stacked labels
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Fix AMD boot regression due to 64-bit window conflicting with system
memory (Christian König)"
* tag 'pci-v4.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Move and shrink AMD 64-bit window to avoid conflict
x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows
|
|
Merge misc fixlets from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/objtool/Makefile: don't assume sync-check.sh is executable
kdump: write correct address of mem_section into vmcoreinfo
kmemleak: allow to coexist with fault injection
MAINTAINERS, nilfs2: change project home URLs
|
|
patch(1) loses the x bit. So if a user follows our patching
instructions in Documentation/admin-guide/README.rst, their kernel will
not compile.
Fixes: 3bd51c5a371de ("objtool: Move kernel headers/code sync check to a script")
Reported-by: Nicolas Bock <nicolasbock@gentoo.org>
Reported-by Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically. In most cases, we can continue to
refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array"
if mem_section is an array, but if mem_section is a pointer, it would
mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section)
writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.
Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kmemleak does one slab allocation per user allocation. So if slab fault
injection is enabled to any degree, kmemleak instantly fails to allocate
and turns itself off. However, it's useful to use kmemleak with fault
injection to find leaks on error paths. On the other hand, checking
kmemleak itself is not so useful because (1) it's a debugging tool and
(2) it has a very regular allocation pattern (basically a single
allocation site, so it either works or not).
Turn off fault injection for kmemleak allocations.
Link: http://lkml.kernel.org/r/20180109192243.19316-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The domain of NILFS project home was changed to "nilfs.sourceforge.io"
to enable https access (the previous domain "nilfs.sourceforge.net" is
redirected to the new one). Modify URLs of the project home to reflect
this change and to replace their protocol from http to https.
Link: http://lkml.kernel.org/r/1515416141-5614-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is a left-over of commit bb3290d91695 ("Remove gperf usage from
toolchain").
We do not generate a hash function any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically. In most cases, we can continue to refer
to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if
mem_section is an array, but if mem_section is a pointer, it would mean
"address of the pointer".
We've stepped onto this in the kdump code: VMCOREINFO_SYMBOL(mem_section)
writes down the address of pointer into vmcoreinfo, not the array as we wanted,
breaking kdump.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.
Mike Galbraith <efault@gmx.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.
If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)
Note to KAISER backporters: please test this under all three
vsyscall modes. Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in. It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.
Greg, etc: please backport this to all your Meltdown-patched
kernels. It'll help make sure the patches didn't regress
vsyscalls.
CSigned-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Masami Hiramatsu says:
====================
Here are the 5th version of patches to moving error injection
table from kprobes. This version fixes a bug and update
fail-function to support multiple function error injection.
Here is the previous version:
https://patchwork.ozlabs.org/cover/858663/
Changes in v5:
- [3/5] Fix a bug that within_error_injection returns false always.
- [5/5] Update to support multiple function error injection.
Thank you,
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.
Here is the result of test script described in
Documentation/fault-injection/fault-injection.txt
===========
# ./test_fail_function.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0227404 s, 46.1 MB/s
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: bfa96010-12e9-4360-aed0-42eec7af5798
Node size: 16384
Sector size: 4096
Filesystem size: 1001.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 58.00MiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 1001.00MiB /dev/loop2
mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
SUCCESS!
===========
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add injectable error types for each error-injectable function.
One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.
But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.
To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:
- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.
ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.
ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
This error types are shown in debugfs as below.
====
/ # cat /sys/kernel/debug/error_injection/list
open_ctree [btrfs] ERRNO
io_ctl_init [btrfs] ERRNO
====
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.
Some differences has been made:
- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Compare instruction pointer with original one on the
stack instead using per-cpu bpf_kprobe_override flag.
This patch also consolidates reset_current_kprobe() and
preempt_enable_no_resched() blocks. Those can be done
in one place.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.
As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The intended behaviour in apparmor profile matching is to flag a
conflict if two profiles match equally well. However, right now a
conflict is generated if another profile has the same match length even
if that profile doesn't actually match. Fix the logic so we only
generate a conflict if the profiles match.
Fixes: 844b8292b631 ("apparmor: ensure that undecidable profile attachments fail")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Given a label with a profile stack of
A//&B or A//&C ...
A ptrace rule should be able to specify a generic trace pattern with
a rule like
ptrace trace A//&**,
however this is failing because while the correct label match routine
is called, it is being done post label decomposition so it is always
being done against a profile instead of the stacked label.
To fix this refactor the cross check to pass the full peer label in to
the label_match.
Fixes: 290f458a4f16 ("apparmor: allow ptrace checks to be finer grained than just capability")
Cc: Stable <stable@vger.kernel.org>
Reported-by: Matthew Garrett <mjg59@google.com>
Tested-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
As pointed out by Daniel Borkmann, using bpf_target_off() is not
necessary for xdp_rxq_info when extracting queue_index and
ifindex, as these members are u32 like BPF_W.
Also fix trivial spelling mistake introduced in same commit.
Fixes: 02dd3291b2f0 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.5+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two pending (non-PTI) x86 fixes:
- an Intel-MID crash fix
- and an Intel microcode loader blacklist quirk to avoid a
problematic revision"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const"
x86/microcode/intel: Extend BDW late-loading with a revision check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A Kconfig fix, a build fix and a membarrier bug fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
membarrier: Disable preemption when calling smp_call_function_many()
sched/isolation: Make CONFIG_CPU_ISOLATION=y depend on SMP or COMPILE_TEST
ia64, sched/cputime: Fix build error if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"No functional effects intended: removes leftovers from recent lockdep
and refcounts work"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry
locking/lockdep: Remove cross-release leftovers
locking/Documentation: Remove stale crossrelease_fullstack parameter
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two build fixes for clang and two fixes for rather
unlikely situations in the Xen gntdev driver"
* tag 'for-linus-4.15-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: Fix partial gntdev_mmap() cleanup
xen/gntdev: Fix off-by-one error when unmapping with holes
x86: xen: remove the use of VLAIS
x86/xen/time: fix section mismatch for xen_init_time_ops()
|
|
Pull KVM fixes from Paolo Bonzini:
"PPC:
- user-triggerable use-after-free in HPT resizing
- stale TLB entries in the guest
- trap-and-emulate (PR) KVM guests failing to start under pHyp
x86:
- Another "Spectre" fix.
- async pagefault fix
- Revert an old fix for x86 nested virtualization, which turned out
to do more harm than good
- Check shrinker registration return code, to avoid warnings from
upcoming 4.16 -mm patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Add memory barrier on vmcs field lookup
KVM: x86: emulate #UD while in guest mode
x86: kvm: propagate register_shrinker return code
KVM MMU: check pending exception before injecting APF
KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()
KVM: PPC: Book3S PR: Fix WIMG handling under pHyp
KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests
KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a NULL pointer dereference in crypto_remove_spawns that can
be triggered through af_alg"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algapi - fix NULL dereference in crypto_remove_spawns()
|
|
Pull a single NVMe fix from Christoph for 4.15.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- s3mci: mark debug_regs[] as static
- renesas_sdhi: Add MODULE_LICENSE
* tag 'mmc-v4.15-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: s3mci: mark debug_regs[] as static
mmc: renesas_sdhi: Add MODULE_LICENSE
|