Age | Commit message (Collapse) | Author |
|
The classification operations that are used for RSS make use of several
lookup tables. Having hit counters for these tables is really helpful
to determine what flows were matched by ingress traffic, and see the
path of packets among all the classifier tables.
This commit adds hit counters for the 3 tables used at the moment :
- The decoding table (also called lookup_id table), that links flows
identified by the Header Parser to the flow table.
There's one entry per flow, located at :
.../mvpp2/<controller>/flows/XX/dec_hits
Note that there are 21 flows in the decoding table, whereas there are
52 flows in the Header Parser. That's because there are several kind
of traffic that will match a given flow. Reading the hit counter from
one sub-flow will clear all hit counter that have the same flow_id.
This also applies to the flow_hits.
- The flow table, that contains all the different lookups to be
performed by the classifier for each packet of a given flow. The match
is done on the first entry of the flow sequence.
- The C2 engine entries, that are used to assign the default rx queue,
and enable or disable RSS for a given port.
There's one entry per flow, located at:
.../mvpp2/<controller>/flows/XX/flow_hits
There is one C2 entry per port, so the c2 hit counter is located at :
.../mvpp2/<controller>/ethX/c2_hits
All hit counter values are 16-bits clear-on-read values.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The classifier configuration for RSS is quite complex, with several
lookup tables being used. This commit adds useful info in debugfs to
see how the different tables are configured :
Added 2 new entries in the per-port directory :
- .../eth0/default_rxq : The default rx queue on that port
- .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry
Added the 'flows' directory :
It contains one entry per sub-flow. a 'sub-flow' is a unique path from
Header Parser to the flow table. Multiple sub-flows can point to the
same 'flow' (each flow has an id from 8 to 29, which is its index in the
Lookup Id table) :
- .../flows/00/...
/01/...
...
/51/id : The flow id. There are 21 unique flows. There's one
flow per combination of the following parameters :
- L4 protocol (TCP, UDP, none)
- L3 protocol (IPv4, IPv6)
- L3 parameters (Fragmented or not)
- L2 parameters (Vlan tag presence or not)
.../type : The flow type. This is an even higher level flow,
that we manipulate with ethtool. It can be :
"udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other".
.../eth0/...
.../eth1/engine : The hash generation engine used for this
flow on the given port
.../hash_opts : The hash generation options indicating on
what data we base the hash (vlan tag, src
IP, src port, etc.)
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
One helpful feature to help debug the Header Parser TCAM filter in PPv2
is to be able to see if the entries did match something when a packet
comes in. This can be done by using the built-in hit counter for TCAM
entries.
This commit implements reading the counter, and exposing its value on
debugfs for each filter entry.
The counter is a 16-bits clear-on-read value, located at:
.../mvpp2/<controller>/parser/XXX/hits
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not
trivial to configure and debug. Being able to dump TCAM entries from
userspace can be really helpful to help development of new features
and debug existing ones.
This commit adds a basic debugfs interface for the PPv2 driver, focusing
on TCAM related features.
<mnt>/mvpp2/ --- f2000000.ethernet
\- f4000000.ethernet --- parser --- 000 ...
| \- 001
| \- ...
| \- 255 --- ai
| \- header_data
| \- lookup_id
| \- sram
| \- valid
\- eth1 ...
\- eth2 --- mac_filter
\- parser_entries
\- vid_filter
There's one directory per PPv2 instance, named after pdev->name to make
sure names are uniques. In each of these directories, there's :
- one directory per interface on the controller, each containing :
- "mac_filter", which lists all filtered addresses for this port
(based on TCAM, not on the kernel's uc / mc lists)
- "parser_entries", which lists the indices of all valid TCAM
entries that have this port in their port map
- "vid_filter", which lists the vids allowed on this port, based on
TCAM
- one "parser" directory (the parser is common to all ports), containing :
- one directory per TCAM entry (256 of them, from 0 to 255), each
containing :
- "ai" : Contains the 1 byte Additional Info field from TCAM, and
- "header_data" : Contains the 8 bytes Header Data extracted from
the packet
- "lookup_id" : Contains the 4 bits LU_ID
- "sram" : contains the raw SRAM data, which is the result of the TCAM
lookup. This readonly at the moment.
- "valid" : Indicates if the entry is valid of not.
All entries are read-only, and everything is output in hex form.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the appropriate SPDX license identifiers and drop the license text.
This patch is only cosmetic.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
I already pulled the first fix, pull the GVT fixes.
- GVT fix for KBL vGPU hang to update virtual register from LRI.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180713070922.GA19840@intel.com
|
|
into drm-fixes
Two armada fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180713075427.GA16160@rmk-PC.armlinux.org.uk
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Fixes for v4.18-rc5:
- Single fix for a build error when the driver is builtin,
but the backend is a loadable module.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9c596cf5-3f24-070e-74f2-c59bfbaf68fa@linux.intel.com
|
|
git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v4.18-rc5
This contains a couple of one- or two-line fixes for various minor
issues in the Tegra driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712070142.15571-1-thierry.reding@gmail.com
|
|
into drm-fixes
A few display and GPUVM fixes for 4.18.
A few more fixes for 4.18. Two display fixes and a fix to avoid a segfault if
the GPU does not power up properly on resume. These are on top of my pull
from earlier this week.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712043820.2877-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix hotplug irq ack on i965/g4x (Ville)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710213249.GA16479@intel.com
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
- A fix for OMAP5 and DRA7 to make the branch predictor hardening
settings take proper effect on secondary cores
- Disable USB OTG on am3517 since current driver isn't working
- Fix thermal sensor register settings on Armada 38x
- Fix suspend/resume IRQs on pxa3xx
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
ARM: dts: armada-38x: use the new thermal binding
|
|
pvti_cpu0_va is the address of shared kvmclock data structure.
pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when
kvmclock vsyscall is disabled, and (3) if kvmclock is not stable.
This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can
work on 32 bit, (2) has little relation to the vsyscall, and (3) does
not need stable kvmclock (although kvmclock won't be used for system
clock if it's not stable, so kvm_ptp is pointless in that case).
Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to
work with it.
This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544.
Fixes: 9f08890ab906 ("x86/pvclock: add setter for pvclock_pvti_cpu0_va")
Cc: stable@vger.kernel.org
Reported-by: Andreas Steinmetz <ast@domdv.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring
that AMD Secure Processor device driver will be built-in when KVM_AMD is
also built-in.
v1->v2:
* Removed usage of 'imply' Kconfig option.
* Change patch commit message.
Fixes: 505c9e94d832 ("KVM: x86: prefer "depends on" to "select" for SEV")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This exit qualification was inadvertently dropped when the two
VM-entry failure blocks were coalesced.
Fixes: e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When we switched from doing rdmsr() to reading FS/GS base values from
current->thread we completely forgot about legacy 32-bit userspaces which
we still support in KVM (why?). task->thread.{fsbase,gsbase} are only
synced for 64-bit processes, calling save_fsgs_for_kvm() and using
its result from current is illegal for legacy processes.
There's no ARCH_SET_FS/GS prctls for legacy applications. Base MSRs are,
however, not always equal to zero. Intel's manual says (3.4.4 Segment
Loading Instructions in IA-32e Mode):
"In order to set up compatibility mode for an application, segment-load
instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An
entry is read from the system descriptor table (GDT or LDT) and is loaded
in the hidden portion of the segment register.
...
The hidden descriptor register fields for FS.base and GS.base are
physically mapped to MSRs in order to load all address bits supported by
a 64-bit implementation.
"
The issue was found by strace test suite where 32-bit ioctl_kvm_run test
started segfaulting.
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Bisected-by: Masatake YAMATO <yamato@redhat.com>
Fixes: 42b933b59721 ("x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->thread")
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-15
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various different arm32 JIT improvements in order to optimize code emission
and make the JIT code itself more robust, from Russell.
2) Support simultaneous driver and offloaded XDP in order to allow for advanced
use-cases where some work is offloaded to the NIC and some to the host. Also
add ability for bpftool to load programs and maps beyond just the cgroup case,
from Jakub.
3) Add BPF JIT support in nfp for multiplication as well as division. For the
latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.
4) Add BTF pretty print functionality to bpftool in plain and JSON output
format, from Okash.
5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
6) Add a TCP BPF callback for listening sockets which is triggered right after
the socket transitions to TCP_LISTEN state, from Andrey.
7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
tree and prints all attached programs, from Roman.
8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
packets, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Two fixes for 4.18:
- an important core fix for RTCs using the core offsetting only one
driver is affected
- a fix for the error path of mrst"
* tag 'rtc-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: fix alarm read and set offset
rtc: mrst: fix error code in probe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Two omap fixes for v4.18-rc cycle
Turns out the recent patches for ARM branch predictor hardening are
not working on omap5 and dra7 as planned because the secondary CPU
is parked to the bootrom code. We can't configure it in the bootloader.
So we must enable invalidates of BTB for omap5 and dra7 secondary
core in the kernel.
And there's a fix for reserved register access for am3517. The
usb otg module on am3517 is not the same as for other omap3.
* tag 'omap-for-v4.18/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
mvebu fixes for 4.18 (part 1)
Use the new thermal binding on Armada 38x allowing to use a driver fix
which is already part of the kernel.
* tag 'mvebu-fixes-4.18-1' of git://git.infradead.org/linux-mvebu:
ARM: dts: armada-38x: use the new thermal binding
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
This is the fixes set for v4.18 cycle.
This is a fix for suspending all pxa3xx platforms, where high
number interrupts are not reenabled.
* tag 'pxa-fixes-4.18' of https://github.com/rjarzmik/linux:
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Andrey Ignatov says:
====================
This patchset adds TCP-BPF callback for listening sockets.
Patch 0001 provides more details and is the main patch in the set.
Patch 0006 adds selftest for the new callback.
Other patches are bug fixes and improvements in TCP-BPF selftest
to make it easier to extend in 0006.
====================
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Cover new TCP-BPF callback in test_tcpbpf: when listen() is called on
socket, set BPF_SOCK_OPS_STATE_CB_FLAG so that BPF_SOCK_OPS_STATE_CB
callback can be called on future state transition, and when such a
transition happens (TCP_LISTEN -> TCP_CLOSE), track it in the map and
verify it in user space later.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Reduce amount of copy/paste for debug info when result is verified in
the test and keep that info together with values being checked so that
they won't get out of sync.
It also improves debug experience: instead of checking manually what
doesn't match in debug output for all fields, only unexpected field is
printed.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Switch to cgroup_helpers to simplify the code and fix cgroup cleanup:
before cgroup was not cleaned up after the test.
It also removes SYSTEM macro, that only printed error, but didn't
terminate the test.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Lack of const in cgroup helpers signatures forces to write ugly client
code. Fix it.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Sync BPF_SOCK_OPS_TCP_LISTEN_CB related UAPI changes to tools/.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add new TCP-BPF callback that is called on listen(2) right after socket
transition to TCP_LISTEN state.
It fills the gap for listening sockets in TCP-BPF. For example BPF
program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening
and track later transition from TCP_LISTEN to TCP_CLOSE with
BPF_SOCK_OPS_STATE_CB callback.
Before there was no way to do it with TCP-BPF and other options were
much harder to work with. E.g. socket state tracking can be done with
tracepoints (either raw or regular) but they can't be attached to cgroup
and their lifetime has to be managed separately.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two related fixes for a boot failure of Xen PV guests"
* tag 'for-linus-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: setup pv irq ops vector earlier
xen: remove global bit from __default_kernel_pte_mask for pv guests
|
|
Pull block fix from Jens Axboe:
"Just a single regression fix (from 4.17) for bsg, fixing an EINVAL
return on non-data commands"
* tag 'for-linus-20180713' of git://git.kernel.dk/linux-block:
bsg: fix bogus EINVAL on non-data commands
|
|
Ido Schimmel says:
====================
mlxsw: Add VRRP support
When a router that is acting as the default gateway of a host stops
functioning, the host will encounter packet loss until the router starts
functioning again.
To increase the reliability of the default gateway without performing
reconfiguration on the host, a host can use a Virtual Router Redundancy
Protocol (VRRP) Router. This virtual router is composed from several
routers where only one is actually forwarding packets from the host (the
master router) while the other routers act as backup routers. The
election of the master router is determined by the VRRP protocol [1].
Packets addressed to the virtual router are always sent to the virtual
router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX).
Such packets can only be accepted by the master router and must be
discarded by the backup routers.
In Linux, VRRP is usually implemented by configuring a macvlan with the
virtual router MAC on top of the router interface that is connected to
the host / LAN. The macvlan on the master router is assigned the virtual
IP (VIP) that the host uses as its gateway.
In order to support VRRP in mlxsw, we first need to enable macvlan upper
devices on top of mlxsw netdevs and their uppers. This is done by the
first patch, which also takes care of sanitizing macvlan configurations
that are not currently supported by the driver.
The second patch directs packets with destination MAC addresses as the
macvlans to the router so that they will undergo an L3 lookup. This is
consistent with the kernel's behavior where the macvlan's Rx handler
will re-inject such packets to the Rx path so that they will be picked
up by the IPvX protocol handlers and undergo an L3 lookup. Note that the
driver prevents the macvlans from being enslaved to other devices, to
ensure the packets will be picked up by the protocol handler and not by
another Rx handler.
The third patch adds packet traps for VRRP control packets for both IPv4
and IPv6. Finally, the last patch optimizes the reception of VRRP MACs
by potentially skipping one L2 lookup for them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hosts using a VRRP router send their packets with a destination MAC of
the VRRP router which is of the following form [1]:
IPv4 - 00-00-5E-00-01-{VRID}
IPv6 - 00-00-5E-00-02-{VRID}
Where VRID is the ID of the virtual router. Such packets are directed to
the router block in the ASIC by an FDB entry that was added in the
previous patch.
However, in certain cases it is possible to skip this FDB lookup and
send such packets directly to the router. This is accomplished by adding
these special MAC addresses to the RIF cache. If the cache is hit, the
packet will skip the L2 lookup and ingress the router with the RIF
specified in the cache entry.
1. https://tools.ietf.org/html/rfc5798#section-7.3
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Virtual Router Redundancy Protocol packets are used to communicate the
state of the Master router associated with the virtual router ID (VRID).
These are link-local multicast packets sent with IP protocol 112 that
are trapped in the router block in the ASIC.
Add a trap for these packets and mark the trapped packets to prevent
them from potentially being re-flooded by the bridge driver.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An IP packet received on a netdev with a macvlan upper whose MAC matches
the packet's destination MAC will be re-injected to the Rx path as if it
was received by the macvlan, and perform an L3 lookup.
Reflect this functionality to the ASIC by programming FDB entries that
will direct MACs of macvlan uppers to the router.
In a similar fashion to router interfaces (RIFs) that are programmed
upon the addition of the first IP address on an interface and destroyed
upon the removal of the last IP address, the FDB entries for the macvlan
are added and destroyed based on the addition of the first and removal
of the last IP address on the macvlan.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
be directed to the router we need to enable macvlan uppers on top of
mlxsw netdevs.
Allow macvlan upper devices on top of mlxsw netdevs and sanitize
configurations that can't work. For example, a macvlan can't be enslaved
to a bridge as without ACLs the device doesn't take the destination MAC
into account when classifying a packet to a bridge instance (i.e., a
FID).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_rcv_nxt_update() is already executed in tcp_data_queue().
This line is redundant.
See bellow,
tcp_queue_rcv
tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches form Andrew Morton <akpm@linux-foundation.org>:
reiserfs: fix buffer overflow with long warning messages
checkpatch: fix duplicate invalid vsprintf pointer extension '%p<foo>' messages
mm: do not bug_on on incorrect length in __mm_populate()
mm/memblock.c: do not complain about top-down allocations for !MEMORY_HOTREMOVE
fs, elf: make sure to page align bss in load_elf_library
x86/purgatory: add missing FORCE to Makefile target
net/9p/client.c: put refcount of trans_mod in error case in parse_opts()
mm: allow arch to supply p??_free_tlb functions
autofs: fix slab out of bounds read in getname_kernel()
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
mm: do not drop unused pages when userfaultd is running
|
|
ReiserFS prepares log messages into a 1024-byte buffer with no bounds
checks. Long messages, such as the "unknown mount option" warning when
userspace passes a crafted mount options string, overflow this buffer.
This causes KASAN to report a global-out-of-bounds write.
Fix it by truncating messages to the buffer size.
Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Multiline statements with invalid %p<foo> uses produce multiple
warnings. Fix that.
e.g.:
$ cat t_block.c
void foo(void)
{
MY_DEBUG(drv->foo,
"%pk",
foo->boo);
}
$ ./scripts/checkpatch.pl -f t_block.c
WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#1: FILE: t_block.c:1:
+void foo(void)
WARNING: Invalid vsprintf pointer extension '%pk'
#3: FILE: t_block.c:3:
+ MY_DEBUG(drv->foo,
+ "%pk",
+ foo->boo);
WARNING: Invalid vsprintf pointer extension '%pk'
#3: FILE: t_block.c:3:
+ MY_DEBUG(drv->foo,
+ "%pk",
+ foo->boo);
total: 0 errors, 3 warnings, 6 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.
t_block.c has style problems, please review.
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Link: http://lkml.kernel.org/r/9e8341bbe4c9877d159cb512bb701043cbfbb10b.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
kernel BUG at mm/gup.c:1242!
invalid opcode: 0000 [#1] SMP
CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
RIP: 0010:__mm_populate+0x1e2/0x1f0
Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
Call Trace:
vm_brk_flags+0xc3/0x100
vm_brk+0x1f/0x30
load_elf_library+0x281/0x2e0
__ia32_sys_uselib+0x170/0x1e0
do_fast_syscall_32+0xca/0x420
entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is
expanded as it should. All we need is to tell mm_populate about it.
Besides that there is absolutely no reason to to bug_on in the first
place. The worst thing that could happen is that the last page wouldn't
get populated and that is far from putting system into an inconsistent
state.
Fix the issue by moving the length sanitization code from do_brk_flags
up to vm_brk_flags. The only other caller of do_brk_flags is brk
syscall entry and it makes sure to provide the proper length so t here
is no need for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mike Rapoport is converting architectures from bootmem to nobootmem
allocator. While doing so for m68k Geert has noticed that he gets a
scary looking warning:
WARNING: CPU: 0 PID: 0 at mm/memblock.c:230
memblock_find_in_range_node+0x11c/0x1be
memblock: bottom-up allocation failed, memory hotunplug may be affected
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
4.18.0-rc3-atari-01343-gf2fb5f2e09a97a3c-dirty #7
Call Trace: __warn+0xa8/0xc2
kernel_pg_dir+0x0/0x1000
netdev_lower_get_next+0x2/0x22
warn_slowpath_fmt+0x2e/0x36
memblock_find_in_range_node+0x11c/0x1be
memblock_find_in_range_node+0x11c/0x1be
memblock_find_in_range_node+0x0/0x1be
vprintk_func+0x66/0x6e
memblock_virt_alloc_internal+0xd0/0x156
netdev_lower_get_next+0x2/0x22
netdev_lower_get_next+0x2/0x22
kernel_pg_dir+0x0/0x1000
memblock_virt_alloc_try_nid_nopanic+0x58/0x7a
netdev_lower_get_next+0x2/0x22
kernel_pg_dir+0x0/0x1000
kernel_pg_dir+0x0/0x1000
EXPTBL+0x234/0x400
EXPTBL+0x234/0x400
alloc_node_mem_map+0x4a/0x66
netdev_lower_get_next+0x2/0x22
free_area_init_node+0xe2/0x29e
EXPTBL+0x234/0x400
paging_init+0x430/0x462
kernel_pg_dir+0x0/0x1000
printk+0x0/0x1a
EXPTBL+0x234/0x400
setup_arch+0x1b8/0x22c
start_kernel+0x4a/0x40a
_sinittext+0x344/0x9e8
The warning is basically saying that a top-down allocation can break
memory hotremove because memblock allocation is not movable. But m68k
doesn't even support MEMORY_HOTREMOVE so there is no point to warn about
it.
Make the warning conditional only to configurations that care.
Link: http://lkml.kernel.org/r/20180706061750.GH32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Sam Creasey <sammy@sammy.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to
the requested lenght not being correctly aligned.
Let us make sure to align it properly.
Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured
for libc5.
Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
-fno-asynchronous-unwind-tables
- Re-build the kernel
When you look at makes output you see that sha256.o is not re-build in the
last step. Also readelf -S still shows the .eh_frame section for
sha256.o.
With the fix sha256.o is rebuilt in the last step.
Without FORCE make does not detect changes only made to the command line
options. So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.
Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes: df6f2801f511 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> [4.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In my testing, the second mount will fail after umounting successfully.
The reason is that we put refcount of trans_mod in the correct case
rather than the error case in parse_opts() at last. That will cause the
refcount decrease to -1, and when we try to get trans_mod again in
try_module_get(), we could only increase refcount to 0 which will cause
failure as follows:
parse_opts
v9fs_get_trans_by_name
try_module_get : return NULL to caller which cause error
So we should put refcount of trans_mod in error case.
Link: http://lkml.kernel.org/r/5B3F39A0.2030509@huawei.com
Fixes: 9421c3e64137ec ("net/9p/client.c: fix potential refcnt problem of trans module")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Tested-by: Dominique Martinet <dominique.martinet@cea.fr>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The mmu_gather APIs keep track of the invalidated address range
including the span covered by invalidated page table pages. Ranges
covered by page tables but not ptes (and therefore no TLBs) still need
to be invalidated because some architectures (x86) can cache
intermediate page table entries, and invalidate those with normal TLB
invalidation instructions to be almost-backward-compatible.
Architectures which don't cache intermediate page table entries, or
which invalidate these caches separately from TLB invalidation, do not
require TLB invalidation range expanded over page tables.
Allow architectures to supply their own p??_free_tlb functions, which
can avoid the __tlb_adjust_range.
Link: http://lkml.kernel.org/r/20180703013131.2807-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The autofs subsystem does not check that the "path" parameter is present
for all cases where it is required when it is passed in via the "param"
struct.
In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
ioctl command.
To solve it, modify validate_dev_ioctl(function to check that a path has
been provided for ioctl commands that require it.
Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Thomas reports:
"While looking around in /proc on my v4.14.52 system I noticed that all
processes got a lot of "Locked" memory in /proc/*/smaps. A lot more
memory than a regular user can usually lock with mlock().
Commit 493b0e9d945f (in v4.14-rc1) seems to have changed the behavior
of "Locked".
Before that commit the code was like this. Notice the VM_LOCKED check.
(vma->vm_flags & VM_LOCKED) ?
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
After that commit Locked is now the same as Pss:
(unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
This looks like a mistake."
Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.
Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
KVM guests on s390 can notify the host of unused pages. This can result
in pte_unused callbacks to be true for KVM guest memory.
If a page is unused (checked with pte_unused) we might drop this page
instead of paging it. This can have side-effects on userfaultd, when
the page in question was already migrated:
The next access of that page will trigger a fault and a user fault
instead of faulting in a new and empty zero page. As QEMU does not
expect a userfault on an already migrated page this migration will fail.
The most straightforward solution is to ignore the pte_unused hint if a
userfault context is active for this VMA.
Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|