Age | Commit message (Collapse) | Author |
|
When PCID is enabled, CR3's PCID bits can change during context
switches, so KVM won't be able to treat CR3 as a per-mm constant any
more.
I structured this like the existing CR4 handling. Under ordinary
circumstances (PCID disabled or if the current PCID and the value
that's already in the VMCS match), then we won't do an extra VMCS
write, and we'll never do an extra direct CR3 read. The overhead
should be minimal.
I disallowed using the new helper in non-atomic context because
PCID support will cause CR3 to stop being constant in non-atomic
process context.
(Frankly, it also scares me a bit that KVM ever treated CR3 as
constant, but it looks like it was okay before.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Nadav pointed out that some code used PAGE_SIZE and other code used
PAGE_SHIFT. Use PAGE_SHIFT instead of multiplying or dividing by
PAGE_SIZE.
Requested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Lazy TLB state is currently managed in a rather baroque manner.
AFAICT, there are three possible states:
- Non-lazy. This means that we're running a user thread or a
kernel thread that has called use_mm(). current->mm ==
current->active_mm == cpu_tlbstate.active_mm and
cpu_tlbstate.state == TLBSTATE_OK.
- Lazy with user mm. We're running a kernel thread without an mm
and we're borrowing an mm_struct. We have current->mm == NULL,
current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
!= TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set
in mm_cpumask(current->active_mm). CR3 points to
current->active_mm->pgd. The TLB is up to date.
- Lazy with init_mm. This happens when we call leave_mm(). We
have current->mm == NULL, current->active_mm ==
cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
the scheduler is tracking it for refcounting. cpu_tlbstate.state
!= TLBSTATE_OK. The current cpu is clear in
mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir,
i.e. init_mm->pgd.
This patch simplifies the situation. Other than perf, x86 stops
caring about current->active_mm at all. We have
cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The
TLB is always up to date for that mm. leave_mm() just switches us
to init_mm. There are no longer any special cases for mm_cpumask,
and switch_mm() switches mms without worrying about laziness.
After this patch, cpu_tlbstate.state serves only to tell the TLB
flush code whether it may switch to init_mm instead of doing a
normal flush.
This makes fairly extensive changes to xen_exit_mmap(), which used
to look a bit like black magic.
Perf is unchanged. With or without this change, perf may behave a bit
erratically if it tries to read user memory in kernel thread context.
We should build on this patch to teach perf to never look at user
memory when cpu_tlbstate.loaded_mm != current->mm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:
- flush_tlb_mm_range() didn't flush individual pages if the range
was small.
- The lazy TLB code was much weaker. This usually wouldn't matter,
but, if a kernel thread flushed its lazy "active_mm" more than
once (due to reclaim or similar), it wouldn't be unlazied and
would instead pointlessly flush repeatedly.
- Tracepoints were missing.
Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.
Simplify everything by deleting the UP code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now there's only one copy of the local tlb flush logic for
non-kernel pages on SMP kernels.
The only functional change is that arch_tlbbatch_flush() will now
leave_mm() on the local CPU if that CPU is in the batch and is in
TLBSTATE_LAZY mode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The local flush path is very similar to the remote flush path.
Merge them.
This is intended to make no difference to behavior whatsoever. It
removes some code and will make future changes to the flushing
mechanics simpler.
This patch does remove one small optimization: flush_tlb_mm_range()
now has an unconditional smp_mb() instead of using MOV to CR3 or
INVLPG as a full barrier when applicable. I think this is okay for
a few reasons. First, smp_mb() is quite cheap compared to the cost
of a TLB flush. Second, this rearrangement makes a bigger
optimization available: with some work on the SMP function call
code, we could do the local and remote flushes in parallel. Third,
I'm planning a rework of the TLB flush algorithm that will require
an atomic operation at the beginning of each flush, and that
operation will replace the smp_mb().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On a remote TLB flush, we leave_mm() if we're TLBSTATE_LAZY. For a
local flush_tlb_mm_range(), we leave_mm() if !current->mm. These
are approximately the same condition -- the scheduler sets lazy TLB
mode when switching to a thread with no mm.
I'm about to merge the local and remote flush code, but for ease of
verifying and bisecting the patch, I want the local and remote flush
behavior to match first. This patch changes the local code to match
the remote code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rather than passing all the contents of flush_tlb_info to
flush_tlb_others(), pass a pointer to the structure directly. For
consistency, this also removes the unnecessary cpu parameter from
uv_flush_tlb_others() to make its signature match the other
*flush_tlb_others() functions.
This serves two purposes:
- It will dramatically simplify future patches that change struct
flush_tlb_info, which I'm planning to do.
- struct flush_tlb_info is an adequate description of what to do
for a local flush, too, so by reusing it we can remove duplicated
code between local and remove flushes in a future patch.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
[ Fix build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
report an inverted Device ID in its DIR0 configuration register,
specifically 0xb instead of the expected 0x4.
Catch this presumably quirky version so it's properly recognized
as GX1 and has its cache switched to write-back mode, which provides
a significant performance boost in most workloads.
SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
states in section 1.1.7.1 "Device ID" that device identification
values are specified in SC1100's device errata. These, however,
seem to not have been publicly released.
Wading through a number of boot logs and /proc/cpuinfo dumps found on
pastebin and blogs, this patch should mostly be relevant for a number
of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
the latter being the platform this issue was discovered on.
Performance impact was verified using "openssl speed", with
write-back caching scaling throughput between -3% and +41%.
Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).
These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.
Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Our previous patch (cited below) introduced a regression
for RAW Eth QPs.
Fix it by checking if the QP number provided by user-space
exists, hence allowing steering rules to be added for valid
QPs only.
Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since iptunnel_pull_header() can call pskb_may_pull(),
we must reload any pointer that was related to skb->head.
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexander reported various KASAN messages triggered in recent kernels
The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.
The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.
To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().
Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.
For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.
Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.
Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.
Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Tested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
xgbe_map_rx_buffer is rather confused about what PAGE_ALLOC_COSTLY_ORDER
means. It uses PAGE_ALLOC_COSTLY_ORDER-1 assuming that
PAGE_ALLOC_COSTLY_ORDER is the first costly order which is not the case
actually because orders larger than that are costly. And even that
applies only to sleeping allocations which is not the case here. We
simply do not perform any costly operations like reclaim or compaction
for those. Simplify the code by dropping the order calculation and use
PAGE_ALLOC_COSTLY_ORDER directly.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.
Commit 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.
The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This fixes a problem with reading files larger than 2GB from a UFS-2
file system:
https://bugzilla.kernel.org/show_bug.cgi?id=195721
The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().
That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.
Here I simply increase the default to a common value used by other file
systems.
Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45e2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use software polling (PHY_POLL) to check for link state changes instead
of relying on the EMAC's hardware polling feature. Some PHY drivers
are unable to get a functioning link because the HW polling is not
robust enough.
The EMAC is able to poll the PHY on the MDIO bus looking for link state
changes (via the Link Status bit in the Status Register at address 0x1).
When the link state changes, the EMAC triggers an interrupt and tells the
driver what the new state is. The feature eliminates the need for
software to poll the MDIO bus.
Unfortunately, this feature is incompatible with phylib, because it
ignores everything that the PHY core and PHY drivers are trying to do.
In particular:
1. It assumes a compatible register set, so PHYs with different registers
may not work.
2. It doesn't allow for hardware errata that have work-arounds implemented
in the PHY driver.
3. It doesn't support multiple register pages. If the PHY core switches
the register set to another page, the EMAC won't know the page has
changed and will still attempt to read the same PHY register.
4. It only checks the copper side of the link, not the SGMII side. Some
PHY drivers (e.g. at803x) may also check the SGMII side, and
report the link as not ready during autonegotiation if the SGMII link
is still down. Phylib then waits for another interrupt to query
the PHY again, but the EMAC won't send another interrupt because it
thinks the link is up.
Cc: stable@vger.kernel.org # 4.11.x
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Bugfixes include:
- Fix a typo in commit e092693443b ("NFS append COMMIT after
synchronous COPY") that breaks copy offload
- Fix the connect error propagation in xs_tcp_setup_socket()
- Fix a lock leak in nfs40_walk_client_list
- Verify that pNFS requests lie within the offset range of the layout
segment"
* tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Mark unnecessarily extern functions as static
SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
NFSv4.0: Fix a lock leak in nfs40_walk_client_list
pnfs: Fix the check for requests in range of layout segment
xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup()
pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
NFS fix COMMIT after COPY
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
"Here is a single tty core fix for 4.12-rc4. It reverts a patch that a
lot of people reported as causing lockdep and other warnings.
Right after I reverted this in my tree, it seems like another
"correct" fix might have shown up, but it's too late in the release
cycle to be messing with tty core locking, so let's just revert this
for now to go back how things always have been and try it again for
4.13.
This has not been in linux-next as I only reverted it a few hours ago"
* tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: fix port buffer locking"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
- a couple of regression fixes in synaptics and axp20x-pek drivers
- try to ease transition from PS/2 to RMI for Synaptics touchpad users
by ensuring we do not try to activate RMI mode when RMI SMBus support
is not enabled, and nag users a bit to enable it
- plus a couple of other changes that seemed worthwhile for this
release
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 too
Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry Trail
Input: tm2-touchkey - use LEN_ON as boolean value instead of LED_FULL
Input: synaptics - tell users to report when they should be using rmi-smbus
Input: synaptics - warn the users when there is a better mode
Input: synaptics - keep PS/2 around when RMI4_SMB is not enabled
Input: synaptics - clear device info before filling in
Input: silead - disable interrupt during suspend
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixlet from Alexandre Belloni:
"A single patch, not really a fix but I don't think there is any reason
to delay it.
Change the mailing list address"
* tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
MAINTAINERS: update RTC mailing list
|
|
A rc device can call ir_raw_event_handle() after rc_allocate_device(),
but before rc_register_device() has completed. This is racey because
rcdev->raw is set before rcdev->raw->thread has a valid value.
Cc: stable@kernel.org
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Just depend on DEBUG_FS, no need to invent a new kernel config.
Especially since CEC can be enabled by drm without enabling
MEDIA_SUPPORT.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
This config option is strictly speaking independent of the
media subsystem since it can be used by drm as well.
Besides, it looks odd when drivers select CEC_CORE and
MEDIA_CEC_NOTIFIER, that's inconsistent naming.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
The CEC framework is used by both drm and media. That makes it tricky
to get the dependencies right.
This patch moves the CEC_CORE and MEDIA_CEC_NOTIFIER config options
out of the media menu and instead drivers that want to use CEC should
select CEC_CORE and MEDIA_CEC_NOTIFIER (if needed).
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
exit_loop is not being initialized, so it contains garbage. Ensure it is
initialized to false.
Detected by CoverityScan, CID#1436409 ("Uninitialized scalar variable")
Fixes: ea6a69defd3311 ("[media] rainshadow-cec: avoid -Wmaybe-uninitialized warning")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Several atomisp files use:
ccflags-y += -Werror
As, on media, our usual procedure is to use W=1, and atomisp
has *a lot* of warnings with such flag enabled,like:
./drivers/staging/media/atomisp/pci/atomisp2/css2400/hive_isp_css_common/host/system_local.h:62:26: warning: 'DDR_BASE' defined but not used [-Wunused-const-variable=]
At the end, it causes our build to fail, impacting our workflow.
So, remove this crap. If one wants to force -Werror, he
can still build with it enabled by passing a parameter to
make.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is nine fixes, seven of which are for the qedi driver (new as of
4.10) the other two are a use after free in the cxgbi drivers and a
potential NULL dereference in the rdac device handler"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libcxgbi: fix skb use after free
scsi: qedi: Fix endpoint NULL panic during recovery.
scsi: qedi: set max_fin_rt default value
scsi: qedi: Set firmware tcp msl timer value.
scsi: qedi: Fix endpoint NULL panic in qedi_set_path.
scsi: qedi: Set dma_boundary to 0xfff.
scsi: qedi: Correctly set firmware max supported BDs.
scsi: qedi: Fix bad pte call trace when iscsiuio is stopped.
scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"For the most part this is just a minor -rc cycle for the rdma
subsystem. Even given that this is all of the -rc patches since the
merge window closed, it's still only about 25 patches:
- Multiple i40iw, nes, iw_cxgb4, hfi1, qib, mlx4, mlx5 fixes
- A few upper layer protocol fixes (IPoIB, iSER, SRP)
- A modest number of core fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (26 commits)
RDMA/SA: Fix kernel panic in CMA request handler flow
RDMA/umem: Fix missing mmap_sem in get umem ODP call
RDMA/core: not to set page dirty bit if it's already set.
RDMA/uverbs: Declare local function static and add brackets to sizeof
RDMA/netlink: Reduce exposure of RDMA netlink functions
RDMA/srp: Fix NULL deref at srp_destroy_qp()
RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
RDMA/qedr: add null check before pointer dereference
RDMA/mlx5: set UMR wqe fence according to HCA cap
net/mlx5: Define interface bits for fencing UMR wqe
RDMA/mlx4: Fix MAD tunneling when SRIOV is enabled
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
RDMA/hfi1: change PCI bar addr assignments to Linux API functions
RDMA/hfi1: fix array termination by appending NULL to attr array
RDMA/iw_cxgb4: fix the calculation of ipv6 header size
RDMA/iw_cxgb4: calculate t4_eq_status_entries properly
RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
RDMA/nes: ACK MPA Reply frame
...
|
|
The alarmtimer code has another source of potentially rearming itself too
fast. Interval timers with a very samll interval have a similar CPU hog
effect as the previously fixed overflow issue.
The reason is that alarmtimers do not implement the normal protection
against this kind of problem which the other posix timer use:
timer expires -> queue signal -> deliver signal -> rearm timer
This scheme brings the rearming under scheduler control and prevents
permanently firing timers which hog the CPU.
Bringing this scheme to the alarm timer code is a major overhaul because it
lacks all the necessary mechanisms completely.
So for a quick fix limit the interval to one jiffie. This is not
problematic in practice as alarmtimers are usually backed by an RTC for
suspend which have 1 second resolution. It could be therefor argued that
the resolution of this clock should be set to 1 second in general, but
that's outside the scope of this fix.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de
|
|
Andrey reported a alartimer related RCU stall while fuzzing the kernel with
syzkaller.
The reason for this is an overflow in ktime_add() which brings the
resulting time into negative space and causes immediate expiry of the
timer. The following rearm with a small interval does not bring the timer
back into positive space due to the same issue.
This results in a permanent firing alarmtimer which hogs the CPU.
Use ktime_add_safe() instead which detects the overflow and clamps the
result to KTIME_SEC_MAX.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de
|
|
This reverts commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e.
It causes lots of warnings and problems so for now, let's just revert
it.
Reported-by: <valdis.kletnieks@vt.edu>
Reported-by: Russell King <linux@armlinux.org.uk>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
nfs_initialise_sb() and nfs_clone_super() are declared as extern even
though they are used only in fs/nfs/super.c. Mark them as static.
Also remove explicit 'inline' directive from nfs_initialise_sb() and
leave it upto compiler to decide whether inlining is worth it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"A couple of patches for the aspeed pwm fan driver"
* tag 'hwmon-for-linus-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (aspeed-pwm-tacho) make fan/pwm names start with index 1
hwmon: (aspeed-pwm-tacho) Call of_node_put() on a node not claimed
hwmon: (aspeed-pwm-tacho) On read failure return -ETIMEDOUT
hwmon: (aspeed-pwm-tacho) Select REGMAP
|
|
Pull MTD fixes from Brian Norris:
"NAND updates from Boris:
tango fixes:
- Add missing MODULE_DEVICE_TABLE() in tango_nand.c
- Update the number of corrected bitflips
core fixes:
- Fix a long standing memory leak in nand_scan_tail()
- Fix several bugs introduced by the per-vendor init/detection
infrastructure (introduced in 4.12)
- Add a static specifier to nand_ooblayout_lp_hamming_ops definition"
* tag 'for-linus-20170602' of git://git.infradead.org/linux-mtd:
mtd: nand: make nand_ooblayout_lp_hamming_ops static
mtd: nand: tango: Update ecc_stats.corrected
mtd: nand: tango: Export OF device ID table as module aliases
mtd: nand: samsung: warn about un-parseable ECC info
mtd: nand: free vendor-specific resources in init failure paths
mtd: nand: drop unneeded module.h include
mtd: nand: don't leak buffers when ->scan_bbt() fails
|
|
If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.
This patch prevent bugon like follows:
kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: 0000 [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs_io Tainted: G W 4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
task: ffff880137786440 task.stack: ffffc90000ba8000
RIP: 0010:kfree_debugcheck+0x25/0x2a
RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
FS: 00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
Call Trace:
kfree+0xc8/0x1b3
bio_integrity_free+0xc3/0x16b
bio_free+0x25/0x66
bio_put+0x14/0x26
blkdev_issue_flush+0x7a/0x85
blkdev_fsync+0x35/0x42
vfs_fsync_range+0x8e/0x9f
vfs_fsync+0x1c/0x1e
do_fsync+0x31/0x4a
SyS_fsync+0x10/0x14
entry_SYSCALL_64_fastpath+0x1f/0xc2
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Make fan and pwm names in sysfs start with index 1 in accordance to
Documentation/hwmon/sysfs-interface conventions.
Current implementation starts with index 0, making tools such as
sensors(1) skip the first fan.
Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com>
Fixes: 2d7a548a3eff ("drivers: hwmon: Support for ASPEED PWM/Fan tach")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Call of_node_put() on a node claimed with of_node_get() or by any other
means such as for_each_child_of_node().
Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com>
Fixes: 2d7a548a3eff ("drivers: hwmon: Support for ASPEED PWM/Fan tach")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
modprobe is not able to resolve sysfs modalias for mei devices.
# cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
# modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
# cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt
commit b26864cad1c9 ("mei: bus: add client protocol
version to the device alias"), however sysfs modalias
is still in formmat mei:S:uuid:*.
This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.
Cc: <stable@vger.kernel.org> 4.7+
Fixes: commit b26864cad1c9 ("mei: bus: add client protocol version to the device alias")
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A recent fix to /dev/mem prevents mappings from wrapping around the end
of physical address space. However, the check was written in a way that
also prevents a mapping reaching just up to the end of physical address
space, which may be a valid use case (especially on 32-bit systems).
This patch fixes it by checking the last mapped address (instead of the
first address behind that) for overflow.
Fixes: b299cde245 ("drivers: char: mem: Check for address space wraparound with mmap()")
Cc: <stable@vger.kernel.org>
Reported-by: Nico Huber <nico.h@gmx.de>
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In case of error, the function devm_ioremap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test. Also add NULL test for iores.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
Starting from MPU6500, accelerometer dlpf is set in a separate
register named ACCEL_CONFIG_2.
Add this new register in the map and set it for the corresponding
chips.
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
lov_user_md pointer from user- or kernel-space. This changes the
behavior of copy_from_user() on SPARC and may result in a misaligned
access exception which in turn oopses the kernel. In fact the
relevant argument to lov_getstripe() is never called with a
kernel-space pointer and so changing the address limits is unnecessary
and so we remove the calls to save, set, and restore the address
limits.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/6150
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If a custom CPU target is specified and that one is not available _or_
can't be interrupted then the code returns to userland without dropping a
lock as notices by lockdep:
|echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
| ================================================
| [ BUG: lock held when returning to user space! ]
| ------------------------------------------------
| bash/503 is leaving the kernel with locks still held!
| 1 lock held by bash/503:
| #0: (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40
So release the lock then.
Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
|