summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-11perf/x86/intel: Fix inaccurate period in context switch for auto-reloadKan Liang
Perf doesn't take the left period into account when auto-reload is enabled with fixed period sampling mode in context switch. Here is the MSR trace of the perf command as below. (The MSR trace is simplified from a ftrace log.) #perf record -e cycles:p -c 2000000 -- ./triad_loop //The MSR trace of task schedule out //perf disable all counters, disable PEBS, disable GP counter 0, //read GP counter 0, and re-enable all counters. //The counter 0 stops at 0xfffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value fffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again //perf disable all counters, enable and set GP counter 0, //enable PEBS, and re-enable all counters. //0xffffffe17b80 (-2000000) is written to GP counter 0. write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe17b80 write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff When the same task schedule in again, the counter should starts from previous left. However, it starts from the fixed period -2000000 again. A special variant of intel_pmu_save_and_restart() is used for auto-reload, which doesn't update the hwc->period_left. When the monitored task schedules in again, perf doesn't know the left period. The fixed period is used, which is inaccurate. With auto-reload, the counter always has a negative counter value. So the left period is -value. Update the period_left in intel_pmu_save_and_restart_reload(). With the patch: //The MSR trace of task schedule out write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value ffffffe25cbc write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe25cbc write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff Fixes: d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com
2020-02-11perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event mapKim Phillips
Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"), claimed L2 misses were unsupported, due to them not being found in its referenced documentation, whose link has now moved [1]. That old documentation listed PMCx064 unit mask bit 3 as: "LsRdBlkC: LS Read Block C S L X Change to X Miss." and bit 0 as: "IcFillMiss: IC Fill Miss" We now have new public documentation [2] with improved descriptions, that clearly indicate what events those unit mask bits represent: Bit 3 now clearly states: "LsRdBlkC: Data Cache Req Miss in L2 (all types)" and bit 0 is: "IcFillMiss: Instruction Cache Req Miss in L2." So we can now add support for L2 misses in perf's genericised events as PMCx064 with both the above unit masks. [1] The commit's original documentation reference, "Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors", originally available here: https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf is now available here: https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf [2] "Processor Programming Reference (PPR) for Family 17h Model 31h, Revision B0 Processors", available here: https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Reported-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com
2020-02-11perf/x86/msr: Add Tremont supportKan Liang
Tremont is Intel's successor to Goldmont Plus. SMI_COUNT MSR is also supported. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1580236279-35492-3-git-send-email-kan.liang@linux.intel.com
2020-02-11perf/x86/cstate: Add Tremont supportKan Liang
Tremont is Intel's successor to Goldmont Plus. From the perspective of Intel cstate residency counters, there is nothing changed compared with Goldmont Plus and Goldmont. Share glm_cstates with Goldmont Plus and Goldmont. Update the comments for Tremont. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1580236279-35492-2-git-send-email-kan.liang@linux.intel.com
2020-02-11perf/x86/intel: Add Elkhart Lake supportKan Liang
Elkhart Lake also uses Tremont CPU. From the perspective of Intel PMU, there is nothing changed compared with Jacobsville. Share the perf code with Jacobsville. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1580236279-35492-1-git-send-email-kan.liang@linux.intel.com
2020-02-11locking/percpu-rwsem: Add might_sleep() for writer lockingDavidlohr Bueso
We are missing this annotation in percpu_down_write(). Correct this. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lkml.kernel.org/r/20200108013305.7732-1-dave@stgolabs.net
2020-02-11locking/percpu-rwsem: Fold __percpu_up_read()Davidlohr Bueso
Now that __percpu_up_read() is only ever used from percpu_up_read() merge them, it's a small function. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lkml.kernel.org/r/20200131151540.212415454@infradead.org
2020-02-11locking/rwsem: Remove RWSEM_OWNER_UNKNOWNPeter Zijlstra
Remove the now unused RWSEM_OWNER_UNKNOWN hack. This hack breaks PREEMPT_RT and getting rid of it was the entire motivation for re-writing the percpu rwsem. The biggest problem is that it is fundamentally incompatible with any form of Priority Inheritance, any exclusively held lock must have a distinct owner. Requested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200204092228.GP14946@hirez.programming.kicks-ass.net
2020-02-11locking/percpu-rwsem: Remove the embedded rwsemPeter Zijlstra
The filesystem freezer uses percpu-rwsem in a way that is effectively write_non_owner() and achieves this with a few horrible hacks that rely on the rwsem (!percpu) implementation. When PREEMPT_RT replaces the rwsem implementation with a PI aware variant this comes apart. Remove the embedded rwsem and implement it using a waitqueue and an atomic_t. - make readers_block an atomic, and use it, with the waitqueue for a blocking test-and-set write-side. - have the read-side wait for the 'lock' state to clear. Have the waiters use FIFO queueing and mark them (reader/writer) with a new WQ_FLAG. Use a custom wake_function to wake either a single writer or all readers until a writer. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200204092403.GB14879@hirez.programming.kicks-ass.net
2020-02-11locking/percpu-rwsem: Extract __percpu_down_read_trylock()Peter Zijlstra
In preparation for removing the embedded rwsem and building a custom lock, extract the read-trylock primitive. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200131151540.098485539@infradead.org
2020-02-11locking/percpu-rwsem: Move __this_cpu_inc() into the slowpathPeter Zijlstra
As preparation to rework __percpu_down_read() move the __this_cpu_inc() into it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200131151540.041600199@infradead.org
2020-02-11locking/percpu-rwsem: Convert to boolPeter Zijlstra
Use bool where possible. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200131151539.984626569@infradead.org
2020-02-11locking/percpu-rwsem, lockdep: Make percpu-rwsem use its own lockdep_mapPeter Zijlstra
As preparation for replacing the embedded rwsem, give percpu-rwsem its own lockdep_map. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200131151539.927625541@infradead.org
2020-02-11locking/lockdep: Reuse freed chain_hlocks entriesWaiman Long
Once a lock class is zapped, all the lock chains that include the zapped class are essentially useless. The lock_chain structure itself can be reused, but not the corresponding chain_hlocks[] entries. Over time, we will run out of chain_hlocks entries while there are still plenty of other lockdep array entries available. To fix this imbalance, we have to make chain_hlocks entries reusable just like the others. As the freed chain_hlocks entries are in blocks of various lengths. A simple bitmap like the one used in the other reusable lockdep arrays isn't applicable. Instead the chain_hlocks entries are put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks. Bucket 0 is the variable size bucket which houses chain blocks of size larger than MAX_CHAIN_BUCKETS sorted in decreasing size order. Initially, the whole array is in one chain block (the primordial chain block) in bucket 0. The minimum size of a chain block is 2 chain_hlocks entries. That will be the minimum allocation size. In other word, allocation requests for one chain_hlocks entry will cause 2-entry block to be returned and hence 1 entry will be wasted. Allocation requests for the chain_hlocks are fulfilled first by looking for chain block of matching size. If not found, the first chain block from bucket[0] (the largest one) is split. That can cause hlock entries fragmentation and reduce allocation efficiency if a chain block of size > MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial chain block. So the MAX_CHAIN_BUCKETS must be large enough that this should seldom happen. By reusing the chain_hlocks entries, we are able to handle workloads that add and zap a lot of lock classes without the risk of running out of chain_hlocks entries as long as the total number of outstanding lock classes at any time remain within a reasonable limit. Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks, are added to track the total number of chain_hlocks entries in the free bucketed lists and the number of large chain blocks in buckets[0] respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks. The nr_large_chain_blocks counter enables to see if we should increase the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to avoid the fragmentation problem in bucket[0]. An internal nfsd test that ran for more than an hour and kept on loading and unloading kernel modules could cause the following message to be displayed. [ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low! The patched kernel was able to complete the test with a lot of free chain_hlocks entries to spare: # cat /proc/lockdep_stats : dependency chains: 18867 [max: 65536] dependency chain hlocks: 74926 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56765 large chain blocks: 1 By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the largest chain block. The system still worked and We got the following lockdep_stats data: dependency chains: 18601 [max: 65536] dependency chain hlocks used: 73133 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56702 large chain blocks: 45165 large chain block size: 20165 By running the test again, I was indeed able to cause chain_hlocks entries to get lost: dependency chain hlocks used: 74806 [max: 327680] dependency chain hlocks lost: 575 : large chain blocks: 48737 large chain block size: 7 Due to the fragmentation, it is possible that the "MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of of chain_hlocks entries appear to be free. Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that few variable sized chain blocks, other than the initial one, should ever be present in bucket 0. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com
2020-02-11locking/lockdep: Track number of zapped lock chainsWaiman Long
Add a new counter nr_zapped_lock_chains to track the number lock chains that have been removed. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-6-longman@redhat.com
2020-02-11locking/lockdep: Throw away all lock chains with zapped classWaiman Long
If a lock chain contains a class that is zapped, the whole lock chain is likely to be invalid. If the zapped class is at the end of the chain, the partial chain without the zapped class should have been stored already as the current code will store all its predecessor chains. If the zapped class is somewhere in the middle, there is no guarantee that the partial chain will actually happen. It may just clutter up the hash and make searching slower. I would rather prefer storing the chain only when it actually happens. So just dump the corresponding chain_hlocks entries for now. A latter patch will try to reuse the freed chain_hlocks entries. This patch also changes the type of nr_chain_hlocks to unsigned integer to be consistent with the other counters. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-5-longman@redhat.com
2020-02-11locking/lockdep: Track number of zapped classesWaiman Long
The whole point of the lockdep dynamic key patch is to allow unused locks to be removed from the lockdep data buffers so that existing buffer space can be reused. However, there is no way to find out how many unused locks are zapped and so we don't know if the zapping process is working properly. Add a new nr_zapped_classes counter to track that and show it in /proc/lockdep_stats. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-4-longman@redhat.com
2020-02-11locking/lockdep: Display irq_context names in /proc/lockdep_chainsWaiman Long
Currently, the irq_context field of a lock chains displayed in /proc/lockdep_chains is just a number. It is likely that many people may not know what a non-zero number means. To make the information more useful, print the actual irq names ("softirq" and "hardirq") instead. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-3-longman@redhat.com
2020-02-11locking/lockdep: Decrement IRQ context counters when removing lock chainWaiman Long
There are currently three counters to track the IRQ context of a lock chain - nr_hardirq_chains, nr_softirq_chains and nr_process_chains. They are incremented when a new lock chain is added, but they are not decremented when a lock chain is removed. That causes some of the statistic counts reported by /proc/lockdep_stats to be incorrect. IRQ Fix that by decrementing the right counter when a lock chain is removed. Since inc_chains() no longer accesses hardirq_context and softirq_context directly, it is moved out from the CONFIG_TRACE_IRQFLAGS conditional compilation block. Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-2-longman@redhat.com
2020-02-11sched/fair: Fix kernel-doc warning in attach_entity_load_avg()Randy Dunlap
Fix kernel-doc warning in kernel/sched/fair.c, caused by a recent function parameter removal: ../kernel/sched/fair.c:3526: warning: Excess function parameter 'flags' description in 'attach_entity_load_avg' Fixes: a4f9a0e51bbf ("sched/fair: Remove redundant call to cpufreq_update_util()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/cbe964e4-6879-fd08-41c9-ef1917414af4@infradead.org
2020-02-11sched/core: Annotate curr pointer in rq with __rcuMadhuparna Bhowmik
This patch fixes the following sparse warnings in sched/core.c and sched/membarrier.c: kernel/sched/core.c:2372:27: error: incompatible types in comparison expression kernel/sched/core.c:4061:17: error: incompatible types in comparison expression kernel/sched/core.c:6067:9: error: incompatible types in comparison expression kernel/sched/membarrier.c:108:21: error: incompatible types in comparison expression kernel/sched/membarrier.c:177:21: error: incompatible types in comparison expression kernel/sched/membarrier.c:243:21: error: incompatible types in comparison expression Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200201125803.20245-1-madhuparnabhowmik10@gmail.com
2020-02-11sched/psi: Fix OOB write when writing 0 bytes to PSI filesSuren Baghdasaryan
Issuing write() with count parameter set to 0 on any file under /proc/pressure/ will cause an OOB write because of the access to buf[buf_size-1] when NUL-termination is performed. Fix this by checking for buf_size to be non-zero. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20200203212216.7076-1-surenb@google.com
2020-02-11arm/patch: Fix !MMU compilePeter Zijlstra
Now that patch.o is unconditionally selected for ftrace, it can also get compiled for !MMU kernels. These (obviously) lack {set,clear}_fixmap() support. Also remove the superfluous __acquire/__release nonsense. Fixes: 42e51f187f86 ("arm/ftrace: Use __patch_text()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-11arm/ftrace: Fix BE text pokingPeter Zijlstra
The __patch_text() function already applies __opcode_to_mem_*(), so when __opcode_to_mem_*() is not the identity (BE*), it is applied twice, wrecking the instruction. Fixes: 42e51f187f86 ("arm/ftrace: Use __patch_text()") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Dmitry Osipenko <digetx@gmail.com>
2020-02-11ALSA: usb-audio: Apply 48kHz fixed rate playback for Jabra Evolve 65 headsetTakashi Iwai
Jabra Evolve 65 headset appears as if supporting lower rates than 48kHz, but it actually doesn't work but with 48kHz for playback. This patch applies a workaround to enforce the 48kHz like LINE6 devices already did. The workaround is put in a unified helper function, set_fixed_rate(), to be called from both places now. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206149 Link: https://lore.kernel.org/r/20200211111419.5895-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-02-11netfilter: conntrack: split resolve_clash functionFlorian Westphal
Followup patch will need a helper function with the 'clashing entries refer to the identical tuple in both directions' resolution logic. This patch will add another resolve_clash helper where loser_ct must not be added to the dying list because it will be inserted into the table. Therefore this also moves the stat counters and dying-list insertion of the losing ct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-11netfilter: conntrack: place confirm-bit setting in a helperFlorian Westphal
... so it can be re-used from clash resolution in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-11netfilter: conntrack: remove two args from resolve_clashFlorian Westphal
ctinfo is whats taken from the skb, i.e. ct = nf_ct_get(skb, &ctinfo). We do not pass 'ct' and instead re-fetch it from the skb. Just do the same for both netns and ctinfo. Also add a comment on what clash resolution is supposed to do. While at it, one indent level can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-11drm/i915: Check activity on i915_vma after confirming pin_count==0Chris Wilson
Only assert that the i915_vma is now idle if and only if no other pins are present. If another user has the i915_vma pinned, they may submit more work to the i915_vma skipping the vm->mutex used to serialise the unbind. We need to wait again, if we want to continue and unbind this vma. However, if we own the i915_vma (we hold the vm->mutex for the unbind and the pin_count is 0), we can assert that the vma remains idle as we unbind. Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Closes: https://gitlab.freedesktop.org/drm/intel/issues/530 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200123224459.38128-1-chris@chris-wilson.co.uk (cherry picked from commit 60e94557fff1f5514c7fc4da7ddc2c7a13ffff26) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11drm/i915/gem: Detect overflow in calculating dumb buffer sizeChris Wilson
To multiply 2 u32 numbers to generate a u64 in C requires a bit of forewarning for the compiler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200123125934.1401755-1-chris@chris-wilson.co.uk (cherry picked from commit 0f8f8a64300092852b9361cd835395ee71e6a7d6) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11drm/i915: Don't show the blank process name for internal/simulated errorsChris Wilson
For a simulated preemption reset, we don't populate the request and so do not fill in the guilty context name. [ 79.991294] i915 0000:00:02.0: GPU HANG: ecode 9:1:e757fefe, in [0] Just don't mention the empty string in the logs! Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200121132107.267709-1-chris@chris-wilson.co.uk (cherry picked from commit 29baf3ae8daa4c673de58106ff41c7236dff57f4) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain listChris Wilson
Currently we create a new mmap_offset for every call to mmap_offset_ioctl. This exposes ourselves to an abusive client that may simply create new mmap_offsets ad infinitum, which will exhaust physical memory and the virtual address space. In addition to the exhaustion, a very long linear list of mmap_offsets causes other clients using the object to incur long list walks -- these long lists can also be generated by simply having many clients generate their own mmap_offset. However, we can simply use the drm_vma_node itself to manage the file association (allow/revoke) dropping our need to keep an mmo per-file. Then if we keep a small rbtree of per-type mmap_offsets, we can lookup duplicate requests quickly. Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200120104924.4000706-3-chris@chris-wilson.co.uk (cherry picked from commit 7865559872074a9ab169c87915504661d630addf) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11drm/i915/execlists: Leave resetting ring to intel_ringChris Wilson
We need to allow concurrent intel_context_unpin, which means avoiding doing destructive operations like intel_ring_reset(). This was already fixed for intel_ring_unpin() in commit 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"), but I overlooked that execlists_context_unpin() also made the same mistake. Reported-by: Matthew Brost <matthew.brost@intel.com> Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") References: 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk (cherry picked from commit f3c0efc9fe7a4e61544034f525348a3aa86ac5aa) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11arm64: Fix CONFIG_ARCH_RANDOM=n buildRobin Murphy
The entire asm/archrandom.h header is generically included via linux/archrandom.h only when CONFIG_ARCH_RANDOM is already set, so the stub definitions of __arm64_rndr() and __early_cpu_has_rndr() are only visible to KASLR if it explicitly includes the arch-internal header. Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-02-11habanalabs: patched cb equals user cb in device memsetOded Gabbay
During device memory memset, the driver allocates and use a CB (command buffer). To reuse existing code, it keeps a pointer to the CB in two variables, user_cb and patched_cb. Therefore, there is no need to "put" both the user_cb and patched_cb, as it will cause an underflow of the refcnt of the CB. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-02-11habanalabs: do not halt CoreSight during hard resetOmer Shpigelman
During hard reset we must not write to the device. Hence avoid halting CoreSight during user context close if it is done during hard reset. In addition, we must not re-enable clock gating afterwards as it was deliberately disabled in the beginning of the hard reset flow. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-02-11habanalabs: halt the engines before hard-resetOded Gabbay
The driver must halt the engines before doing hard-reset, otherwise the device can go into undefined state. There is a place where the driver didn't do that and this patch fixes it. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-02-11ACPI: PM: s2idle: Avoid possible race related to the EC GPERafael J. Wysocki
It is theoretically possible for the ACPI EC GPE to be set after the s2idle_ops->wake() called from s2idle_loop() has returned and before the subsequent pm_wakeup_pending() check is carried out. If that happens, the resulting wakeup event will cause the system to resume even though it may be a spurious one. To avoid that race, first make the ->wake() callback in struct platform_s2idle_ops return a bool value indicating whether or not to let the system resume and rearrange s2idle_loop() to use that value instad of the direct pm_wakeup_pending() call if ->wake() is present. Next, rework acpi_s2idle_wake() to process EC events and check pm_wakeup_pending() before re-arming the SCI for system wakeup to prevent it from triggering prematurely and add comments to that function to explain the rationale for the new code flow. Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11ACPI: EC: Fix flushing of pending workRafael J. Wysocki
Commit 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work") introduced a subtle bug into the flushing of pending EC work while suspended to idle, which may cause the EC driver to fail to re-enable the EC GPE after handling a non-wakeup event (like a battery status change event, for example). The problem is that the work item flushed by flush_scheduled_work() in __acpi_ec_flush_work() may disable the EC GPE and schedule another work item expected to re-enable it, but that new work item is not flushed, so __acpi_ec_flush_work() returns with the EC GPE disabled and the CPU running it goes into an idle state subsequently. If all of the other CPUs are in idle states at that point, the EC GPE won't be re-enabled until at least one CPU is woken up by another interrupt source, so system wakeup events that would normally come from the EC then don't work. This is reproducible on a Dell XPS13 9360 in my office which sometimes stops reacting to power button and lid events (triggered by the EC on that machine) after switching from AC power to battery power or vice versa while suspended to idle (each of those switches causes the EC GPE to trigger for several times in a row, but they are not system wakeup events). To avoid this problem, it is necessary to drain the workqueue entirely in __acpi_ec_flush_work(), but that cannot be done with respect to system_wq, because work items may be added to it from other places while __acpi_ec_flush_work() is running. For this reason, make the EC driver use a dedicated workqueue for EC events processing (let that workqueue be ordered so that EC events are processed sequentially) and use drain_workqueue() on it in __acpi_ec_flush_work(). Fixes: 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11platform/chrome: wilco_ec: Include asm/unaligned instead of linux/ pathStephen Boyd
It seems that we shouldn't try to include the include/linux/ path to unaligned functions. Just include asm/unaligned.h instead so that we don't run into compilation warnings like below. In file included from drivers/platform/chrome/wilco_ec/properties.c:8:0: include/linux/unaligned/le_memmove.h:7:19: error: redefinition of 'get_unaligned_le16' static inline u16 get_unaligned_le16(const void *p) ^~~~~~~~~~~~~~~~~~ In file included from arch/ia64/include/asm/unaligned.h:5:0, from arch/ia64/include/asm/io.h:23, from arch/ia64/include/asm/smp.h:21, from include/linux/smp.h:68, from include/linux/percpu.h:7, from include/linux/arch_topology.h:9, from include/linux/topology.h:30, from include/linux/gfp.h:9, from include/linux/xarray.h:14, from include/linux/radix-tree.h:18, from include/linux/idr.h:15, from include/linux/kernfs.h:13, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/device.h:16, from include/linux/platform_data/wilco-ec.h:11, from drivers/platform/chrome/wilco_ec/properties.c:6: include/linux/unaligned/le_struct.h:7:19: note: previous definition of 'get_unaligned_le16' was here static inline u16 get_unaligned_le16(const void *p) ^~~~~~~~~~~~~~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Fixes: 60fb8a8e93ca ("platform/chrome: wilco_ec: Allow wilco to be compiled in COMPILE_TEST") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
2020-02-11usb: dwc3: debug: fix string position formatting mixup with ret and lenColin Ian King
Currently the string formatting is mixing up the offset of ret and len. Re-work the code to use just len, remove ret and use scnprintf instead of snprintf and len position accumulation where required. Remove the -ve return check since scnprintf never returns a failure -ve size. Also break overly long lines to clean up checkpatch warnings. Addresses-Coverity: ("Unused value") Fixes: 1381a5113caf ("usb: dwc3: debug: purge usage of strcat") Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: gadget: serial: fix Tx stall after buffer overflowSergey Organov
Symptom: application opens /dev/ttyGS0 and starts sending (writing) to it while either USB cable is not connected, or nobody listens on the other side of the cable. If driver circular buffer overflows before connection is established, no data will be written to the USB layer until/unless /dev/ttyGS0 is closed and re-opened again by the application (the latter besides having no means of being notified about the event of establishing of the connection.) Fix: on open and/or connect, kick Tx to flush circular buffer data to USB layer. Signed-off-by: Sergey Organov <sorganov@gmail.com> Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flagsLars-Peter Clausen
ffs_aio_cancel() can be called from both interrupt and thread context. Make sure that the current IRQ state is saved and restored by using spin_{un,}lock_irq{save,restore}(). Otherwise undefined behavior might occur. Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flowsMinas Harutyunyan
SET/CLEAR_FEATURE for Remote Wakeup allowance not handled correctly. GET_STATUS handling provided not correct data on DATA Stage. Issue seen when gadget's dr_mode set to "otg" mode and connected to MacOS. Both are fixed and tested using USBCV Ch.9 tests. Signed-off-by: Minas Harutyunyan <hminas@synopsys.com> Fixes: fa389a6d7726 ("usb: dwc2: gadget: Add remote_wakeup_allowed flag") Tested-by: Jack Mitchell <ml@embed.me.uk> Cc: stable@vger.kernel.org Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: dwc2: Fix in ISOC request length checkingMinas Harutyunyan
Moved ISOC request length checking from dwc2_hsotg_start_req() function to dwc2_hsotg_ep_queue(). Fixes: 4fca54aa58293 ("usb: gadget: s3c-hsotg: add multi count support") Signed-off-by: Minas Harutyunyan <hminas@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: gadget: composite: Support more than 500mA MaxPowerJack Pham
USB 3.x SuperSpeed peripherals can draw up to 900mA of VBUS power when in configured state. However, if a configuration wanting to take advantage of this is added with MaxPower greater than 500 (currently possible if using a ConfigFS gadget) the composite driver fails to accommodate this for a couple reasons: - usb_gadget_vbus_draw() when called from set_config() and composite_resume() will be passed the MaxPower value without regard for the current connection speed, resulting in a violation for USB 2.0 since the max is 500mA. - the bMaxPower of the configuration descriptor would be incorrectly encoded, again if the connection speed is only at USB 2.0 or below, likely wrapping around U8_MAX since the 2mA multiplier corresponds to a maximum of 510mA. Fix these by adding checks against the current gadget->speed when the c->MaxPower value is used (set_config() and composite_resume()) and appropriately limit based on whether it is currently at a low-/full-/high- or super-speed connection. Because 900 is not divisible by 8, with the round-up division currently used in encode_bMaxPower() a MaxPower of 900mA will result in an encoded value of 0x71. When a host stack (including Linux and Windows) enumerates this on a single port root hub, it reads this value back and decodes (multiplies by 8) to get 904mA which is strictly greater than 900mA that is typically budgeted for that port, causing it to reject the configuration. Instead, we should be using the round-down behavior of normal integral division so that 900 / 8 -> 0x70 or 896mA to stay within range. And we might as well change it for the high/full/low case as well for consistency. N.B. USB 3.2 Gen N x 2 allows for up to 1500mA but there doesn't seem to be any any peripheral controller supported by Linux that does two lane operation, so for now keeping the clamp at 900 should be fine. Signed-off-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: gadget: composite: Fix bMaxPower for SuperSpeedPlusJack Pham
SuperSpeedPlus peripherals must report their bMaxPower of the configuration descriptor in units of 8mA as per the USB 3.2 specification. The current switch statement in encode_bMaxPower() only checks for USB_SPEED_SUPER but not USB_SPEED_SUPER_PLUS so the latter falls back to USB 2.0 encoding which uses 2mA units. Replace the switch with a simple if/else. Fixes: eae5820b852f ("usb: gadget: composite: Write SuperSpeedPlus config descriptors") Signed-off-by: Jack Pham <jackp@codeaurora.org> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: gadget: u_audio: Fix high-speed max packet sizeJohn Keeping
Prior to commit eb9fecb9e69b ("usb: gadget: f_uac2: split out audio core") the maximum packet size was calculated only from the high-speed descriptor but now we use the largest of the full-speed and high-speed descriptors. This is correct, but the full-speed value is likely to be higher than that for high-speed and this leads to submitting requests for OUT transfers (received by the gadget) which are larger than the endpoint's maximum packet size. These are rightly rejected by the gadget core. config_ep_by_speed() already sets up the correct maximum packet size for the enumerated speed in the usb_ep structure, so we can simply use this instead of the overall value that has been used to allocate buffers for requests. Note that the minimum period for ALSA is still set from the largest value, and this is unavoidable because it's possible to open the audio device before the gadget has been enumerated. Tested-by: Pavel Hofman <pavel.hofman@ivitera.com> Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-11usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fieldsAnurag Kumar Vulisha
The current code in dwc3_gadget_ep_reclaim_completed_trb() will check for IOC/LST bit in the event->status and returns if IOC/LST bit is set. This logic doesn't work if multiple TRBs are queued per request and the IOC/LST bit is set on the last TRB of that request. Consider an example where a queued request has multiple queued TRBs and IOC/LST bit is set only for the last TRB. In this case, the core generates XferComplete/XferInProgress events only for the last TRB (since IOC/LST are set only for the last TRB). As per the logic in dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for IOC/LST bit and returns on the first TRB. This leaves the remaining TRBs left unhandled. Similarly, if the gadget function enqueues an unaligned request with sglist already in it, it should fail the same way, since we will append another TRB to something that already uses more than one TRB. To aviod this, this patch changes the code to check for IOC/LST bits in TRB->ctrl instead. At a practical level, this patch resolves USB transfer stalls seen with adb on dwc3 based HiKey960 after functionfs gadget added scatter-gather support around v4.20. Cc: Felipe Balbi <balbi@kernel.org> Cc: Yang Fei <fei.yang@intel.com> Cc: Thinh Nguyen <thinhn@synopsys.com> Cc: Tejas Joglekar <tejas.joglekar@synopsys.com> Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Cc: Jack Pham <jackp@codeaurora.org> Cc: Todd Kjos <tkjos@google.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Linux USB List <linux-usb@vger.kernel.org> Cc: stable <stable@vger.kernel.org> Tested-by: Tejas Joglekar <tejas.joglekar@synopsys.com> Reviewed-by: Thinh Nguyen <thinhn@synopsys.com> Signed-off-by: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com> [jstultz: forward ported to mainline, reworded commit log, reworked to only check trb->ctrl as suggested by Felipe] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-02-10tracing: Consolidate trace() functionsTom Zanussi
Move the checking, buffer reserve and buffer commit code in synth_event_trace_start/end() into inline functions __synth_event_trace_start/end() so they can also be used by synth_event_trace() and synth_event_trace_array(), and then have all those functions use them. Also, change synth_event_trace_state.enabled to disabled so it only needs to be set if the event is disabled, which is not normally the case. Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>