summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-09ASoC: add simple-muxAlexandre Belloni
Add a driver for simple mux driven by gpios. It currently only supports one gpio, muxing one of two inputs to a single output. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201205001508.346439-2-alexandre.belloni@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-09ASoC: add simple-audio-mux bindingAlexandre Belloni
Add devicetree documentation for simple audio multiplexers Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201205001508.346439-1-alexandre.belloni@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-09ASoC: SOF: Intel: add SoundWire support for ADL-SKai Vehmanen
Expand SOF support for Alder Lake by adding ACPI machine tables for ADL-S systems with SoundWire codecs. Modify kernel config to choose SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE for these platforms. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20201209153102.3028310-2-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-09ASoC: Intel: common: add ACPI matching tables for Alder LakeKai Vehmanen
Initial support for ADL w/ RT711 Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20201209153102.3028310-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-09can: isotp: isotp_setsockopt(): block setsockopt on bound socketsOliver Hartkopp
The isotp socket can be widely configured in its behaviour regarding addressing types, fill-ups, receive pattern tests and link layer length. Usually all these settings need to be fixed before bind() and can not be changed afterwards. This patch adds a check to enforce the common usage pattern. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Thomas Wagner <thwa1@web.de> Link: https://lore.kernel.org/r/20201203140604.25488-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20201204133508.742120-3-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09ice: Add space to unknown speedSimon Perron Caissy
Add space to the end of 'Unknown' string in order to avoid concatenation with 'bps' string when formatting netdev log message. Signed-off-by: Simon Perron Caissy <simon.perron.caissy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: join format strings to same line as ice_debugJacob Keller
When printing messages with ice_debug, align the printed string to the origin line of the message in order to ease debugging and tracking messages back to their source. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: silence static analysis warningBruce Allan
sparse warns about cast to/from restricted types which is not an actual problem; silence the warning. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: cleanup misleading commentBruce Allan
The maximum Admin Queue buffer size and NVM shadow RAM sector size are both 4 Kilobytes. Some comments refer to those as 4Kb which can be confused with 4 Kilobits. Update the comments to use the commonly used KB symbol instead. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: Remove vlan_ena from vsi structureNick Nunley
vlan_ena was introduced to track whether VLAN filters are enabled on the device, but 1) checking for num_vlan > 1 already gives us this information, and is currently used in this way throughout the code 2) the logic for vlan_ena is broken when multiple VLANs are active Just remove vlan_ena and use num_vlan instead. Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: Remove gate to OROM initJeb Cramer
Remove the gate that prevents the OROM and netlist info from being populated. The NVM now has the appropriate section for software to reference the versioning info. Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: Enable Support for FW Override (E82X)Jeb Cramer
The driver is able to override the firmware when it comes to supporting a more lenient link mode. This feature was limited to E810 devices. It is now extended to E82X devices. Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: don't always return an error for Get PHY Abilities AQ commandPaul M Stillwell Jr
There are times when the driver shouldn't return an error when the Get PHY abilities AQ command (0x0600) returns an error. Instead the driver should log that the error occurred and continue on. This allows the driver to load even though the AQ command failed. The user can then later determine the reason for the failure and correct it. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09ice: cleanup stack hogBruce Allan
In ice_flow_add_prof_sync(), struct ice_flow_prof_params has recently grown in size hogging stack space when allocated there. Hogging stack space should be avoided. Change allocation to be on the heap when needed. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Harikumar Bokkena <harikumarx.bokkena@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09perf/x86/intel: Add Tremont Topdown supportKan Liang
Tremont has four L1 Topdown events, TOPDOWN_FE_BOUND.ALL, TOPDOWN_BAD_SPECULATION.ALL, TOPDOWN_BE_BOUND.ALL and TOPDOWN_RETIRING.ALL. They are available on GP counters. Export them to sysfs and facilitate the perf stat tool. $perf stat --topdown -- sleep 1 Performance counter stats for 'sleep 1': retiring bad speculation frontend bound backend bound 24.9% 16.8% 31.7% 26.6% 1.001224610 seconds time elapsed 0.001150000 seconds user 0.000000000 seconds sys Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1607457952-3519-1-git-send-email-kan.liang@linux.intel.com
2020-12-09uprobes/x86: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://github.com/KSPP/linux/issues/115
2020-12-09perf/x86: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a fallthrough pseudo-keyword as a replacement for a /* fall through */ comment, instead of letting the code fall through to the next case. Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://github.com/KSPP/linux/issues/115
2020-12-09kprobes/x86: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://github.com/KSPP/linux/issues/115
2020-12-09perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()Kan Liang
The cycle count of a timed LBR is always 1 in perf record -D. The cycle count is stored in the first 16 bits of the IA32_LBR_x_INFO register, but the get_lbr_cycles() return Boolean type. Use u16 to replace the Boolean type. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20201125213720.15692-2-kan.liang@linux.intel.com
2020-12-09perf/x86/intel: Fix rtm_abort_event encoding on Ice LakeKan Liang
According to the event list from icelake_core_v1.09.json, the encoding of the RTM_RETIRED.ABORTED event on Ice Lake should be, "EventCode": "0xc9", "UMask": "0x04", "EventName": "RTM_RETIRED.ABORTED", Correct the wrong encoding. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20201125213720.15692-1-kan.liang@linux.intel.com
2020-12-09x86/kprobes: Restore BTF if the single-stepping is cancelledMasami Hiramatsu
Fix to restore BTF if single-stepping causes a page fault and it is cancelled. Usually the BTF flag was restored when the single stepping is done (in resume_execution()). However, if a page fault happens on the single stepping instruction, the fault handler is invoked and the single stepping is cancelled. Thus, the BTF flag is not restored. Fixes: 1ecc798c6764 ("x86: debugctlmsr kprobes") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/160389546985.106936.12727996109376240993.stgit@devnote2
2020-12-09perf: Break deadlock involving exec_update_mutexpeterz@infradead.org
Syzbot reported a lock inversion involving perf. The sore point being perf holding exec_update_mutex() for a very long time, specifically across a whole bunch of filesystem ops in pmu::event_init() (uprobes) and anon_inode_getfile(). This then inverts against procfs code trying to take exec_update_mutex. Move the permission checks later, such that we need to hold the mutex over less code. Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-12-09sparc64/mm: Implement pXX_leaf_size() supportPeter Zijlstra
Sparc64 has non-pagetable aligned large page support; wire up the pXX_leaf_size() functions to report the correct pagetable page size. This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate pagetable leaf sizes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201126121121.301768209@infradead.org
2020-12-09powerpc/8xx: Implement pXX_leaf_size() supportPeter Zijlstra
Christophe Leroy wrote: > I can help with powerpc 8xx. It is a 32 bits powerpc. The PGD has 1024 > entries, that means each entry maps 4M. > > Page sizes are 4k, 16k, 512k and 8M. > > For the 8M pages we use hugepd with a single entry. The two related PGD > entries point to the same hugepd. > > For the other sizes, they are in standard page tables. 16k pages appear > 4 times in the page table. 512k entries appear 128 times in the page > table. > > When the PGD entry has _PMD_PAGE_8M bits, the PMD entry points to a > hugepd with holds the single 8M entry. > > In the PTE, we have two bits: _PAGE_SPS and _PAGE_HUGE > > _PAGE_HUGE means it is a 512k page > _PAGE_SPS means it is not a 4k page > > The kernel can by build either with 4k pages as standard page size, or > 16k pages. It doesn't change the page table layout though. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201126121121.364451610@infradead.org
2020-12-09seqlock: kernel-doc: Specify when preemption is automatically alteredAhmed S. Darwish
The kernel-doc annotations for sequence counters write side functions are incomplete: they do not specify when preemption is automatically disabled and re-enabled. This has confused a number of call-site developers. Fix it. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/CAHk-=wikhGExmprXgaW+MVXG1zsGpztBbVwOb23vetk41EtTBQ@mail.gmail.com
2020-12-09seqlock: Prefix internal seqcount_t-only macros with a "do_"Ahmed S. Darwish
When the seqcount_LOCKNAME_t group of data types were introduced, two classes of seqlock.h sequence counter macros were added: - An external public API which can either take a plain seqcount_t or any of the seqcount_LOCKNAME_t variants. - An internal API which takes only a plain seqcount_t. To distinguish between the two groups, the "*_seqcount_t_*" pattern was used for the latter. This confused a number of mm/ call-site developers, and Linus also commented that it was not a standard practice for marking seqlock.h internal APIs. Distinguish the latter group of macros by prefixing a "do_". Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/CAHk-=wikhGExmprXgaW+MVXG1zsGpztBbVwOb23vetk41EtTBQ@mail.gmail.com
2020-12-09Documentation: seqlock: s/LOCKTYPE/LOCKNAME/gAhmed S. Darwish
Sequence counters with an associated write serialization lock are called seqcount_LOCKNAME_t. Fix the documentation accordingly. While at it, remove a paragraph that inappropriately discussed a seqlock.h implementation detail. Fixes: 6dd699b13d53 ("seqlock: seqcount_LOCKNAME_t: Standardize naming convention") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20201206162143.14387-2-a.darwish@linutronix.de
2020-12-09locking/rwsem: Remove reader optimistic spinningWaiman Long
Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It also improves the chance that a reader can get the lock as writer optimistic spinning disproportionally favors writers much more than readers. Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers in wait queue"), all the waiting readers are woken up so that they can all get the read lock and run in parallel. When the number of contending readers is large, allowing reader optimistic spinning will likely cause reader fragmentation where multiple smaller groups of readers can get the read lock in a sequential manner separated by writers. That reduces reader parallelism. One possible way to address that drawback is to limit the number of readers (preferably one) that can do optimistic spinning. These readers act as representatives of all the waiting readers in the wait queue as they will wake up all those waiting readers once they get the lock. Alternatively, as reader optimistic lock stealing has already enhanced fairness to readers, it may be easier to just remove reader optimistic spinning and simplifying the optimistic spinning code as a result. Performance measurements (locking throughput kops/s) using a locking microbenchmark with 50/50 reader/writer distribution and turbo-boost disabled was done on a 2-socket Cascade Lake system (48-core 96-thread) to see the impacts of these changes: 1) Vanilla - 5.10-rc3 kernel 2) Before - 5.10-rc3 kernel with previous patches in this series 2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch 3) no-rspin - 5.10-rc3 kernel with reader spinning disabled # of threads CS Load Vanilla Before limit-rspin no-rspin ------------ ------- ------- ------ ----------- -------- 2 1 5,185 5,662 5,214 5,077 4 1 5,107 4,983 5,188 4,760 8 1 4,782 4,564 4,720 4,628 16 1 4,680 4,053 4,567 3,402 32 1 4,299 1,115 1,118 1,098 64 1 3,218 983 1,001 957 96 1 1,938 944 957 930 2 20 2,008 2,128 2,264 1,665 4 20 1,390 1,033 1,046 1,101 8 20 1,472 1,155 1,098 1,213 16 20 1,332 1,077 1,089 1,122 32 20 967 914 917 980 64 20 787 874 891 858 96 20 730 836 847 844 2 100 372 356 360 355 4 100 492 425 434 392 8 100 533 537 529 538 16 100 548 572 568 598 32 100 499 520 527 537 64 100 466 517 526 512 96 100 406 497 506 509 The column "CS Load" represents the number of pause instructions issued in the locking critical section. A CS load of 1 is extremely short and is not likey in real situations. A load of 20 (moderate) and 100 (long) are more realistic. It can be seen that the previous patches in this series have reduced performance in general except in highly contended cases with moderate or long critical sections that performance improves a bit. This change is mostly caused by the "Prevent potential lock starvation" patch that reduce reader optimistic spinning and hence reduce reader fragmentation. The patch that further limit reader optimistic spinning doesn't seem to have too much impact on overall performance as shown in the benchmark data. The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, but comparable or slightly better performance on with heavier contention. This patch just removes reader optimistic spinning for now. As readers are not going to do optimistic spinning anymore, we don't need to consider if the OSQ is empty or not when doing lock stealing. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-6-longman@redhat.com
2020-12-09locking/rwsem: Enable reader optimistic lock stealingWaiman Long
If the optimistic spinning queue is empty and the rwsem does not have the handoff or write-lock bits set, it is actually not necessary to call rwsem_optimistic_spin() to spin on it. Instead, it can steal the lock directly as its reader bias is in the count already. If it is the first reader in this state, it will try to wake up other readers in the wait queue. With this patch applied, the following were the lock event counts after rebooting a 2-socket system and a "make -j96" kernel rebuild. rwsem_opt_rlock=4437 rwsem_rlock=29 rwsem_rlock_steal=19 So lock stealing represents about 0.4% of all the read locks acquired in the slow path. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-4-longman@redhat.com
2020-12-09locking/rwsem: Prevent potential lock starvationWaiman Long
The lock handoff bit is added in commit 4f23dbc1e657 ("locking/rwsem: Implement lock handoff to prevent lock starvation") to avoid lock starvation. However, allowing readers to do optimistic spinning does introduce an unlikely scenario where lock starvation can happen. The lock handoff bit may only be set when a waiter is being woken up. In the case of reader unlock, wakeup happens only when the reader count reaches 0. If there is a continuous stream of incoming readers acquiring read lock via optimistic spinning, it is possible that the reader count may never reach 0 and so the handoff bit will never be asserted. One way to prevent this scenario from happening is to disallow optimistic spinning if the rwsem is currently owned by readers. If the previous or current owner is a writer, optimistic spinning will be allowed. If the previous owner is a reader but the reader count has reached 0 before, a wakeup should have been issued. So the handoff mechanism will be kicked in to prevent lock starvation. As a result, it should be OK to do optimistic spinning in this case. This patch may have some impact on reader performance as it reduces reader optimistic spinning especially if the lock critical sections are short the number of contending readers are small. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-3-longman@redhat.com
2020-12-09locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()Waiman Long
The atomic count value right after reader count increment can be useful to determine the rwsem state at trylock time. So the count value is passed down to rwsem_down_read_slowpath() to be used when appropriate. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-2-longman@redhat.com
2020-12-09locking/rwsem: Fold __down_{read,write}*()Peter Zijlstra
There's a lot needless duplication in __down_{read,write}*(), cure that with a helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
2020-12-09locking/rwsem: Introduce rwsem_write_trylock()Peter Zijlstra
One copy of this logic is better than three. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
2020-12-09locking/rwsem: Better collate rwsem_read_trylock()Peter Zijlstra
All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on success, move it into rwsem_read_trylock() proper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
2020-12-09Merge branch 'locking/rwsem'Peter Zijlstra
2020-12-09rwsem: Implement down_read_interruptibleEric W. Biederman
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
2020-12-09rwsem: Implement down_read_killable_nestedEric W. Biederman
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
2020-12-09Merge branch 'bpf-xsk-selftests'Daniel Borkmann
Weqaar Janjua says: ==================== This patch set adds AF_XDP selftests based on veth to selftests/bpf. # Topology: # --------- # ----------- # _ | Process | _ # / ----------- \ # / | \ # / | \ # ----------- | ----------- # | Thread1 | | | Thread2 | # ----------- | ----------- # | | | # ----------- | ----------- # | xskX | | | xskY | # ----------- | ----------- # | | | # ----------- | ---------- # | vethX | --------- | vethY | # ----------- peer ---------- # | | | # namespaceX | namespaceY These selftests test AF_XDP SKB and Native/DRV modes using veth Virtual Ethernet interfaces. The test program contains two threads, each thread is single socket with a unique UMEM. It validates in-order packet delivery and packet content by sending packets to each other. Prerequisites setup by script test_xsk.sh: Set up veth interfaces as per the topology shown ^^: * setup two veth interfaces and one namespace ** veth<xxxx> in root namespace ** veth<yyyy> in af_xdp<xxxx> namespace ** namespace af_xdp<xxxx> * create a spec file veth.spec that includes this run-time configuration *** xxxx and yyyy are randomly generated 4 digit numbers used to avoid conflict with any existing interface Adds xsk framework test to validate veth xdp DRV and SKB modes. The following tests are provided: 1. AF_XDP SKB mode Generic mode XDP is driver independent, used when the driver does not have support for XDP. Works on any netdevice using sockets and generic XDP path. XDP hook from netif_receive_skb(). a. nopoll - soft-irq processing b. poll - using poll() syscall c. Socket Teardown Create a Tx and a Rx socket, Tx from one socket, Rx on another. Destroy both sockets, then repeat multiple times. Only nopoll mode is used d. Bi-directional Sockets Configure sockets as bi-directional tx/rx sockets, sets up fill and completion rings on each socket, tx/rx in both directions. Only nopoll mode is used 2. AF_XDP DRV/Native mode Works on any netdevice with XDP_REDIRECT support, driver dependent. Processes packets before SKB allocation. Provides better performance than SKB. Driver hook available just after DMA of buffer descriptor. a. nopoll b. poll c. Socket Teardown d. Bi-directional Sockets * Only copy mode is supported because veth does not currently support zero-copy mode Total tests: 8 Flow: * Single process spawns two threads: Tx and Rx * Each of these two threads attach to a veth interface within their assigned namespaces * Each thread creates one AF_XDP socket connected to a unique umem for each veth interface * Tx thread transmits 10k packets from veth<xxxx> to veth<yyyy> * Rx thread verifies if all 10k packets were received and delivered in-order, and have the right content v2 changes: * Move selftests/xsk to selftests/bpf * Remove Makefiles under selftests/xsk, and utilize selftests/bpf/Makefile v3 changes: * merge all test scripts test_xsk_*.sh into test_xsk.sh v4 changes: * merge xsk_env.sh into xsk_prereqs.sh * test_xsk.sh add cliarg -c for color-coded output * test_xsk.sh PREREQUISITES disables IPv6 on veth interfaces * test_xsk.sh PREREQUISITES adds xsk framework test * test_xsk.sh is independently executable * xdpxceiver.c Tx/Rx validates only IPv4 packets with TOS 0x9, ignores others ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-12-09selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRVWeqaar Janjua
Adds following tests: 1. AF_XDP SKB mode d. Bi-directional Sockets Configure sockets as bi-directional tx/rx sockets, sets up fill and completion rings on each socket, tx/rx in both directions. Only nopoll mode is used 2. AF_XDP DRV/Native mode d. Bi-directional Sockets * Only copy mode is supported because veth does not currently support zero-copy mode Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yhs@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201207215333.11586-6-weqaar.a.janjua@intel.com
2020-12-09selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRVWeqaar Janjua
Adds following tests: 1. AF_XDP SKB mode c. Socket Teardown Create a Tx and a Rx socket, Tx from one socket, Rx on another. Destroy both sockets, then repeat multiple times. Only nopoll mode is used 2. AF_XDP DRV/Native mode c. Socket Teardown * Only copy mode is supported because veth does not currently support zero-copy mode Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yhs@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201207215333.11586-5-weqaar.a.janjua@intel.com
2020-12-09selftests/bpf: Xsk selftests - DRV POLL, NOPOLLWeqaar Janjua
Adds following tests: 2. AF_XDP DRV/Native mode Works on any netdevice with XDP_REDIRECT support, driver dependent. Processes packets before SKB allocation. Provides better performance than SKB. Driver hook available just after DMA of buffer descriptor. a. nopoll b. poll * Only copy mode is supported because veth does not currently support zero-copy mode Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yhs@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201207215333.11586-4-weqaar.a.janjua@intel.com
2020-12-09selftests/bpf: Xsk selftests - SKB POLL, NOPOLLWeqaar Janjua
Adds following tests: 1. AF_XDP SKB mode Generic mode XDP is driver independent, used when the driver does not have support for XDP. Works on any netdevice using sockets and generic XDP path. XDP hook from netif_receive_skb(). a. nopoll - soft-irq processing b. poll - using poll() syscall Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yhs@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201207215333.11586-3-weqaar.a.janjua@intel.com
2020-12-09selftests/bpf: Xsk selftests frameworkWeqaar Janjua
This patch adds AF_XDP selftests framework under selftests/bpf. Topology: --------- ----------- ----------- | xskX | --------- | xskY | ----------- | ----------- | | | ----------- | ---------- | vethX | --------- | vethY | ----------- peer ---------- | | | namespaceX | namespaceY Prerequisites setup by script test_xsk.sh: Set up veth interfaces as per the topology shown ^^: * setup two veth interfaces and one namespace ** veth<xxxx> in root namespace ** veth<yyyy> in af_xdp<xxxx> namespace ** namespace af_xdp<xxxx> * create a spec file veth.spec that includes this run-time configuration *** xxxx and yyyy are randomly generated 4 digit numbers used to avoid conflict with any existing interface * tests the veth and xsk layers of the topology Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yhs@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201207215333.11586-2-weqaar.a.janjua@intel.com
2020-12-09Merge branch 'bpf-xdp-offload-fixes'Daniel Borkmann
Toke Høiland-Jørgensen says: ==================== This series restores the test_offload.py selftest to working order. It seems a number of subtle behavioural changes have crept into various subsystems which broke test_offload.py in a number of ways. Most of these are fairly benign changes where small adjustments to the test script seems to be the best fix, but one is an actual kernel bug that I've observed in the wild caused by a bad interaction between xdp_attachment_flags_ok() and the rework of XDP program handling in the core netdev code. Patch 1 fixes the bug by removing xdp_attachment_flags_ok(), and the reminder of the patches are adjustments to test_offload.py, including a new feature for netdevsim to force a BPF verification fail. Please see the individual patches for details. Changelog: v4: - Accidentally truncated the Fixes: hashes in patches 3/4 to 11 chars v3: - Add Fixes: tags v2: - Replace xdp_attachment_flags_ok() with a check in dev_xdp_attach() - Better packing of struct nsim_dev ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-12-09selftests/bpf/test_offload.py: Filter bpftool internal map when counting mapsToke Høiland-Jørgensen
A few of the tests in test_offload.py expects to see a certain number of maps created, and checks this by counting the number of maps returned by bpftool. There is already a filter that will remove any maps already there at the beginning of the test, but bpftool now creates a map for the PID iterator rodata on each invocation, which makes the map count wrong. Fix this by also filtering the pid_iter.rodata map by name when counting. Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752226387.110217.9887866138149423444.stgit@toke.dk
2020-12-09selftests/bpf/test_offload.py: Reset ethtool features after failed settingToke Høiland-Jørgensen
When setting the ethtool feature flag fails (as expected for the test), the kernel now tracks that the feature was requested to be 'off' and refuses to subsequently disable it again. So reset it back to 'on' so a subsequent disable (that's not supposed to fail) can succeed. Fixes: 417ec26477a5 ("selftests/bpf: add offload test based on netdevsim") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752226280.110217.10696241563705667871.stgit@toke.dk
2020-12-09selftests/bpf/test_offload.py: Fix expected case of extack messagesToke Høiland-Jørgensen
Commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") changed the case of some of the extack messages being returned when attaching of XDP programs failed. This broke test_offload.py, so let's fix the test to reflect this. Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752226175.110217.11214100824416344952.stgit@toke.dk
2020-12-09selftests/bpf/test_offload.py: Only check verifier log on verification failsToke Høiland-Jørgensen
Since commit 6f8a57ccf851 ("bpf: Make verifier log more relevant by default"), the verifier discards log messages for successfully-verified programs. This broke test_offload.py which is looking for a verification message from the driver callback. Change test_offload.py to use the toggle in netdevsim to make the verification fail before looking for the verification message. Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752226069.110217.12370824996153348073.stgit@toke.dk
2020-12-09netdevsim: Add debugfs toggle to reject BPF programs in verifierToke Høiland-Jørgensen
This adds a new debugfs toggle ('bpf_bind_verifier_accept') that can be used to make netdevsim reject BPF programs from being accepted by the verifier. If this toggle (which defaults to true) is set to false, nsim_bpf_verify_insn() will return EOPNOTSUPP on the last instruction (after outputting the 'Hello from netdevsim' verifier message). This makes it possible to check the verification callback in the driver from test_offload.py in selftests, since the verifier now clears the verifier log on a successful load, hiding the message from the driver. Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752225964.110217.12584017165318065332.stgit@toke.dk
2020-12-09selftests/bpf/test_offload.py: Remove check for program load flags matchToke Høiland-Jørgensen
Since we just removed the xdp_attachment_flags_ok() callback, also remove the check for it in test_offload.py, and replace it with a test for the new ambiguity-avoid check when multiple programs are loaded. Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752225858.110217.13036901876869496246.stgit@toke.dk