summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-07octeontx2-af: Fix interrupt name stringsSunil Goutham
Fixed interrupt name string logic which currently results in wrong memory location being accessed while dumping /proc/interrupts. Fixes: 4826090719d4 ("octeontx2-af: Enable CPT HW interrupts") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/1641538505-28367-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07Merge branch 'mptcp-refactoring-for-one-selftest-and-csum-validation'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Refactoring for one selftest and csum validation Patch 1 changes the MPTCP join self tests to depend more on events rather than delays, so the script runs faster and has more consistent results. Patches 2 and 3 get rid of some duplicate code in MPTCP's checksum validation by modifying and leveraging an existing helper function. ==================== Link: https://lore.kernel.org/r/20220107192524.445137-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07mptcp: reuse __mptcp_make_csum in validate_data_csumGeliang Tang
This patch reused __mptcp_make_csum() in validate_data_csum() instead of open-coding. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07mptcp: change the parameter of __mptcp_make_csumGeliang Tang
This patch changed the type of the last parameter of __mptcp_make_csum() from __sum16 to __wsum. And export this function in protocol.h. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07selftests: mptcp: more stable join tests-casesPaolo Abeni
MPTCP join self-tests are a bit fragile as they reply on delays instead of events to catch-up with the expected sockets states. Replace the delay with state checking where possible and reduce the number of sleeps in the most complex scenarios. This will both reduce the tests run-time and will improve stability. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07net: dsa: felix: add port fast age supportVladimir Oltean
Add support for flushing the MAC table on a given port in the ocelot switch library, and use this functionality in the felix DSA driver. This operation is needed when a port leaves a bridge to become standalone, and when the learning is disabled, and when the STP state changes to a state where no FDB entry should be present. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07net: mscc: ocelot: fix incorrect balancing with down LAG portsVladimir Oltean
Assuming the test setup described here: https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/ (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0) it can be seen that when swp1 goes down (on either board A or B), then traffic that should go through that port isn't forwarded anywhere. A dump of the PGID table shows the following: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1 PGID_DST[2] = ports 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 Whereas a "good" PGID configuration for that setup should have looked like this: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1, 2 PGID_DST[2] = ports 1, 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 In other words, in the "bad" configuration, the attempt is to remove the inactive swp1 from the destination ports via PGID_DST. But when a MAC table entry is learned, it is learned towards PGID_DST 1, because that is the logical port id of the LAG itself (it is equal to the lowest numbered member port). So when swp1 becomes inactive, if we set PGID_DST[1] to contain just swp1 and not swp2, the packet will not have any chance to reach the destination via swp2. The "correct" way to remove swp1 as a destination is via PGID_AGGR (remove swp1 from the aggregation port groups for all aggregation codes). This means that PGID_DST[1] and PGID_DST[2] must still contain both swp1 and swp2. This makes the MAC table still treat packets destined towards the single-port LAG as "multicast", and the inactive ports are removed via the aggregation code tables. The change presented here is a design one: the ocelot_get_bond_mask() function used to take an "only_active_ports" argument. We don't need that. The only call site that specifies only_active_ports=true, ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because it must program that into PGID_DST. Additionally, it must also clear the inactive ports from the bond mask here, which it can't do if bond_mask just contains the active ports: ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i); ac &= ~bond_mask; <---- here /* Don't do division by zero if there was no active * port. Just make all aggregation codes zero. */ if (num_active_ports) ac |= BIT(aggr_idx[i % num_active_ports]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); So it becomes the responsibility of ocelot_set_aggr_pgids() to take ocelot_port->lag_tx_active into consideration when populating the aggr_idx array. Fixes: 23ca3b727ee6 ("net: mscc: ocelot: rebalance LAGs on link up/down events") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-01-07 This series contains updates to i40e and iavf drivers. Karen limits per VF MAC filters so that one VF does not consume all filters for i40e. Jedrzej reduces busy wait time for admin queue calls for i40e. Mateusz updates firmware versions to reflect new supported NVM images and renames an error to remove non-inclusive language for i40e. Yang Li fixes a set but not used warning for i40e. Jason Wang removes an unneeded variable for iavf. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: remove an unneeded variable i40e: remove variables set but not used i40e: Remove non-inclusive language i40e: Update FW API version i40e: Minimize amount of busy-waiting during AQ send i40e: Add ensurance of MacVlan resources for every trusted VF ==================== Link: https://lore.kernel.org/r/20220107175704.438387-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07net/tls: Fix skb memory leak when running kTLS trafficGal Pressman
The cited Fixes commit introduced a memory leak when running kTLS traffic (with/without hardware offloads). I'm running nginx on the server side and wrk on the client side and get the following: unreferenced object 0xffff8881935e9b80 (size 224): comm "softirq", pid 0, jiffies 4294903611 (age 43.204s) hex dump (first 32 bytes): 80 9b d0 36 81 88 ff ff 00 00 00 00 00 00 00 00 ...6............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000efe2a999>] build_skb+0x1f/0x170 [<00000000ef521785>] mlx5e_skb_from_cqe_mpwrq_linear+0x2bc/0x610 [mlx5_core] [<00000000945d0ffe>] mlx5e_handle_rx_cqe_mpwrq+0x264/0x9e0 [mlx5_core] [<00000000cb675b06>] mlx5e_poll_rx_cq+0x3ad/0x17a0 [mlx5_core] [<0000000018aac6a9>] mlx5e_napi_poll+0x28c/0x1b60 [mlx5_core] [<000000001f3369d1>] __napi_poll+0x9f/0x560 [<00000000cfa11f72>] net_rx_action+0x357/0xa60 [<000000008653b8d7>] __do_softirq+0x282/0x94e [<00000000644923c6>] __irq_exit_rcu+0x11f/0x170 [<00000000d4085f8f>] irq_exit_rcu+0xa/0x20 [<00000000d412fef4>] common_interrupt+0x7d/0xa0 [<00000000bfb0cebc>] asm_common_interrupt+0x1e/0x40 [<00000000d80d0890>] default_idle+0x53/0x70 [<00000000f2b9780e>] default_idle_call+0x8c/0xd0 [<00000000c7659e15>] do_idle+0x394/0x450 I'm not familiar with these areas of the code, but I've added this sk_defer_free_flush() to tls_sw_recvmsg() based on a hunch and it resolved the issue. Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220102081253.9123-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07Merge branch 'for-5.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This contains the cgroup.procs permission check fixes so that they use the credentials at the time of open rather than write, which also fixes the cgroup namespace lifetime bug" * 'for-5.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: selftests: cgroup: Test open-time cgroup namespace usage for migration checks selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 cgroup: Use open-time cgroup namespace for process migration perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time credentials for process migraton perm checks
2022-01-07cpuset: convert 'allowed' in __cpuset_node_allowed() to be booleanQi Zheng
Convert 'allowed' in __cpuset_node_allowed() to be boolean since the return types of node_isset() and __cpuset_node_allowed() are both boolean. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-07Merge tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fix from Jens Axboe: "Just the md bitmap regression this time" * tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-block: md/raid1: fix missing bitmap update w/o WriteMostly devices
2022-01-07Merge tag 'edac_urgent_for_v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Tony Luck: "Fix 10nm EDAC driver to release and unmap resources on systems without HBM" * tag 'edac_urgent_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Release mdev/mbase when failing to detect HBM
2022-01-07Revert "i2c: core: support bus regulator controlling in adapter"Wolfram Sang
This largely reverts commit 5a7b95fb993ec399c8a685552aa6a8fc995c40bd. It breaks suspend with AMD GPUs, and we couldn't incrementally fix it. So, let's remove the code and go back to the drawing board. We keep the header extension to not break drivers already populating the regulator. We expect to re-add the code handling it soon. Fixes: 5a7b95fb993e ("i2c: core: support bus regulator controlling in adapter") Reported-by: "Tareque Md.Hanif" <tarequemd.hanif@yahoo.com> Link: https://lore.kernel.org/r/1295184560.182511.1639075777725@mail.yahoo.com Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> Link: https://lore.kernel.org/r/7143a7147978f4104171072d9f5225d2ce355ec1.camel@yandex.ru BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1850 Tested-by: "Tareque Md.Hanif" <tarequemd.hanif@yahoo.com> Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Signed-off-by: Wolfram Sang <wsa@kernel.org> Cc: <stable@vger.kernel.org> # 5.14+
2022-01-07Revert "libtraceevent: Increase libtraceevent logging when verbose"Arnaldo Carvalho de Melo
This reverts commit 08efcb4a638d260ef7fcbae64ecf7ceceb3f1841. This breaks the build as it will prefer using libbpf-devel header files, even when not using LIBBPF_DYNAMIC=1, breaking the build. This was detected on OpenSuSE Tumbleweed with libtraceevent-devel 1.3.0, as described by Jiri Slaby: ======================================================================= It breaks build with LIBTRACEEVENT_DYNAMIC and version 1.3.0: > util/debug.c: In function ‘perf_debug_option’: > util/debug.c:243:17: error: implicit declaration of function ‘tep_set_loglevel’ [-Werror=implicit-function-declaration] > 243 | tep_set_loglevel(TEP_LOG_INFO); > | ^~~~~~~~~~~~~~~~ > util/debug.c:243:34: error: ‘TEP_LOG_INFO’ undeclared (first use in this function); did you mean ‘TEP_PRINT_INFO’? > 243 | tep_set_loglevel(TEP_LOG_INFO); > | ^~~~~~~~~~~~ > | TEP_PRINT_INFO > util/debug.c:243:34: note: each undeclared identifier is reported only once for each function it appears in > util/debug.c:245:34: error: ‘TEP_LOG_DEBUG’ undeclared (first use in this function) > 245 | tep_set_loglevel(TEP_LOG_DEBUG); > | ^~~~~~~~~~~~~ > util/debug.c:247:34: error: ‘TEP_LOG_ALL’ undeclared (first use in this function) > 247 | tep_set_loglevel(TEP_LOG_ALL); > | ^~~~~~~~~~~ It is because the gcc's command line looks like: gcc ... -I/home/abuild/rpmbuild/BUILD/tools/lib/ ... -DLIBTRACEEVENT_VERSION=65790 ... ======================================================================= The proper way to fix this is more involved and so not suitable for this late in the 5.16-rc stage. Reported-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/lkml/bc2b0786-8965-1bcd-2316-9d9bb37b9c31@kernel.org Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/YddGjjmlMZzxUZbN@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-07perf trace: Avoid early exit due to running SIGCHLD handler before it makes ↵Jiri Olsa
sense to When running 'perf trace' with an BPF object like: # perf trace -e openat,tools/perf/examples/bpf/hello.c the event parsing eventually calls llvm__get_kbuild_opts() that runs a script and that ends up with SIGCHLD delivered to the 'perf trace' handler, which assumes the workload process is done and quits 'perf trace'. Move the SIGCHLD handler setup directly to trace__run(), where the event is parsed and the object is already compiled. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Christy Lee <christyc.y.lee@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220106222030.227499-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Two small fixes for x86: - lockdep WARN due to missing lock nesting annotation - NULL pointer dereference when accessing debugfs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Check for rmaps allocation KVM: SEV: Mark nested locking of kvm->lock
2022-01-07Merge tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "There is only the amdgpu runtime pm regression fix in here: amdgpu: - suspend/resume fix - fix runtime PM regression" * tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: disable runpm if we are the primary adapter fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb drm/amd/pm: keep the BACO feature enabled for suspend
2022-01-07iavf: remove an unneeded variableJason Wang
The variable `ret_code' used for returning is never changed in function `iavf_shutdown_adminq'. So that it can be removed and just return its initial value 0 at the end of `iavf_shutdown_adminq' function. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07i40e: remove variables set but not usedYang Li
The code that uses variables pe_cntx_size and pe_filt_size has been removed, so they should be removed as well. Eliminate the following clang warnings: drivers/net/ethernet/intel/i40e/i40e_common.c:4139:20: warning: variable 'pe_filt_size' set but not used. drivers/net/ethernet/intel/i40e/i40e_common.c:4139:6: warning: variable 'pe_cntx_size' set but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07i40e: Remove non-inclusive languageMateusz Palczewski
Remove non-inclusive language from the driver. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07i40e: Update FW API versionMateusz Palczewski
Update FW API versions to the newest supported NVM images. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07i40e: Minimize amount of busy-waiting during AQ sendJedrzej Jagielski
The i40e_asq_send_command will now use a non blocking usleep_range if possible (non-atomic context), instead of busy-waiting udelay. The usleep_range function uses hrtimers to provide better performance and removes the negative impact of busy-waiting in time-critical environments. 1. Rename i40e_asq_send_command to i40e_asq_send_command_atomic and add 5th parameter to inform if called from an atomic context. Call inside usleep_range (if non-atomic) or udelay (if atomic). 2. Change i40e_asq_send_command to invoke i40e_asq_send_command_atomic(..., false). 3. Change two functions: - i40e_aq_set_vsi_uc_promisc_on_vlan - i40e_aq_set_vsi_mc_promisc_on_vlan to explicitly use i40e_asq_send_command_atomic(..., true) instead of i40e_asq_send_command, as they use spinlocks and do some work in an atomic context. All other calls to i40e_asq_send_command remain unchanged. Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07KVM: x86: Check for rmaps allocationNikunj A Dadhania
With TDP MMU being the default now, access to mmu_rmaps_stat debugfs file causes following oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 3185 Comm: cat Not tainted 5.16.0-rc4+ #204 RIP: 0010:pte_list_count+0x6/0x40 Call Trace: <TASK> ? kvm_mmu_rmaps_stat_show+0x15e/0x320 seq_read_iter+0x126/0x4b0 ? aa_file_perm+0x124/0x490 seq_read+0xf5/0x140 full_proxy_read+0x5c/0x80 vfs_read+0x9f/0x1a0 ksys_read+0x67/0xe0 __x64_sys_read+0x19/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fca6fc13912 Return early when rmaps are not present. Reported-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220105040337.4234-1-nikunj@amd.com> Cc: stable@vger.kernel.org Fixes: 3bcd0662d66f ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07i40e: Add ensurance of MacVlan resources for every trusted VFKaren Sornek
Trusted VF can use up every resource available, leaving nothing to other trusted VFs. Introduce define, which calculates MacVlan resources available based on maximum available MacVlan resources, bare minimum for each VF and number of currently allocated VFs. Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07KVM: SEV: Mark nested locking of kvm->lockWanpeng Li
Both source and dest vms' kvm->locks are held in sev_lock_two_vms. Mark one with a different subtype to avoid false positives from lockdep. Fixes: c9d61dcb0bc26 (KVM: SEV: accept signals in sev_lock_two_vms) Reported-by: Yiru Xu <xyru1999@gmail.com> Tested-by: Jinrong Liang <cloudliang@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1641364863-26331-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/sgx: Fix NULL pointer dereference on non-SGX systemsDave Hansen
== Problem == Nathan Chancellor reported an oops when aceessing the 'sgx_total_bytes' sysfs file: https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/ The sysfs output code accesses the sgx_numa_nodes[] array unconditionally. However, this array is allocated during SGX initialization, which only occurs on systems where SGX is supported. If the sysfs file is accessed on systems without SGX support, sgx_numa_nodes[] is NULL and an oops occurs. == Solution == To fix this, hide the entire nodeX/x86/ attribute group on systems without SGX support using the ->is_visible attribute group callback. Unfortunately, SGX is initialized via a device_initcall() which occurs _after_ the ->is_visible() callback. Instead of moving SGX initialization earlier, call sysfs_update_group() during SGX initialization to update the group visiblility. This update requires moving the SGX sysfs code earlier in sgx/main.c. There are no code changes other than the addition of arch_update_sysfs_visibility() and a minor whitespace fixup to arch_node_attr_is_visible() which checkpatch caught. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-sgx@vger.kernel.org Cc: x86@kernel.org Fixes: 50468e431335 ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com
2022-01-07sch_cake: revise Diffserv docsKevin Bracey
Documentation incorrectly stated that CS1 is equivalent to LE for diffserv8. But when LE was added to the table, CS1 was pushed into tin 1, leaving only LE in tin 0. Also "TOS1" no longer exists, as that is the same codepoint as LE. Make other tweaks properly distinguishing codepoints from classes and putting current Diffserve codepoints ahead of legacy ones. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220106215637.3132391-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07scripts: sphinx-pre-install: Fix ctex support on DebianMauro Carvalho Chehab
The name of the package with ctexhook.sty is different on Debian/Ubuntu. Reported-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Tested-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/r/63882425609a2820fac78f5e94620abeb7ed5f6f.1641429634.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: discourage use of list tablesJonathan Corbet
Our documentation encourages the use of list-table formats, but that advice runs counter to the objective of keeping the plain-text documentation as useful and readable as possible. Turn that advice around the other way so that people don't keep adding these tables. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: 5.Posting.rst: describe Fixes: and Link: tagsThorsten Leemhuis
Explain Fixes: and Link: tags in Documentation/process/5.Posting.rst, which are missing in this file for unknown reasons and only described in Documentation/process/submitting-patches.rst. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> CC: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/c4a5f5e25fa84b26fd383bba6eafde4ab57c9de7.1641314856.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07Documentation: kgdb: Replace deprecated remotebaudChristian Löhle
Using set remotebaud to set the baud rate was deprecated in gdb-7.7 and completely removed from the command parser in gdb-7.8 (released in 2014). Adopt set serial baud instead. Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/4050689967ed46baaa3bfadda53a0e73@hyperstone.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: automarkup.py: Fix invalid HTML link output and broken URI fragmentsJames Clark
Since commit d18b01789ae5 ("docs: Add automatic cross-reference for documentation pages"), references that were already explicitly defined with "ref:" and referred to other pages with a path have been doubled. This is reported as the following error by Firefox: Start tag "a" seen but an element of the same type was already open. End tag "a" violates nesting rules. As well as the invalid HTML, this also obscures the URI fragment links to subsections because the second link overrides the first. For example on the page admin-guide/hw-vuln/mds.html the last link should be to the "Default Mitigations" subsection using a # URI fragment: admin-guide/hw-vuln/l1tf.html#default-mitigations But it is obsured by a second link to the whole page: admin-guide/hw-vuln/l1tf.html The full HTML with the double <a> tags looks like this: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref"> <a class="reference internal" href="l1tf.html"> <span class="doc">L1TF - L1 Terminal Fault</span> </a> </span> </a> After this commit, there is only a single link: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref">Documentation/admin-guide/hw-vuln//l1tf.rst</span> </a> Now that the second link is removed, the browser correctly jumps to the default-mitigations subsection when clicking the link. The fix is to check that nodes in the document to be modified are not already references. A reference is counted as any text that is a descendant of a reference type node. Only plain text should be converted to new references, otherwise the doubling occurs. Testing ======= * Test that the build stdout is the same (ignoring ordering), and that no new warnings are printed. * Diff all .html files and check that the only modifications occur to the bad double links. * The auto linking of bare references to pages without "ref:" is still working. Fixes: d18b01789ae5 ("docs: Add automatic cross-reference for documentation pages") Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20220105143640.330602-2-james.clark@arm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07netrom: fix api breakage in nr_setsockopt()Dan Carpenter
This needs to copy an unsigned int from user space instead of a long to avoid breaking user space with an API change. I have updated all the integer overflow checks from ULONG to UINT as well. This is a slight API change but I do not expect it to affect anything in real life. Fixes: 3087a6f36ee0 ("netrom: fix copying in user data in nr_setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07ax25: uninitialized variable in ax25_setsockopt()Dan Carpenter
The "opt" variable is unsigned long but we only copy 4 bytes from the user so the lower 4 bytes are uninitialized. I have changed the integer overflow checks from ULONG to UINT as well. This is a slight API change but I don't expect it to break anything. Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07Merge branch 'octeontx2-ptp-bugs'David S. Miller
Subbaraya Sundeep says: ==================== octeontx2: Fix PTP bugs This patchset addresses two problems found when using ptp. Patch 1 - Increases the refcount of ptp device before use which was missing and it lead to refcount increment after use bug when module is loaded and unloaded couple of times. Patch 2 - PTP resources allocated by VF are not being freed during VF teardown. This patch fixes that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07octeontx2-nicvf: Free VF PTP resources.Rakesh Babu Saladi
When a VF is removed respective PTP resources are not being freed currently. This patch fixes it. Fixes: 43510ef4ddad ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07octeontx2-af: Increment ptp refcount before useSubbaraya Sundeep
Before using the ptp pci device by AF driver increment the reference count of it. Fixes: a8b90c9d26d6 ("octeontx2-af: Add PTP device id for CN10K and 95O silcons") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07Merge branch 'mptcp-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: Fixes for buffer reclaim and option writing Here are three fixes dealing with a syzkaller crash MPTCP triggers in the memory manager in 5.16-rc8, and some option writing problems. Patches 1 and 2 fix some corner cases in MPTCP option writing. Patch 3 addresses a crash that syzkaller found a way to trigger in the mm subsystem by passing an invalid value to __sk_mem_reduce_allocated(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: Check reclaim amount before reducing allocationMat Martineau
syzbot found a page counter underflow that was triggered by MPTCP's reclaim code: page_counter underflow: -4294964789 nr_pages=4294967295 WARNING: CPU: 2 PID: 3785 at mm/page_counter.c:56 page_counter_cancel+0xcf/0xe0 mm/page_counter.c:56 Modules linked in: CPU: 2 PID: 3785 Comm: kworker/2:6 Not tainted 5.16.0-rc1-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:page_counter_cancel+0xcf/0xe0 mm/page_counter.c:56 Code: c7 04 24 00 00 00 00 45 31 f6 eb 97 e8 2a 2b b5 ff 4c 89 ea 48 89 ee 48 c7 c7 00 9e b8 89 c6 05 a0 c1 ba 0b 01 e8 95 e4 4b 07 <0f> 0b eb a8 4c 89 e7 e8 25 5a fb ff eb c7 0f 1f 00 41 56 41 55 49 RSP: 0018:ffffc90002d4f918 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffff88806a494120 RCX: 0000000000000000 RDX: ffff8880688c41c0 RSI: ffffffff815e8f28 RDI: fffff520005a9f15 RBP: ffffffff000009cb R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815e2cfe R11: 0000000000000000 R12: ffff88806a494120 R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2de21000 CR3: 000000005ad59000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> page_counter_uncharge+0x2e/0x60 mm/page_counter.c:160 drain_stock+0xc1/0x180 mm/memcontrol.c:2219 refill_stock+0x139/0x2f0 mm/memcontrol.c:2271 __sk_mem_reduce_allocated+0x24d/0x550 net/core/sock.c:2945 __mptcp_rmem_reclaim net/mptcp/protocol.c:167 [inline] __mptcp_mem_reclaim_partial+0x124/0x410 net/mptcp/protocol.c:975 mptcp_mem_reclaim_partial net/mptcp/protocol.c:982 [inline] mptcp_alloc_tx_skb net/mptcp/protocol.c:1212 [inline] mptcp_sendmsg_frag+0x18c6/0x2190 net/mptcp/protocol.c:1279 __mptcp_push_pending+0x232/0x720 net/mptcp/protocol.c:1545 mptcp_release_cb+0xfe/0x200 net/mptcp/protocol.c:2975 release_sock+0xb4/0x1b0 net/core/sock.c:3306 mptcp_worker+0x51e/0xc10 net/mptcp/protocol.c:2443 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> __mptcp_mem_reclaim_partial() could call __mptcp_rmem_reclaim() with a negative value, which passed that negative value to __sk_mem_reduce_allocated() and triggered the splat above. Check for a reclaim amount that is positive and large enough for __mptcp_rmem_reclaim() to actually adjust rmem_fwd_alloc (much like the sk_mem_reclaim_partial() code the function is based on). v2: Use '>' instead of '>=', since SK_MEM_QUANTUM - 1 would get right-shifted into nothing by __mptcp_rmem_reclaim. Fixes: 6511882cdd82 ("mptcp: allocate fwd memory separately on the rx and tx path") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/252 Reported-and-tested-by: syzbot+bc9e2d2dbcb347dd215a@syzkaller.appspotmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: fix a DSS option writing errorGeliang Tang
'ptr += 1;' was omitted in the original code. If the DSS is the last option -- which is what we have most of the time -- that's not an issue. But it is if we need to send something else after like a RM_ADDR or an MP_PRIO. Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: fix opt size when sending DSS + MP_FAILMatthieu Baerts
When these two options had to be sent -- which is not common -- the DSS size was not being taken into account in the remaining size. Additionally in this situation, the reported size was only the one of the MP_FAIL which can cause issue if at the end, we need to write more in the TCP options than previously said. Here we use a dedicated variable for MP_FAIL size to keep the WARN_ON_ONCE() just after. Fixes: c25aeb4e0953 ("mptcp: MP_FAIL suboption sending") Acked-and-tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07Merge branch 'mptcp-next'David S. Miller
Mat Martineau says: ==================== mptcp: New features and cleanup These patches have been tested in the MPTCP tree for a longer than usual time (thanks to holiday schedules), and are ready for the net-next branch. Changes include feature updates, small fixes, refactoring, and some selftest changes. Patch 1 fixes an OUTQ ioctl issue with TCP fallback sockets. Patches 2, 3, and 6 add support of the MPTCP fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP), and a related self test. Patch 4 cleans up some accept and poll code that is no longer needed after the fastclose changes. Patch 5 add userspace disconnect using AF_UNSPEC, which is used when testing fastclose and makes the MPTCP socket's handling of AF_UNSPEC in connect() more TCP-like. Patches 7-11 refactor subflow creation to make better use of multiple local endpoints and to better handle individual connection failures when creating multiple subflows. Includes self test updates. Patch 12 cleans up the way subflows are added to the MPTCP connection list, eliminating the need for calls throughout the MPTCP code that had to check the intermediate "join list" for entries to shift over to the main "connection list". Patch 13 refactors the MPTCP release_cb flags to use separate storage for values only accessed with the socket lock held (no atomic ops needed), and for values that need atomic operations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: avoid atomic bit manipulation when possiblePaolo Abeni
Currently the msk->flags bitmask carries both state for the mptcp_release_cb() - mostly touched under the mptcp data lock - and others state info touched even outside such lock scope. As a consequence, msk->flags is always manipulated with atomic operations. This change splits such bitmask in two separate fields, so that we use plain bit operations when touching the cb-related info. The MPTCP_PUSH_PENDING bit needs additional care, as it is the only CB related field currently accessed either under the mptcp data lock or the mptcp socket lock. Let's add another mask just for such bit's sake. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: cleanup MPJ subflow list handlingPaolo Abeni
We can simplify the join list handling leveraging the mptcp_release_cb(): if we can acquire the msk socket lock at mptcp_finish_join time, move the new subflow directly into the conn_list, otherwise place it on join_list and let the release_cb process such list. Since pending MPJ connection are now always processed in a timely way, we can avoid flushing the join list every time we have to process all the current subflows. Additionally we can now use the mptcp data lock to protect the join_list, removing the additional spin lock. Finally, the MPJ handshake is now always finalized under the msk socket lock, we can drop the additional synchronization between mptcp_finish_join() and mptcp_close(). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07selftests: mptcp: add tests for subflow creation failurePaolo Abeni
Verify that, when multiple endpoints are available, subflows creation proceed even when the first additional subflow creation fails - due to packet drop on the relevant link Co-developed-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: do not block subflows creation on errorsPaolo Abeni
If the MPTCP configuration allows for multiple subflows creation, and the first additional subflows never reach the fully established status - e.g. due to packets drop or reset - the in kernel path manager do not move to the next subflow. This patch introduces a new PM helper to cope with MPJ subflow creation failure and delay and hook it where appropriate. Such helper triggers additional subflow creation, as needed and updates the PM subflow counter, if the current one is closing. Additionally start all the needed additional subflows as soon as the MPTCP socket is fully established, so we don't have to cope with slow MPJ handshake blocking the next subflow creation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: keep track of local endpoint still available for each mskPaolo Abeni
Include into the path manager status a bitmap tracking the list of local endpoints still available - not yet used - for the relevant mptcp socket. Keep such map updated at endpoint creation/deletion time, so that we can easily skip already used endpoint at local address selection time. The endpoint used by the initial subflow is lazyly accounted at subflow creation time: the usage bitmap is be up2date before endpoint selection and we avoid such unneeded task in some relevant scenarios - e.g. busy servers accepting incoming subflows but not creating any additional ones nor annuncing additional addresses. Overall this allows for fair local endpoints usage in case of subflow failure. As a side effect, this patch also enforces that each endpoint is used at most once for each mptcp connection. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: clean-up MPJ option writingPaolo Abeni
Check for all MPJ variant at once, this reduces the number of conditionals traversed on average and will simplify the next patch. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07mptcp: fix per socket endpoint accountingPaolo Abeni
Since full-mesh endpoint support, the reception of a single ADD_ADDR option can cause multiple subflows creation. When such option is accepted we increment 'add_addr_accepted' by one. When we received a paired RM_ADDR option, we deleted all the relevant subflows, decrementing 'add_addr_accepted' by one for each of them. We have a similar issue for 'local_addr_used' Fix them moving the pm endpoint accounting outside the subflow traversal. Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>