linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-20	perf vendor events: Add knightslanding counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-22-irogers@google.com
2024-06-20	perf vendor events: Update jaketown metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-21-irogers@google.com
2024-06-20	perf vendor events: Update ivytown metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-20-irogers@google.com
2024-06-20	perf vendor events: Update ivybridge metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-19-irogers@google.com
2024-06-20	perf vendor events: Add/update icelakex events/metrics	Ian Rogers
	Update events from v1.24 to v1.26. Add TMA metrics v4.8. Bring in the event updates v1.26: https://github.com/intel/perfmon/commit/c607c739e05f2569f95998cc98e1283f042b4fd1 v1.25: https://github.com/intel/perfmon/commit/42d996769069921ec06f6fbb600b0c663b9ec5a9 The TMA 4.8 information was added in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-18-irogers@google.com
2024-06-20	perf vendor events: Add/update icelake events/metrics	Ian Rogers
	Update events from v1.21 to v1.22. Add TMA metrics v4.8. Bring in the event updates v1.22: https://github.com/intel/perfmon/commit/e5640646e96d59e3c1c1e0d0100a475220ff1dfe The TMA 4.8 information was added in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-17-irogers@google.com
2024-06-20	perf vendor events: Update haswellx metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-16-irogers@google.com
2024-06-20	perf vendor events: Add haswell counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-15-irogers@google.com
2024-06-20	perf vendor events: Update graniterapids events and add counter information	Ian Rogers
	Update events from v1.01 to v1.02. Bring in the event updates v1.02: https://github.com/intel/perfmon/commit/0ff9f681bd07d0e84026c52f4941d21b1cd4c171 Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ There are over 1000 new events. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-14-irogers@google.com
2024-06-20	perf vendor events: Update/add grandridge events/metrics	Ian Rogers
	Update events from v1.02 to v1.03. Add TMA metrics v4.8. Bring in the event updates v1.03: https://github.com/intel/perfmon/commit/5ec7a252d0f6ec461f80cc397c9ac25abcd9184f The TMA 4.8 information was added in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 New events are: FP_INST_RETIRED.128B_DP, FP_INST_RETIRED.128B_SP, FP_INST_RETIRED.256B_DP, FP_INST_RETIRED.32B_SP, FP_INST_RETIRED.64B_DP, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD, OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM, OCR.STREAMING_WR.ANY_RESPONSE. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-13-irogers@google.com
2024-06-20	perf vendor events: Add goldmontplus counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-12-irogers@google.com
2024-06-20	perf vendor events: Add goldmont counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-11-irogers@google.com
2024-06-20	perf vendor events: Add/update emeraldrapids events/metrics	Ian Rogers
	Update events from v1.06 to v1.09. Add TMA metrics v4.8. Bring in the event updates v1.09: https://github.com/intel/perfmon/commit/3fd5892bb4aece9c1e5c17630570d0462838e85d v1.08: https://github.com/intel/perfmon/commit/54525c4508f4a1ce4a8b854aa808a4ee2fb5930b The TMA 4.8 information was added in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SW_PREFETCH_ACCESS.ANY, UNC_IIO_BANDWIDTH_OUT.PART[0-7]_FREERUN, UOPS_ISSUED.CYCLES. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-10-irogers@google.com
2024-06-20	perf vendor events: Update elkhartlake events	Ian Rogers
	Update events from v1.04 to v1.05. Bring in event updates from: https://github.com/intel/perfmon/commit/fb91e1851ca40a5b443e2c3cd79bc7fc34c8237e The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-9-irogers@google.com
2024-06-20	perf vendor events: Update cascadelakex events/metrics	Ian Rogers
	Update events from v1.21 to v1.22. Bring in the event updates v1.22 https://github.com/intel/perfmon/commit/013877729c4ed96427932ca48722bc3bfd2a0075 The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 New events are: SW_PREFETCH_ACCESS.ANY Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-8-irogers@google.com
2024-06-20	perf vendor events: Update broadwellx metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-7-irogers@google.com
2024-06-20	perf vendor events: Update broadwellde metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-6-irogers@google.com
2024-06-20	perf vendor events: Update broadwell metrics add event counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. The TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-5-irogers@google.com
2024-06-20	perf vendor events: Add bonnell counter information	Ian Rogers
	Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1 and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-4-irogers@google.com
2024-06-20	perf vendor events: Update alderlaken events/metrics	Ian Rogers
	Update events from v1.24 to v1.27. Update e-core TMA metrics to v3.6. Bring in the event updates v1.27: https://github.com/intel/perfmon/commit/ea4f309a04c50ca77a00da2db130fd7cf06db978 v1.26: https://github.com/intel/perfmon/commit/0052e68d24d9873d5ff22363677794fa3eb05313 The e-core TMA 3.6 information was updated in: https://github.com/intel/perfmon/commit/d9c2faa70bafe03129dc10f9fe414ef03a95acd9 New events are: MEM_UOPS_RETIRED.LOCK_LOADS, SERIALIZATION.C01_MS_SCB, UOPS_ISSUED.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-3-irogers@google.com
2024-06-20	perf vendor events: Update alderlake events/metrics	Ian Rogers
	Update events from v1.24 to v1.27. Update p-core TMA metrics from v4.7 to v4.8, and the e-core TMA metrics to v3.6. Bring in the event updates v1.27: https://github.com/intel/perfmon/commit/ea4f309a04c50ca77a00da2db130fd7cf06db978 v1.26: https://github.com/intel/perfmon/commit/0052e68d24d9873d5ff22363677794fa3eb05313 The p-core TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 And e-core in: https://github.com/intel/perfmon/commit/d9c2faa70bafe03129dc10f9fe414ef03a95acd9 New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, MEM_UOPS_RETIRED.LOCK_LOADS, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SERIALIZATION.C01_MS_SCB, SW_PREFETCH_ACCESS.ANY, UOPS_ISSUED.ANY, UOPS_ISSUED.CYCLES Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-2-irogers@google.com
2024-06-20	perf doc: Add AMD IBS usage document	Ravi Bangoria
	Add a perf man page document that describes how to exploit AMD IBS with Linux perf. Brief intro about IBS and simple one-liner examples will help naive users to get started. This is not meant to be an exhaustive IBS guide. User should refer latest AMD64 Architecture Programmer's Manual for detailed description of IBS. Usage: $ man perf-amd-ibs Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620054104.815-1-ravi.bangoria@amd.com
2024-06-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c 1e7962114c10 ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error") 165f87691a89 ("bnxt_en: add timestamping statistics support") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-20	Merge tag 'net-6.10-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, bpf and netfilter. Happy summer solstice! The line count is a bit inflated by a selftest and update to a driver's FW interface header, in reality this is slightly below average for us. We are expecting one driver fix from Intel, but there are no big known issues. Current release - regressions: - ipv6: bring NLM_DONE out to a separate recv() again Current release - new code bugs: - wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API Previous releases - regressions: - wifi: mac80211: fix monitor channel setting with chanctx emulation (probably most awaited of the fixes in this PR, tracked by Thorsten) - usb: ax88179_178a: bring back reset on init, if PHY is disconnected - bpf: fix UML x86_64 compile failure with BPF - bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit with malicious BPF - eth: mvpp2: use slab_build_skb() for packets in slab, driver was missed during API refactoring - wifi: iwlwifi: add missing unlock of mvm mutex Previous releases - always broken: - ipv6: add a number of missing null-checks for in6_dev_get(), in case IPv6 disabling races with the datapath - bpf: fix reg_set_min_max corruption of fake_reg - sched: act_ct: add netns as part of the key of tcf_ct_flow_table" * tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings selftests: virtio_net: add forgotten config options bnxt_en: Restore PTP tx_avail count in case of skb_pad() error bnxt_en: Set TSO max segs on devices with limits bnxt_en: Update firmware interface to 1.10.3.44 net: stmmac: Assign configured channel value to EXTTS event net: do not leave a dangling sk pointer, when socket creation fails net/tcp_ao: Don't leak ao_info on error-path ice: Fix VSI list rule with ICE_SW_LKUP_LAST type ipv6: bring NLM_DONE out to a separate recv() again selftests: add selftest for the SRv6 End.DX6 behavior with netfilter selftests: add selftest for the SRv6 End.DX4 behavior with netfilter netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors netfilter: ipset: Fix suspicious rcu_dereference_protected() selftests: openvswitch: Set value to nla flags. octeontx2-pf: Fix linking objects into multiple modules octeontx2-pf: Add error handling to VLAN unoffload handling virtio_net: fixing XDP for fully checksummed packets handling virtio_net: checksum offloading handling fix ...
2024-06-20	kselftest: devices: Add of-fullname-regex property	Nícolas F. R. A. Prado
	Introduce a new 'of-fullname-regex' property that takes a regular expression and matches against the OF_FULLNAME property. It allows matching controllers that don't have a unique DT address across sibling controllers, and thus dt-mmio can't be used. One particular example of where this is needed is on MT8195 which has multiple USB controllers described by two level deep nodes and using the ranges property: ssusb2: usb@112a1000 { reg = <0 0x112a1000 0 0x2dff>, <0 0x112a3e00 0 0x0100>; ranges = <0 0 0 0x112a0000 0 0x3f00>; xhci2: usb@0 { Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20240613-kselftest-discoverable-probe-mt8195-kci-v1-2-7b396a9b032d@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-20	kselftest: devices: Allow specifying boards directory through parameter	Nícolas F. R. A. Prado
	Add support for a --boards-dir parameter through which the directory in which the board files will be searched for can be specified. The 'boards' subdirectory is still used as default when the parameter is not specified. This allows more easily running the test with board files supplied by an external repository like https://github.com/kernelci/platform-test-parameters. Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20240613-kselftest-discoverable-probe-mt8195-kci-v1-1-7b396a9b032d@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-20	KVM: selftests: arm64: Test writes to CTR_EL0	Sebastian Ott
	Test that CTR_EL0 is modifiable from userspace, that changes are visible to guests, and that they are preserved across a vCPU reset. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20	tools/perf: Handle perftool-testsuite_probe testcases fail when kernel ↵	Athira Rajeev
	debuginfo is not present Running "perftool-testsuite_probe" fails as below: ./perf test -v "perftool-testsuite_probe" 83: perftool-testsuite_probe : FAILED There are three fails: 1. Regexp not found: "\sprobe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf probe -l (output regexp parsing) 2. Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: wildcard adding support (command exitcode + output regexp parsing) 3. Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." These three tests depends on kernel debug info. 1. Fail 1 expects file name along with probe which needs debuginfo 2. Fail 2 : perf probe -nf --max-probes=512 -a 'vfs_ $params' Debuginfo-analysis is not supported. Error: Failed to add events. 3. Fail 3 : perf probe 'vfs_read somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64' Debuginfo-analysis is not supported. Error: Failed to add events. There is already helper function skip_if_no_debuginfo in lib/probe_vfs_getname.sh which does perf probe and returns "2" if debug info is not present. Use the skip_if_no_debuginfo function and skip only the three tests which needs debuginfo based on the result. With the patch: 83: perftool-testsuite_probe: --- start --- test child forked, pid 3927 -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -a -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: --add -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf list Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: using added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: deleting added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing removed probe (should NOT be listed) -- [ PASS ] -- perf_probe :: test_adding_kernel :: dry run :: adding probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: first probe adding -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (without force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: using doubled probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: removing multiple probes Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: add -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: record -- [ PASS ] -- perf_probe :: test_adding_kernel :: function argument probing :: script ## [ PASS ] ## perf_probe :: test_adding_kernel SUMMARY ---- end(0) ---- 83: perftool-testsuite_probe : Ok Only the three specific tests are skipped and remaining ran successfully. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240617122121.7484-1-atrajeev@linux.vnet.ibm.com
2024-06-20	cpupower: Change the var type of the 'monitor' subcommand display mode	Roman Storozhenko
	There is a type 'enum operation_mode_e' contains the display modes of the 'monitor' subcommand. This type isn't used though, instead the variable 'mode' is of a simple 'int' type. Change 'mode' variable type from 'int' to 'enum operation_mode_e' in order to improve compiler type checking. Built and tested this with different monitor cmdline params. Everything works as expected, that is nothing changed and no regressions encountered. Signed-off-by: Roman Storozhenko <romeusmeister@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-20	cpupower: Remove absent 'v' parameter from monitor man page	Roman Storozhenko
	Remove not supported '-v' parameter from the cpupower's 'monitor' command description. There is a '-v' parameter described in cpupower's 'monitor' command man page. It isn't supported at the moment, and perhaps has never been supported. When I run the monitor with this parameter I get the following: $ sudo LD_LIBRARY_PATH=lib64/ bin/cpupower monitor -v monitor: invalid option -- 'v' invalid or unknown argument $ sudo LD_LIBRARY_PATH=lib64/ bin/cpupower monitor -V monitor: invalid option -- 'V' invalid or unknown argument Signed-off-by: Roman Storozhenko <romeusmeister@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-20	selftests: virtio_net: add forgotten config options	Jiri Pirko
	One may use tools/testing/selftests/drivers/net/virtio_net/config for example for vng build command like this one: $ vng -v -b -f tools/testing/selftests/drivers/net/virtio_net/config In that case, the needed kernel config options are not turned on. Add the missed kernel config options. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240617072614.75fe79e7@kernel.org/ Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/netdev/1a63f209-b1d4-4809-bc30-295a5cafa296@kernel.org/ Fixes: ccfaed04db5e ("selftests: virtio_net: add initial tests") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240619061748.1869404-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19	binfmt_elf: Honor PT_LOAD alignment for static PIE	Kees Cook
	The p_align values in PT_LOAD were ignored for static PIE executables (i.e. ET_DYN without PT_INTERP). This is because there is no way to request a non-fixed mmap region with a specific alignment. ET_DYN with PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf performs the ASLR itself, which means it can also apply alignment. For the mmap region, the address selection happens deep within the vm_mmap() implementation (when the requested address is 0). The earlier attempt to implement this: commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") commit 925346c129da ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") did not take into account the different base address origins, and were eventually reverted: aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") In order to get the correct alignment from an mmap base, binfmt_elf must perform a 0-address load first, then tear down the mapping and perform alignment on the resulting address. Since this is slightly more overhead, only do this when it is needed (i.e. the alignment is not the default ELF alignment). This does, however, have the benefit of being able to use MAP_FIXED_NOREPLACE, to avoid potential collisions. With this fixed, enable the static PIE self tests again. Reported-by: H.J. Lu <hjl.tools@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Link: https://lore.kernel.org/r/20240508173149.677910-3-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-19	selftests/exec: Build both static and non-static load_address tests	Kees Cook
	After commit 4d1cd3b2c5c1 ("tools/testing/selftests/exec: fix link error"), the load address alignment tests tried to build statically. This was silently ignored in some cases. However, after attempting to further fix the build by switching to "-static-pie", the test started failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static PIE") not doing alignment correctly, which remains unfixed[1]. See commit aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") for more details. Provide rules to build both static and non-static PIE binaries, improve debug reporting, and perform several test steps instead of a single all-or-nothing test. However, do not actually enable static-pie tests; alignment specification is only supported for ET_DYN with PT_INTERP ("regular PIE"). Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1] Link: https://lore.kernel.org/r/20240508173149.677910-1-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-19	selftest/cgroup: Update test_cpuset_prs.sh to match changes	Waiman Long
	Unlike the list of isolated CPUs, it is not easy to programamatically determine what sched domains are being created by the scheduler just by examinng the data in various kernfs filesystems. The easiest way to get this information is by enabling /sys/kernel/debug/sched/verbose file to make those information displayed in the console. This is also what the test_cpuset_prs.sh script is doing when the -v flag is given. It is rather hard to fetch the data from the console and compare it to the expected result. An easier way is to dump the expected sched-domain information out to the console so that they can be visually compared with the actual sched domain data. However, this have to be done manually by visual inspection and so will only be done once in a while. Moreover the preceding cpuset commits also change the cpuset behavior requiring corresponding chanages in some test cases as well as new test cases to test the newly added functionality. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19	selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot	Waiman Long
	The test robot reported two different problems when running the test_cpuset_prs.sh test. # ./test_cpuset_prs.sh: line 106: echo: write error: Input/output error # : # Effective cpus changed to 0-1,4-7 after test 4! The write error is caused by writing to /dev/console. It looks like some systems may not have /dev/console configured or in a writeable state. Fix this by checking the existence of /dev/console before attempting to write it. After the completion of each test run, the script will check if the cpuset state is reset back to the original state. That usually takes a while to happen. The test script inserts some artificial delay to make sure that the reset has completed. The current setting is about 80ms. That may not be enough in some cases especially if the test system is slow. Double it to 160ms to minimize the chance of this type of failure. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406141712.dbbaa8fd-oliver.sang@intel.com Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-19	selftests: add selftest for the SRv6 End.DX6 behavior with netfilter	Jianguo Wu
	this selftest is designed for evaluating the SRv6 End.DX6 behavior used with netfilter(rpfilter), in this example, for implementing IPv6 L3 VPN use cases. Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19	selftests: add selftest for the SRv6 End.DX4 behavior with netfilter	Jianguo Wu
	this selftest is designed for evaluating the SRv6 End.DX4 behavior used with netfilter(rpfilter), in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19	selftests: openvswitch: Set value to nla flags.	Adrian Moreno
	Netlink flags, although they don't have payload at the netlink level, are represented as having "True" as value in pyroute2. Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) fails with the following traceback: Traceback (most recent call last): File "[...]/ovs-dpctl.py", line 2498, in <module> sys.exit(main(sys.argv)) ^^^^^^^^^^^^^^ File "[...]/ovs-dpctl.py", line 2487, in main ovsflow.add_flow(rep["dpifindex"], flow) File "[...]/ovs-dpctl.py", line 2136, in add_flow reply = self.nlm_request( ^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request return tuple(self._genlm_request(argv, kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in nlm_request return tuple(super().nlm_request(argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put self.sendto_gate(msg, addr) File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate msg.encode() File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode offset = self.encode_nlas(offset) ^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas nla_instance.setvalue(cell[1]) File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue nlv.setvalue(nla_tuple[1]) ~~~~~~~~~^^^ IndexError: list index out of range Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-18	Merge tag 'linux_kselftest-fixes-6.10-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - filesystems: warn_unused_result warnings - seccomp: format-zero-length warnings - fchmodat2: clang build warnings due to-static-libasan - openat2: clang build warnings due to static-libasan, LOCAL_HDRS * tag 'linux_kselftest-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/fchmodat2: fix clang build failure due to -static-libasan selftests/openat2: fix clang build failures: -static-libasan, LOCAL_HDRS selftests: seccomp: fix format-zero-length warnings selftests: filesystems: fix warn_unused_result build warnings
2024-06-18	selftests: openvswitch: Use bash as interpreter	Simon Horman
	openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to obtain the first character of $ns. Empirically, this is works with bash but not dash. When run with dash these evaluate to an empty string and printing an error to stdout. # dash -c 'ns=client; echo "${ns:0:1}"' 2>error # cat error dash: 1: Bad substitution # bash -c 'ns=client; echo "${ns:0:1}"' 2>error c # cat error This leads to tests that neither pass nor fail. F.e. TEST: arp_ping [START] adding sandbox 'test_arp_ping' Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , } create namespaces ./openvswitch.sh: 282: eval: Bad substitution TEST: ct_connect_v4 [START] adding sandbox 'test_ct_connect_v4' Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , } ./openvswitch.sh: 322: eval: Bad substitution create namespaces Resolve this by making openvswitch.sh a bash script. Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming case") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240617-ovs-selftest-bash-v1-1-7ae6ccd3617b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18	sched_ext: Add selftests	David Vernet
	Add basic selftests. Signed-off-by: David Vernet <dvernet@meta.com> Acked-by: Tejun Heo <tj@kernel.org>
2024-06-18	sched_ext: Documentation: scheduler: Document extensible scheduler class	Tejun Heo
	Add Documentation/scheduler/sched-ext.rst which gives a high-level overview and pointers to the examples. v6: - Add paragraph explaining debug dump. v5: - Updated to reflect /sys/kernel interface change. Kconfig options added. v4: - README improved, reformatted in markdown and renamed to README.md. v3: - Added tools/sched_ext/README. - Dropped _example prefix from scheduler names. v2: - Apply minor edits suggested by Bagas. Caveats section dropped as all of them are addressed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com>
2024-06-18	sched_ext: Add vtime-ordered priority queue to dispatch_q's	Tejun Heo
	Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. A DSQ can either be a FIFO or priority queue and automatically switches between the two depending on whether scx_bpf_dispatch() or scx_bpf_dispatch_vtime() is used. Using the wrong variant while the DSQ already has the other type queued is not allowed and triggers an ops error. Built-in DSQs must always be FIFOs. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v4: - As allowing mixing priority queue and FIFO on the same DSQ sometimes led to unexpected starvations, DSQs now error out if both modes are used at the same time and the built-in DSQs are no longer allowed to be priority queues. - Explicit type struct scx_dsq_node added to contain fields needed to be linked on DSQs. This will be used to implement stateful iterator. - Tasks are now always linked on dsq->list whether the DSQ is in FIFO or PRIQ mode. This confines PRIQ related complexities to the enqueue and dequeue paths. Other paths only need to look at dsq->list. This will also ease implementing BPF iterator. - Print p->scx.dsq_flags in debug dump. v3: - SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. - Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: - p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com>
2024-06-18	sched_ext: Implement core-sched support	Tejun Heo
	The core-sched support is composed of the following parts: - task_struct->scx.core_sched_at is added. This is a timestamp which can be used to order tasks. Depending on whether the BPF scheduler implements custom ordering, it tracks either global FIFO ordering of all tasks or local-DSQ ordering within the dispatched tasks on a CPU. - prio_less() is updated to call scx_prio_less() when comparing SCX tasks. scx_prio_less() calls ops.core_sched_before() if available or uses the core_sched_at timestamp. For global FIFO ordering, the BPF scheduler doesn't need to do anything. Otherwise, it should implement ops.core_sched_before() which reflects the ordering. - When core-sched is enabled, balance_scx() balances all SMT siblings so that they all have tasks dispatched if necessary before pick_task_scx() is called. pick_task_scx() picks between the current task and the first dispatched task on the local DSQ based on availability and the core_sched_at timestamps. Note that FIFO ordering is expected among the already dispatched tasks whether running or on the local DSQ, so this path always compares core_sched_at instead of calling into ops.core_sched_before(). qmap_core_sched_before() is added to scx_qmap. It scales the distances from the heads of the queues to compare the tasks across different priority queues and seems to behave as expected. v3: Fixed build error when !CONFIG_SCHED_SMT reported by Andrea Righi. v2: Sched core added the const qualifiers to prio_less task arguments. Explicitly drop them for ops.core_sched_before() task arguments. BPF enforces access control through the verifier, so the qualifier isn't actually operative and only gets in the way when interacting with various helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Reviewed-by: Josh Don <joshdon@google.com> Cc: Andrea Righi <andrea.righi@canonical.com>
2024-06-18	sched_ext: Implement sched_ext_ops.cpu_online/offline()	Tejun Heo
	Add ops.cpu_online/offline() which are invoked when CPUs come online and offline respectively. As the enqueue path already automatically bypasses tasks to the local dsq on a deactivated CPU, BPF schedulers are guaranteed to see tasks only on CPUs which are between online() and offline(). If the BPF scheduler doesn't implement ops.cpu_online/offline(), the scheduler is automatically exited with SCX_ECODE_RESTART \| SCX_ECODE_RSN_HOTPLUG. Userspace can implement CPU hotpplug support trivially by simply reinitializing and reloading the scheduler. scx_qmap is updated to print out online CPUs on hotplug events. Other schedulers are updated to restart based on ecode. v3: - The previous implementation added @reason to sched_class.rq_on/offline() to distinguish between CPU hotplug events and topology updates. This was buggy and fragile as the methods are skipped if the current state equals the target state. Instead, add scx_rq_[de]activate() which are directly called from sched_cpu_de/activate(). This also allows ops.cpu_on/offline() to sleep which can be useful. - ops.dispatch() could be called on a CPU that the BPF scheduler was told to be offline. The dispatch patch is updated to bypass in such cases. v2: - To accommodate lock ordering change between scx_cgroup_rwsem and cpus_read_lock(), CPU hotplug operations are put into its own SCX_OPI block and enabled eariler during scx_ope_enable() so that cpus_read_lock() can be dropped before acquiring scx_cgroup_rwsem. - Auto exit with ECODE added. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2024-06-18	sched_ext: Implement sched_ext_ops.cpu_acquire/release()	David Vernet
	Scheduler classes are strictly ordered and when a higher priority class has tasks to run, the lower priority ones lose access to the CPU. Being able to monitor and act on these events are necessary for use cases includling strict core-scheduling and latency management. This patch adds two operations ops.cpu_acquire() and .cpu_release(). The former is invoked when a CPU becomes available to the BPF scheduler and the opposite for the latter. This patch also implements scx_bpf_reenqueue_local() which can be called from .cpu_release() to trigger requeueing of all tasks in the local dsq of the CPU so that the tasks can be reassigned to other available CPUs. scx_pair is updated to use .cpu_acquire/release() along with %SCX_KICK_WAIT to make the pair scheduling guarantee strict even when a CPU is preempted by a higher priority scheduler class. scx_qmap is updated to use .cpu_acquire/release() to empty the local dsq of a preempted CPU. A similar approach can be adopted by BPF schedulers that want to have a tight control over latency. v4: Use the new SCX_KICK_IDLE to wake up a CPU after re-enqueueing. v3: Drop the const qualifier from scx_cpu_release_args.task. BPF enforces access control through the verifier, so the qualifier isn't actually operative and only gets in the way when interacting with various helpers. v2: Add p->scx.kf_mask annotation to allow calling scx_bpf_reenqueue_local() from ops.cpu_release() nested inside ops.init() and other sleepable operations. Signed-off-by: David Vernet <dvernet@meta.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2024-06-18	sched_ext: Implement tickless support	Tejun Heo
	Allow BPF schedulers to indicate tickless operation by setting p->scx.slice to SCX_SLICE_INF. A CPU whose current task has infinte slice goes into tickless operation. scx_central is updated to use tickless operations for all tasks and instead use a BPF timer to expire slices. This also uses the SCX_ENQ_PREEMPT and task state tracking added by the previous patches. Currently, there is no way to pin the timer on the central CPU, so it may end up on one of the worker CPUs; however, outside of that, the worker CPUs can go tickless both while running sched_ext tasks and idling. With schbench running, scx_central shows: root@test ~# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts LOC: 142024 656 664 449 Local timer interrupts LOC: 161663 663 665 449 Local timer interrupts Without it: root@test ~ [SIGINT]# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts LOC: 188778 3142 3793 3993 Local timer interrupts LOC: 198993 5314 6323 6438 Local timer interrupts While scx_central itself is too barebone to be useful as a production scheduler, a more featureful central scheduler can be built using the same approach. Google's experience shows that such an approach can have significant benefits for certain applications such as VM hosting. v4: Allow operation even if BPF_F_TIMER_CPU_PIN is not available. v3: Pin the central scheduler's timer on the central_cpu using BPF_F_TIMER_CPU_PIN. v2: Convert to BPF inline iterators. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2024-06-18	sched_ext: Make watchdog handle ops.dispatch() looping stall	Tejun Heo
	The dispatch path retries if the local DSQ is still empty after ops.dispatch() either dispatched or consumed a task. This is both out of necessity and for convenience. It has to retry because the dispatch path might lose the tasks to dequeue while the rq lock is released while trying to migrate tasks across CPUs, and the retry mechanism makes ops.dispatch() implementation easier as it only needs to make some forward progress each iteration. However, this makes it possible for ops.dispatch() to stall CPUs by repeatedly dispatching ineligible tasks. If all CPUs are stalled that way, the watchdog or sysrq handler can't run and the system can't be saved. Let's address the issue by breaking out of the dispatch loop after 32 iterations. It is unlikely but not impossible for ops.dispatch() to legitimately go over the iteration limit. We want to come back to the dispatch path in such cases as not doing so risks stalling the CPU by idling with runnable tasks pending. As the previous task is still current in balance_scx(), resched_curr() doesn't do anything - it will just get cleared. Let's instead use scx_kick_bpf() which will trigger reschedule after switching to the next task which will likely be the idle task. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com>
2024-06-18	sched_ext: Add a central scheduler which makes all scheduling decisions on ↵	Tejun Heo
	one CPU This patch adds a new example scheduler, scx_central, which demonstrates central scheduling where one CPU is responsible for making all scheduling decisions in the system using scx_bpf_kick_cpu(). The central CPU makes scheduling decisions for all CPUs in the system, queues tasks on the appropriate local dsq's and preempts the worker CPUs. The worker CPUs in turn preempt the central CPU when it needs tasks to run. Currently, every CPU depends on its own tick to expire the current task. A follow-up patch implementing tickless support for sched_ext will allow the worker CPUs to go full tickless so that they can run completely undisturbed. v3: - Kumar fixed a bug where the dispatch path could overflow the dispatch buffer if too many are dispatched to the fallback DSQ. - Use the new SCX_KICK_IDLE to wake up non-central CPUs. - Dropped '-p' option. v2: - Use RESIZABLE_ARRAY() instead of fixed MAX_CPUS and use SCX_BUG[_ON]() to simplify error handling. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Julia Lawall <julia.lawall@inria.fr>
2024-06-18	sched_ext: Implement scx_bpf_kick_cpu() and task preemption support	Tejun Heo
	It's often useful to wake up and/or trigger reschedule on other CPUs. This patch adds scx_bpf_kick_cpu() kfunc helper that BPF scheduler can call to kick the target CPU into the scheduling path. As a sched_ext task relinquishes its CPU only after its slice is depleted, this patch also adds SCX_KICK_PREEMPT and SCX_ENQ_PREEMPT which clears the slice of the target CPU's current task to guarantee that sched_ext's scheduling path runs on the CPU. If SCX_KICK_IDLE is specified, the target CPU is kicked iff the CPU is idle to guarantee that the target CPU will go through at least one full sched_ext scheduling cycle after the kicking. This can be used to wake up idle CPUs without incurring unnecessary overhead if it isn't currently idle. As a demonstration of how backward compatibility can be supported using BPF CO-RE, tools/sched_ext/include/scx/compat.bpf.h is added. It provides __COMPAT_scx_bpf_kick_cpu_IDLE() which uses SCX_KICK_IDLE if available or becomes a regular kicking otherwise. This allows schedulers to use the new SCX_KICK_IDLE while maintaining support for older kernels. The plan is to temporarily use compat helpers to ease API updates and drop them after a few kernel releases. v5: - SCX_KICK_IDLE added. Note that this also adds a compat mechanism for schedulers so that they can support kernels without SCX_KICK_IDLE. This is useful as a demonstration of how new feature flags can be added in a backward compatible way. - kick_cpus_irq_workfn() reimplemented so that it touches the pending cpumasks only as necessary to reduce kicking overhead on machines with a lot of CPUs. - tools/sched_ext/include/scx/compat.bpf.h added. v4: - Move example scheduler to its own patch. v3: - Make scx_example_central switch all tasks by default. - Convert to BPF inline iterators. v2: - Julia Lawall reported that scx_example_central can overflow the dispatch buffer and malfunction. As scheduling for other CPUs can't be handled by the automatic retry mechanism, fix by implementing an explicit overflow and retry handling. - Updated to use generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>