summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-11ntp, rtc: Move rtc_set_ntp_time() to ntp codeThomas Gleixner
rtc_set_ntp_time() is not really RTC functionality as the code is just a user of RTC. Move it into the NTP code which allows further cleanups. Requested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.166871172@linutronix.de
2020-12-11ntp: Make the RTC synchronization more reliableThomas Gleixner
Miroslav reported that the periodic RTC synchronization in the NTP code fails more often than not to hit the specified update window. The reason is that the code uses delayed_work to schedule the update which needs to be in thread context as the underlying RTC might be connected via a slow bus, e.g. I2C. In the update function it verifies whether the current time is correct vs. the requirements of the underlying RTC. But delayed_work is using the timer wheel for scheduling which is inaccurate by design. Depending on the distance to the expiry the wheel gets less granular to allow batching and to avoid the cascading of the original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading wheel") and the code for further details. The code already deals with this by splitting the 660 seconds period into a long 659 seconds timer and then retrying with a smaller delta. But looking at the actual granularities of the timer wheel (which depend on the HZ configuration) the 659 seconds timer ends up in an outer wheel level and is affected by a worst case granularity of: HZ Granularity 1000 32s 250 16s 100 40s So the initial timer can be already off by max 12.5% which is not a big issue as the period of the sync is defined as ~11 minutes. The fine grained second attempt schedules to the desired update point with a timer expiring less than a second from now. Depending on the actual delta and the HZ setting even the second attempt can end up in outer wheel levels which have a large enough granularity to make the correctness check fail. As this is a fundamental property of the timer wheel there is no way to make this more accurate short of iterating in one jiffies steps towards the update point. Switch it to an hrtimer instead which schedules the actual update work. The hrtimer will expire precisely (max 1 jiffie delay when high resolution timers are not available). The actual scheduling delay of the work is the same as before. The update is triggered from do_adjtimex() which is a bit racy but not much more racy than it was before: if (ntp_synced()) queue_delayed_work(system_power_efficient_wq, &sync_work, 0); which is racy when the work is currently executed and has not managed to reschedule itself. This becomes now: if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer)) queue_work(system_power_efficient_wq, &sync_work, 0); which is racy when the hrtimer has expired and the work is currently executed and has not yet managed to rearm the hrtimer. Not a big problem as it just schedules work for nothing. The new implementation has a safe guard in place to catch the case where the hrtimer is queued on entry to the work function and avoids an extra update attempt of the RTC that way. Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Miroslav Lichvar <mlichvar@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.062910520@linutronix.de
2020-12-11rtc: core: Make the sync offset default more realisticThomas Gleixner
The offset which is used to steer the start of an RTC synchronization update via rtc_set_ntp_time() is huge. The math behind this is: tsched twrite(t2.tv_sec - 1) t2 (seconds increment) twrite - tsched is the transport time for the write to hit the device. t2 - twrite depends on the chip and is for most chips one second. The rtc_set_ntp_time() calculation of tsched is: tsched = t2 - 1sec - (t2 - twrite) The default for the sync offset is 500ms which means that twrite - tsched is 500ms assumed that t2 - twrite is one second. This is 0.5 seconds off for RTCs which are directly accessible by IO writes and probably for the majority of i2C/SPI based RTC off by an order of magnitude. Set it to 5ms which should bring it closer to reality. The default can be adjusted by drivers (rtc_cmos does so) and could be adjusted further by a calibration method which is an orthogonal problem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.960333166@linutronix.de
2020-12-11rtc: cmos: Make rtc_cmos sync offset correctThomas Gleixner
The offset for rtc_cmos must be -500ms to work correctly with the current implementation of rtc_set_ntp_time() due to the following: tsched twrite(t2.tv_sec - 1) t2 (seconds increment) twrite - tsched is the transport time for the write to hit the device, which is negligible for this chip because it's accessed directly. t2 - twrite = 500ms according to the datasheet. But rtc_set_ntp_time() calculation of tsched is: tsched = t2 - 1sec - (t2 - twrite) The default for the sync offset is 500ms which means that the write happens at t2 - 1.5 seconds which is obviously off by a second for this device. Make the offset -500ms so it works correct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.830517160@linutronix.de
2020-12-11rtc: mc146818: Reduce spinlock section in mc146818_set_time()Thomas Gleixner
No need to hold the lock and disable interrupts for doing math. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.709243630@linutronix.de
2020-12-11rtc: mc146818: Prevent reading garbageThomas Gleixner
The MC146818 driver is prone to read garbage from the RTC. There are several issues all related to the update cycle of the MC146818. The chip increments seconds obviously once per second and indicates that by a bit in a register. The bit goes high 244us before the actual update starts. During the update the readout of the time values is undefined. The code just checks whether the update in progress bit (UIP) is set before reading the clock. If it's set it waits arbitrary 20ms before retrying, which is ample because the maximum update time is ~2ms. But this check does not guarantee that the UIP bit goes high and the actual update happens during the readout. So the following can happen 0.997 UIP = False -> Interrupt/NMI/preemption 0.998 UIP -> True 0.999 Readout <- Undefined To prevent this rework the code so it checks UIP before and after the readout and if set after the readout try again. But that's not enough to cover the following: 0.997 UIP = False Readout seconds -> NMI (or vCPU scheduled out) 0.998 UIP -> True update completes UIP -> False 1.000 Readout minutes,.... UIP check succeeds That can make the readout wrong up to 59 seconds. To prevent this, read the seconds value before the first UIP check, validate it after checking UIP and after reading out the rest. It's amazing that the original i386 code had this actually correct and the generic implementation of the MC146818 driver got it wrong in 2002 and it stayed that way until today. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.594826678@linutronix.de
2020-12-11sched/fair: Trivial correction of the newidle_balance() commentBarry Song
idle_balance() has been renamed to newidle_balance(). To differentiate with nohz_idle_balance, it seems refining the comment will be helpful for the readers of the code. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201202220641.22752-1-song.bao.hua@hisilicon.com
2020-12-11sched/fair: Clear SMT siblings after determining the core is not idleMel Gorman
The clearing of SMT siblings from the SIS mask before checking for an idle core is a small but unnecessary cost. Defer the clearing of the siblings until the scan moves to the next potential target. The cost of this was not measured as it is borderline noise but it should be self-evident. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20201130144020.GS3371@techsingularity.net
2020-12-11sched: Fix kernel-doc markupMauro Carvalho Chehab
Kernel-doc requires that a kernel-doc markup to be immediately below the function prototype, as otherwise it will rename it. So, move sys_sched_yield() markup to the right place. Also fix the cpu_util() markup: Kernel-doc markups should use this format: identifier - description Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/50cd6f460aeb872ebe518a8e9cfffda2df8bdb0a.1606823973.git.mchehab+huawei@kernel.org
2020-12-11x86: Print ratio freq_max/freq_base used in frequency invariance calculationsGiovanni Gherdovich
The value freq_max/freq_base is a fundamental component of frequency invariance calculations. It may come from a variety of sources such as MSRs or ACPI data, tracking it down when troubleshooting a system could be non-trivial. It is worth saving it in the kernel logs. # dmesg | grep 'Estimated ratio of average max' [ 14.024036] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1289 Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201112182614.10700-4-ggherdovich@suse.cz
2020-12-11x86, sched: Use midpoint of max_boost and max_P for frequency invariance on ↵Giovanni Gherdovich
AMD EPYC Frequency invariant accounting calculations need the ratio freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power allocation between cores: AMD EPYC CPUs implement "Core Performance Boost". Three candidates are considered to estimate this value: - maximum non-boost frequency - maximum boost frequency - the mid point between the above two Experimental data on an AMD EPYC Zen2 machine slightly favors the third option, which is applied with this patch. The analysis uses the ondemand cpufreq governor as baseline, and compares it with schedutil in a number of configurations. Using the freq_max value described above offers a moderate advantage in performance and efficiency: sugov-max (freq_max=max_boost) performs the worst on tbench: less throughput and reduced efficiency than the other invariant-schedutil options (see "Data Overview" below). Consider that tbench is generally a problematic case as no schedutil version currently is better than ondemand. sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's can surpass ondemand with less filesystem latency and slightly increased efficiency. 1. DATA OVERVIEW 2. DETAILED PERFORMANCE TABLES 3. POWER CONSUMPTION TABLE 1. DATA OVERVIEW ================ sugov-noinv : non-invariant schedutil governor sugov-max : invariant schedutil, freq_max=max_boost sugov-mid : invariant schedutil, freq_max=midpoint sugov-P0 : invariant schedutil, freq_max=max_P perfgov : performance governor driver : acpi_cpufreq machine : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket, 128 cores / 256 threads, SATA SSD storage, 250G of memory, XFS filesystem Benchmarks are described in the next section. Tilde (~) means the value is the same as baseline. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ondemand perfgov sugov-noinv sugov-max sugov-mid sugov-P0 better if - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - PERFORMANCE RATIOS tbench 1.00 1.44 0.90 0.87 0.93 0.93 higher dbench 1.00 0.91 0.95 0.94 0.94 1.06 lower kernbench 1.00 0.93 ~ ~ ~ 0.97 lower gitsource 1.00 0.66 0.97 0.96 ~ 0.95 lower - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - PERFORMANCE-PER-WATT RATIOS tbench 1.00 1.16 0.84 0.84 0.88 0.85 higher dbench 1.00 1.03 1.02 1.02 1.02 0.93 higher kernbench 1.00 1.05 ~ ~ ~ ~ higher gitsource 1.00 1.46 1.04 1.04 ~ 1.05 higher 2. DETAILED PERFORMANCE TABLES ============================== Benchmark : tbench4 (i.e. dbench4 over the network, actually loopback) Varying parameter : number of clients Unit : MB/sec (higher is better) 5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 427.19 +- 0.16% ( ) 778.35 +- 0.10% ( 82.20%) 346.92 +- 0.14% ( -18.79%) Hmean 2 853.82 +- 0.09% ( ) 1536.23 +- 0.03% ( 79.93%) 694.36 +- 0.05% ( -18.68%) Hmean 4 1657.54 +- 0.12% ( ) 2938.18 +- 0.12% ( 77.26%) 1362.81 +- 0.11% ( -17.78%) Hmean 8 3301.87 +- 0.06% ( ) 5679.10 +- 0.04% ( 72.00%) 2693.35 +- 0.04% ( -18.43%) Hmean 16 6139.65 +- 0.05% ( ) 9498.81 +- 0.04% ( 54.71%) 4889.97 +- 0.17% ( -20.35%) Hmean 32 11170.28 +- 0.09% ( ) 17393.25 +- 0.08% ( 55.71%) 9104.55 +- 0.09% ( -18.49%) Hmean 64 19322.97 +- 0.17% ( ) 31573.91 +- 0.08% ( 63.40%) 18552.52 +- 0.40% ( -3.99%) Hmean 128 30383.71 +- 0.11% ( ) 37416.91 +- 0.15% ( 23.15%) 25938.70 +- 0.41% ( -14.63%) Hmean 256 31143.96 +- 0.41% ( ) 30908.76 +- 0.88% ( -0.76%) 29754.32 +- 0.24% ( -4.46%) Hmean 512 30858.49 +- 0.26% ( ) 38524.60 +- 1.19% ( 24.84%) 42080.39 +- 0.56% ( 36.37%) Hmean 1024 39187.37 +- 0.19% ( ) 36213.86 +- 0.26% ( -7.59%) 39555.98 +- 0.12% ( 0.94%) 5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 352.59 +- 1.03% ( -17.46%) 352.08 +- 0.75% ( -17.58%) 352.31 +- 1.48% ( -17.53%) Hmean 2 697.32 +- 0.08% ( -18.33%) 700.16 +- 0.20% ( -18.00%) 696.79 +- 0.06% ( -18.39%) Hmean 4 1369.88 +- 0.04% ( -17.35%) 1369.72 +- 0.07% ( -17.36%) 1365.91 +- 0.05% ( -17.59%) Hmean 8 2696.79 +- 0.04% ( -18.33%) 2711.06 +- 0.04% ( -17.89%) 2715.10 +- 0.61% ( -17.77%) Hmean 16 4725.03 +- 0.03% ( -23.04%) 4875.65 +- 0.02% ( -20.59%) 4953.05 +- 0.28% ( -19.33%) Hmean 32 9231.65 +- 0.10% ( -17.36%) 8704.89 +- 0.27% ( -22.07%) 10562.02 +- 0.36% ( -5.45%) Hmean 64 15364.27 +- 0.19% ( -20.49%) 17786.64 +- 0.15% ( -7.95%) 19665.40 +- 0.22% ( 1.77%) Hmean 128 42100.58 +- 0.13% ( 38.56%) 34946.28 +- 0.13% ( 15.02%) 38635.79 +- 0.06% ( 27.16%) Hmean 256 30660.23 +- 1.08% ( -1.55%) 32307.67 +- 0.54% ( 3.74%) 31153.27 +- 0.12% ( 0.03%) Hmean 512 24604.32 +- 0.14% ( -20.27%) 40408.50 +- 1.10% ( 30.95%) 38800.29 +- 1.23% ( 25.74%) Hmean 1024 35535.47 +- 0.28% ( -9.32%) 41070.38 +- 2.56% ( 4.81%) 31308.29 +- 2.52% ( -20.11%) Benchmark : dbench (filesystem stressor) Varying parameter : number of clients Unit : seconds (lower is better) NOTE-1: This dbench version measures the average latency of a set of filesystem operations, as we found the traditional dbench metric (throughput) to be misleading. NOTE-2: Due to high variability, we partition the original dataset and apply statistical bootrapping (a resampling method). Accuracy is reported in the form of 95% confidence intervals. 5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SubAmean 1 98.79 +- 0.92 ( ) 83.36 +- 0.82 ( 15.62%) 84.82 +- 0.92 ( 14.14%) SubAmean 2 116.00 +- 0.89 ( ) 102.12 +- 0.77 ( 11.96%) 109.63 +- 0.89 ( 5.49%) SubAmean 4 149.90 +- 1.03 ( ) 132.12 +- 0.91 ( 11.86%) 143.90 +- 1.15 ( 4.00%) SubAmean 8 182.41 +- 1.13 ( ) 159.86 +- 0.93 ( 12.36%) 165.82 +- 1.03 ( 9.10%) SubAmean 16 237.83 +- 1.23 ( ) 219.46 +- 1.14 ( 7.72%) 229.28 +- 1.19 ( 3.59%) SubAmean 32 334.34 +- 1.49 ( ) 309.94 +- 1.42 ( 7.30%) 321.19 +- 1.36 ( 3.93%) SubAmean 64 576.61 +- 2.16 ( ) 540.75 +- 2.00 ( 6.22%) 551.27 +- 1.99 ( 4.39%) SubAmean 128 1350.07 +- 4.14 ( ) 1205.47 +- 3.20 ( 10.71%) 1280.26 +- 3.75 ( 5.17%) SubAmean 256 3444.42 +- 7.97 ( ) 3698.00 +- 27.43 ( -7.36%) 3494.14 +- 7.81 ( -1.44%) SubAmean 2048 39457.89 +- 29.01 ( ) 34105.33 +- 41.85 ( 13.57%) 39688.52 +- 36.26 ( -0.58%) 5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SubAmean 1 85.68 +- 1.04 ( 13.27%) 84.16 +- 0.84 ( 14.81%) 83.99 +- 0.90 ( 14.99%) SubAmean 2 108.42 +- 0.95 ( 6.54%) 109.91 +- 1.39 ( 5.24%) 112.06 +- 0.91 ( 3.39%) SubAmean 4 136.90 +- 1.04 ( 8.67%) 137.59 +- 0.93 ( 8.21%) 136.55 +- 0.95 ( 8.91%) SubAmean 8 163.15 +- 0.96 ( 10.56%) 166.07 +- 1.02 ( 8.96%) 165.81 +- 0.99 ( 9.10%) SubAmean 16 224.86 +- 1.12 ( 5.45%) 223.83 +- 1.06 ( 5.89%) 230.66 +- 1.19 ( 3.01%) SubAmean 32 320.51 +- 1.38 ( 4.13%) 322.85 +- 1.49 ( 3.44%) 321.96 +- 1.46 ( 3.70%) SubAmean 64 553.25 +- 1.93 ( 4.05%) 554.19 +- 2.08 ( 3.89%) 562.26 +- 2.22 ( 2.49%) SubAmean 128 1264.35 +- 3.72 ( 6.35%) 1256.99 +- 3.46 ( 6.89%) 2018.97 +- 18.79 ( -49.55%) SubAmean 256 3466.25 +- 8.25 ( -0.63%) 3450.58 +- 8.44 ( -0.18%) 5032.12 +- 38.74 ( -46.09%) SubAmean 2048 39133.10 +- 45.71 ( 0.82%) 39905.95 +- 34.33 ( -1.14%) 53811.86 +-193.04 ( -36.38%) Benchmark : kernbench (kernel compilation) Varying parameter : number of jobs Unit : seconds (lower is better) 5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 2 471.71 +- 26.61% ( ) 409.88 +- 16.99% ( 13.11%) 430.63 +- 0.18% ( 8.71%) Amean 4 211.87 +- 0.58% ( ) 194.03 +- 0.74% ( 8.42%) 215.33 +- 0.64% ( -1.63%) Amean 8 109.79 +- 1.27% ( ) 101.43 +- 1.53% ( 7.61%) 111.05 +- 1.95% ( -1.15%) Amean 16 59.50 +- 1.28% ( ) 55.61 +- 1.35% ( 6.55%) 59.65 +- 1.78% ( -0.24%) Amean 32 34.94 +- 1.22% ( ) 32.36 +- 1.95% ( 7.41%) 35.44 +- 0.63% ( -1.43%) Amean 64 22.58 +- 0.38% ( ) 20.97 +- 1.28% ( 7.11%) 22.41 +- 1.73% ( 0.74%) Amean 128 17.72 +- 0.44% ( ) 16.68 +- 0.32% ( 5.88%) 17.65 +- 0.96% ( 0.37%) Amean 256 16.44 +- 0.53% ( ) 15.76 +- 0.32% ( 4.18%) 16.76 +- 0.60% ( -1.93%) Amean 512 16.54 +- 0.21% ( ) 15.62 +- 0.41% ( 5.53%) 16.84 +- 0.85% ( -1.83%) 5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 2 421.30 +- 0.24% ( 10.69%) 419.26 +- 0.15% ( 11.12%) 414.38 +- 0.33% ( 12.15%) Amean 4 217.81 +- 5.53% ( -2.80%) 211.63 +- 0.99% ( 0.12%) 208.43 +- 0.47% ( 1.63%) Amean 8 108.80 +- 0.43% ( 0.90%) 108.48 +- 1.44% ( 1.19%) 108.59 +- 3.08% ( 1.09%) Amean 16 58.84 +- 0.74% ( 1.12%) 58.37 +- 0.94% ( 1.91%) 57.78 +- 0.78% ( 2.90%) Amean 32 34.04 +- 2.00% ( 2.59%) 34.28 +- 1.18% ( 1.91%) 33.98 +- 2.21% ( 2.75%) Amean 64 22.22 +- 1.69% ( 1.60%) 22.27 +- 1.60% ( 1.38%) 22.25 +- 1.41% ( 1.47%) Amean 128 17.55 +- 0.24% ( 0.97%) 17.53 +- 0.94% ( 1.04%) 17.49 +- 0.43% ( 1.30%) Amean 256 16.51 +- 0.46% ( -0.40%) 16.48 +- 0.48% ( -0.19%) 16.44 +- 1.21% ( 0.00%) Amean 512 16.50 +- 0.35% ( 0.19%) 16.35 +- 0.42% ( 1.14%) 16.37 +- 0.33% ( 0.99%) Benchmark : gitsource (time to run the git unit test suite) Varying parameter : none Unit : seconds (lower is better) 5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 1035.76 +- 0.30% ( ) 688.21 +- 0.04% ( 33.56%) 1003.85 +- 0.14% ( 3.08%) 5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 995.82 +- 0.08% ( 3.86%) 1011.98 +- 0.03% ( 2.30%) 986.87 +- 0.19% ( 4.72%) 3. POWER CONSUMPTION TABLE ========================== Average power consumption (watts). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ondemand perfgov sugov-noinv sugov-max sugov-mid sugov-P0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - tbench4 227.25 281.83 244.17 236.76 241.50 247.99 dbench4 151.97 161.87 157.08 158.10 158.06 153.73 kernbench 162.78 167.22 162.90 164.19 164.65 164.72 gitsource 133.65 139.00 133.04 134.43 134.18 134.32 Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@suse.cz
2020-12-11x86, sched: Calculate frequency invariance for AMD systemsNathan Fontenot
This is the first pass in creating the ability to calculate the frequency invariance on AMD systems. This approach uses the CPPC highest performance and nominal performance values that range from 0 - 255 instead of a high and base frquency. This is because we do not have the ability on AMD to get a highest frequency value. On AMD systems the highest performance and nominal performance vaues do correspond to the highest and base frequencies for the system so using them should produce an appropriate ratio but some tweaking is likely necessary. Due to CPPC being initialized later in boot than when the frequency invariant calculation is currently made, I had to create a callback from the CPPC init code to do the calculation after we have CPPC data. Special thanks to "kernel test robot <lkp@intel.com>" for reporting that compilation of drivers/acpi/cppc_acpi.c is conditional to CONFIG_ACPI_CPPC_LIB, not just CONFIG_ACPI. [ ggherdovich@suse.cz: made safe under CPU hotplug, edited changelog. ] Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201112182614.10700-2-ggherdovich@suse.cz
2020-12-11ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset ButtonKailang Yang
Add supported for more Lenovo ALC285 Headset Button. Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/bb1f1da1526d460885aa4257be81eb94@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11ALSA: hda/ca0132 - Remove now unnecessary DSP setup functions.Connor McAdams
Now that the DSP's audio configuration is understood, remove previous hacky methods of trying to properly configure it. Signed-off-by: Connor McAdams <conmanx360@gmail.com> Link: https://lore.kernel.org/r/20201210160658.461739-6-conmanx360@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11ALSA: hda/ca0132 - Ensure DSP is properly setup post-firmware download.Connor McAdams
Make sure that the DSP has no DMA channels allocated once the firmware is downloaded, and that the default audio streams in use by the DSP are setup in the correct order. Signed-off-by: Connor McAdams <conmanx360@gmail.com> Link: https://lore.kernel.org/r/20201210160658.461739-5-conmanx360@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11ALSA: hda/ca0132 - Add 8051 exram helper functions.Connor McAdams
Add functions for both reading and writing to the 8051's exram. Also, add a little bit of documentation on how the addresses are segmented. Signed-off-by: Connor McAdams <conmanx360@gmail.com> Link: https://lore.kernel.org/r/20201210160658.461739-4-conmanx360@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11ALSA: hda/ca0132 - Add stream port remapping function.Connor McAdams
Add function for remapping a ChipIO stream's ports. Also include some documentation as to how this works. Signed-off-by: Connor McAdams <conmanx360@gmail.com> Link: https://lore.kernel.org/r/20201210160658.461739-3-conmanx360@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11ALSA: hda/ca0132 - Reset codec upon initialization.Connor McAdams
Reset the codec upon initialization to clear out anything that may have been setup on a previous boot into Windows, or in case of an improper shutdown. Signed-off-by: Connor McAdams <conmanx360@gmail.com> Link: https://lore.kernel.org/r/20201210160658.461739-2-conmanx360@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-12-11Merge tag 'extcon-next-for-5.11' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next Chanwoo writes: Update for extcon-next v5.11 1. Add new TI TUSB320 USB-C extcon driver - The extcon-usbc-tusb320.c driver for the TI TUSB320 USB Type-C device support the USB Type C connector detection. 2. Rewrite binding document in yaml for extcon-fsa9480.c and add new compatible name of TI TSU6111 device. 3. Fix moalias string of extcon-max77693.c to fix the automated module loading when this driver is compiled as a module. * tag 'extcon-next-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon: extcon: max77693: Fix modalias string extcon: fsa9480: Support TI TSU6111 variant extcon: fsa9480: Rewrite bindings in YAML and extend dt-bindings: extcon: add binding for TUSB320 extcon: Add driver for TI TUSB320
2020-12-11extcon: max77693: Fix modalias stringMarek Szyprowski
The platform device driver name is "max77693-muic", so advertise it properly in the modalias string. This fixes automated module loading when this driver is compiled as a module. Fixes: db1b9037424b ("extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-12-11extcon: fsa9480: Support TI TSU6111 variantLinus Walleij
The Texas Instruments TSU6111 is compatible to the FSA880/FSA9480. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-12-11extcon: fsa9480: Rewrite bindings in YAML and extendLinus Walleij
This rewrites the FSA9480 DT bindings using YAML and extends them with the compatible TI TSU6111. I chose to name the file fcs,fsa880 since this is the first switch, later versions are improvements. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-12-11dt-bindings: extcon: add binding for TUSB320Michael Auchter
Add a device tree binding for the TI TUSB320. Signed-off-by: Michael Auchter <michael.auchter@ni.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-12-11extcon: Add driver for TI TUSB320Michael Auchter
This patch adds an extcon driver for the TI TUSB320 USB Type-C device. This can be used to detect whether the port is configured as a downstream or upstream facing port. Signed-off-by: Michael Auchter <michael.auchter@ni.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-12-10selftests/bpf: Silence ima_setup.sh when not running in verbose mode.KP Singh
Currently, ima_setup.sh spews outputs from commands like mkfs and dd on the terminal without taking into account the verbosity level of the test framework. Update test_progs to set the environment variable SELFTESTS_VERBOSE=1 when a verbose output is requested. This environment variable is then used by ima_setup.sh (and can be used by other similar scripts) to obey the verbosity level of the test harness without needing to re-implement command line options for verbosity. In "silent" mode, the script saves the output to a temporary file, the contents of which are echoed back to stderr when the script encounters an error. Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash") Reported-by: Andrii Nakryiko <andrii@kernel.org> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201211010711.3716917-1-kpsingh@kernel.org
2020-12-10selftests/bpf: Drop the need for LLVM's llcAndrew Delgadillo
LLC is meant for compiler development and debugging. Consequently, it exposes many low level options about its backend. To avoid future bugs introduced by using the raw LLC tool, use clang directly so that all appropriate options are passed to the back end. Additionally, simplify the Makefile by removing the CLANG_NATIVE_BPF_BUILD_RULE as it is not being use, stop passing dwarfris attr since elfutils/libdw now supports the bpf backend (which should work with any recent pahole), and stop passing alu32 since -mcpu=v3 implies alu32. Signed-off-by: Andrew Delgadillo <adelg@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201211004344.3355074-1-adelg@google.com
2020-12-10selftests/bpf: fix bpf_testmod.ko recompilation logicAndrii Nakryiko
bpf_testmod.ko build rule declared dependency on VMLINUX_BTF, but the variable itself was initialized after the rule was declared, which often caused bpf_testmod.ko to not be re-compiled. Fix by moving VMLINUX_BTF determination sooner. Also enforce bpf_testmod.ko recompilation when we detect that vmlinux image changed by removing bpf_testmod/bpf_testmod.ko. This is necessary to generate correct module's split BTF. Without it, Kbuild's module build logic might determine that nothing changed on the kernel side and thus bpf_testmod.ko shouldn't be rebuilt, so won't re-generate module BTF, which often leads to module's BTF with wrong string offsets against vmlinux BTF. Removing .ko file forces Kbuild to re-build the module. Reported-by: Alexei Starovoitov <ast@kernel.org> Fixes: 9f7fa225894c ("selftests/bpf: Add bpf_testmod kernel module for testing") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20201211015946.4062098-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-10Merge tag 'drm-fixes-2020-12-11' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Last week of fixes, just amdgpu and i915 collections. We had a i915 regression reported by HJ Lu reported this morning, and this contains a fix for that he has tested. There are a fair few other fixes, but they are spread across the two drivers, and all fairly self contained. amdgpu: - Fan fix for CI asics - Fix a warning in possible_crtcs - Build fix for when debugfs is disabled - Display overflow fix - Display watermark fixes for Renoir - SDMA 5.2 fix - Stolen vga memory regression fix - Power profile fixes - Fix a regression from removal of GEM and PRIME callbacks amdkfd: - Fix a memory leak in dmabuf import i915: - rc7 regression fix for modesetting - vdsc/dp slice fixes - gen9 mocs entries fix - preemption timeout fix - unsigned compare against 0 fix - selftest fix - submission error propogatig fix - request flow suspend fix" * tag 'drm-fixes-2020-12-11' of git://anongit.freedesktop.org/drm/drm: drm/i915/display: Go softly softly on initial modeset failure drm/amd/pm: typo fix (CUSTOM -> COMPUTE) drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs drm/amdgpu: fix size calculation with stolen vga memory drm/amd/pm: update smu10.h WORKLOAD_PPLIB setting for raven drm/amdkfd: Fix leak in dmabuf import drm/amdgpu: fix sdma instance fw version and feature version init drm/amd/display: Add wm table for Renoir drm/amd/display: Prevent bandwidth overflow drm/amdgpu: fix debugfs creation/removal, again drm/amdgpu/disply: set num_crtc earlier drm/amdgpu/powerplay: parse fan table for CI asics drm/i915/gt: Declare gen9 has 64 mocs entries! drm/i915/display/dp: Compute the correct slice count for VDSC on DP drm/i915: fix size_t greater or equal to zero comparison drm/i915/gt: Cancel the preemption timeout on responding to it drm/i915/gt: Ignore repeated attempts to suspend request flow across reset drm/i915/gem: Propagate error from cancelled submit due to context closure drm/i915/gem: Check the correct variable in selftest
2020-12-10RISC-V: Define get_cycles64() regardless of M-modePalmer Dabbelt
The timer driver uses get_cycles64() unconditionally to obtain the current time. A recent refactoring lost the common definition for some configs, which is now the only one we need. Fixes: d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10Merge tag 'ktest-v5.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest fix from Steven Rostedt: "Fix issues with grub2bls in ktest.pl ktest.pl did not know about grub2bls that was introduced in Fedora 30, and now it does" * tag 'ktest-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest.pl: Fix incorrect reboot for grub2bls
2020-12-10Merge tag 'powerpc-5.10-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "One commit to implement copy_from_kernel_nofault_allowed(), otherwise copy_from_kernel_nofault() can trigger warnings when accessing bad addresses in some configurations. Thanks to Christophe Leroy and Qian Cai" * tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()
2020-12-10Merge tag 'fixes-v5.10a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull namespaced fscaps fix from James Morris: "Fix namespaced fscaps when !CONFIG_SECURITY (Serge Hallyn)" * tag 'fixes-v5.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: [SECURITY] fix namespaced fscaps when !CONFIG_SECURITY
2020-12-11drm/i915/display: Go softly softly on initial modeset failureChris Wilson
Reduce the module/device probe error into a mere debug to hide issues where the initial modeset is failing (after lies told by hw probe) and the system hangs with a livelock in cleaning up the failed commit. Reported-by: H.J. Lu <hjl.tools@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210619 Fixes: b3bf99daaee9 ("drm/i915/display: Defer initial modeset until after GGTT is initialised") Fixes: ccc9e67ab26f ("drm/i915/display: Defer initial modeset until after GGTT is initialised") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201210230741.17140-1-chris@chris-wilson.co.uk
2020-12-11Merge tag 'drm-intel-fixes-2020-12-09' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Fixes for VDSC/DP, selftests, shmem_utils, preemption, submission, and gt reset: - Check the correct variable in selftest (Dan) - Propagate error from canceled submit due to context closure (Chris) - Ignore repeated attempts to suspend request flow across reset (Chris) - Cancel the preemption timeout on responding to it (Chris) - Fix unsigned compared against 0 (Colin) - Compute the correct slice count for VDSC on DP (Manasi) - Declar gen9 has 64 mocs entries (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201209235010.GA10554@intel.com
2020-12-11Merge tag 'amd-drm-fixes-5.10-2020-12-09' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.10-2020-12-09: amdgpu: - Fan fix for CI asics - Fix a warning in possible_crtcs - Build fix for when debugfs is disabled - Display overflow fix - Display watermark fixes for Renoir - SDMA 5.2 fix - Stolen vga memory regression fix - Power profile fixes - Fix a regression from removal of GEM and PRIME callbacks amdkfd: - Fix a memory leak in dmabuf import Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201210034848.18108-1-alexander.deucher@amd.com
2020-12-10Merge tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Here are a handful more bugfixes for 5.10. Unfortunately, we found some problems with the new READ_PLUS operation that aren't easy to fix. We've decided to disable this codepath through a Kconfig option for now, but a series of patches going into 5.11 will clean up the code and fix the issues at the same time. This seemed like the best way to go about it. Summary: - Fix array overflow when flexfiles mirroring is enabled - Fix rpcrdma_inline_fixup() crash with new LISTXATTRS - Fix 5 second delay when doing inter-server copy - Disable READ_PLUS by default" * tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Disable READ_PLUS by default NFSv4.2: Fix 5 seconds delay when doing inter server copy NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) IPsec compat fixes, from Dmitry Safonov. 2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai. 3) Fix polling in xsk sockets by using sk_poll_wait() instead of datagram_poll() which keys off of sk_wmem_alloc and such which xsk sockets do not update. From Xuan Zhuo. 4) Missing init of rekey_data in cfgh80211, from Sara Sharon. 5) Fix destroy of timer before init, from Davide Caratti. 6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd Bergmann. 7) Missing error return in rtm_to_fib_config() switch case, from Zhang Changzhong. 8) Fix some src/dest address handling in vrf and add a testcase. From Stephen Suryaputra. 9) Fix multicast handling in Seville switches driven by mscc-ocelot driver. From Vladimir Oltean. 10) Fix proto value passed to skb delivery demux in udp, from Xin Long. 11) HW pkt counters not reported correctly in enetc driver, from Claudiu Manoil. 12) Fix deadlock in bridge, from Joseph Huang. 13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET. 14) Fix pid fetching in bpftool when there are a lot of results, from Andrii Nakryiko. 15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso. 16) Various stymmac fixes, from Fugang Duan. 17) Fix null deref in tipc, from Cengiz Can. 18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric Dumazet. 19) Revert a geneve change that likely isnt necessary, from Jakub Kicinski. 20) Avoid premature rx buffer reuse in various Intel driversm from Björn Töpel. 21) retain EcT bits during TIS reflection in tcp, from Wei Wang. 22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell. 23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault 24) Fix propagation of 32-bit signed bounds in bpf verifier and add test cases, from Alexei Starovoitov. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) selftests: fix poll error in udpgro.sh selftests/bpf: Fix "dubious pointer arithmetic" test selftests/bpf: Fix array access with signed variable test selftests/bpf: Add test for signed 32-bit bound check bug bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds. MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower net/mlx4_en: Handle TX error CQE net/mlx4_en: Avoid scheduling restart task if it is already running tcp: fix cwnd-limited bug for TSO deferral where we send nothing net: flow_offload: Fix memory leak for indirect flow block tcp: Retain ECT bits for tos reflection ethtool: fix stack overflow in ethnl_parse_bitset() e1000e: fix S0ix flow to allow S0i3.2 subset entry ice: avoid premature Rx buffer reuse ixgbe: avoid premature Rx buffer reuse i40e: avoid premature Rx buffer reuse igb: avoid transmit queue timeout in xdp path igb: use xdp_do_flush igb: skb add metasize for xdp ...
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-12-10 The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 12 day(s) which contain a total of 21 files changed, 163 insertions(+), 88 deletions(-). The main changes are: 1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei. 2) Fix ring_buffer__poll() return value, from Andrii. 3) Fix race in lwt_bpf, from Cong. 4) Fix test_offload, from Toke. 5) Various xsk fixes. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10samples/bpf: Fix possible hang in xdpsock with multiple threadsMagnus Karlsson
Fix a possible hang in xdpsock that can occur when using multiple threads. In this case, one or more of the threads might get stuck in the while-loop in tx_only after the user has signaled the main thread to stop execution. In this case, no more Tx packets will be sent, so a thread might get stuck in the aforementioned while-loop. Fix this by introducing a test inside the while-loop to check if the benchmark has been terminated. If so, return from the function. Fixes: cd9e72b6f210 ("samples/bpf: xdpsock: Add option to specify batch size") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201210163407.22066-1-magnus.karlsson@gmail.com
2020-12-10x86/ioapic: Cleanup the timer_works() irqflags messThomas Gleixner
Mark tripped over the creative irqflags handling in the IO-APIC timer delivery check which ends up doing: local_irq_save(flags); local_irq_enable(); local_irq_restore(flags); which triggered a new consistency check he's working on required for replacing the POPF based restore with a conditional STI. That code is a historical mess and none of this is needed. Make it straightforward use local_irq_disable()/enable() as that's all what is required. It is invoked from interrupt enabled code nowadays. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/87k0tpju47.fsf@nanos.tec.linutronix.de
2020-12-10x86/apic/vector: Fix ordering in vector assignmentThomas Gleixner
Prarit reported that depending on the affinity setting the ' irq $N: Affinity broken due to vector space exhaustion.' message is showing up in dmesg, but the vector space on the CPUs in the affinity mask is definitely not exhausted. Shung-Hsi provided traces and analysis which pinpoints the problem: The ordering of trying to assign an interrupt vector in assign_irq_vector_any_locked() is simply wrong if the interrupt data has a valid node assigned. It does: 1) Try the intersection of affinity mask and node mask 2) Try the node mask 3) Try the full affinity mask 4) Try the full online mask Obviously #2 and #3 are in the wrong order as the requested affinity mask has to take precedence. In the observed cases #1 failed because the affinity mask did not contain CPUs from node 0. That made it allocate a vector from node 0, thereby breaking affinity and emitting the misleading message. Revert the order of #2 and #3 so the full affinity mask without the node intersection is tried before actually affinity is broken. If no node is assigned then only the full affinity mask and if that fails the full online mask is tried. Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor") Reported-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
2020-12-10Merge branch 'add-ppp_generic-ioctls-to-bridge-channels'David S. Miller
Tom Parkin says: ==================== add ppp_generic ioctl(s) to bridge channels Following on from my previous RFC[1], this series adds two ioctl calls to the ppp code to implement "channel bridging". When two ppp channels are bridged, frames presented to ppp_input() on one channel are passed to the other channel's ->start_xmit function for transmission. The primary use-case for this functionality is in an L2TP Access Concentrator where PPP frames are typically presented in a PPPoE session (e.g. from a home broadband user) and are forwarded to the ISP network in a PPPoL2TP session. The two new ioctls, PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN form a symmetric pair. Userspace code testing and illustrating use of the ioctl calls is available in the go-l2tp[2] and l2tp-ktest[3] repositories. [1]. Previous RFC series: https://lore.kernel.org/netdev/20201106181647.16358-1-tparkin@katalix.com/ [2]. go-l2tp: a Go library for building L2TP applications on Linux systems. Support for the PPPIOCBRIDGECHAN ioctl is on a branch: https://github.com/katalix/go-l2tp/tree/tp_002_pppoe_2 [3]. l2tp-ktest: a test suite for the Linux Kernel L2TP subsystem. Support for the PPPIOCBRIDGECHAN ioctl is on a branch: https://github.com/katalix/l2tp-ktest/tree/tp_ac_pppoe_tests_2 Changelog: v4: * Fix NULL-pointer access in PPPIOCBRIDGECHAN in the case that the ID of the channel to be bridged wasn't found. * Add comment in ppp_unbridge_channels to better document the unbridge process. v3: * Use rcu_dereference_protected for accessing struct channel 'bridge' field during updates with lock 'upl' held. * Avoid race in ppp_unbridge_channels by ensuring that each channel in the bridge points to it's peer before decrementing refcounts. v2: * Add missing __rcu annotation to struct channel 'bridge' field in order to squash a sparse warning from a C=1 build * Integrate review comments from gnault@redhat.com * Have ppp_unbridge_channels return -EINVAL if the channel isn't part of a bridge: this better aligns with the return code from ppp_disconnect_channel. * Improve docs update by including information on ioctl arguments and error return codes. ==================== Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10docs: update ppp_generic.rst to document new ioctlsTom Parkin
Add documentation of the newly-added PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctlsTom Parkin
This new ioctl pair allows two ppp channels to be bridged together: frames arriving in one channel are transmitted in the other channel and vice versa. The practical use for this is primarily to support the L2TP Access Concentrator use-case. The end-user session is presented as a ppp channel (typically PPPoE, although it could be e.g. PPPoA, or even PPP over a serial link) and is switched into a PPPoL2TP session for transmission to the LNS. At the LNS the PPP session is terminated in the ISP's network. When a PPP channel is bridged to another it takes a reference on the other's struct ppp_file. This reference is dropped when the channels are unbridged, which can occur either explicitly on userspace calling the PPPIOCUNBRIDGECHAN ioctl, or implicitly when either channel in the bridge is unregistered. In order to implement the channel bridge, struct channel is extended with a new field, 'bridge', which points to the other struct channel making up the bridge. This pointer is RCU protected to avoid adding another lock to the data path. To guard against concurrent writes to the pointer, the existing struct channel lock 'upl' coverage is extended rather than adding a new lock. The 'upl' lock is used to protect the existing unit pointer. Since the bridge effectively replaces the unit (they're mutually exclusive for a channel) it makes coding easier to use the same lock to cover them both. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10NFS: Disable READ_PLUS by defaultAnna Schumaker
We've been seeing failures with xfstests generic/091 and generic/263 when using READ_PLUS. I've made some progress on these issues, and the tests fail later on but still don't pass. Let's disable READ_PLUS by default until we can work out what is going on. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-12-10NFSv4.2: Fix 5 seconds delay when doing inter server copyDai Ngo
Since commit b4868b44c5628 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-12-10NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operationChuck Lever
By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <kolga@netapp.com> Suggested-by: Olga kornievskaia <kolga@netapp.com> Fixes: c10a75145feb ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga kornievskaia <kolga@netapp.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Tested-by: Olga kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-12-10rtnetlink: RCU-annotate both dimensions of rtnl_msg_handlersJakub Kicinski
We use rcu_assign_pointer to assign both the table and the entries, but the entries are not marked as __rcu. This generates sparse warnings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10Revert "macb: support the two tx descriptors on at91rm9200"Willy Tarreau
This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb. The code was developed and tested on an MSC313E SoC, which seems to be half-way between the AT91RM9200 and the AT91SAM9260 in that it supports both the 2-descriptors mode and a Tx ring. It turns out that after the code was merged I could notice that the controller would sometimes lock up, and only when dealing with sustained bidirectional transfers, in which case it would report a Tx overrun condition right after having reported being ready, and will stop sending even after the status is cleared (a down/up cycle fixes it though). After adding lots of traces I couldn't spot a sequence pattern allowing to predict that this situation would happen. The chip comes with no documentation and other bits are often reported with no conclusive pattern either. It is possible that my change is wrong just like it is possible that the controller on the chip is bogus or at least unpredictable based on existing docs from other chips. I do not have an RM9200 at hand to test at the moment and a few tests run on a more recent 9G20 indicate that this code path cannot be used there to test the code on a 3rd platform. Since the MSC313E works fine in the single-descriptor mode, and that people using the old RM9200 very likely favor stability over performance, better revert this patch until we can test it on the original platform this part of the driver was written for. Note that the reverted patch was actually tested on MSC313E. Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Claudiu Beznea <claudiu.beznea@microchip.com> Cc: Daniel Palmer <daniel@0x0f.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/netdev/20201206092041.GA10646@1wt.eu/ Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10net: qualcomm: rmnet: Update rmnet device MTU based on real deviceSubash Abhinov Kasiviswanathan
Packets sent by rmnet to the real device have variable MAP header lengths based on the data format configured. This patch adds checks to ensure that the real device MTU is sufficient to transmit the MAP packet comprising of the MAP header and the IP packet. This check is enforced when rmnet devices are created and updated and during MTU updates of both the rmnet and real device. Additionally, rmnet devices now have a default MTU configured which accounts for the real device MTU and the headroom based on the data format. Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Tested-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>