summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-03x86/perf/amd: Resolve NMI latency issues for active PMCsLendacky, Thomas
On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured. To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03x86/perf/amd: Resolve race condition when disabling PMCLendacky, Thomas
On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured. To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Initialize TFA MSRPeter Zijlstra
Stephane reported that the TFA MSR is not initialized by the kernel, but the TFA bit could set by firmware or as a leftover from a kexec, which makes the state inconsistent. Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Nelson DSouza <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190321123849.GN6521@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBSStephane Eranian
When an event is programmed with attr.wakeup_events=N (N>0), it means the caller is interested in getting a user level notification after N samples have been recorded in the kernel sampling buffer. With precise events on Intel processors, the kernel uses PEBS. The kernel tries minimize sampling overhead by verifying if the event configuration is compatible with multi-entry PEBS mode. If so, the kernel is notified only when the buffer has reached its threshold. Other PEBS operates in single-entry mode, the kenrel is notified for each PEBS sample. The problem is that the current implementation look at frequency mode and event sample_type but ignores the wakeup_events field. Thus, it may not be possible to receive a notification after each precise event. This patch fixes this problem by disabling multi-entry PEBS if wakeup_events is non-zero. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03sched/fair: Do not re-read ->h_load_next during hierarchical load calculationMel Gorman
A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like: Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98 The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL. While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing. As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03mfd: sun6i-prcm: Allow to compile with COMPILE_TESTMaxime Ripard
Since this driver only has a dependency on ARCH_SUNXI just because it doesn't make any sense to run it on something else, we can definitely enable it through COMPILE_TEST as well to get some build coverage. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-04-02Merge tag 'pidfd-fixes-v5.1-rc3' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull pidfd fix from Christian Brauner: "This should be an uncontroversial fix for pidfd_send_signal() by Jann to better align it's behavior with other signal sending functions: In one of the early versions of the patchset it was suggested to not unconditionally error out when a signal with SI_USER is sent to a non-current task (cf. [1]). Instead, pidfd_send_signal() currently silently changes this to a regular kill signal. While this is technically fine, the semantics are weird since the kernel just silently converts a user's request behind their back and also no other signal sending function allows to do this. It gets more hairy when we introduce sending signals to a specific thread soon. So let's align pidfd_send_signal() with all the other signal sending functions and error out when SI_USER signals are sent to a non-current task" * tag 'pidfd-fixes-v5.1-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: signal: don't silently convert SI_USER signals to non-current pidfd
2019-04-03ASoC: Intel: cht_bsw_max98090_ti: Enable codec clock once and keep it enabledHans de Goede
Users have been seeing sound stability issues with max98090 codecs since: commit 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") At first that commit broke sound for Chromebook Swanky and Clapper models, the problem was that the machine-driver has been controlling the wrong clock on those models since support for them was added. This was hidden by clk-pmc-atom.c keeping the actual clk on unconditionally. With the machine-driver controlling the proper clock, sound works again but we are seeing bug reports describing it as: low volume, "sounds like played at 10x speed" and instable. When these issues are hit the following message is seen in dmesg: "max98090 i2c-193C9890:00: PLL unlocked". Attempts have been made to fix this by inserting a delay between enabling the clk and enabling and checking the pll, but this has not helped. It seems that at least on boards which use pmc_plt_clk_0 as clock, if we ever disable the clk, the pll looses its lock and after that we get various issues. This commit fixes this by enabling the clock once at probe time on these boards. In essence this restores the old behavior of clk-pmc-atom.c always keeping the clk on on these boards. Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Reported-by: Mogens Jensen <mogens-jensen@protonmail.com> Reported-by: Dean Wallace <duffydack73@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-02Merge tag 'hwmon-for-v5.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Couple of minor hwmon fixes" * tag 'hwmon-for-v5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: dt-bindings: hwmon: (adc128d818) Specify ti,mode property size hwmon: (ntc_thermistor) Fix temperature type reporting hwmon: (occ) Fix power sensor indexing hwmon: (w83773g) Select REGMAP_I2C to fix build error
2019-04-02Update Nicolas Pitre's email addressNicolas Pitre
The @linaro version won't be valid much longer. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-03ASoC: wm_adsp: Check for buffer in trigger stopCharles Keepax
Trigger stop can be called in situations where trigger start failed and as such it can't be assumed the buffer is already attached to the compressed stream or a NULL pointer may be dereferenced. Fixes: 639e5eb3c7d6 ("ASoC: wm_adsp: Correct handling of compressed streams that restart") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-02rtc: da9063: set uie_unsupported when relevantAlexandre Belloni
The DA9063AD doesn't support alarms on any seconds and its granularity is the minute. Set uie_unsupported in that case. Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2019-04-02drm/amdgpu: remove unnecessary rlc reset function on gfx9Le Ma
The rlc reset function is not necessary during gfx9 initialization/resume phase. And this function would even cause rlc fw loading failed on some gfx9 ASIC. Remove this function safely with verification well on Vega/Raven platform. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-02Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2019-04-01 This series contains two fixes for XDP in the i40e driver. Björn provides both fixes, first moving a function out of the header and into the main.c file. Second fixes a regression introduced in an earlier patch that removed umem from the VSI. This caused an issue because the setup code would try to enable AF_XDP zero copy unconditionally, as long as there was a umem placed in the netdev receive structure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev typeSheena Mira-ato
The device type for ip6 tunnels is set to ARPHRD_TUNNEL6. However, the ip4ip6_err function is expecting the device type of the tunnel to be ARPHRD_TUNNEL. Since the device types do not match, the function exits and the ICMP error packet is not sent to the originating host. Note that the device type for IPv4 tunnels is set to ARPHRD_TUNNEL. Fix is to expect a tunnel device type of ARPHRD_TUNNEL6 instead. Now the tunnel device type matches and the ICMP error packet is sent to the originating host. Signed-off-by: Sheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02blk-mq: add trace block plug and unplug for multiple queuesYufen Yu
For now, we just trace plug for single queue device or drivers provide .commit_rqs, and have not trace plug for multiple queues device. But, unplug events will be recorded when call blk_mq_flush_plug_list(). Then, trace events will be asymmetrical, just have unplug and without plug. This patch add trace plug and unplug for multiple queues device in blk_mq_make_request(). After that, we can accurately trace plug and unplug for multiple queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-02block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctxShenghui Wang
kfree() can leak the hctx->fq->flush_rq field. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-02ALSA: uapi: #include <time.h> in asound.hDaniel Mentz
The uapi header asound.h defines types based on struct timespec. We need to #include <time.h> to get access to the definition of this struct. Previously, we encountered the following error message when building applications with a clang/bionic toolchain: kernel-headers/sound/asound.h:350:19: error: field has incomplete type 'struct timespec' struct timespec trigger_tstamp; ^ The absence of the time.h #include statement does not cause build errors with glibc, because its version of stdlib.h indirectly includes time.h. Signed-off-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-04-02ALSA: hda/realtek: Enable headset MIC of Acer TravelMate B114-21 with ALC233Jian-Hong Pan
The Acer TravelMate B114-21 laptop cannot detect and record sound from headset MIC. This patch adds the ALC233_FIXUP_ACER_HEADSET_MIC HDA verb quirk chained with ALC233_FIXUP_ASUS_MIC_NO_PRESENCE pin quirk to fix this issue. [ fixed the missing brace and reordered the entry -- tiwai ] Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Reviewed-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-04-02ASoC: dapm: set power_check callback for widgets that shouldnt be always onRanjani Sridharan
Currently, buffers, schedulers, src's, encoders, decoders and effect type dapm widgets remain always on as their power_check method is not set. Setting this callback allows these widgets in the audio path to be powered managed properly. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-02ASoC: dpcm: skip missing substream while applying symmetryJerome Brunet
If for any reason, the backend does not have the requested substream (like capture on a playback only backend), the BE will be skipped in dpcm_be_dai_startup(). However, dpcm_apply_symmetry() does not skip those BE and will dereference the be_substream (NULL) pointer anyway. Like in dpcm_be_dai_startup(), just skip those BE. Fixes: 906c7d690c3b ("ASoC: dpcm: Apply symmetry for DPCM") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-02mfd: sc27xx: Use SoC compatible string for PMIC devicesBaolin Wang
We should use SoC compatible string in stead of wildcard string for PMIC child devices. Fixes: 0419a75b1808 (arm64: dts: sprd: Remove wildcard compatible string) Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-04-02mfd: twl-core: Disable IRQ while suspendedAndreas Kemnade
Since commit 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend") on gta04 we have handle_twl4030_pih() called in situations where pm_runtime_get() in i2c-omap.c returns -EACCES. [ 86.474365] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 86.485473] printk: Suspending console(s) (use no_console_suspend to debug) [ 86.555572] Disabling non-boot CPUs ... [ 86.555664] Successfully put all powerdomains to target state [ 86.563720] twl: Read failed (mod 1, reg 0x01 count 1) [ 86.563751] twl4030: I2C error -13 reading PIH ISR [ 86.563812] twl: Read failed (mod 1, reg 0x01 count 1) [ 86.563812] twl4030: I2C error -13 reading PIH ISR [ 86.563873] twl: Read failed (mod 1, reg 0x01 count 1) [ 86.563903] twl4030: I2C error -13 reading PIH ISR This happens when we wakeup via something behing twl4030 (powerbutton or rtc alarm). This goes on for minutes until the system is finally resumed. Disable the irq on suspend and enable it on resume to avoid having i2c access problems when the irq registers are checked. Fixes: 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend") Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-04-02drm/mediatek: Fix an error code in mtk_hdmi_dt_parse_pdata()Dan Carpenter
We don't want to overwrite "ret", it already holds the correct error code. The "regmap" variable might be a valid pointer as this point. Fixes: 8f83f26891e1 ("drm/mediatek: Add HDMI support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: CK Hu <ck.hu@mediatek.com>
2019-04-01dccp: Fix memleak in __feat_register_spYueHaibing
If dccp_feat_push_change fails, we forget free the mem which is alloced by kmemdup in dccp_feat_clone_sp_val. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e8ef967a54f4 ("dccp: Registration routines for changing feature values") Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01sctp: initialize _pad of sockaddr_in before copying to user memoryXin Long
Syzbot report a kernel-infoleak: BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 Call Trace: _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline] sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562 ... Uninit was stored to memory at: sctp_transport_init net/sctp/transport.c:61 [inline] sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115 sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637 sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline] sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361 ... Bytes 8-15 of 16 are uninitialized It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in struct sockaddr_in) wasn't initialized, but directly copied to user memory in sctp_getsockopt_peer_addrs(). So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as sctp_v6_addr_to_user() does. Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01Merge branch 'nfp-flower-fix-matching-and-pushing-vlan-CFI-bit'David S. Miller
Jakub Kicinski says: ==================== nfp: flower: fix matching and pushing vlan CFI bit This patch clears up some confusion around the meaning of bit 12 for FW messages related to VLAN and flower offload. Pieter says: It fixes issues with matching, pushing and popping vlan tags. We replace the vlan CFI bit with a vlan present bit that indicates the presence of a vlan tag. We also no longer set the CFI when pushing vlan tags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01nfp: flower: remove vlan CFI bit from push vlan actionPieter Jansen van Vuuren
We no longer set CFI when pushing vlan tags, therefore we remove the CFI bit from push vlan. Fixes: 1a1e586f54bf ("nfp: add basic action capabilities to flower offloads") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01nfp: flower: replace CFI with vlan presentPieter Jansen van Vuuren
Replace vlan CFI bit with a vlan present bit that indicates the presence of a vlan tag. Previously the driver incorrectly assumed that an vlan id of 0 is not matchable, therefore we indicate vlan presence with a vlan present bit. Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01kcm: switch order of device registration to fix a crashJiri Slaby
When kcm is loaded while many processes try to create a KCM socket, a crash occurs: BUG: unable to handle kernel NULL pointer dereference at 000000000000000e IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0 Oops: 0002 [#1] SMP KASAN PTI CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased) RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719 ... CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0 Call Trace: kcm_create+0x600/0xbf0 [kcm] __sock_create+0x324/0x750 net/socket.c:1272 ... This is due to race between sock_create and unfinished register_pernet_device. kcm_create tries to do "net_generic(net, kcm_net_id)". but kcm_net_id is not initialized yet. So switch the order of the two to close the race. This can be reproduced with mutiple processes doing socket(PF_KCM, ...) and one process doing module removal. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01Merge branch 'net-sched-fix-stats-accounting-for-child-NOLOCK-qdiscs'David S. Miller
Paolo Abeni says: ==================== net: sched: fix stats accounting for child NOLOCK qdiscs Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock) qdiscs is buggy. Per CPU values are ignored in most places, as a result, stats dump in the above scenario always report 0 length backlog and parent backlog len is not updated correctly on NOLOCK qdisc removal. The first patch address stats dumping, and the second one child qdisc removal. I'm targeting the net tree as this is a bugfix, but it could be moved to net-next due to the relatively large diffstat. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01net: sched: introduce and use qdisc tree flush/purge helpersPaolo Abeni
The same code to flush qdisc tree and purge the qdisc queue is duplicated in many places and in most cases it does not respect NOLOCK qdisc: the global backlog len is used and the per CPU values are ignored. This change addresses the above, factoring-out the relevant code and using the helpers introduced by the previous patch to fetch the correct backlog len. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01net: sched: introduce and use qstats read helpersPaolo Abeni
Classful qdiscs can't access directly the child qdiscs backlog length: if such qdisc is NOLOCK, per CPU values should be accounted instead. Most qdiscs no not respect the above. As a result, qstats fetching for most classful qdisc is currently incorrect: if the child qdisc is NOLOCK, it always reports 0 len backlog. This change introduces a pair of helpers to safely fetch both backlog and qlen and use them in stats class dumping functions, fixing the above issue and cleaning a bit the code. DRR needs also to access the child qdisc queue length, so it needs custom handling. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01cpufreq/intel_pstate: Load only on Intel hardwareBorislav Petkov
This driver is Intel-only so loading on anything which is not Intel is pointless. Prevent it from doing so. While at it, correct the "not supported" print statement to say CPU "model" which is what that test does. Fixes: 076b862c7e44 (cpufreq: intel_pstate: Add reasons for failure and debug messages) Suggested-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-01net/sched: fix ->get helper of the matchall clsNicolas Dichtel
It returned always NULL, thus it was never possible to get the filter. Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1 CC: Yotam Gigi <yotamg@mellanox.com> CC: Jiri Pirko <jiri@mellanox.com> Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01signal: don't silently convert SI_USER signals to non-current pidfdJann Horn
The current sys_pidfd_send_signal() silently turns signals with explicit SI_USER context that are sent to non-current tasks into signals with kernel-generated siginfo. This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case. If a user actually wants to send a signal with kernel-provided siginfo, they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing this case is unnecessary. Instead of silently replacing the siginfo, just bail out with an error; this is consistent with other interfaces and avoids special-casing behavior based on security checks. Fixes: 3eb39f47934f ("signal: add pidfd_send_signal() syscall") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-04-01dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errorsIlya Dryomov
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors. Set BDI_CAP_STABLE_WRITES if any underlying device has it set. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * ↵Mikulas Patocka
PAGE_SIZE") The limit was already incorporated to dm-crypt with commit 4e870e948fba ("dm crypt: fix error with too large bios"), so we don't need to apply it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is wrong anyway because the variable ti->max_io_len it is supposed to be in the units of 512-byte sectors not in bytes. Reduction of the limit to 1048576 sectors could even cause data corruption in rare cases - suppose that we have a dm-striped device with stripe size 768MiB. The target will call dm_set_target_max_io_len with the value 1572864. The buggy code would reduce it to 1048576. Now, the dm-core will errorneously split the bios on 1048576-sector boundary insetad of 1572864-sector boundary and pass these stripe-crossing bios to the striped target. Cc: stable@vger.kernel.org # v4.16+ Fixes: 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm init: fix const confusion for dm_allowed_targets arrayAndi Kleen
A non const pointer to const cannot be marked initconst. Mark the array actually const. Fixes: 6bbc923dfcf5 dm: add support to directly boot to a mapped device Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm integrity: make dm_integrity_init and dm_integrity_exit staticYueHaibing
Fix sparse warnings: drivers/md/dm-integrity.c:3619:12: warning: symbol 'dm_integrity_init' was not declared. Should it be static? drivers/md/dm-integrity.c:3638:6: warning: symbol 'dm_integrity_exit' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm integrity: change memcmp to strncmp in dm_integrity_ctrMikulas Patocka
If the string opt_string is small, the function memcmp can access bytes that are beyond the terminating nul character. In theory, it could cause segfault, if opt_string were located just below some unmapped memory. Change from memcmp to strncmp so that we don't read bytes beyond the end of the string. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01cifs: a smb2_validate_and_copy_iov failure does not mean the handle is invalid.Ronnie Sahlberg
It only means that we do not have a valid cached value for the file_all_info structure. CC: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-04-01SMB3: Allow persistent handle timeout to be configurable on mountSteve French
Reconnecting after server or network failure can be improved (to maintain availability and protect data integrity) by allowing the client to choose the default persistent (or resilient) handle timeout in some use cases. Today we default to 0 which lets the server pick the default timeout (usually 120 seconds) but this can be problematic for some workloads. Add the new mount parameter to cifs.ko for SMB3 mounts "handletimeout" which enables the user to override the default handle timeout for persistent (mount option "persistenthandles") or resilient handles (mount option "resilienthandles"). Maximum allowed is 16 minutes (960000 ms). Units for the timeout are expressed in milliseconds. See section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol specification for more information. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2019-04-01smb3: Fix enumerating snapshots to AzureSteve French
Some servers (see MS-SMB2 protocol specification section 3.3.5.15.1) expect that the FSCTL enumerate snapshots is done twice, with the first query having EXACTLY the minimum size response buffer requested (16 bytes) which refreshes the snapshot list (otherwise that and subsequent queries get an empty list returned). So had to add code to set the maximum response size differently for the first snapshot query (which gets the size needed for the second query which contains the actual list of snapshots). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> # 4.19+
2019-04-01cifs: fix kref underflow in close_shroot()Ronnie Sahlberg
Fix a bug where we used to not initialize the cached fid structure at all in open_shroot() if the open was successful but we did not get a lease. This would leave the structure uninitialized and later when we close the handle we would in close_shroot() try to kref_put() an uninitialized refcount. Fix this by always initializing this structure if the open was successful but only do the extra get() if we got a lease. This extra get() is only used to hold the structure until we get a lease break from the server at which point we will kref_put() it during lease processing. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-04-01i40e: add tracking of AF_XDP ZC state for each queue pairBjörn Töpel
In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was introduced; When the VSI was reset, the setup code would try to enable AF_XDP ZC unconditionally (as long as there was a umem placed in the netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a certain queue pair has been "zero-copy enabled" via the ndo_bpf. The bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if XDP is enabled, the corresponding qid in the bitmap is set and the umem is non-NULL. Fixes: f3fef2b6e1cc ("i40e: Remove umem from VSI") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-01i40e: move i40e_xsk_umem functionBjörn Töpel
The i40e_xsk_umem function was explicitly inlined in i40e.h. There is no reason for that, so move it to i40e_main.c instead. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-01vrf: check accept_source_route on the original netdeviceStephen Suryaputra
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev. v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01MAINTAINERS: net: update Solarflare maintainersBert Kenward
Cc: Martin Habets <mhabets@solarflare.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01drm/i915: Always backoff after a drm_modeset_lock() deadlockChris Wilson
If drm_modeset_lock() reports a deadlock it sets the ctx->contexted field and insists that the caller calls drm_modeset_backoff() or else it generates a WARN on cleanup. <4> [1601.870376] WARNING: CPU: 3 PID: 8445 at drivers/gpu/drm/drm_modeset_lock.c:228 drm_modeset_drop_locks+0x35/0x40 <4> [1601.870395] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 coretemp crct10dif_pclmul <6> [1601.870403] Console: switching <4> [1601.870403] snd_hda_intel <4> [1601.870406] to colour frame buffer device 320x90 <4> [1601.870406] crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel e1000e snd_hda_core cdc_ether ptp usbnet mii pps_core snd_pcm i2c_i801 mei_me mei prime_numbers <4> [1601.870422] CPU: 3 PID: 8445 Comm: cat Tainted: G U 5.0.0-rc7-CI-CI_DRM_5650+ #1 <4> [1601.870424] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018 <4> [1601.870427] RIP: 0010:drm_modeset_drop_locks+0x35/0x40 <4> [1601.870430] Code: 29 48 8b 43 60 48 8d 6b 60 48 39 c5 74 19 48 8b 43 60 48 8d b8 70 ff ff ff e8 87 ff ff ff 48 8b 43 60 48 39 c5 75 e7 5b 5d c3 <0f> 0b eb d3 0f 1f 80 00 00 00 00 41 56 41 55 41 54 55 53 48 8b 6f <4> [1601.870432] RSP: 0018:ffffc90000d67ce8 EFLAGS: 00010282 <4> [1601.870435] RAX: 00000000ffffffdd RBX: ffffc90000d67d00 RCX: 5dbbe23d00000000 <4> [1601.870437] RDX: 0000000000000000 RSI: 0000000093e6194a RDI: ffffc90000d67d00 <4> [1601.870439] RBP: ffff88849e62e678 R08: 0000000003b7329a R09: 0000000000000001 <4> [1601.870441] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888492100410 <4> [1601.870442] R13: ffff88849ea50958 R14: ffff8884a67eb028 R15: ffff8884a67eb028 <4> [1601.870445] FS: 00007fa7a27745c0(0000) GS:ffff8884aff80000(0000) knlGS:0000000000000000 <4> [1601.870447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [1601.870449] CR2: 000055af07e66000 CR3: 00000004a8cc2006 CR4: 0000000000760ee0 <4> [1601.870451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4> [1601.870453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4> [1601.870454] PKRU: 55555554 <4> [1601.870456] Call Trace: <4> [1601.870505] i915_dsc_fec_support_show+0x91/0x190 [i915] <4> [1601.870522] seq_read+0xdb/0x3c0 <4> [1601.870531] full_proxy_read+0x51/0x80 <4> [1601.870538] __vfs_read+0x31/0x190 <4> [1601.870546] ? __se_sys_newfstat+0x3c/0x60 <4> [1601.870552] vfs_read+0x9e/0x150 <4> [1601.870557] ksys_read+0x50/0xc0 <4> [1601.870564] do_syscall_64+0x55/0x190 <4> [1601.870569] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1601.870572] RIP: 0033:0x7fa7a226d081 <4> [1601.870574] Code: fe ff ff 48 8d 3d 67 9c 0a 00 48 83 ec 08 e8 a6 4c 02 00 66 0f 1f 44 00 00 48 8d 05 81 08 2e 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [1601.870576] RSP: 002b:00007ffcc05140c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 <4> [1601.870579] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fa7a226d081 <4> [1601.870581] RDX: 0000000000020000 RSI: 000055af07e63000 RDI: 0000000000000007 <4> [1601.870583] RBP: 0000000000020000 R08: 000000000000007b R09: 0000000000000000 <4> [1601.870585] R10: 000055af07e60010 R11: 0000000000000246 R12: 000055af07e63000 <4> [1601.870587] R13: 0000000000000007 R14: 000055af07e634bf R15: 0000000000020000 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109745 Fixes: e845f099f1c6 ("drm/i915/dsc: Add Per connector debugfs node for DSC support/enable") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190329165152.29259-1-chris@chris-wilson.co.uk (cherry picked from commit ee6df5694a9a2e30566ae05e9c145a0f6d5e087f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>