linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-10-27	drm/amd/pm: fix the wrong fan speed in fan1_input	Kenneth Feng
	fix the wrong fan speed in fan1_input when the fan control mode is manual. the fan speed value is not correct when we set manual mode to fan1_enalbe - 1. since the fan speed in the metrics table always reflects the real fan speed,we can fetch the fan speed for both auto and manual mode. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-27	drm/amdgpu/swsmu: drop smu i2c bus on navi1x	Alex Deucher
	Stop registering the SMU i2c bus on navi1x. This leads to instability issues when userspace processes mess with the bus and also seems to cause display stability issues in some cases. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1314 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1341 Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-10-27	intel_idle: Fix max_cstate for processor models without C-state tables	Chen Yu
	Currently intel_idle driver gets the c-state information from ACPI _CST if the processor model is not recognized by it. However the c-state in _CST starts with index 1 which is different from the index in intel_idle driver's internal c-state table. While intel_idle_max_cstate_reached() was previously introduced to deal with intel_idle driver's internal c-state table, re-using this function directly on _CST is incorrect. Fix this by subtracting 1 from the index when checking max_cstate in the _CST case. For example, append intel_idle.max_cstate=1 in boot command line, Before the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name POLL After the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Cc: 5.6+ <stable@vger.kernel.org> # 5.6+ Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-27	cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode	Rafael J. Wysocki
	If the cpufreq policy max limit is changed when intel_pstate operates in the passive mode with HWP enabled and the "powersave" governor is used on top of it, the HWP max limit is not updated as appropriate. Namely, in the "powersave" governor case, the target P-state is always equal to the policy min limit, so if the latter does not change, intel_cpufreq_adjust_hwp() is not invoked to update the HWP Request MSR due to the "target_pstate != old_pstate" check in intel_cpufreq_update_pstate(), so the HWP max limit is not updated as a result. Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the driver and the target frequency does not change along with the policy max limit, the "target_freq == policy->cur" check in __cpufreq_driver_target() prevents the driver's ->target() callback from being invoked at all, so the HWP max limit is not updated. To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag in the intel_cpufreq driver structure if HWP is enabled and modify intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate" check only in the non-HWP case and let intel_cpufreq_adjust_hwp() always run in the HWP case (it will update HWP Request only if the cached value of the register is different from the new one including the limits, so if neither the target P-state value nor the max limit changes, the register write will still be avoided). Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Zhang Rui <rui.zhang@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f47f cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ... Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com>
2020-10-27	cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag	Rafael J. Wysocki
	Generally, a cpufreq driver may need to update some internal upper and lower frequency boundaries on policy max and min changes, respectively, but currently this does not work if the target frequency does not change along with the policy limit. Namely, if the target frequency does not change along with the policy min or max, the "target_freq == policy->cur" check in __cpufreq_driver_target() prevents driver callbacks from being invoked and they do not even have a chance to update the corresponding internal boundary. This particularly affects the "powersave" and "performance" governors that always set the target frequency to one of the policy limits and it never changes when the other limit is updated. To allow cpufreq the drivers needing to update internal frequency boundaries on policy limits changes to avoid this issue, introduce a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will neutralize the check mentioned above. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-27	cpufreq: Avoid configuring old governors as default with intel_pstate	Rafael J. Wysocki
	Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP") was meant to cause intel_pstate to be used in the passive mode with the schedutil governor on top of it, but it missed the case in which either "ondemand" or "conservative" was selected as the default governor in the existing kernel config, in which case the previous old governor configuration would be used, causing the default legacy governor to be used on top of intel_pstate instead of schedutil. Address this by preventing "ondemand" and "conservative" from being configured as the default cpufreq governor in the case when schedutil is the default choice for the default governor setting. [Note that the default cpufreq governor can still be set via the kernel command line if need be and that choice is not limited, so if anyone really wants to use one of the legacy governors by default, it can be achieved this way.] Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP") Reported-by: Julia Lawall <julia.lawall@inria.fr> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-27	cpufreq: e_powersaver: remove unreachable break	Zhang Qilong
	A 'break' following a 'return' statement is pointless, so remove it. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-27	Merge tag 'devicetree-fixes-for-5.10-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - More binding additionalProperties/unevaluatedProperties additions - More yamllint fixes on additions in the merge window - CrOS embedded controller schema updates to fix warnings - LEDs schema update adding ID_RGB - A reserved-memory fix for regions starting at address 0x0 * tag 'devicetree-fixes-for-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: Another round of adding missing 'additionalProperties/unevalutatedProperties' dt-bindings: Explicitly allow additional properties in board/SoC schemas dt-bindings: More whitespace clean-ups in schema files mfd: google,cros-ec: add missing properties dt-bindings: input: convert cros-ec-keyb to json-schema dt-bindings: i2c: convert i2c-cros-ec-tunnel to json-schema of: Fix reserved-memory overlap detection dt-bindings: mailbox: mtk-gce: fix incorrect mbox-cells value dt-bindings: leds: Update devicetree documents for ID_RGB
2020-10-27	drm/vc4: drv: Add error handding for bind	Hoegeun Kwon
	There is a problem that if vc4_drm bind fails, a memory leak occurs on the drm_property_create side. Add error handding for drm_mode_config. Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20201027041442.30352-2-hoegeun.kwon@samsung.com
2020-10-27	interconnect: qcom: use icc_sync state for sm8[12]50	Dmitry Baryshkov
	In addition to the rest of Qcom interconnect drivers use icc_sync_state for SM8150/SM8250 interconnect drivers to notify the interconnect framework when all consumers are probed and there is no need to keep the bandwidth set to maximum anymore. Also move the BCM initialization before creating the nodes to set the max bandwidth in hardware for the initialization/probing stage. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Fixes: 7d3b0b0d8184 ("interconnect: qcom: Use icc_sync_state") Link: https://lore.kernel.org/r/20201027133418.976687-1-dmitry.baryshkov@linaro.org Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-10-27	staging: fieldbus: anybuss: jump to correct label in an error path	Jing Xiangfeng
	In current code, controller_probe() misses to call ida_simple_remove() in an error path. Jump to correct label to fix it. Fixes: 17614978ed34 ("staging: fieldbus: anybus-s: support the Arcx anybus controller") Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201012132404.113031-1-jingxiangfeng@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: wfx: fix test on return value of gpiod_get_value()	Jérôme Pouiller
	The commit 8522d62e6bca ("staging: wfx: gpiod_get_value() can return an error") has changed the way the driver test the value returned by gpiod_get_value(). The new code was wrong. Fixes: 8522d62e6bca ("staging: wfx: gpiod_get_value() can return an error") Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201019160604.1609180-2-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: wfx: fix use of uninitialized pointer	Jérôme Pouiller
	With -Wuninitialized, the compiler complains: drivers/staging/wfx/data_tx.c:34:19: warning: variable 'band' is uninitialized when used here [-Wuninitialized] if (rate->idx >= band->n_bitrates) { ^~~~ Reported-by: kernel test robot <lkp@intel.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Fixes: 868fd970e187 ("staging: wfx: improve robustness of wfx_get_hw_rate()") Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20201019160604.1609180-1-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: mmal-vchiq: Fix memory leak for vchiq_instance	Seung-Woo Kim
	The vchiq_instance is allocated with vchiq_initialise() but never freed properly. Fix memory leak for the vchiq_instance. Reported-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com> Link: https://lore.kernel.org/r/1603706150-10806-1-git-send-email-sw0312.kim@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice	Ian Abbott
	The "cb_pcidas" driver supports asynchronous commands on the analog output (AO) subdevice for those boards that have an AO FIFO. The code (in `cb_pcidas_ao_check_chanlist()` and `cb_pcidas_ao_cmd()`) to validate and set up the command supports output to a single channel or to two channels simultaneously (the boards have two AO channels). However, the code in `cb_pcidas_auto_attach()` that initializes the subdevices neglects to initialize the AO subdevice's `len_chanlist` member, leaving it set to 0, but the Comedi core will "correct" it to 1 if the driver neglected to set it. This limits commands to use a single channel (either channel 0 or 1), but the limit should be two channels. Set the AO subdevice's `len_chanlist` member to be the same value as the `n_chan` member, which will be 2. Cc: <stable@vger.kernel.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20201021122142.81628-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: octeon: Drop on uncorrectable alignment or FCS error	Alexander Sverdlin
	Currently in case of alignment or FCS error if the packet cannot be corrected it's still not dropped. Report the error properly and drop the packet while making the code around a little bit more readable. Fixes: 80ff0fd3ab64 ("Staging: Add octeon-ethernet driver files.") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201016145630.41852-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	staging: octeon: repair "fixed-link" support	Alexander Sverdlin
	The PHYs must be registered once in device probe function, not in device open callback because it's only possible to register them once. Fixes: a25e278020bf ("staging: octeon: support fixed-link phys") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201016101858.11374-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-27	drm: kernel-doc: add description for a new function parameter	Mauro Carvalho Chehab
	As reported by "make htmldocs": ./drivers/gpu/drm/drm_prime.c:808: warning: Function parameter or member 'dev' not described in 'drm_prime_pages_to_sg' Add a description for the new parameter. Fixes: 707d561f77b5 ("drm: allow limiting the scatter list size.") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/9366f48e6e9c3ec2f31a3e68452a2b23a1089fce.1603791716.git.mchehab+huawei@kernel.org
2020-10-27	drm/dp: fix a kernel-doc issue at drm_edid.c	Mauro Carvalho Chehab
	The name of the argument is different, causing those warnings: ./drivers/gpu/drm/drm_edid.c:3754: warning: Function parameter or member 'video_code' not described in 'drm_display_mode_from_cea_vic' ./drivers/gpu/drm/drm_edid.c:3754: warning: Excess function parameter 'vic' description in 'drm_display_mode_from_cea_vic' Fixes: 7af655bce275 ("drm/dp: Add drm_dp_downstream_mode()") Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/7f4d6c3ff6df63ebd006eb90a5108006c23e2168.1603791716.git.mchehab+huawei@kernel.org
2020-10-27	drm/dp: fix kernel-doc warnings at drm_dp_helper.c	Mauro Carvalho Chehab
	As warned by kernel-doc: ./drivers/gpu/drm/drm_dp_helper.c:385: warning: Function parameter or member 'type' not described in 'drm_dp_downstream_is_type' ./drivers/gpu/drm/drm_dp_helper.c:886: warning: Function parameter or member 'dev' not described in 'drm_dp_downstream_mode' Some function parameters weren't documented. Fixes: 38784f6f8805 ("drm/dp: Add helpers to identify downstream facing port types") Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/03c9c8ba3f492aca76e2b4836803219cd9c971cf.1603791716.git.mchehab+huawei@kernel.org
2020-10-27	drm: kernel-doc: document drm_dp_set_subconnector_property() params	Mauro Carvalho Chehab
	Changeset e5b92773287c ("drm: report dp downstream port type as a subconnector property") added a new function to the kAPI, but didn't add any documentation for the parameters for drm_dp_set_subconnector_property(). Fixes: e5b92773287c ("drm: report dp downstream port type as a subconnector property") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/0870be85a77bea4ba5cf1715010834289a4e10b1.1603791716.git.mchehab+huawei@kernel.org
2020-10-27	usb: raw-gadget: fix memory leak in gadget_setup	Zqiang
	When fetch 'event' from event queue, after copy its address space content to user space, the 'event' the memory space pointed to by the 'event' pointer need be freed. BUG: memory leak unreferenced object 0xffff888110622660 (size 32): comm "softirq", pid 0, jiffies 4294941981 (age 12.480s) hex dump (first 32 bytes): 02 00 00 00 08 00 00 00 80 06 00 01 00 00 40 00 ..............@. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000efd29abd>] kmalloc include/linux/slab.h:554 [inline] [<00000000efd29abd>] raw_event_queue_add drivers/usb/gadget/legacy/raw_gadget.c:66 [inline] [<00000000efd29abd>] raw_queue_event drivers/usb/gadget/legacy/raw_gadget.c:225 [inline] [<00000000efd29abd>] gadget_setup+0xf6/0x220 drivers/usb/gadget/legacy/raw_gadget.c:343 [<00000000952c4a46>] dummy_timer+0xb9f/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1899 [<0000000074ac2c54>] call_timer_fn+0x38/0x200 kernel/time/timer.c:1415 [<00000000560a3a79>] expire_timers kernel/time/timer.c:1460 [inline] [<00000000560a3a79>] __run_timers.part.0+0x319/0x400 kernel/time/timer.c:1757 [<000000009d9503d0>] __run_timers kernel/time/timer.c:1738 [inline] [<000000009d9503d0>] run_timer_softirq+0x3d/0x80 kernel/time/timer.c:1770 [<000000009df27c89>] __do_softirq+0xcc/0x2c2 kernel/softirq.c:298 [<000000007a3f1a47>] asm_call_irq_on_stack+0xf/0x20 [<000000004a62cc2e>] __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] [<000000004a62cc2e>] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] [<000000004a62cc2e>] do_softirq_own_stack+0x32/0x40 arch/x86/kernel/irq_64.c:77 [<00000000b0086800>] invoke_softirq kernel/softirq.c:393 [inline] [<00000000b0086800>] __irq_exit_rcu kernel/softirq.c:423 [inline] [<00000000b0086800>] irq_exit_rcu+0x91/0xc0 kernel/softirq.c:435 [<00000000175f9523>] sysvec_apic_timer_interrupt+0x36/0x80 arch/x86/kernel/apic/apic.c:1091 [<00000000a348e847>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:631 [<0000000060661100>] native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline] [<0000000060661100>] arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline] [<0000000060661100>] acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] [<0000000060661100>] acpi_idle_do_entry+0xc3/0xd0 drivers/acpi/processor_idle.c:517 [<000000003f413b99>] acpi_idle_enter+0x128/0x1f0 drivers/acpi/processor_idle.c:648 [<00000000f5e5afb8>] cpuidle_enter_state+0xc9/0x650 drivers/cpuidle/cpuidle.c:237 [<00000000d50d51fc>] cpuidle_enter+0x29/0x40 drivers/cpuidle/cpuidle.c:351 [<00000000d674baed>] call_cpuidle kernel/sched/idle.c:132 [inline] [<00000000d674baed>] cpuidle_idle_call kernel/sched/idle.c:213 [inline] [<00000000d674baed>] do_idle+0x1c8/0x250 kernel/sched/idle.c:273 Reported-by: syzbot+bd38200f53df6259e6bf@syzkaller.appspotmail.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	usb: dwc2: Avoid leaving the error_debugfs label unused	Martin Blumenstingl
	The error_debugfs label is only used when either CONFIG_USB_DWC2_PERIPHERAL or CONFIG_USB_DWC2_DUAL_ROLE is enabled. Add the same #if to the error_debugfs label itself as the code which uses this label already has. This avoids the following compiler warning: warning: label ‘error_debugfs’ defined but not used [-Wunused-label] Fixes: e1c08cf23172ed ("usb: dwc2: Add missing cleanups when usb_add_gadget_udc() fails") Acked-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	usb: dwc3: ep0: Fix delay status handling	Thinh Nguyen
	If we want to send a control status on our own time (through delayed_status), make sure to handle a case where we may queue the delayed status before the host requesting for it (when XferNotReady is generated). Otherwise, the driver won't send anything because it's not EP0_STATUS_PHASE yet. To resolve this, regardless whether dwc->ep0state is EP0_STATUS_PHASE, make sure to clear the dwc->delayed_status flag if dwc3_ep0_send_delayed_status() is called. The control status can be sent when the host requests it later. Cc: <stable@vger.kernel.org> Fixes: d97c78a1908e ("usb: dwc3: gadget: END_TRANSFER before CLEAR_STALL command") Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	drm/imx: tve remove extraneous type qualifier	Arnd Bergmann
	clang warns about functions returning a 'const int' result: drivers/gpu/drm/imx/imx-tve.c:487:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] Remove the extraneous 'const' qualifier here. I would guess that the function was intended to be marked __attribute__((const)) instead, but that would also be wrong since it call other functions without that attribute. Fixes: fcbc51e54d2a ("staging: drm/imx: Add support for Television Encoder (TVEv2)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-10-27	drm/imx: parallel-display: reduce scope of edid_len	Philipp Zabel
	The edid_len variable is never used again. Use a local variable instead of storing it in the device structure. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-10-27	drm/imx: parallel-display: remove unused function enc_to_imxpd()	Philipp Zabel
	Remove leftover container_of helper, it has been replaced by bridge_to_imxpd(). Fixes: fe141cedc433 ("drm/imx: pd: Use bus format/flags provided by the bridge when available") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-10-27	drm/imx: parallel-display: fix edid memory leak	Marco Felsch
	The edid memory is only freed if the component.unbind() is called. This is okay if the parallel-display was bound but if the bind() fails we leak the memory. Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> [p.zabel@pengutronix.de: rebased, dropped now empty unbind()] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-10-27	drm/imx: imx-ldb: reduce scope of edid_len	Philipp Zabel
	The edid_len variable is never used again. Use a local variable instead of storing it in the device structure. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-10-27	nvmet: fix a NULL pointer dereference when tracing the flush command	Chaitanya Kulkarni
	When target side trace in turned on and flush command is issued from the host it results in the following Oops. [ 856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068 [ 856.790686] #PF: supervisor read access in kernel mode [ 856.791262] #PF: error_code(0x0000) - not-present page [ 856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0 [ 856.792527] Oops: 0000 [#1] SMP NOPTI [ 856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G OE 5.9.0nvme-5.9+ #71 [ 856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214 [ 856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet] [ 856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0 [ 856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246 [ 856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000 [ 856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034 [ 856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe [ 856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034 [ 856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440 [ 856.802667] FS: 00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000 [ 856.803635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0 [ 856.805283] Call Trace: [ 856.805613] nvmet_req_init+0x27c/0x480 [nvmet] [ 856.806200] nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop] [ 856.806862] blk_mq_dispatch_rq_list+0x123/0x7b0 [ 856.807459] ? kvm_sched_clock_read+0x14/0x30 [ 856.808025] __blk_mq_sched_dispatch_requests+0xc7/0x170 [ 856.808708] blk_mq_sched_dispatch_requests+0x30/0x60 [ 856.809372] __blk_mq_run_hw_queue+0x70/0x100 [ 856.809935] __blk_mq_delay_run_hw_queue+0x156/0x170 [ 856.810574] blk_mq_run_hw_queue+0x86/0xe0 [ 856.811104] blk_mq_sched_insert_request+0xef/0x160 [ 856.811733] blk_execute_rq+0x69/0xc0 [ 856.812212] ? blk_mq_rq_ctx_init+0xd0/0x230 [ 856.812784] nvme_execute_passthru_rq+0x57/0x130 [nvme_core] [ 856.813461] nvme_submit_user_cmd+0xeb/0x300 [nvme_core] [ 856.814099] nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core] [ 856.814752] blkdev_ioctl+0x1dc/0x2c0 [ 856.815197] block_ioctl+0x3f/0x50 [ 856.815606] __x64_sys_ioctl+0x84/0xc0 [ 856.816074] do_syscall_64+0x33/0x40 [ 856.816533] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 856.817168] RIP: 0033:0x7f6a222ed107 [ 856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8 [ 856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107 [ 856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003 [ 856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005 [ 856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3 [ 856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900 [ 856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel Move the nvmet_req_init() tracepoint after we parse the command in nvmet_req_init() so that we can get rid of the duplicate nvmet_find_namespace() call. Rename __assign_disk_name() -> __assign_req_name(). Now that we call tracepoint after parsing the command simplify the newly added __assign_req_name() which fixes this bug. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme-fc: remove nvme_fc_terminate_io()	James Smart
	__nvme_fc_terminate_io() is now called by only 1 place, in reset_work. Consoldate and move the functionality of terminate_io into reset_work. In reset_work, rather than calling the create_association directly, schedule the connect work element to do its thing. After scheduling, flush the connect work element to continue with semantic of not returning until connect has been attempted at least once. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery	James Smart
	nvme_fc_error_recovery() special cases handling when in CONNECTING state and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself special cases CONNECTING state and calls the routine to abort outstanding ios. Simplify the sequence by putting the call to abort outstanding I/Os directly in nvme_fc_error_recovery. Move the location of __nvme_fc_abort_outstanding_ios(), and nvme_fc_terminate_exchange() which is called by it, to avoid adding function prototypes for nvme_fc_error_recovery(). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme-fc: remove err_work work item	James Smart
	err_work was created to handle errors (mainly I/O timeouts) while in CONNECTING state. The flag for err_work_active is also unneeded. Remove err_work_active and err_work. The actions to abort I/Os are moved inline to nvme_error_recovery(). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme-fc: track error_recovery while connecting	James Smart
	Whenever there are errors during CONNECTING, the driver recovers by aborting all outstanding ios and counts on the io completion to fail them and thus the connection/association they are on. However, the connection failure depends on a failure state from the core routines. Not all commands that are issued by the core routine are guaranteed to cause a failure of the core routine. They may be treated as a failure status and the status is then ignored. As such, whenever the transport enters error_recovery while CONNECTING, it will set a new flag indicating an association failed. The create_association routine which creates and initializes the controller, will monitor the state of the flag as well as the core routine error status and ensure the association fails if there was an error. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme-rdma: handle unexpected nvme completion data length	zhenwei pi
	Receiving a zero length message leads to the following warnings because the CQE is processed twice: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0xd9/0xe0 Call Trace: <IRQ> nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma] __ib_process_cq+0x76/0x150 [ib_core] ... Sanity check the received data length, to avoids this. Thanks to Chao Leng & Sagi for suggestions. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	nvme: ignore zone validate errors on subsequent scans	Keith Busch
	Revalidating nvme zoned namespaces requires IO commands, and there are controller states that prevent IO. For example, a sanitize in progress is required to fail all IO, but we don't want to remove a namespace we've previously added just because the controller is in such a state. Suppress the error in this case. Reported-by: Michael Nguyen <michael.nguyen@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27	usb: gadget: fsl: fix null pointer checking	Ran Wang
	fsl_ep_fifo_status() should return error if _ep->desc is null. Fixes: 75eaa498c99e (“usb: gadget: Correct NULL pointer checking in fsl gadget”) Reviewed-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Ran Wang <ran.wang_1@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	usb: gadget: goku_udc: fix potential crashes in probe	Evgeny Novikov
	goku_probe() goes to error label "err" and invokes goku_remove() in case of failures of pci_enable_device(), pci_resource_start() and ioremap(). goku_remove() gets a device from pci_get_drvdata(pdev) and works with it without any checks, in particular it dereferences a corresponding pointer. But goku_probe() did not set this device yet. So, one can expect various crashes. The patch moves setting the device just after allocation of memory for it. Found by Linux Driver Verification project (linuxtesting.org). Reported-by: Pavel Andrianov <andrianov@ispras.ru> Signed-off-by: Evgeny Novikov <novikov@ispras.ru> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	opp: Reduce the size of critical section in _opp_table_kref_release()	Viresh Kumar
	There is a lot of stuff here which can be done outside of the big opp_table_lock, do that. This helps avoiding few circular dependency lockdeps around debugfs and interconnects. Reported-by: Rob Clark <robdclark@gmail.com> Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-27	usb: dwc3: pci: add support for the Intel Alder Lake-S	Heikki Krogerus
	This patch adds the necessary PCI ID for Intel Alder Lake-S devices. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-27	opp: Fix early exit from dev_pm_opp_register_set_opp_helper()	Viresh Kumar
	We returned earlier by mistake even when there were no failures. Fix it. Fixes: dd461cd9183f ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.com>
2020-10-27	opp: Don't always remove static OPPs in _of_add_opp_table_v1()	Viresh Kumar
	The patch missed returning 0 early in case of success and hence the static OPPs got removed by mistake. Fix it. Fixes: 90d46d71cce2 ("opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()") Reported-by: Aisheng Dong <aisheng.dong@nxp.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dong Aisheng <aisheng.dong@nxp.com>
2020-10-26	net: hns3: Clear the CMDQ registers before unmapping BAR region	Zenghui Yu
	When unbinding the hns3 driver with the HNS3 VF, I got the following kernel panic: [ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000 [ 265.717928] Mem abort info: [ 265.720740] ESR = 0x96000047 [ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits [ 265.729126] SET = 0, FnV = 0 [ 265.732195] EA = 0, S1PTW = 0 [ 265.735351] Data abort info: [ 265.738227] ISV = 0, ISS = 0x00000047 [ 265.742071] CM = 0, WnR = 1 [ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000 [ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000 [ 265.764314] Internal error: Oops: 96000047 [#1] SMP [ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206 [ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019 [ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) [ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300 [ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300 [ 265.861992] sp : ffff80004c983b50 [ 265.881411] pmr_save: 000000e0 [ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500 [ 265.889744] x27: 0000000000000000 x26: 0000000000000000 [ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90 [ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150 [ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90 [ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004 [ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8 [ 265.916201] x15: 0000000000000028 x14: 0000000000000002 [ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8 [ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000 [ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000 [ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff [ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001 [ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100 [ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000 [ 265.964491] Call trace: [ 265.964494] hclgevf_cmd_uninit+0xbc/0x300 [ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8 [ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130 [ 265.964516] hns3_remove+0x34/0x88 [hns3] [ 266.009683] pci_device_remove+0x48/0xf0 [ 266.009692] device_release_driver_internal+0x114/0x1e8 [ 266.030058] device_driver_detach+0x28/0x38 [ 266.034224] unbind_store+0xd4/0x108 [ 266.037784] drv_attr_store+0x40/0x58 [ 266.041435] sysfs_kf_write+0x54/0x80 [ 266.045081] kernfs_fop_write+0x12c/0x250 [ 266.049076] vfs_write+0xc4/0x248 [ 266.052378] ksys_write+0x74/0xf8 [ 266.055677] __arm64_sys_write+0x24/0x30 [ 266.059584] el0_svc_common.constprop.3+0x84/0x270 [ 266.064354] do_el0_svc+0x34/0xa0 [ 266.067658] el0_svc+0x38/0x40 [ 266.070700] el0_sync_handler+0x8c/0xb0 [ 266.074519] el0_sync+0x140/0x180 It looks like the BAR memory region had already been unmapped before we start clearing CMDQ registers in it, which is pretty bad and the kernel happily kills itself because of a Current EL Data Abort (on arm64). Moving the CMDQ uninitialization a bit early fixes the issue for me. Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.	Vasundhara Volam
	In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently. However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset. Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	bnxt_en: Check abort error state in bnxt_open_nic().	Michael Chan
	bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state. Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this: [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000 Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	bnxt_en: Re-write PCI BARs after PCI fatal error.	Vasundhara Volam
	When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained. pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally. Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.	Vasundhara Volam
	As part of the commit b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well. As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call. Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one(). Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().	Vasundhara Volam
	A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed. Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue. Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()	Amit Cohen
	Each EMAD transaction stores the skb used to issue the EMAD request ('trans->tx_skb') so that the request could be retried in case of a timeout. The skb can be freed when a corresponding response is received or as part of the retry logic (e.g., failed retransmit, exceeded maximum number of retries). The two tasks (i.e., response processing and retransmits) are synchronized by the atomic 'trans->active' field which ensures that responses to inactive transactions are ignored. In case of a failed retransmit the transaction is finished and all of its resources are freed. However, the current code does not mark it as inactive. Syzkaller was able to hit a race condition in which a concurrent response is processed while the transaction's resources are being freed, resulting in a use-after-free [1]. Fix the issue by making sure to mark the transaction as inactive after a failed retransmit and free its resources only if a concurrent task did not already do that. [1] BUG: KASAN: use-after-free in consume_skb+0x30/0x370 net/core/skbuff.c:833 Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004 CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 instrument_atomic_read include/linux/instrumented.h:56 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] skb_unref include/linux/skbuff.h:1044 [inline] consume_skb+0x30/0x370 net/core/skbuff.c:833 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline] mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline] mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550 __do_softirq+0x223/0x964 kernel/softirq.c:292 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711 Allocated by task 1006: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 slab_post_alloc_hook mm/slab.h:586 [inline] slab_alloc_node mm/slub.c:2824 [inline] slab_alloc mm/slub.c:2832 [inline] kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837 __build_skb+0x21/0x60 net/core/skbuff.c:311 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464 netdev_alloc_skb include/linux/skbuff.h:2810 [inline] mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline] mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline] mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline] mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline] genl_family_rcv_msg net/netlink/genetlink.c:714 [inline] genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0x150/0x190 net/socket.c:671 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359 ___sys_sendmsg+0xff/0x170 net/socket.c:2413 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 73: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kmem_cache_free+0xbe/0x380 mm/slub.c:3088 kfree_skbmem net/core/skbuff.c:622 [inline] kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:837 [inline] consume_skb+0xe1/0x370 net/core/skbuff.c:831 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 The buggy address belongs to the object at ffff88804f5703c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 212 bytes inside of 224-byte region [ffff88804f5703c0, ffff88804f5704a0) The buggy address belongs to the page: page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x100000000000200(slab) raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-26	mlxsw: core: Fix memory leak on module removal	Ido Schimmel
	Free the devlink instance during the teardown sequence in the non-reload case to avoid the following memory leak. unreferenced object 0xffff888232895000 (size 2048): comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2.... backtrace: [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0 [<00000000ee30129d>] devlink_alloc+0xff/0x760 [<0000000092ab3e5d>] 0xffffffffa042e5b0 [<000000004f3f8a31>] 0xffffffffa042f6ad [<0000000092800b4b>] 0xffffffffa0491df3 [<00000000c4843903>] local_pci_probe+0xcb/0x170 [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0 [<00000000a8e0de75>] really_probe+0x2c5/0xf90 [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340 [<00000000bcc95e05>] device_driver_attach+0x294/0x300 [<000000000e2bc177>] __driver_attach+0x167/0x2f0 [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0 [<000000003cd5a91e>] driver_attach+0x45/0x60 [<000000000041ce51>] bus_add_driver+0x3b8/0x720 [<00000000f5215476>] driver_register+0x230/0x4e0 [<00000000d79356f5>] __pci_register_driver+0x190/0x200 Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>