Age | Commit message (Collapse) | Author |
|
The previous change "hv_netvsc: Switch the data path at the right time
during hibernation" adds the call of netvsc_vf_changed() upon
NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message
when the VF is brought UP or DOWN.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet,
so in the future the host might sliently fail the call netvsc_vf_changed()
-> netvsc_switch_datapath() there, even if the call works now.
Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time
the mlx5 VF NIC has been resumed.
Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently there is concurrent reset and enqueue operation for the
same lockless qdisc when there is no lock to synchronize the
q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
qdisc_deactivate() called by dev_deactivate_queue(), which may cause
out-of-bounds access for priv->ring[] in hns3 driver if user has
requested a smaller queue num when __dev_xmit_skb() still enqueue a
skb with a larger queue_mapping after the corresponding qdisc is
reset, and call hns3_nic_net_xmit() with that skb later.
Reused the existing synchronize_net() in dev_deactivate_many() to
make sure skb with larger queue_mapping enqueued to old qdisc(which
is saved in dev_queue->qdisc_sleeping) will always be reset when
dev_reset_queue() is called.
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode
property should be specified on port nodes. However, the microchip
drivers read it from the switch node.
Let the driver use the per-port property and fall back to the old
location with a warning.
Fix in-tree users.
Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to
use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context.
[ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498
[ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3
[ 280.209814] INFO: lockdep is turned off.
[ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146
[ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 280.209820] Workqueue: events mptcp_worker
[ 280.209822] Call Trace:
[ 280.209824] <IRQ>
[ 280.209826] dump_stack+0x77/0xa0
[ 280.209829] ___might_sleep.cold+0xa6/0xb6
[ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290
[ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410
[ 280.209840] subflow_init_req+0x1e9/0x2ea
[ 280.209843] ? inet_reqsk_alloc+0x1c/0x120
[ 280.209845] ? kmem_cache_alloc+0x264/0x290
[ 280.209849] tcp_conn_request+0x303/0xae0
[ 280.209854] ? printk+0x53/0x6a
[ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374
[ 280.209859] tcp_rcv_state_process+0x28f/0x1374
[ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0
[ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0
[ 280.209869] tcp_v4_rcv+0xed6/0xfa0
[ 280.209873] ip_protocol_deliver_rcu+0x28/0x270
[ 280.209875] ip_local_deliver_finish+0x89/0x120
[ 280.209877] ip_local_deliver+0x180/0x220
[ 280.209881] ip_rcv+0x166/0x210
[ 280.209885] __netif_receive_skb_one_core+0x82/0x90
[ 280.209888] process_backlog+0xd6/0x230
[ 280.209891] net_rx_action+0x13a/0x410
[ 280.209895] __do_softirq+0xcf/0x468
[ 280.209899] asm_call_on_stack+0x12/0x20
[ 280.209901] </IRQ>
[ 280.209903] ? ip_finish_output2+0x240/0x9a0
[ 280.209906] do_softirq_own_stack+0x4d/0x60
[ 280.209908] do_softirq.part.0+0x2b/0x60
[ 280.209911] __local_bh_enable_ip+0x9a/0xa0
[ 280.209913] ip_finish_output2+0x264/0x9a0
[ 280.209916] ? rcu_read_lock_held+0x4d/0x60
[ 280.209920] ? ip_output+0x7a/0x250
[ 280.209922] ip_output+0x7a/0x250
[ 280.209925] ? __ip_finish_output+0x330/0x330
[ 280.209928] __ip_queue_xmit+0x1dc/0x5a0
[ 280.209931] __tcp_transmit_skb+0xa0f/0xc70
[ 280.209937] tcp_connect+0xb03/0xff0
[ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190
[ 280.209942] ? ktime_get_with_offset+0x125/0x150
[ 280.209944] ? trace_hardirqs_on+0x1c/0xe0
[ 280.209948] tcp_v4_connect+0x449/0x550
[ 280.209953] __inet_stream_connect+0xbb/0x320
[ 280.209955] ? mark_held_locks+0x49/0x70
[ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190
[ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0
[ 280.209963] inet_stream_connect+0x32/0x50
[ 280.209966] __mptcp_subflow_connect+0x1fd/0x242
[ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600
[ 280.209975] mptcp_worker+0x543/0x7a0
[ 280.209980] process_one_work+0x26d/0x5b0
[ 280.209984] ? process_one_work+0x5b0/0x5b0
[ 280.209987] worker_thread+0x48/0x3d0
[ 280.209990] ? process_one_work+0x5b0/0x5b0
[ 280.209993] kthread+0x117/0x150
[ 280.209996] ? kthread_park+0x80/0x80
[ 280.209998] ret_from_fork+0x22/0x30
Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Geliang Tang says:
====================
mptcp: fix subflow's local_id/remote_id issues
v2:
- add Fixes tags;
- simply with 'return addresses_equal';
- use 'reversed Xmas tree' way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch set the init remote_id to zero, otherwise it will be a random
number.
Then it added the missing subflow's remote_id setting code both in
__mptcp_subflow_connect and in subflow_ulp_clone.
Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM")
Fixes: ec3edaa7ca6ce ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: f296234c98a8f ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In mptcp_pm_nl_get_local_id, skc_local is the same as msk_local, so it
always return 0. Thus every subflow's local_id is 0. It's incorrect.
This patch fixed this issue.
Also, we need to ignore the zero address here, like 0.0.0.0 in IPv4. When
we use the zero address as a local address, it means that we can use any
one of the local addresses. The zero address is not a new address, we don't
need to add it to PM, so this patch added a new function address_zero to
check whether an address is the zero address, if it is, we ignore this
address.
Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I confirmed that the problem fixed by commit 2a63866c8b51a3f7 ("tipc: fix
shutdown() of connectionless socket") also applies to stream socket.
----------
#include <sys/socket.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc, char *argv[])
{
int fds[2] = { -1, -1 };
socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
if (fork() == 0)
_exit(read(fds[0], NULL, 1));
shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
wait(NULL); /* To be woken up by _exit(). */
return 0;
}
----------
Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
behavior.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix up the documentation of the struct powercap_control_type members
to match the code.
Also fixup stray whitespace.
Signed-off-by: Amit Kucheria <amitk@kernel.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix kernel-doc warning in <linux/device.h>:
../include/linux/device.h:613: warning: Function parameter or member 'em_pd' not described in 'device'
Fixes: 1bc138c62295 ("PM / EM: add support for other devices than CPUs in Energy Model")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the AlderLake platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the RocketLake platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the TigerLake desktop platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This reverts commit 14775b04964264189caa4a0862eac05dab8c0502 as there
were still some parsing problems with it, and the follow-on patch for
it.
Let's revisit it later, just drop it for now.
Cc: <jbaron@akamai.com>
Cc: Jim Cromie <jim.cromie@gmail.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 14775b049642 ("dyndbg: accept query terms like file=bar and module=foo")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 42f07816ac0cc797928119cc039c414ae2b95d34 as it
still causes problems. It will be resolved later, let's revert it so we
can also revert the original patch this was supposed to be helping with.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 42f07816ac0c ("dyndbg: fix problem parsing format="foo bar"")
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On non-EFI systems, it wasn't possible to test the platform firmware
loader because it will have never set "checked_fw" during __init.
Instead, allow the test code to override this check. Additionally split
the declarations into a private symbol namespace so there is greater
enforcement of the symbol visibility.
Fixes: 548193cba2a7 ("test_firmware: add support for firmware_request_platform")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200909225354.3118328-1-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Evgeniy does not have the time nor capacity to maintain the
connector subsystem any longer, so just move it under networking
as that is effectively what has been happening lately.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NVMe fixes from Christoph.
"nvme fixes for 5.9
- cancel async events before freeing them (David Milburn)
- revert a broken race fix (James Smart)
- fix command processing during resets (Sagi Grimberg)"
* tag 'nvme-5.9-2020-09-10' of git://git.infradead.org/nvme:
nvme-fabrics: allow to queue requests for live queues
nvme-tcp: cancel async events before freeing event struct
nvme-rdma: cancel async events before freeing event struct
nvme-fc: cancel async events before freeing event struct
nvme: Revert: Fix controller creation races with teardown flow
|
|
For some inexplicable reason I decided to call flush_scheduled_work()
instead of cancel_delayed_work_sync(). The problem with that is that
flush_scheduled_work() waits for *all* queued scheduled work to be
completed instead of just the work itself.
This can cause a deadlock if a CEC driver also schedules work that
takes the same lock. See the comments for flush_scheduled_work() in
linux/workqueue.h.
This is exactly what has been observed a few times.
This patch simply replaces flush_scheduled_work() by
cancel_delayed_work_sync().
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: <stable@vger.kernel.org> # for v5.8 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
Better guess. Secondary CSC registers are from 0xF0000.
Signed-off-by: Martin Cerveny <m.cerveny@computer.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200906162140.5584-3-m.cerveny@computer.org
|
|
"Allwinner V3s" has secondary video layer (VI).
Decoded video is displayed in wrong colors until
secondary CSC registers are programmed correctly.
Fixes: 883029390550 ("drm/sun4i: Add DE2 CSC library")
Signed-off-by: Martin Cerveny <m.cerveny@computer.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200906162140.5584-2-m.cerveny@computer.org
|
|
Every iteration of for_each_available_child_of_node() decrements
the reference count of the previous node, however when control is
transferred from the middle of the loop, as in the case of a return
or break or goto, there is no decrement thus ultimately resulting in
a memory leak.
Fix a potential memory leak in clk-impd1.c by inserting
of_node_put() before a return statement.
Issue found with Coccinelle.
Signed-off-by: Sumera Priyadarsini <sylphrenadin@gmail.com>
Link: https://lore.kernel.org/r/20200829175704.GA10998@Kaladin
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The DVP driver depends both on the RESET_SIMPLE driver but also on the
reset framework itself. Let's make sure we have it enabled.
Fixes: 1bc95972715a ("clk: bcm: Add BCM2711 DVP driver")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20200903082636.3844629-1-maxime@cerno.tech
Acked-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a regression in padata"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
padata: fix possible padata_works_lock deadlock
|
|
In sas_notify_lldd_dev_found(), if we can't allocate the necessary
resources, then it seems like the wrong thing to mark the device as found
and to increment the reference count. None of the callers ever drop the
reference in that situation.
[mkp: tweaked commit desc based on feedback from John]
Link: https://lore.kernel.org/r/20200905125836.GF183976@mwanda
Fixes: 735f7d2fedf5 ("[SCSI] libsas: fix domain_device leak")
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When timestamping a packet there's a delay between the start of the
packet and the point where the hardware actually captures the
timestamp. This difference needs to be considered if we want accurate
timestamps.
This was done on the RX side, but not on the TX side.
Fixes: 2c344ae24501 ("igc: Add support for TX timestamping")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The previous timestamping latency numbers were obtained by
interpolating the i210 numbers with the i225 crystal clock value. That
calculation was wrong.
Use the correct values from real measurements.
Fixes: 81b055205e8b ("igc: Add support for RX timestamping")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The for loop in i40e_set_vsi_promisc() reports errors via dev_err() but
does not propagate the error up the call chain. Instead it continues the
loop and potentially overwrites the reported error value.
This results in the error being recorded in the log buffer, but the
caller might never know anything went the wrong way.
To avoid this situation i40e_set_vsi_promisc() needs to temporarily store
the error after reporting it. This is still not optimal as multiple
different errors may occur, so store the first error and hope that's
the main issue.
Fixes: 37d318d7805f (i40e: Remove scheduling while atomic possibility)
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function ‘i40e_set_vsi_promisc’:
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1176:14: error: ‘aq_ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
i40e_status aq_ret;
In case the code inside the if statement and the for loop does not get
executed aq_ret will be uninitialized when the variable gets returned at
the end of the function.
Avoid this by changing num_vlans from int to u16, so aq_ret always gets
set. Making this change in additional places as num_vlans should never
be negative.
Fixes: 37d318d7805f ("i40e: Remove scheduling while atomic possibility")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Igor Russkikh says:
====================
net: qed disable aRFS in NPAR and 100G
This patchset fixes some recent issues found by customers.
v3:
resending on Dmitry's behalf
v2:
correct hash in Fixes tag
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the assert during VF driver installation when the personality is iWARP
Fixes: 1fe614d10f45 ("qed: Relax VF firmware requirements")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some configurations ARFS cannot be used, so disable it if device
is not capable.
Fixes: e4917d46a653 ("qede: Add aRFS support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In CMT and NPAR the PF is unknown when the GFS block processes the
packet. Therefore cannot use searcher as it has a per PF database,
and thus ARFS must be disabled.
Fixes: d51e4af5c209 ("qed: aRFS infrastructure support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The string was incorrectly defined before from least to most specific,
swap the compatible strings accordingly.
Fixes: ff73917d38a6 ("ARM64: dts: Add QSPI Device Tree node for NS2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The string was incorrectly defined before from least to most
specific, swap the compatible strings accordingly.
Fixes: 1c8f40650723 ("ARM: dts: BCM5301X: convert to iProc QSPI")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The string was incorrectly defined before from least to most
specific, swap the compatible strings accordingly.
Fixes: 329f98c1974e ("ARM: dts: NSP: Add QSPI nodes to NSPI and bcm958625k DTSes")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The string was incorrectly defined before from least to most specific,
swap the compatible strings accordingly.
Fixes: b9099ec754b5 ("ARM: dts: Add Broadcom Hurricane 2 DTS include file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The binding is currently incorrectly defining the compatible strings
from least specifice to most specific instead of the converse. Re-order
them from most specific (left) to least specific (right) and fix the
examples as well.
Fixes: 5fc78f4c842a ("spi: Broadcom BRCMSTB, NSP, NS2 SoC bindings")
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.9
First set of fixes for v5.9, small but important.
brcmfmac
* fix a throughput regression on bcm4329
mt76
* fix a regression with stations reconnecting on mt7616
* properly free tx skbs, it was working by accident before
mwifiex
* fix a regression with 256 bit encryption keys
wlcore
* revert AES CMAC support as it caused a regression
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jason A. Donenfeld says:
====================
wireguard fixes for 5.9-rc5
Yesterday, Eric reported a race condition found by syzbot. This series
contains two commits, one that fixes the direct issue, and another that
addresses the more general issue, as a defense in depth.
1) The basic problem syzbot unearthed was that one particular mutation
of handshake->entry was not protected by the handshake mutex like the
other cases, so this patch basically just reorders a line to make
sure the mutex is actually taken at the right point. Most of the work
here went into making sure the race was fully understood and making a
reproducer (which syzbot was unable to do itself, due to the rarity
of the race).
2) Eric's initial suggestion for fixing this was taking a spinlock
around the hash table replace function where the null ptr deref was
happening. This doesn't address the main problem in the most precise
possible way like (1) does, but it is a good suggestion for
defense-in-depth, in case related issues come up in the future, and
basically costs nothing from a performance perspective. I thought it
aided in implementing a good general rule: all mutators of that hash
table take the table lock. So that's part of this series as a
companion.
Both of these contain Fixes: tags and are good candidates for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric's suggested fix for the previous commit's mentioned race condition
was to simply take the table->lock in wg_index_hashtable_replace(). The
table->lock of the hash table is supposed to protect the bucket heads,
not the entires, but actually, since all the mutator functions are
already taking it, it makes sense to take it too for the test to
hlist_unhashed, as a defense in depth measure, so that it no longer
races with deletions, regardless of what other locks are protecting
individual entries. This is sensible from a performance perspective
because, as Eric pointed out, the case of being unhashed is already the
unlikely case, so this won't add common contention. And comparing
instructions, this basically doesn't make much of a difference other
than pushing and popping %r13, used by the new `bool ret`. More
generally, I like the idea of locking consistency across table mutator
functions, and this might let me rest slightly easier at night.
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric reported that syzkaller found a race of this variety:
CPU 1 CPU 2
-------------------------------------------|---------------------------------------
wg_index_hashtable_replace(old, ...) |
if (hlist_unhashed(&old->index_hash)) |
| wg_index_hashtable_remove(old)
| hlist_del_init_rcu(&old->index_hash)
| old->index_hash.pprev = NULL
hlist_replace_rcu(&old->index_hash, ...) |
*old->index_hash.pprev |
Syzbot wasn't actually able to reproduce this more than once or create a
reproducer, because the race window between checking "hlist_unhashed" and
calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
similar there helps make this demonstrable using this simple script:
#!/bin/bash
set -ex
trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
ip link add wg0 type wireguard
ip link add wg1 type wireguard
wg set wg0 private-key <(wg genkey) listen-port 9999
wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
wg set wg0 peer $(wg show wg1 public-key)
ip link set wg0 up
yes link set wg1 up | ip -force -batch - &
pid1=$!
yes link set wg1 down | ip -force -batch - &
pid2=$!
wait
The fundumental underlying problem is that we permit calls to wg_index_
hashtable_remove(handshake.entry) without requiring the caller to take
the handshake mutex that is intended to protect members of handshake
during mutations. This is consistently the case with calls to wg_index_
hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
handshake.entry), but it's missing from a pertinent callsite of wg_
index_hashtable_remove(handshake.entry). So, this patch makes sure that
mutex is taken.
The original code was a little bit funky though, in the form of:
remove(handshake.entry)
lock(), memzero(handshake.some_members), unlock()
remove(handshake.entry)
The original intention of that double removal pattern outside the lock
appears to be some attempt to prevent insertions that might happen while
locks are dropped during expensive crypto operations, but actually, all
callers of wg_index_hashtable_insert(handshake.entry) take the write
lock and then explicitly check handshake.state, as they should, which
the aforementioned memzero clears, which means an insertion should
already be impossible. And regardless, the original intention was
necessarily racy, since it wasn't guaranteed that something else would
run after the unlock() instead of after the remove(). So, from a
soundness perspective, it seems positive to remove what looks like a
hack at best.
The crash from both syzbot and from the script above is as follows:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
clean follow coccicheck warning:
net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFS/RDMA resource leak
- Fix the error handling during delegation recall
- NFSv4.0 needs to return the delegation on a zero-stateid SETATTR
- Stop printk reading past end of string
* tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: stop printk reading past end of string
NFS: Zero-stateid SETATTR should first return delegation
NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
xprtrdma: Release in-flight MRs on disconnect
|
|
Eric Dumazet says:
====================
net: skb_put_padto() fixes
sysbot reported a bug in qrtr leading to use-after-free.
First patch fixes the issue.
Second patch addes __must_check attribute to avoid similar
issues in the future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If skb_put_padto() returns an error, skb has been freed.
Better not touch it anymore, as reported by syzbot [1]
Note to qrtr maintainers : this suggests qrtr_sendmsg()
should adjust sock_alloc_send_skb() second parameter
to account for the potential added alignment to avoid
reallocation.
[1]
BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
__skb_insert include/linux/skbuff.h:1907 [inline]
__skb_queue_before include/linux/skbuff.h:2016 [inline]
__skb_queue_tail include/linux/skbuff.h:2049 [inline]
skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23
qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d5b9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9
RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003
RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f
R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c
Allocated by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518
slab_alloc mm/slab.c:3312 [inline]
kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482
skb_clone+0x1b2/0x370 net/core/skbuff.c:1449
qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kmem_cache_free+0x82/0xf0 mm/slab.c:3693
__skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823
__skb_put_padto include/linux/skbuff.h:3233 [inline]
skb_put_padto include/linux/skbuff.h:3252 [inline]
qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88804d8ab3c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 0 bytes inside of
224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0)
The buggy address belongs to the page:
page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800
raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000
page dumped because: kasan: bad access detected
Fixes: ce57785bf91b ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Carl Huang <cjhuang@codeaurora.org>
Cc: Wen Gong <wgong@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we allocate rx buffers in a single contiguous buffers for
headers (iser and iscsi) and data trailer. This means that most likely the
data starting offset is aligned to 76 bytes (size of both headers).
This worked fine for years, but at some point this broke, resulting in
data corruptions in isert when a command comes with immediate data and the
underlying backend device assumes 512 bytes buffer alignment.
We assume a hard-requirement for all direct I/O buffers to be 512 bytes
aligned. To fix this, we should avoid passing unaligned buffers for I/O.
Instead, we allocate our recv buffers with some extra space such that we
can have the data portion align to 512 byte boundary. This also means that
we cannot reference headers or data using structure but rather
accessors (as they may move based on alignment). Also, get rid of the
wrong __packed annotation from iser_rx_desc as this has only harmful
effects (not aligned to anything).
This affects the rx descriptors for iscsi login and data plane.
Fixes: 3d75ca0adef4 ("block: introduce multi-page bvec helpers")
Link: https://lore.kernel.org/r/20200904195039.31687-1-sagi@grimberg.me
Reported-by: Stephen Rust <srust@blockbridge.com>
Tested-by: Doug Dumitru <doug@dumitru.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The device .release function was not being set during the device
initialization. This was leading to the below warning, in error cases when
put_srv was called before device_add was called.
Warning:
Device '(null)' does not have a release() function, it is broken and must
be fixed. See Documentation/kobject.txt.
So, set the device .release function during device initialization in the
__alloc_srv() function.
Fixes: baa5b28b7a47 ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add")
Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|