Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
"A fix for st1232 driver to properly report coordinates for 2nd and
subsequent fingers when more than one is on the surface"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: st1232 - fix reporting multitouch coordinates
|
|
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.
This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.
Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This driver is using regulator_get_optional() to handle all the supplies
that it handles, and only ever enables and disables all supplies en masse
without ever doing any other configuration of the device to handle missing
power. These are clear signs that the API is being misused - it should only
be used for supplies that may be physically absent from the system and in
these cases the hardware usually needs different configuration if the
supply is missing. Instead use normal regualtor_get(), if the supply is
not described in DT then the framework will substitute a dummy regulator in
so no special handling is needed by the consumer driver.
In the case of the PHY regulator the handling in the driver is a hack to
deal with integrated PHYs; the supplies are only optional in the sense
that that there's some confusion in the code about where they're bound to.
From a code point of view they function exactly as normal supplies so can
be treated as such. It'd probably be better to model this by instantiating
a PHY object for integrated PHYs.
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We hit the following warning in production
print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
blk_mq_free_request+0xb7/0xf0
blk_mq_complete_request+0x62/0xf0
recv_work+0x29/0xa1 [nbd]
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
? rescuer_thread+0x340/0x340
kthread+0x111/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---
This was preceded by us timing out everything and shutting down the
sockets for the device. The problem is we had a request in the queue at
the same time, so we completed the request twice. This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.
Fix this by checking cmd->status in nbd_read_stat. We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.
For the clear_req cause this is initiated by the user, so again is safe.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fixes from Jessica Yu:
- Revert __ksymtab_$namespace.$symbol naming scheme back to
__ksymtab_$symbol, as it was causing issues with depmod.
Instead, have modpost extract a symbol's namespace from __kstrtabns
and __ksymtab_strings.
- Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
by unescaped '/'.
Use a different sed delimiter to avoid this problem.
* tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
scripts/nsdeps: use alternative sed delimiter
symbol namespaces: revert to previous __ksymtab name scheme
modpost: make updating the symbol namespace explicit
modpost: delegate updating namespaces to separate function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A slightly larger set of fixes have accrued in the last two weeks.
Mostly a collection of the usual smaller fixes:
- Marvell Armada: USB phy setup issues on Turris Mox
- Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
maintainer updates.
- OMAP: Fixlets for display config, interrupt settings for wifi, some
clock/PM pieces. Also IOMMU regression fix and a ti-sysc
no-watchdog regression fix.
- i.MX: A few fixes around PM/settings, some devicetree fixlets and
catching up with config option changes in DRM
- Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
panel settings
... and some smaller fixes for Davinci (backlight, McBSP DMA),
Allwinner (phy regulators, PMU removal on A64, etc)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
MAINTAINERS: Update the Spreadtrum SoC maintainer
MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
bus: ti-sysc: Fix watchdog quirk handling
ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
ARM: davinci_all_defconfig: enable GPIO backlight
ARM: davinci: dm365: Fix McBSP dma_slave_map entry
ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
ARM: dts: imx7s: Correct GPT's ipg clock source
ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
arm64: dts: lx2160a: Correct CPU core idle state name
mailmap: Add Simon Arlott (replacement for expired email address)
arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
...
|
|
Pull KVM fixes from Paolo Bonzini:
"Bugfixes for ARM, PPC and x86, plus selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Don't leak L1 MMIO regions to L2
KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update
kvm: clear kvmclock MSR on reset
KVM: x86: fix bugon.cocci warnings
KVM: VMX: Remove specialized handling of unexpected exit-reasons
selftests: kvm: fix sync_regs_test with newer gccs
selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported
selftests: kvm: consolidate VMX support checks
selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice
KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled
selftests: kvm: synchronize .gitignore to Makefile
kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID
KVM: arm64: pmu: Reset sample period on overflow handling
KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event
arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems
KVM: arm64: pmu: Fix cycle counter truncation
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
|
|
Pull drm fixes from Dave Airlie:
"Quiet week this week, which I suspect means some people just didn't
get around to sending me fixes pulls in time. This has 2 komeda and a
bunch of amdgpu fixes in it:
komeda:
- typo fixes
- flushing pipes fix
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl"
* tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm:
drm/komeda: Fix typos in komeda_splitter_validate
drm/komeda: Don't flush inactive pipes
drm/amdgpu/vce: fix allocation size in enc ring test
drm/amdgpu: fix error handling in amdgpu_bo_list_create
drm/amdgpu: fix potential VM faults
drm/amdgpu: user pages array memory leak fix
drm/amdgpu/vcn: fix allocation size in enc ring test
drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host fixes:
- mxs: Fix flags passed to dmaengine_prep_slave_sg
- cqhci: Add a missing memory barrier
- sdhci-omap: Fix tuning procedure for temperatures < -20C"
* tag 'mmc-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mxs: fix flags passed to dmaengine_prep_slave_sg
mmc: cqhci: Commit descriptors before setting the doorbell
mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C
|
|
We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.
For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Relax qspi pins slew-rate to minimize peak currents.
Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board")
Link: https://lore.kernel.org/r/20191025130122.11407-1-alexandre.torgue@st.com
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe
Reorder them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.
The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().
Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.
The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Support for the kernel as Xen 32-bit PV guest will soon be removed.
Issue a warning when booted as such.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull the second lot of irqchip updates for 5.4 from Marc Zyngier:
- Sifive PLIC: force driver to skip non-relevant contexts
- GICv4: Don't send VMOVP commands to ITSs that don't have
this vPE mapped
|
|
This reorganization will allow us to call kvm_arch_destroy_vm in the
event that kvm_create_vm fails after calling kvm_arch_init_vm.
Suggested-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Recent cleanup in the way EEH support is added to a device causes a
kernel oops when the cxl driver probes a device and creates virtual
devices discovered on the FPGA:
BUG: Kernel NULL pointer dereference at 0x000000a0
Faulting instruction address: 0xc000000000048070
Oops: Kernel access of bad area, sig: 7 [#1]
...
NIP eeh_add_device_late.part.9+0x50/0x1e0
LR eeh_add_device_late.part.9+0x3c/0x1e0
Call Trace:
_dev_info+0x5c/0x6c (unreliable)
pnv_pcibios_bus_add_device+0x60/0xb0
pcibios_bus_add_device+0x40/0x60
pci_bus_add_device+0x30/0x100
pci_bus_add_devices+0x64/0xd0
cxl_pci_vphb_add+0xe0/0x130 [cxl]
cxl_probe+0x504/0x5b0 [cxl]
local_pci_probe+0x6c/0x110
work_for_cpu_fn+0x38/0x60
The root cause is that those cxl virtual devices don't have a
representation in the device tree and therefore no associated pci_dn
structure. In eeh_add_device_late(), pdn is NULL, so edev is NULL and
we oops.
We never had explicit support for EEH for those virtual devices.
Instead, EEH events are reported to the (real) pci device and handled
by the cxl driver. Which can then forward to the virtual devices and
handle dependencies. The fact that we try adding EEH support for the
virtual devices is new and a side-effect of the recent cleanup.
This patch fixes it by skipping adding EEH support on powernv for
devices which don't have a pci_dn structure.
The cxl driver doesn't create virtual devices on pseries so this patch
doesn't fix it there intentionally.
Fixes: b905f8cdca77 ("powerpc/eeh: EEH for pSeries hot plug")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016162833.22509-1-fbarrat@linux.ibm.com
|
|
Modify plic_init() to skip .dts interrupt contexts other
than supervisor external interrupt.
The .dts entry for plic may specify multiple interrupt contexts.
For example, it may assign two entries IRQ_M_EXT and IRQ_S_EXT,
in that order, to the same interrupt controller. This patch
modifies plic_init() to skip the IRQ_M_EXT context since
IRQ_S_EXT is currently the only supported context.
If IRQ_M_EXT is not skipped, plic_init() will report "handler
already present for context" when it comes across the IRQ_S_EXT
context in the next iteration of its loop.
Without this patch, .dts would have to be edited to replace the
value of IRQ_M_EXT with -1 for it to be skipped.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv
Link: https://lkml.kernel.org/r/1571933503-21504-1-git-send-email-alan.mikhak@sifive.com
|
|
The _PPC change notifications from the platform firmware are per-CPU,
so acpi_processor_ppc_init() needs to add a frequency QoS request
for each CPU covered by a cpufreq policy to take all of them into
account.
Even though ACPI thermal control of CPUs sets frequency limits
per processor package, it also needs a frequency QoS request for each
CPU in a cpufreq policy in case some of them are taken offline and
the frequency limit needs to be set through the remaining online
ones (this is slightly excessive, because all CPUs covered by one
cpufreq policy will set the same frequency limit through their QoS
requests, but it is not incorrect).
Modify the code in accordance with the above observations.
Fixes: d15ce412737a ("ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
git://people.freedesktop.org/~agd5f/linux into drm-fixes
drm-fixes-5.4-2019-10-23:
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024031809.3155-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for komeda, one for typos and one to prevent an hardware issue
when flushing inactive pipes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023112643.evpp6f23mpjwdsn4@gilmour
|
|
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.
CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);
Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.
Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When the client hits reconnect it iterates over the mid
pending queue marking entries for retry and moving them
to a temporary list to issue callbacks later without holding
GlobalMid_Lock. In the same time there is no guarantee that
mids can't be removed from the temporary list or even
freed completely by another thread. It may cause a temporary
list corruption:
[ 430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[ 430.464668] ------------[ cut here ]------------
[ 430.466569] kernel BUG at lib/list_debug.c:51!
[ 430.468476] invalid opcode: 0000 [#1] SMP PTI
[ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
...
[ 430.510426] Call Trace:
[ 430.511500] cifs_reconnect+0x25e/0x610 [cifs]
[ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs]
[ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs]
[ 430.517452] ? try_to_wake_up+0x212/0x650
[ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs]
[ 430.521086] ? allocate_buffers+0x66/0x120 [cifs]
[ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[ 430.525116] kthread+0xfb/0x130
[ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs]
[ 430.528514] ? kthread_park+0x90/0x90
[ 430.530019] ret_from_fork+0x35/0x40
Fix this by obtaining extra references for mids being retried
and marking them as MID_DELETED which indicates that such a mid
has been dequeued from the pending list.
Also move mid cleanup logic from DeleteMidQEntry to
_cifs_mid_q_entry_release which is called when the last reference
to a particular mid is put. This allows to avoid any use-after-free
of response buffers.
The patch needs to be backported to stable kernels. A stable tag
is not mentioned below because the patch doesn't apply cleanly
to any actively maintained stable kernel.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Variable dif in function sd_setup_read_write_cmnd() is the return value of
function scsi_host_dif_capable() which returns dif capability of disks. If
define it as bool, even for the disks which support DIF3, the function
still return dif=1, which causes IO error. So define variable dif as
unsigned int instead of bool.
Fixes: e249e42d277e ("scsi: sd: Clean up sd_setup_read_write_cmnd()")
Link: https://lore.kernel.org/r/1571725628-132736-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use the pointer 'p' after having tested that pointer instead of before.
Fixes: 5cadafb236df ("target/cxgbit: Fix endianness annotations")
Cc: Varun Prakash <varun@chelsio.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191023202150.22173-1-bvanassche@acm.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.
[..]
rdma_nl_rcv()
mutex_lock()
[..]
rdma_nl_rcv_msg()
ib_get_client_nl_info()
request_module()
iw_cm_init()
rdma_nl_register()
mutex_lock(); <- Deadlock, acquiring mutex again
Due to above call sequence, following call trace and deadlock is observed.
kernel: __mutex_lock+0x35e/0x860
kernel: ? __mutex_lock+0x129/0x860
kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
kernel: rdma_nl_register+0x1a/0x90 [ib_core]
kernel: ? 0xffffffffc029b000
kernel: iw_cm_init+0x34/0x1000 [iw_cm]
kernel: do_one_initcall+0x67/0x2d4
kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
kernel: do_init_module+0x5a/0x223
kernel: load_module+0x1998/0x1e10
kernel: ? __symbol_put+0x60/0x60
kernel: __do_sys_finit_module+0x94/0xe0
kernel: do_syscall_64+0x5a/0x270
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
process stack trace:
[<0>] __request_module+0x1c9/0x460
[<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
[<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
[<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
[<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
[<0>] netlink_unicast+0x179/0x220
[<0>] netlink_sendmsg+0x2f6/0x3f0
[<0>] sock_sendmsg+0x30/0x40
[<0>] ___sys_sendmsg+0x27a/0x290
[<0>] __sys_sendmsg+0x58/0xa0
[<0>] do_syscall_64+0x5a/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.
1. Split the lock protecting the cb_table into a per-index lock, and make
it a rwlock. This lock is used to ensure no callbacks are running after
unregistration returns. Since a module will not be registered once it
is already running callbacks, this avoids the deadlock.
2. Use smp_store_release() to update the cb_table during registration so
that no lock is required. This avoids lockdep problems with thinking
all the rwsems are the same lock class.
Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
"A couple more DT fixes for 5.4: fix a ref count, memory leak, and
Risc-V cpu schema warnings"
* tag 'devicetree-fixes-for-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: reserved_mem: add missing of_node_put() for proper ref-counting
of: unittest: fix memory leak in unittest_data_add
dt-bindings: riscv: Fix CPU schema errors
|
|
Taehee Yoo says:
====================
net: fix nested device bugs
This patchset fixes several bugs that are related to nesting
device infrastructure.
Current nesting infrastructure code doesn't limit the depth level of
devices. nested devices could be handled recursively. at that moment,
it needs huge memory and stack overflow could occur.
Below devices type have same bug.
VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, and VXLAN.
But I couldn't test all interface types so there could be more device
types, which have similar problems.
Maybe qmi_wwan.c code could have same problem.
So, I would appreciate if someone test qmi_wwan.c and other modules.
Test commands:
ip link add dummy0 type dummy
ip link add vlan1 link dummy0 type vlan id 1
for i in {2..100}
do
let A=$i-1
ip link add name vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
1st patch actually fixes the root cause.
It adds new common variables {upper/lower}_level that represent
depth level. upper_level variable is depth of upper devices.
lower_level variable is depth of lower devices.
[U][L] [U][L]
vlan1 1 5 vlan4 1 4
vlan2 2 4 vlan5 2 3
vlan3 3 3 |
| |
+------------+
|
vlan6 4 2
dummy0 5 1
After this patch, the nesting infrastructure code uses this variable to
check the depth level.
2nd patch fixes Qdisc lockdep related problem.
Before this patch, devices use static lockdep map.
So, if devices that are same types are nested, lockdep will warn about
recursive situation.
These patches make these devices use dynamic lockdep key instead of
static lock or subclass.
3rd patch fixes unexpected IFF_BONDING bit unset.
When nested bonding interface scenario, bonding interface could lost it's
IFF_BONDING flag. This should not happen.
This patch adds a condition before unsetting IFF_BONDING.
4th patch fixes nested locking problem in bonding interface
Bonding interface has own lock and this uses static lock.
Bonding interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
5th patch fixes nested locking problem in team interface
Team interface has own lock and this uses static lock.
Team interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
6th patch fixes a refcnt leak in the macsec module.
When the macsec module is unloaded, refcnt leaks occur.
But actually, that holding refcnt is unnecessary.
So this patch just removes these code.
7th patch adds ignore flag to an adjacent structure.
In order to exchange an adjacent node safely, ignore flag is needed.
8th patch makes vxlan add an adjacent link to limit depth level.
Vxlan interface could set it's lower interface and these lower interfaces
are handled recursively.
So, if the depth of lower interfaces is too deep, stack overflow could
happen.
9th patch removes unnecessary variables and callback.
After 1st patch, subclass callback and variables are unnecessary.
This patch just removes these variables and callback.
10th patch fix refcnt leaks in the virt_wifi module
Like every nested interface, the upper interface should be deleted
before the lower interface is deleted.
In order to fix this, the notifier routine is added in this patch.
v4 -> v5 :
- Update log messages
- Move variables position, 1st patch
- Fix iterator routine, 1st patch
- Add generic lockdep key code, which replaces 2, 4, 5, 6, 7 patches.
- Log message update, 10th patch
- Fix wrong error value in error path of __init routine, 10th patch
- hold module refcnt when interface is created, 10th patch
v3 -> v4 :
- Add new 12th patch to fix refcnt leaks in the virt_wifi module
- Fix wrong usage netdev_upper_dev_link() in the vxlan.c
- Preserve reverse christmas tree variable ordering in the vxlan.c
- Add missing static keyword in the dev.c
- Expose netdev_adjacent_change_{prepare/commit/abort} instead of
netdev_adjacent_dev_{enable/disable}
v2 -> v3 :
- Modify nesting infrastructure code to use iterator instead of recursive.
v1 -> v2 :
- Make the 3rd patch do not add a new priv_flag.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
virt_wifi_newlink() calls netdev_upper_dev_link() and it internally
holds reference count of lower interface.
Current code does not release a reference count of the lower interface
when the lower interface is being deleted.
So, reference count leaks occur.
Test commands:
ip link add dummy0 type dummy
ip link add vw1 link dummy0 type virt_wifi
ip link del dummy0
Splat looks like:
[ 133.787526][ T788] WARNING: CPU: 1 PID: 788 at net/core/dev.c:8274 rollback_registered_many+0x835/0xc80
[ 133.788355][ T788] Modules linked in: virt_wifi cfg80211 dummy team af_packet sch_fq_codel ip_tables x_tables unix
[ 133.789377][ T788] CPU: 1 PID: 788 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 133.790069][ T788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 133.791167][ T788] RIP: 0010:rollback_registered_many+0x835/0xc80
[ 133.791906][ T788] Code: 00 4d 85 ff 0f 84 b5 fd ff ff ba c0 0c 00 00 48 89 de 4c 89 ff e8 9b 58 04 00 48 89 df e8 30
[ 133.794317][ T788] RSP: 0018:ffff88805ba3f338 EFLAGS: 00010202
[ 133.795080][ T788] RAX: ffff88805e57e801 RBX: ffff88805ba34000 RCX: ffffffffa9294723
[ 133.796045][ T788] RDX: 1ffff1100b746816 RSI: 0000000000000008 RDI: ffffffffabcc4240
[ 133.797006][ T788] RBP: ffff88805ba3f4c0 R08: fffffbfff5798849 R09: fffffbfff5798849
[ 133.797993][ T788] R10: 0000000000000001 R11: fffffbfff5798848 R12: dffffc0000000000
[ 133.802514][ T788] R13: ffff88805ba3f440 R14: ffff88805ba3f400 R15: ffff88805ed622c0
[ 133.803237][ T788] FS: 00007f2e9608c0c0(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[ 133.804002][ T788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.804664][ T788] CR2: 00007f2e95610603 CR3: 000000005f68c004 CR4: 00000000000606e0
[ 133.805363][ T788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.806073][ T788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.806787][ T788] Call Trace:
[ 133.807069][ T788] ? generic_xdp_install+0x310/0x310
[ 133.807612][ T788] ? lock_acquire+0x164/0x3b0
[ 133.808077][ T788] ? is_bpf_text_address+0x5/0xf0
[ 133.808640][ T788] ? deref_stack_reg+0x9c/0xd0
[ 133.809138][ T788] ? __nla_validate_parse+0x98/0x1ab0
[ 133.809944][ T788] unregister_netdevice_many.part.122+0x13/0x1b0
[ 133.810599][ T788] rtnl_delete_link+0xbc/0x100
[ 133.811073][ T788] ? rtnl_af_register+0xc0/0xc0
[ 133.811672][ T788] rtnl_dellink+0x30e/0x8a0
[ 133.812205][ T788] ? is_bpf_text_address+0x5/0xf0
[ ... ]
[ 144.110530][ T788] unregister_netdevice: waiting for dummy0 to become free. Usage count = 1
This patch adds notifier routine to delete upper interface before deleting
lower interface.
Fixes: c7cdba31ed8b ("mac80211-next: rtnetlink wifi simulation device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current vxlan code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this routine needs
huge stack memory. So, unlimited nested devices could make
stack overflow.
In order to fix this issue, this patch adds adjacent links.
The adjacent link APIs internally check the depth level.
Test commands:
ip link add dummy0 type dummy
ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \
dstport 4789
for i in {1..100}
do
let A=$i-1
ip link add vxlan$i type vxlan id $i group 239.1.1.1 \
dev vxlan$A dstport 4789
done
ip link del dummy0
The top upper link is vxlan100 and the lowest link is vxlan0.
When vxlan0 is deleting, the upper devices will be deleted recursively.
It needs huge stack memory so it makes stack overflow.
Splat looks like:
[ 229.628477] =============================================================================
[ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2
[ 229.629785] -----------------------------------------------------------------------------
[ 229.629785]
[ 229.655439] ==================================================================
[ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200
[ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0
[ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334
[ 229.655688]
[ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h.......
[ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h.......
[ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`"....
[ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`..............
[ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!......
[ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................
[ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............
[ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`..............
[ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A....
[ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a
[ ... ]
Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to link an adjacent node, netdev_upper_dev_link() is used
and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
unlink operation does not fail, but link operation can fail.
In order to exchange adjacent nodes, we should unlink an old adjacent
node first. then, link a new adjacent node.
If link operation is failed, we should link an old adjacent node again.
But this link operation can fail too.
It eventually breaks the adjacent link relationship.
This patch adds an ignore flag into the netdev_adjacent structure.
If this flag is set, netdev_upper_dev_link() ignores an old adjacent
node for a moment.
This patch also adds new functions for other modules.
netdev_adjacent_change_prepare()
netdev_adjacent_change_commit()
netdev_adjacent_change_abort()
netdev_adjacent_change_prepare() inserts new device into adjacent list
but new device is not allowed to use immediately.
If netdev_adjacent_change_prepare() fails, it internally rollbacks
adjacent list so that we don't need any other action.
netdev_adjacent_change_commit() deletes old device in the adjacent list
and allows new device to use.
netdev_adjacent_change_abort() rollbacks adjacent list.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a macsec interface is created, it increases a refcnt to a lower
device(real device). when macsec interface is deleted, the refcnt is
decreased in macsec_free_netdev(), which is ->priv_destructor() of
macsec interface.
The problem scenario is this.
When nested macsec interfaces are exiting, the exit routine of the
macsec module makes refcnt leaks.
Test commands:
ip link add dummy0 type dummy
ip link add macsec0 link dummy0 type macsec
ip link add macsec1 link macsec0 type macsec
modprobe -rv macsec
[ 208.629433] unregister_netdevice: waiting for macsec0 to become free. Usage count = 1
Steps of exit routine of macsec module are below.
1. Calls ->dellink() in __rtnl_link_unregister().
2. Checks refcnt and wait refcnt to be 0 if refcnt is not 0 in
netdev_run_todo().
3. Calls ->priv_destruvtor() in netdev_run_todo().
Step2 checks refcnt, but step3 decreases refcnt.
So, step2 waits forever.
This patch makes the macsec module do not hold a refcnt of the lower
device because it already holds a refcnt of the lower device with
netdev_upper_dev_link().
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
team interface could be nested and it's lock variable could be nested too.
But this lock uses static lockdep key and there is no nested locking
handling code such as mutex_lock_nested() and so on.
so the Lockdep would warn about the circular locking scenario that
couldn't happen.
In order to fix, this patch makes the team module to use dynamic lock key
instead of static key.
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
ip link set team1 nomaster
Splat that looks like:
[ 40.364352] WARNING: possible recursive locking detected
[ 40.364964] 5.4.0-rc3+ #96 Not tainted
[ 40.365405] --------------------------------------------
[ 40.365973] ip/750 is trying to acquire lock:
[ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
[ 40.367689]
but task is already holding lock:
[ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.370280]
other info that might help us debug this:
[ 40.371159] Possible unsafe locking scenario:
[ 40.371942] CPU0
[ 40.372338] ----
[ 40.372673] lock(&team->lock);
[ 40.373115] lock(&team->lock);
[ 40.373549]
*** DEADLOCK ***
[ 40.374432] May be due to missing lock nesting notation
[ 40.375338] 2 locks held by ip/750:
[ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.377989]
stack backtrace:
[ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.380574] Call Trace:
[ 40.381208] dump_stack+0x7c/0xbb
[ 40.381959] __lock_acquire+0x269d/0x3de0
[ 40.382817] ? register_lock_class+0x14d0/0x14d0
[ 40.383784] ? check_chain_key+0x236/0x5d0
[ 40.384518] lock_acquire+0x164/0x3b0
[ 40.385074] ? team_set_mac_address+0x151/0x290 [team]
[ 40.385805] __mutex_lock+0x14d/0x14c0
[ 40.386371] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387038] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380
[ 40.388245] ? team_del_slave+0x60/0x60 [team]
[ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0
[ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 40.389819] ? lock_acquire+0x164/0x3b0
[ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20
[ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team]
[ 40.391353] ? __module_text_address+0x13/0x140
[ 40.391886] ? team_set_mac_address+0x151/0x290 [team]
[ 40.392547] team_set_mac_address+0x151/0x290 [team]
[ 40.393111] dev_set_mac_address+0x1f0/0x3f0
[ ... ]
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All bonding device has same lockdep key and subclass is initialized with
nest_level.
But actual nest_level value can be changed when a lower device is attached.
And at this moment, the subclass should be updated but it seems to be
unsafe.
So this patch makes bonding use dynamic lockdep key instead of the
subclass.
Test commands:
ip link add bond0 type bond
for i in {1..5}
do
let A=$i-1
ip link add bond$i type bond
ip link set bond$i master bond$A
done
ip link set bond5 master bond0
Splat looks like:
[ 307.992912] WARNING: possible recursive locking detected
[ 307.993656] 5.4.0-rc3+ #96 Tainted: G W
[ 307.994367] --------------------------------------------
[ 307.995092] ip/761 is trying to acquire lock:
[ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.997045]
but task is already holding lock:
[ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.999215]
other info that might help us debug this:
[ 308.000251] Possible unsafe locking scenario:
[ 308.001137] CPU0
[ 308.001533] ----
[ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.003302]
*** DEADLOCK ***
[ 308.004310] May be due to missing lock nesting notation
[ 308.005319] 3 locks held by ip/761:
[ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
[ 308.009422]
stack backtrace:
[ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96
[ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 308.012179] Call Trace:
[ 308.012601] dump_stack+0x7c/0xbb
[ 308.013089] __lock_acquire+0x269d/0x3de0
[ 308.013669] ? register_lock_class+0x14d0/0x14d0
[ 308.014318] lock_acquire+0x164/0x3b0
[ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.015520] _raw_spin_lock_nested+0x2e/0x60
[ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.017215] bond_get_stats+0xb8/0x500 [bonding]
[ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding]
[ 308.019710] ? rcu_read_lock_held+0x90/0xa0
[ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding]
[ 308.021953] dev_get_stats+0x1ec/0x270
[ 308.022508] bond_get_stats+0x1d1/0x500 [bonding]
Fixes: d3fff6c443fe ("net: add netdev_lockdep_set_classes() helper")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IFF_BONDING means bonding master or bonding slave device.
->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets
IFF_BONDING flag.
bond0<--bond1
Both bond0 and bond1 are bonding device and these should keep having
IFF_BONDING flag until they are removed.
But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine
do not check whether the slave device is the bonding type or not.
This patch adds the interface type check routine before removing
IFF_BONDING flag.
Test commands:
ip link add bond0 type bond
ip link add bond1 type bond
ip link set bond1 master bond0
ip link set bond1 nomaster
ip link del bond1 type bond
ip link add bond1 type bond
Splat looks like:
[ 226.665555] proc_dir_entry 'bonding/bond1' already registered
[ 226.666440] WARNING: CPU: 0 PID: 737 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0
[ 226.667571] Modules linked in: bonding af_packet sch_fq_codel ip_tables x_tables unix
[ 226.668662] CPU: 0 PID: 737 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 226.669508] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 226.670652] RIP: 0010:proc_register+0x2a9/0x3e0
[ 226.671612] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 a0 0b 14 9f 48 8b b0 e
0 00 00 00 e8 07 e7 88 ff <0f> 0b 48 c7 c7 40 2d a5 9f e8 59 d6 23 01 48 8b 4c 24 10 48 b8 00
[ 226.675007] RSP: 0018:ffff888050e17078 EFLAGS: 00010282
[ 226.675761] RAX: dffffc0000000008 RBX: ffff88805fdd0f10 RCX: ffffffff9dd344e2
[ 226.676757] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f6b8c
[ 226.677751] RBP: ffff8880507160f3 R08: ffffed100d940019 R09: ffffed100d940019
[ 226.678761] R10: 0000000000000001 R11: ffffed100d940018 R12: ffff888050716008
[ 226.679757] R13: ffff8880507160f2 R14: dffffc0000000000 R15: ffffed100a0e2c1e
[ 226.680758] FS: 00007fdc217cc0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 226.681886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 226.682719] CR2: 00007f49313424d0 CR3: 0000000050e46001 CR4: 00000000000606f0
[ 226.683727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 226.684725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 226.685681] Call Trace:
[ 226.687089] proc_create_seq_private+0xb3/0xf0
[ 226.687778] bond_create_proc_entry+0x1b3/0x3f0 [bonding]
[ 226.691458] bond_netdev_event+0x433/0x970 [bonding]
[ 226.692139] ? __module_text_address+0x13/0x140
[ 226.692779] notifier_call_chain+0x90/0x160
[ 226.693401] register_netdevice+0x9b3/0xd80
[ 226.694010] ? alloc_netdev_mqs+0x854/0xc10
[ 226.694629] ? netdev_change_features+0xa0/0xa0
[ 226.695278] ? rtnl_create_link+0x2ed/0xad0
[ 226.695849] bond_newlink+0x2a/0x60 [bonding]
[ 226.696422] __rtnl_newlink+0xb9f/0x11b0
[ 226.696968] ? rtnl_link_unregister+0x220/0x220
[ ... ]
Fixes: 0b680e753724 ("[PATCH] bonding: Add priv_flag to avoid event mishandling")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this needs huge stack
memory. So, unlimited nested devices could make stack overflow.
This patch adds upper_level and lower_level, they are common variables
and represent maximum lower/upper depth.
When upper/lower device is attached or dettached,
{lower/upper}_level are updated. and if maximum depth is bigger than 8,
attach routine fails and returns -EMLINK.
In addition, this patch converts recursive routine of
netdev_walk_all_{lower/upper} to iterator routine.
Test commands:
ip link add dummy0 type dummy
ip link add link dummy0 name vlan1 type vlan id 1
ip link set vlan1 up
for i in {2..55}
do
let A=$i-1
ip link add vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
Splat looks like:
[ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
[ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
[ 155.515048][ T908]
[ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.517233][ T908] Call Trace:
[ 155.517627][ T908]
[ 155.517918][ T908] Allocated by task 0:
[ 155.518412][ T908] (stack is not available)
[ 155.518955][ T908]
[ 155.519228][ T908] Freed by task 0:
[ 155.519885][ T908] (stack is not available)
[ 155.520452][ T908]
[ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0
[ 155.520729][ T908] which belongs to the cache names_cache of size 4096
[ 155.522387][ T908] The buggy address is located 512 bytes inside of
[ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
[ 155.523920][ T908] The buggy address belongs to the page:
[ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
[ 155.525836][ T908] flags: 0x100000000010200(slab|head)
[ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
[ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 155.528429][ T908] page dumped because: kasan: bad access detected
[ 155.529158][ T908]
[ 155.529410][ T908] Memory state around the buggy address:
[ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
[ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.532806][ T908] ^
[ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
[ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If copy_net_ns() failed after net_alloc(), net->key_domain is leaked.
Fix this, by freeing key_domain in error path.
syzbot report:
BUG: memory leak
unreferenced object 0xffff8881175007e0 (size 32):
comm "syz-executor902", pid 7069, jiffies 4294944350 (age 28.400s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a83ed741>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<00000000a83ed741>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<00000000a83ed741>] slab_alloc mm/slab.c:3326 [inline]
[<00000000a83ed741>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<0000000059fc92b9>] kmalloc include/linux/slab.h:547 [inline]
[<0000000059fc92b9>] kzalloc include/linux/slab.h:742 [inline]
[<0000000059fc92b9>] net_alloc net/core/net_namespace.c:398 [inline]
[<0000000059fc92b9>] copy_net_ns+0xb2/0x220 net/core/net_namespace.c:445
[<00000000a9d74bbc>] create_new_namespaces+0x141/0x2a0 kernel/nsproxy.c:103
[<000000008047d645>] unshare_nsproxy_namespaces+0x7f/0x100 kernel/nsproxy.c:202
[<000000005993ea6e>] ksys_unshare+0x236/0x490 kernel/fork.c:2674
[<0000000019417e75>] __do_sys_unshare kernel/fork.c:2742 [inline]
[<0000000019417e75>] __se_sys_unshare kernel/fork.c:2740 [inline]
[<0000000019417e75>] __x64_sys_unshare+0x16/0x20 kernel/fork.c:2740
[<00000000f4c5f2c8>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
[<0000000038550184>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
syzbot also reported other leak in copy_net_ns -> setup_net.
This problem is already fixed by cf47a0b882a4e5f6b34c7949d7b293e9287f1972.
Fixes: 9b242610514f ("keys: Network namespace domain tag")
Reported-and-tested-by: syzbot+3b3296d032353c33184b@syzkaller.appspotmail.com
Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix locking issue in the error code path of a function that belongs to
the sysfs interface exposed by the ACPI NFIT handling code (Dan
Carpenter)"
* tag 'acpi-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: NFIT: Fix unlock on error in scrub_show()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix problems related to frequency limits management in cpufreq
that were introduced during the 5.3 cycle (when PM QoS had started to
be used for that), fix a few issues in the OPP (operating performance
points) library code and fix up the recently added haltpoll cpuidle
driver.
The cpufreq changes are somewhat bigger that I would like them to be
at this stage of the cycle, but the problems fixed by them include
crashes on boot and shutdown in some cases (among other things) and in
my view it is better to address the root of the issue right away.
Specifics:
- Using device PM QoS of CPU devices for managing frequency limits in
cpufreq does not work, so introduce frequency QoS (based on the
original low-level PM QoS) for this purpose, switch cpufreq and
related code over to using it and fix a race involving deferred
updates of frequency limits on top of that (Rafael Wysocki, Sudeep
Holla).
- Avoid calling regulator_enable()/disable() from the OPP framework
to avoid side-effects on boot-enabled regulators that may change
their initial voltage due to performing initial voltage balancing
without all restrictions from the consumers (Marek Szyprowski).
- Avoid a kref management issue in the OPP library code and drop an
incorrectly added lockdep_assert_held() from it (Viresh Kumar).
- Make the recently added haltpoll cpuidle driver take the 'idle='
override into account as appropriate (Zhenzhong Duan)"
* tag 'pm-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
opp: Reinitialize the list_kref before adding the static OPPs again
cpufreq: Cancel policy update work scheduled before freeing
cpuidle: haltpoll: Take 'idle=' override into account
opp: core: Revert "add regulators enable and disable"
PM: QoS: Drop frequency QoS types from device PM QoS
cpufreq: Use per-policy frequency QoS
PM: QoS: Introduce frequency QoS
opp: of: drop incorrect lockdep_assert_held()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
"Fix a memory leak introduced in -rc1"
* tag 'gfs2-v5.4-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix memory leak when gfs2meta's fs_context is freed
|
|
Remove the following warning:
drivers/i2c/busses/i2c-stm32f7.c:315:
warning: cannot understand function prototype:
'struct stm32f7_i2c_spec i2c_specs[] =
Replace a comment starting with /** by simply /* to avoid having
it interpreted as a kernel-doc comment.
Fixes: aeb068c57214 ("i2c: i2c-stm32f7: add driver")
Signed-off-by: Alain Volmat <alain.volmat@st.com>
Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
When in slave mode, an arbitration loss (ARLO) may be detected before the
slave had a chance to detect the stop condition (STOPF in ISR).
This is seen when two master + slave adapters switch their roles. It
provokes the i2c bus to be stuck, busy as SCL line is stretched.
- the I2C_SLAVE_STOP event is never generated due to STOPF flag is set but
don't generate an irq (race with ARLO irq, STOPIE is masked). STOPF flag
remains set until next master xfer (e.g. when STOPIE irq get unmasked).
In this case, completion is generated too early: immediately upon new
transfer request (then it doesn't send all data).
- Some data get stuck in TXDR register. As a consequence, the controller
stretches the SCL line: the bus gets busy until a future master transfer
triggers the bus busy / recovery mechanism (this can take time... and
may never happen at all)
So choice is to let the STOPF being detected by the slave isr handler,
to properly handle this stop condition. E.g. don't mask IRQs in error
handler, when the slave is running.
Fixes: 60d609f30de2 ("i2c: i2c-stm32f7: Add slave support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
The slave-interface documentation [1] states "the bus driver should
transmit the first byte" upon I2C_SLAVE_READ_REQUESTED slave event:
- 'val': backend returns first byte to be sent
The driver currently ignores the 1st byte to send on this event.
[1] https://www.kernel.org/doc/Documentation/i2c/slave-interface
Fixes: 60d609f30de2 ("i2c: i2c-stm32f7: Add slave support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Since commit abf4923e97c3 ("i2c: mediatek: disable zero-length transfers
for mt8183"), there is a NULL pointer dereference for all the SoCs
that don't have any quirk. mtk_i2c_functionality is not checking that
the quirks pointer is not NULL before starting to use it.
This commit add a call to i2c_check_quirks which will check whether
the quirks pointer is set, and if so will check if the IP has the
NO_ZERO_LEN quirk.
Fixes: abf4923e97c3 ("i2c: mediatek: disable zero-length transfers for mt8183")
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Reviewed-by: Cengiz Can <cengiz@kernel.wtf>
Reviewed-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Ulrich Hecht <uli@fpond.eu>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|