Age | Commit message (Collapse) | Author |
|
Currently when a switchdev port joins a bridge, we replay all FDB
entries pointing towards that port or towards the bridge.
However, this is insufficient in certain situations:
(a) DSA, through its assisted_learning_on_cpu_port logic, snoops
dynamically learned FDB entries on foreign interfaces.
These are FDB entries that are pointing neither towards the newly
joined switchdev port, nor towards the bridge. So these addresses
would be missed when joining a bridge where a foreign interface has
already learned some addresses, and they would also linger on if the
DSA port leaves the bridge before the foreign interface forgets them.
None of this happens if we replay the entire FDB when the port joins.
(b) There is a desire to treat local FDB entries on a port (i.e. the
port's termination MAC address) identically to FDB entries pointing
towards the bridge itself. More details on the reason behind this in
the next patch. The point is that this cannot be done given the
current structure of br_fdb_replay() in this situation:
ip link set swp0 master br0 # br0 inherits its MAC address from swp0
ip link set swp1 master br0
What is desirable is that when swp1 joins the bridge, br_fdb_replay()
also notifies swp1 of br0's MAC address, but this won't in fact
happen because the MAC address of br0 does not have fdb->dst == NULL
(it doesn't point towards the bridge), but it has fdb->dst == swp0.
So our current logic makes it impossible for that address to be
replayed. But if we dump the entire FDB instead of just the entries
with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC
address of br0 will be replayed too, which is what we need.
A natural question arises: say there is an FDB entry to be replayed,
like a MAC address dynamically learned on a foreign interface that
belongs to a bridge where no switchdev port has joined yet. If 10
switchdev ports belonging to the same driver join this bridge, one by
one, won't every port get notified 10 times of the foreign FDB entry,
amounting to a total of 100 notifications for this FDB entry in the
switchdev driver?
Well, yes, but this is where the "void *ctx" argument for br_fdb_replay
is useful: every port of the switchdev driver is notified whenever any
other port requests an FDB replay, but because the replay was initiated
by a different port, its context is different from the initiating port's
context, so it ignores those replays.
So the foreign FDB entry will be installed only 10 times, once per port.
This is done so that the following 4 code paths are always well balanced:
(a) addition of foreign FDB entry is replayed when port joins bridge
(b) deletion of foreign FDB entry is replayed when port leaves bridge
(c) addition of foreign FDB entry is notified to all ports currently in bridge
(c) deletion of foreign FDB entry is notified to all ports currently in bridge
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: PTP enhancements
This series adds two PTP enhancements. This first one is to register
the PHC during probe time and keep it registered whether it is in
ifup or ifdown state. It will get unregistered and possibly
reregistered if the firmware PTP capability changes after firmware
reset. The second one is to add the 1PPS (one pulse per second)
feature to support input/output of the 1PPS signal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FW can report to driver via ASYNC event if it encountered an
invalid signal on any TSIO PIN. Driver will log this event
for the user to take corrective action.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Arvind Susarla <arvind.susarla@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once the PPS pins are configured, the FW can report
PPS values using ASYNC event. This patch adds the
ASYNC event handler and subsequent reporting of the
events to kernel.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Application will send ioctls to set/clear PPS pin functions
based on user input. This patch implements the driver
callbacks that will configure the TSIO pins using firmware
commands. After firmware reset, the TSIO pins will be reconfigured
again.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1PPS (One Pulse Per Second) is a signal generated either
by the NIC PHC or an external timing source.
Integrating the support to configure and use 1PPS using
the TSIO pins along with PTP timestamps will add Grand
Master capability to the 5750X family chipsets.
This patch initializes the driver data structures and
registers the 1PPS with kernel, based on the TSIO pins'
capability in the hardware. This will create a /dev/ppsX
device which applications can use to receive PPS events.
Later patches will define functions to configure and use
the pins.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During error recovery or hot firmware upgrade, the chip may be under
reset and the PHC register read cycles may cause completion timeouts.
Check that the chip is not under reset condition before proceeding
to read the PHC by checking the flag BNXT_STATE_IN_FW_RESET. We also
need to take the ptp_lock before we set this flag to prevent race
conditions.
We need this logic because the PHC now will stay registered after
bnxt_close().
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was pointed out by Richard Cochran that registering the PHC during
probe is better than during ifup, so move bnxt_ptp_init() back to
bnxt_init_one(). In order to work correctly after firmware reset which
may result in PTP config. changes, we modify bnxt_ptp_init() to return
if the PHC has been registered earlier. If PTP is no longer supported
by the new firmware, we will unregister the PHC and clean up.
This partially reverts:
d7859afb6880 ("bnxt_en: Move bnxt_ptp_init() to bnxt_open()")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 and reiserfs fixes from Jan Kara:
"A fix for the ext2 conversion to kmap_local() and two reiserfs
hardening fixes"
* tag 'fixes_for_v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: check directory items on read from disk
fs/ext2: Avoid page_address on pages returned by ext2_get_page
reiserfs: add check for root_inode in reiserfs_fill_super
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A set of bug-fixes and new hardware ids.
Highlights:
- amd-pmc fixes
- think-lmi fixes
- various new hardware-ids"
* tag 'platform-drivers-x86-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2
platform/x86: intel-hid: add Alder Lake ACPI device ID
platform/x86: think-lmi: Fix possible mem-leaks on tlmi_analyze() error-exit
platform/x86: think-lmi: Split kobject_init() and kobject_add() calls
platform/x86: think-lmi: Move pending_reboot_attr to the attributes sysfs dir
platform/x86: amd-pmc: Fix undefined reference to __udivdi3
platform/x86: amd-pmc: Fix missing unlock on error in amd_pmc_send_cmd()
platform/x86: wireless-hotkey: remove hardcoded "hp" from the error message
platform/x86: amd-pmc: Use return code on suspend
platform/x86: amd-pmc: Add new acpi id for future PMC controllers
platform/x86: amd-pmc: Add support for ACPI ID AMDI0006
platform/x86: amd-pmc: Add support for logging s0ix counters
platform/x86: amd-pmc: Add support for logging SMU metrics
platform/x86: amd-pmc: call dump registers only once
platform/x86: amd-pmc: Fix SMU firmware reporting mechanism
platform/x86: amd-pmc: Fix command completion code
platform/x86: think-lmi: Add pending_reboot support
|
|
This file was given GPL-2.0 license. But LGPL-2.1 makes more sense
as it needs to be used by libraries outside of the kernel source tree.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
unix_gc() assumes that candidate sockets can never gain an external
reference (i.e. be installed into an fd) while the unix_gc_lock is
held. Except for MSG_PEEK this is guaranteed by modifying inflight
count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file
count is not), yet it needs to be serialized with garbage collection.
Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible
to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage
collection is run completely before or completely after the barrier.
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When removing a writeable device in __btrfs_free_extra_devids, the rw
device count should be decremented.
This error was caught by Syzbot which reported a warning in
close_fs_devices:
WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
Modules linked in:
CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
FS: 00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
btrfs_fill_super fs/btrfs/super.c:1382 [inline]
btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
fc_mount fs/namespace.c:993 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
Because fs_devices->rw_devices was not 0 after
closing all devices. Here is the call trace that was observed:
btrfs_mount_root():
btrfs_scan_one_device():
device_list_add(); <---------------- device added
btrfs_open_devices():
open_fs_devices():
btrfs_open_one_device(); <-------- writable device opened,
rw device count ++
btrfs_fill_super():
open_ctree():
btrfs_free_extra_devids():
__btrfs_free_extra_devids(); <--- writable device removed,
rw device count not decremented
fail_tree_roots:
btrfs_close_devices():
close_fs_devices(); <------- rw device count off by 1
As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
mount if we don't have replace item with target device"), rw_devices
was decremented on removing a writable device in
__btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
was not set for the device. However, this check does not need to be
reinstated as it is now redundant and incorrect.
In __btrfs_free_extra_devids, we skip removing the device if it is the
target for replacement. This is done by checking whether device->devid
== BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
and so it's redundant to test for that bit.
Additionally, following commit 82372bc816d7 ("Btrfs: make
the logic of source device removing more clear"), rw_devices is
incremented whenever a writeable device is added to the alloc
list (including the target device in btrfs_dev_replace_finishing), so
all removals of writable devices from the alloc list should also be
accompanied by a decrement to rw_devices.
Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
CC: stable@vger.kernel.org # 5.10+
Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The current assumption that methods to read PCH FIVR attributes will
return integer, is not correct. There is no good way to return integer
as negative numbers are also valid.
These read methods return a package of integers. The first integer returns
status, which is 0 on success and any other value for failure. When the
returned status is zero, then the second integer returns the actual value.
This change fixes this issue by replacing acpi_evaluate_integer() with
acpi_evaluate_object() and use acpi_extract_package() to extract results.
Fixes: 2ce6324eadb01 ("ACPI: DPTF: Add PCH FIVR participant driver")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The commit 0ec4e55e9f57 ("ACPI: resources: Add checks for ACPI IRQ
override") introduces regression on some platforms, at least it makes
the UART can't get correct irq setting on two different platforms,
and it makes the kernel can't bootup on these two platforms.
This reverts commit 0ec4e55e9f571f08970ed115ec0addc691eda613.
Regression-discuss: https://bugzilla.kernel.org/show_bug.cgi?id=213031
Reported-by: PGNd <pgnet.dev@gmail.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
For pure poll requests, it doesn't remove the second poll wait entry
when it's done, neither after vfs_poll() or in the poll completion
handler. We should remove the second poll wait entry.
And we use io_poll_remove_double() rather than io_poll_remove_waitqs()
since the latter has some redundant logic.
Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD")
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210728030322.12307-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
completion path. Normally we expect this to happen inline as part
of submission, but apparently SCSI has a weird corner case where it
can happen as part of normal completions.
This should be solved by having the -EAGAIN bubble back up the stack
as part of submission, but previous attempts at this failed and we're
not just quite there yet. Instead we currently use REQ_F_REISSUE to
handle this case.
For now, catch it in io_rw_should_reissue() and prevent a reissue
from a bogus path.
Cc: stable@vger.kernel.org
Reported-by: Fabian Ebner <f.ebner@proxmox.com>
Tested-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Joakim Zhang says:
====================
net: fec: add support for i.MX8MQ and i.MX8QM
This patch set adds supports for i.MX8MQ and i.MX8QM, both of them extend new features.
ChangeLogs:
V1->V2:
* rebase on schema binding, and update dts compatible string.
* use generic ethernet controller property for MAC internal RGMII clock delay
rx-internal-delay-ps and tx-internal-delay-ps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add "fsl,imx8qm-fec" compatible string for FEC to support new feature
(RGMII delayed clock).
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add "fsl,imx8mq-fec" compatible string for FEC to support new feature
(IEEE 802.3az EEE standard).
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
i.MX8QM ENET IP version support timing specification that MAC
integrate clock delay in RGMII mode, the delayed TXC/RXC as an
alternative option to work well with various PHYs.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The i.MX8MQ ENET version support IEEE802.3az eee mode, add
eee mode tx lpi enable to support ethtool interface.
usage:
1. set sleep and wake timer to 5ms:
ethtool --set-eee eth0 eee on tx-lpi on tx-timer 5000
2. check the eee mode:
~# ethtool --show-eee eth0
EEE Settings for eth0:
EEE status: enabled - active
Tx LPI: 5000 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
Note: For realtime case and IEEE1588 ptp case, it should disable
EEE mode.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ENET of imx8mq and imx8qm are basically the same as imx6sx,
but they have new features support based on imx6sx, like:
- imx8mq: supports IEEE 802.3az EEE standard.
- imx8qm: supports RGMII mode delayed clock.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add RGMII internal clock delay for FEC controller.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add more compatible items for i.MX8/8M platforms.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently we added a new option, SKBMOD_F_ECN, to tc-skbmod(8). Add a
control-plane selftest for it.
Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Introduce SKBMOD_F_ECN
option".
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when doing rate limiting using the tc-police(8) action, the
easiest way is to simply drop the packets which exceed or conform the
configured bandwidth limit. Add a new option to tc-skbmod(8), so that
users may use the ECN [1] extension to explicitly inform the receiver
about the congestion instead of dropping packets "on the floor".
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [2]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
As an example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod ecn
Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT
affect Non-ECT or non-IP packets. In the tc-police scenario mentioned
above, users may pipe a tc-police action and a tc-skbmod "ecn" action
together to achieve ECN-based rate limiting.
For TCP connections, upon receiving a CE packet, the receiver will respond
with an ECE packet, asking the sender to reduce their congestion window.
However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
our implementation does not touch or care about L4 headers.
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IPv{4,6} packets.
It is also worth mentioning that, in theory, the same effect could be
achieved by piping a "police" action and a "bpf" action using the
bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
user, thus impractical.
Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".
[1] https://datatracker.ietf.org/doc/html/rfc3168
[2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If nfp_tunnel_add_ipv6_off() fails, it should return error code
in nfp_fl_ct_add_offload().
Fixes: 5a2b93041646 ("nfp: flower-ct: compile match sections of flow_payload")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
using same source and destination ip/port for flow hash calculation
within the two directions.
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-879398883
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20210726153630.65213-1-linux@weissschuh.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Alder Lake has a new ACPI ID for Intel HID event filter device.
Signed-off-by: Ping Bao <ping.a.bao@intel.com>
Link: https://lore.kernel.org/r/20210721225615.20575-1-ping.a.bao@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The `input_mt_get_slot_by_key` function may return a negative value
if an error occurs (e.g. running out of slots). If this occurs we
should really avoid reporting any data for the slot.
Signed-off-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Commit 670e90924bfe ("HID: wacom: support named keys on older devices")
added support for sending named events from the soft buttons on the
24HDT and 27QHDT. In the process, however, it inadvertantly disabled the
touchscreen of the 24HDT and 27QHDT by default. The
`wacom_set_shared_values` function would normally enable touch by default
but because it checks the state of the non-shared `has_mute_touch_switch`
flag and `wacom_setup_touch_input_capabilities` sets the state of the
/shared/ version, touch ends up being disabled by default.
This patch sets the non-shared flag, letting `wacom_set_shared_values`
take care of copying the value over to the shared version and setting
the default touch state to "on".
Fixes: 670e90924bfe ("HID: wacom: support named keys on older devices")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
There is a spelling mistake in the Kconfig text. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
The Keychron K1 wireless keyboard has a set of Apple-like function keys
and an Fn key that works like on an Apple bluetooth keyboard. It
identifies as an Apple Alu RevB ANSI keyboard (05ac:024f) over USB and
BT. Use hid-apple for it so the Fn key and function keys work correctly.
Signed-off-by: Haochen Tong <i@hexchain.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
There is a missing space in "relyingon".
Add it.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Leon Romanovsky says:
====================
Remove duplicated devlink registration check
Changelog:
v1:
* Added two new patches that remove registration field from mlx5 and ti drivers.
v0: https://lore.kernel.org/lkml/ed7bbb1e4c51dd58e6035a058e93d16f883b09ce.1627215829.git.leonro@nvidia.com
--------------------------------------------------------------------
Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devlink is an integral part of mlx5 driver and all flows ensure that
devlink_*_register() will success. That makes the ->registered check
an obsolete.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit that introduced devlink support released devlink resources in
wrong order, that made an unwind flow to be asymmetrical. In addition,
the am65-cpsw-nuss used internal to devlink core field - registered.
In order to fix the unwind flow and remove such access to the
registered field, rewrite the code to call devlink_port_unregister only
on registered ports.
Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a use after free memory corruption during module exit:
- nfcsim_exit()
- nfcsim_device_free(dev0)
- nfc_digital_unregister_device()
This iterates over command queue and frees all commands,
- dev->up = false
- nfcsim_link_shutdown()
- nfcsim_link_recv_wake()
This wakes the sleeping thread nfcsim_link_recv_skb().
- nfcsim_link_recv_skb()
Wake from wait_event_interruptible_timeout(),
call directly the deb->cb callback even though (dev->up == false),
- digital_send_cmd_complete()
Dereference of "struct digital_cmd" cmd which was freed earlier by
nfc_digital_unregister_device().
This causes memory corruption shortly after (with unrelated stack
trace):
nfc nfc0: NFC: nfcsim_recv_wq: Device is down
llcp: nfc_llcp_recv: err -19
nfc nfc1: NFC: nfcsim_recv_wq: Device is down
BUG: unable to handle page fault for address: ffffffffffffffed
Call Trace:
fsnotify+0x54b/0x5c0
__fsnotify_parent+0x1fe/0x300
? vfs_write+0x27c/0x390
vfs_write+0x27c/0x390
ksys_write+0x63/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
KASAN report:
BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50
Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71
Workqueue: events nfcsim_recv_wq [nfcsim]
Call Trace:
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x21/0x140
? digital_send_cmd_complete+0x16/0x50
? digital_send_cmd_complete+0x16/0x50
kasan_report.cold+0x7f/0x11b
? digital_send_cmd_complete+0x16/0x50
? digital_dep_link_down+0x60/0x60
digital_send_cmd_complete+0x16/0x50
nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]
? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]
? lock_is_held_type+0x98/0x110
? finish_wait+0x110/0x110
? rcu_read_lock_sched_held+0x9c/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x12e/0x1f0
This flow of calling digital_send_cmd_complete() callback on driver exit
is specific to nfcsim which implements reading and sending work queues.
Since the NFC digital device was unregistered, the callback should not
be called.
Fixes: 204bddcb508f ("NFC: nfcsim: Make use of the Digital layer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Ben Hutchings noticed, this check should have been inverted: the call
returns true in case of success.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the function s3fwrn5_fw_download(), the 'ret' is not assigned,
so the correct value should be given in dev_err function.
Fixes: a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label")
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-07-27
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
blkdev_get_no_open acquires a reference to the block_device through
the block device inode and then tries to acquire a device model
reference to the gendisk. But at this point the disk migh already
be freed (although the race is free). Fix this by only freeing the
gendisk from the whole device bdevs ->free_inode callback as well.
Fixes: 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it
wants the wq_entry to be always removed whether it ended up waking the
task or not. finish_wait() tests whether wq_entry needs removal without
grabbing the wait_queue lock and expects the waker to use
list_del_init_careful() after all waking operations are complete, which
iocg_wake_fn() didn't do. The operation order was wrong and the regular
list_del_init() was used.
The result is that if a waiter wakes up racing the waker, it can free pop
the wq_entry off stack before the waker is still looking at it, which can
lead to a backtrace like the following.
[7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP
...
[7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0
...
[7312084.858314] Call Trace:
[7312084.863548] _raw_spin_lock_irqsave+0x22/0x30
[7312084.872605] try_to_wake_up+0x4c/0x4f0
[7312084.880444] iocg_wake_fn+0x71/0x80
[7312084.887763] __wake_up_common+0x71/0x140
[7312084.895951] iocg_kick_waitq+0xe8/0x2b0
[7312084.903964] ioc_rqos_throttle+0x275/0x650
[7312084.922423] __rq_qos_throttle+0x20/0x30
[7312084.930608] blk_mq_make_request+0x120/0x650
[7312084.939490] generic_make_request+0xca/0x310
[7312084.957600] submit_bio+0x173/0x200
[7312084.981806] swap_readpage+0x15c/0x240
[7312084.989646] read_swap_cache_async+0x58/0x60
[7312084.998527] swap_cluster_readahead+0x201/0x320
[7312085.023432] swapin_readahead+0x2df/0x450
[7312085.040672] do_swap_page+0x52f/0x820
[7312085.058259] handle_mm_fault+0xa16/0x1420
[7312085.066620] do_page_fault+0x2c6/0x5c0
[7312085.074459] page_fault+0x2f/0x40
Fix it by switching to list_del_init_careful() and putting it at the end.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rik van Riel <riel@surriel.com>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|