Age | Commit message (Collapse) | Author |
|
Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.
This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a
helper but wrongly sets up the target device. Incidentally there's a
local variable with the same name as a parameter in the previous
function, so this got caught during runtime as crash in test btrfs/027.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
ranges
[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)
Task A | Task B
----------------------------------------------------------------------
Buffered_write [0, 2K) |
|- check_data_free_space() |
| |- qgroup_reserve_data() |
| Range aligned to page |
| range [0, 4K) <<< |
| 4K bytes reserved <<< |
|- copy pages to page cache |
| Buffered_write [2K, 4K)
| |- check_data_free_space()
| | |- qgroup_reserved_data()
| | Range alinged to page
| | range [0, 4K)
| | Already reserved by A <<<
| | 0 bytes reserved <<<
| |- delalloc_reserve_metadata()
| | And it *FAILED* (Maybe EQUOTA)
| |- free_reserved_data_space()
|- qgroup_free_data()
Range aligned to page range
[0, 4K)
Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)
[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.
And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.
[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.
Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.
Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.
The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.
This will lead to qgroup reserved space underflow.
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
and quotas being enabled
[BUG]
Under the following case, we can underflow qgroup reserved space.
Task A | Task B
---------------------------------------------------------------
Quota disabled |
Buffered write |
|- btrfs_check_data_free_space() |
| *NO* qgroup space is reserved |
| since quota is *DISABLED* |
|- All pages are copied to page |
cache |
| Enable quota
| Quota scan finished
|
| Sync_fs
| |- run_delalloc_range
| |- Write pages
| |- btrfs_finish_ordered_io
| |- insert_reserved_file_extent
| |- btrfs_qgroup_release_data()
| Since no qgroup space is
reserved in Task A, we
underflow qgroup reserved
space
This can be detected by fstest btrfs/104.
[CAUSE]
In insert_reserved_file_extent() we tell qgroup to release the @ram_bytes
size of qgroup reserved_space in all cases.
And btrfs_qgroup_release_data() will check if quotas are enabled.
However in the above case, the buffered write happens before quota is
enabled, so we don't have the reserved space for that range.
[FIX]
In insert_reserved_file_extent(), we tell qgroup to release the acctual
byte number it released.
In the above case, since we don't have the reserved space, we tell
qgroups to release 0 byte, so the problem can be fixed.
And thanks to the @reserved parameter introduced by the qgroup rework,
and previous patch to return released bytes, the fix can be as small as
10 lines.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_qgroup_release/free_data() only returns 0 or a negative error
number (ENOMEM is the only possible error).
This is normally good enough, but sometimes we need the exact byte
count it freed/released.
Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one no-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().
Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.
Merging them will make code cleaner and less bug prone.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.
The quick exit condition is:
1) The extent belongs to a non-fs tree
Only fs-tree extents can affect qgroup numbers and is the only case
where extent can be shared between different trees.
Although strictly speaking extent in data-reloc or tree-reloc tree
can be shared, data/tree-reloc root won't appear in the result of
btrfs_find_all_roots(), so we can ignore such case.
So we can check the first root in old_roots/new_roots ulist.
- if we find the 1st root is a not a fs/subvol root, then we can skip
the extent
- if we find the 1st root is a fs/subvol root, then we must continue
calculation
OR
2) both 'nr_old_roots' and 'nr_new_roots' are 0
This means either such extent got allocated then freed in current
transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
Either way it won't affect qgroup accounting and can be skipped
safely.
Such quick exit can make trace output more quite and less confusing:
(example with fs uuid and time stamp removed)
Before:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
------
Extent tree block will trigger btrfs_qgroup_account_extent() trace point
while no qgroup number is changed, as extent tree won't affect qgroup
accounting.
After:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
------
Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
trace point, making the trace less noisy.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The total_bytes_pinned counter is completely broken when accounting
delayed refs:
- If two drops for the same extent are merged, we will decrement
total_bytes_pinned twice but only increment it once.
- If an add is merged into a drop or vice versa, we will decrement the
total_bytes_pinned counter but never increment it.
- If multiple references to an extent are dropped, we will account it
multiple times, potentially vastly over-estimating the number of bytes
that will be freed by a commit and doing unnecessary work when we're
close to ENOSPC.
The last issue is relatively minor, but the first two make the
total_bytes_pinned counter leak or underflow very often. These
accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
to keep track of possibly pinned bytes"), but they were papered over by
zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
race of using total_bytes_pinned").
We need to make sure that an extent is accounted as pinned exactly once
if and only if we will drop references to it when when the transaction
is committed. Ideally we would only add to total_bytes_pinned when the
*last* reference is dropped, but this information isn't readily
available for data extents. Again, this over-estimation can lead to
extra commits when we're close to ENOSPC, but it's not as bad as before.
The fix implemented here is to increment total_bytes_pinned when the
total refmod count for an extent goes negative and decrement it if the
refmod count goes back to non-negative or after we've run all of the
delayed refs for that extent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We need this to decide when to account pinned bytes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, we only increment total_bytes_pinned in
btrfs_free_tree_block() when dropping the last reference on the block.
However, when the delayed ref is run later, we will decrement
total_bytes_pinned regardless of whether it was the last reference or
not. This causes the counter to underflow when the reference we dropped
was not the last reference. Fix it by incrementing the counter
unconditionally, which is what btrfs_free_extent() does. This makes
total_bytes_pinned an overestimate when references to shared extents are
dropped, but in the worst case this will just make us try to commit the
transaction to try to free up space and find we didn't free enough.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The extents marked in pin_down_extent() will be unpinned later in
unpin_extent_range(), which decrements total_bytes_pinned.
pin_down_extent() must increment the counter to avoid underflowing it.
Also adjust btrfs_free_tree_block() to avoid accounting for the same
extent twice.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The value of flags is one of DATA/METADATA/SYSTEM, they must exist at
when add_pinned_bytes is called.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ added changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are a few places where we pass in a negative num_bytes, so make it
signed for clarity. Also move it up in the file since later patches will
need it there.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Bump the maximum API supported by these device families to 33.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The XATTR_ITEM is a type of a directory item so we use the common
validator helper. Unlike other dir items, it can have data. The way the
name len validation is currently implemented does not reflect that. We'd
have to adjust by the data_len when comparing the read and item limits.
However, this will not work for multi-item xattr dir items.
Example from tree dump of generic/337:
item 7 key (257 XATTR_ITEM 751495445) itemoff 15667 itemsize 147
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 3 name_len 11
name: user.foobar
data 123
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 6 name_len 13
name: user.WvG1c1Td
data qwerty
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 5 name_len 19
name: user.J3__T_Km3dVsW_
data hello
At the point of btrfs_is_name_len_valid call we don't have access to the
data_len value of the 2nd and 3rd sub-item. So simple btrfs_dir_data_len(leaf,
di) would always return 3, although we'd need to get 6 and 5 respectively to
get the claculations right. (read_end + name_len + data_len vs item_end)
We'd have to also pass data_len externally, which is not point of the
name validation. The last check is supposed to test if there's at least
one dir item space after the one we're processing. I don't think this is
particularly useful, validation of the next item would catch that too.
So the check is removed and we don't weaken the validation. Now tests
btrfs/048, btrfs/053, generic/273 and generic/337 pass.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The newest devices need a longer time to reset because of
their more complex hardware. Wait 5ms after device reset.
Consolidate all the places that reset the device in the
PCIe transport to avoid future bugs.
While at it, unify the flow to use set_bit instead of full
write as requested by the hardware designers.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
iwl_pcie_apm_init can fail so make sure that the caller
takes the status into account.
Also, ensure that the error that iwl_pcie_apm_init can emit
will appear in the kernel log by default.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When a station that's not associated sends a data frame (e.g. an NDP)
hostapd will respond with a disassoc frame, telling it that it's not
associated. The station might also not be authenticated, in which case
there will not be a station entry for it, and as a result we need to
accept such frames without a station.
Fixes: 3ee0f0e23e4f ("iwlwifi: mvm: fix DQA AP mode station assumption")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The API has changed - update the code.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When we get a non-STA frame to transmit in client mode, we try to use
the IWL_MVM_DQA_BSS_CLIENT_QUEUE queue (queue #4). However, at this
point, the queue might not be allocated at all, causing warnings. The
scenario on which this happened was a race condition between mac80211
and our queue allocation work:
* mac80211 sends auth
* we stop mac80211 queues to allocate a hw queue
* authentication is aborted
* we allocate HW queue and start mac80211 queues
* mac80211 removes station
* mac80211 hands us the auth frame from the pending queue
At this point, since mac80211 has already removed the station, we try
to transmit the frame through this special non-station case on queue
4 anyway.
In order to really use it properly, we'd have to again go through the
hw queue allocation work, and attach it to a station, etc. In this
case that isn't possible (there's no station anymore), but if this
special case were needed, then we'd have to do it this way.
However, the special case is documented to exist for TDLS, but can't
trigger there because the TDLS setup frames etc. are normal to-DS
frames going to the peer through the AP. Testing also confirms that
this code path isn't triggered in TDLS.
Therefore, remove the code path to avoid using an unused queue. The
erroneous frame described above will still be transmitted on the AUX
queue, but arguably that's a mac80211 problem, which will eventually
be fixed by moving everything there to TXQs.
Fixes: e3118ad74d7e ("iwlwifi: mvm: support tdls in dqa mode")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When we get large sends on non-QoS association, we had a
bug that mangled the SNAP header. Fix that.
Fixes: a6d5e32f247c ("iwlwifi: mvm: send large SKBs to the transport")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When going into suspend, the HW configuration for MSI-X will
likely be lost. As a consequence, after waking up, all IRQ
causes will be mapped to interrupt 0, and as a consequence we
don't notice the interrupt because in most cases this is an
interrupt for a queue, and getting it doesn't read the other
cause registers.
Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Getting the TID of a packet before we know it is a QoS data
packet isn't a good idea. Delay the TID retrieval until
we know the packet is a QoS data packet.
Fixes: bb81bb68f472 ("iwlwifi: mvm: add Tx A-MSDU inside A-MPDU")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When in non-DQA mode, mac80211 actually gets a pretty much perfect
idea (in vif->hw_queue/cab_queue) of which queues we're using. But
in DQA mode, this isn't true - nonetheless, we were adding all the
queues, even the ones stations are using, to the queue allocation
bitmap.
Fix this, we should only add the queues we really are using in DQA
mode:
* IWL_MVM_OFFCHANNEL_QUEUE, as we use this in both modes
* mvm->aux_queue, as we use this in both modes - mac80211
never really knows about it but we use it as a cookie
internally, so can't reuse it
* possibly the GCAST queue (cab_queue)
* all the "queues" we told mac80211 about we were using on each
interface (vif->hw_queue), these are entirely virtual in this
mode
Also add back the failure now when we can't allocate any more of
these - now virtual - queues; this was skipped in DQA mode and
would lead to having multiple ACs or even interfaces use the same
queue number in mac80211 (10, since that's the limit), which would
stop far too many queues if stopped.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
There may be a difference between the mac80211 vif->cab_queue and
mvmvif->cab_queue, particularly with TVQM. Make the code map this
earlier, instead of first returning the mac80211 one again from
iwl_mvm_get_ctrl_vif_queue().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
In the driver, we track which hardware queue is associated with
which mac80211 "hw_queue", in order to be able to stop and wake
it. When moving these bitmaps out of the queue_info structures,
the type of the bitmap was erroneously changed from u32 to u8,
presumably in order to save memory.
Turns out that u32 isn't needed, because the highest queue we
can ever tell mac80211 is always < 16, but a u16 definitely is
needed, queues >=8 do happen.
While at it, throw a BUILD_BUG_ON() into the place where we set
the limit (mvm->first_agg_queue) and a warning when it actually
gets put into the bitmap.
The consequence of this bug is that full HW queues associated
with such a too-high mac80211 number never stop higher layer
queues when full, and thus would simply drop all packets that
couldn't be enqueued to the hardware queue.
Fixes: 34e10860ae8d ("iwlwifi: mvm: remove references to queue_info in new TX path")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The code was intended to enable IP header checksumming on AMSDUs, but
failed to really do so because the A-MSDU bit was set after all the
checksumming bits, and thus checking for A-MSDU could never be true.
Fix this by setting the A-MSDU bit before the offload bits.
Fixes: 5e6a98dc4863 ("iwlwifi: mvm: enable TCP/UDP checksum support for 9000 family")
Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
We have tracing for both pre-ICT and ICT interrupts, including all
the data read there. Extend the tracing to MSI-X interrupts.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Martin KaFai Lau says:
====================
bpf: Add syscall lookup support for fd array and htab
This patchset adds BPF_MAP_LOOKUP_ELEM syscall support for
BPF_MAP_TYPE_PROG_ARRAY,
BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Checks are added to the existing sockex3 and test_map_in_map test.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on
BPF_MAP_TYPE_PROG_ARRAY,
BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS.
The lookup returns a prog-id or map-id to the userspace.
The userspace can then use the BPF_PROG_GET_FD_BY_ID
or BPF_MAP_GET_FD_BY_ID to get a fd.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case a VLAN device is enslaved to a bridge we shouldn't create a
router interface (RIF) for it when it's configured with an IP address.
This is already handled by the driver for other types of netdevs, such
as physical ports and LAG devices.
If this IP address is then removed and the interface is subsequently
unlinked from the bridge, a NULL pointer dereference can happen, as the
original 802.1d FID was replaced with an rFID which was then deleted.
To reproduce:
$ ip link set dev enp3s0np9 up
$ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111
$ ip link set dev enp3s0np9.111 up
$ ip link add name br0 type bridge
$ ip link set dev br0 up
$ ip link set enp3s0np9.111 master br0
$ ip address add dev enp3s0np9.111 192.168.0.1/24
$ ip address del dev enp3s0np9.111 192.168.0.1/24
$ ip link set dev enp3s0np9.111 nomaster
Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When qdisc fail to init, qdisc_create would invoke the destroy callback
to cleanup. But there is no check if the callback exists really. So it
would cause the panic if there is no real destroy callback like the qdisc
codel, fq, and so on.
Take codel as an example following:
When a malicious user constructs one invalid netlink msg, it would cause
codel_init->codel_change->nla_parse_nested failed.
Then kernel would invoke the destroy callback directly but qdisc codel
doesn't define one. It causes one panic as a result.
Now add one the check for destroy to avoid the possible panic.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't hold any tx lock when trying to disable TX during reset, this
would lead a use after free since ndo_start_xmit() tries to access
the virtqueue which has already been freed. Fix this by using
netif_tx_disable() before freeing the vqs, this could make sure no tx
after vq freeing.
Reported-by: Jean-Philippe Menil <jpmenil@gmail.com>
Tested-by: Jean-Philippe Menil <jpmenil@gmail.com>
Fixes commit f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Version 3.70a of the Designware has additional DMA registers so
add those to the ethtool DMA Register dump.
Offset 9 - Receive Interrupt Watchdog Timer Register
Offset 10 - AXI Bus Mode Register
Offset 11 - AHB or AXI Status Register
Offset 22 - HW Feature Register
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that compat_vfp_get() uses the regset API to copy the FPSCR
value out to userspace, compat_vfp_set() looks inconsistent. In
particular, compat_vfp_set() will fail if called with kbuf != NULL
&& ubuf == NULL (which is valid usage according to the regset API).
This patch fixes compat_vfp_set() to use user_regset_copyin(),
similarly to compat_vfp_get().
This also squashes a sparse warning triggered by the cast that
drops __user when calling get_user().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
compat_vfp_set() checks for userspace trying to write an excessive
amount of data to the regset. However this check is conspicuous
for its absence from every other _set() in the arm64 ptrace
implementation. In fact, the core ptrace_regset() already clamps
userspace's iov_len to the regset size before the individual regset
.{get,set}() methods get called.
This patch removes the redundant check.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If get_user() fails when reading the new FPSCR value from userspace
in compat_vfp_get(), then garbage* will be written to the task's
FPSR and FPCR registers.
This patch prevents this by checking the return from get_user()
first.
[*] Actually, zero, due to the behaviour of get_user() on error, but
that's still not what userspace expects.
Fixes: 478fcb2cdb23 ("arm64: Debugging support")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-06-27 (Innova IPsec offload support)
This patchset adds support for Innova IPSec network interface card.
About Innova device:
--------------------
Innova is a network card with a ConnectX chip and an FPGA chip as a
bump-on-the-wire.
Internal
+----------+ Link +-----------------+
| +--------------+ FPGA | +------+
| ConnectX | | Shell +--+ QSFP |
| +--------------+ +-------+ | | Port |
+----------+ I2C | | SBU | | +------+
| +-------+ |
+--+----------+---+
| |
+--+--+ +---+---+
| DDR | | Flash |
+-----+ +-------+
The FPGA synthesized logic is loaded from dedicated flash storage and has
access to its own dedicated DDR RAM.
The ConnectX chip firmware programs the FPGA by accessing its configuration
space over either the slow internal I2C link or the high-speed internal link.
The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU).
mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality,
while other components may handle the various SBU functionalities.
The driver opens high-speed reliable communication channels with the shell and
the SBU over the internal link.
These channels may be used for high-bandwidth configuration or for SBU-specific
out-of-band data paths.
About Innova IPSec device:
--------------------------
Innova IPSec is a network card that allows offloading IPSec cryptography operations
from the host CPU to the NIC. It is an Innova card with an IPSec SBU.
The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's
DDR memory.
Internal
+----------+ Link +-----------------+
| +--------------+ FPGA | +------+
| ConnectX | | Shell +--+ QSFP |
| +--------------+ +-------+ | | Port |
+----------+ Internal I2C | | IPSec | | +------+
| | SBU | |
| +-------+ |
+--+----------+---+
| |
+--+--+ +---+---+
| DDR | | |
| | | Flash |
|SADB | | |
+-----+ +-------+
Modes and ciphers:
Currently the following modes and ciphers are supported:
IPv4 and IPv6
ESP tunnel and transport modes
AES 128 and 256 bit encryption, with GCM authentication (RFC4106)
IV is generated using seqiv, in sync with Linux's geniv.
More modes and ciphers may be added later.
Notes:
In the future similar functionality will be included in a single-chip NIC.
About the driver:
-----------------
Patches 1-4 prepare some existing driver code for the new feature:
* Add support for reserved GIDs in the hardware GID table
* Allow multiple modules to enable hardware RoCE support independently
Patches 5-6 define structs and helper functions for QP work-queues.
Patches 7-11 add various FPGA-related features required for Innova.
IPSec.
Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices.
atches 13-16 add IPSec offload support to the mlx5 netdevice.
This driver services the new IPSec offload API introduced in commit
d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Configuration Path:
If Innova IPSec device is detected, the mlx5e netdevice gets the new
NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload
capabilities, and also the matching TX checksum and GSO features.
The driver configures offloaded Security Associations (SAs) by sending
an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR.
These messages and their responses are sent over a high-speed channel.
Counters for ethtool are retrieved by the driver from the SBU.
Data path:
On receive path, the SBU decrypts ESP packets which match the offloaded SADB,
but keeps them encapsulated.
The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload
has taken place, the SA with which it was done, and the authentication result.
The ConnectX chip performs RX checksum offload on the packet, and RSS using the
ESP SPI value. The driver detects the special ethertype, and attaches a struct
secpath to the RX SKB, including flags to indicate that crypto offload took place,
the authentication result, and which xfrm_state was used for decryption, in the
olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate
patchset will add support for that in the xfrm stack.
On transmit path, the stack encapsulates the packet but does not encrypt it, and
indicates in the SKB's secpath that crypto offload is to be performed and the SA
to use to do so.
The driver avoids performing crypto-offload for ESP fragments, and packets with
IP options, as the SBU cannot currently do that. For eligible packets, the driver
prepends a special ethertype with metadata instructing the hardware to perform crypto offload.
The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer.
The driver trims it off, because the SBU automatically appends the trailer for offloaded packets.
The ConnectX chip performs TX checksum offload on inner UDP or TCP packets,
and GSO for TCP packets (duplicating the prepended metadata).
The segmented packets then undergo encryption in the SBU before going on the wire.
Performance:
We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
Using AES-NI with ESP GSO we get constant 4.1 Gbps.
Using crypto offload we get constant 18 Gbps.
Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately.
- Ilan Tayari
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ivan Khoronzhuk says:
====================
net: fix sw timestamping for non PTP packets
This series contains several corrections connected with timestamping
for cpsw and netcp drivers based on same cpts module.
Based on net/next
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is cpts function to check if packet can be timstamped with cpts.
Seems that ptp_classify_raw cover all cases listed with "case".
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The cpts can timestmap only ptp packets at this moment, so driver
cannot mark every packet as though it's going to be timestamped,
only because h/w timestamping for given skb is enabled with
SKBTX_HW_TSTAMP. It doesn't allow to use sw timestamping, as result
outgoing packet is not timestamped at all if it's not PTP and h/w
timestamping is enabled. So, fix it by setting SKBTX_IN_PROGRESS
only for PTP packets.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move sw timestamp function close to channel submit function.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using netdev_<level>(netdev, "%s: ...", netdev->name) duplicates the
name in the output. Remove those uses.
Miscellanea:
o Use the netif_<level> convenience macros at the same time
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix to spelling mistake in mlx4_dbg debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix to spelling mistake in netif_info message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|