linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-06-29	btrfs: fix integer overflow in calc_reclaim_items_nr	Chris Mason
	Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with v4.12-rc6. This was because commit 70e7af244 made it possible for calc_reclaim_items_nr() to return a negative number. It's not really a bug in that commit, it just didn't go far enough down the stack to find all the possible 64->32 bit overflows. This switches calc_reclaim_items_nr() to return a u64 and changes everyone that uses the results of that math to u64 as well. Reported-by: Dave Jones <davej@codemonkey.org.uk> Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow") Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: scrub: fix target device intialization while setting up scrub context	David Sterba
	The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a helper but wrongly sets up the target device. Incidentally there's a local variable with the same name as a parameter in the previous function, so this got caught during runtime as crash in test btrfs/027. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ↵	Qu Wenruo
	ranges [BUG] For the following case, btrfs can underflow qgroup reserved space at an error path: (Page size 4K, function name without "btrfs_" prefix) Task A \| Task B ---------------------------------------------------------------------- Buffered_write [0, 2K) \| \|- check_data_free_space() \| \| \|- qgroup_reserve_data() \| \| Range aligned to page \| \| range [0, 4K) <<< \| \| 4K bytes reserved <<< \| \|- copy pages to page cache \| \| Buffered_write [2K, 4K) \| \|- check_data_free_space() \| \| \|- qgroup_reserved_data() \| \| Range alinged to page \| \| range [0, 4K) \| \| Already reserved by A <<< \| \| 0 bytes reserved <<< \| \|- delalloc_reserve_metadata() \| \| And it FAILED (Maybe EQUOTA) \| \|- free_reserved_data_space() \|- qgroup_free_data() Range aligned to page range [0, 4K) Freeing 4K (Special thanks to Chandan for the detailed report and analyse) [CAUSE] Above Task B is freeing reserved data range [0, 4K) which is actually reserved by Task A. And at writeback time, page dirty by Task A will go through writeback routine, which will free 4K reserved data space at file extent insert time, causing the qgroup underflow. [FIX] For btrfs_qgroup_free_data(), add @reserved parameter to only free data ranges reserved by previous btrfs_qgroup_reserve_data(). So in above case, Task B will try to free 0 byte, so no underflow. Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Introduce extent changeset for qgroup reserve functions	Qu Wenruo
	Introduce a new parameter, struct extent_changeset for btrfs_qgroup_reserved_data() and its callers. Such extent_changeset was used in btrfs_qgroup_reserve_data() to record which range it reserved in current reserve, so it can free it in error paths. The reason we need to export it to callers is, at buffered write error path, without knowing what exactly which range we reserved in current allocation, we can free space which is not reserved by us. This will lead to qgroup reserved space underflow. Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write ↵	Qu Wenruo
	and quotas being enabled [BUG] Under the following case, we can underflow qgroup reserved space. Task A \| Task B --------------------------------------------------------------- Quota disabled \| Buffered write \| \|- btrfs_check_data_free_space() \| \| NO qgroup space is reserved \| \| since quota is DISABLED \| \|- All pages are copied to page \| cache \| \| Enable quota \| Quota scan finished \| \| Sync_fs \| \|- run_delalloc_range \| \|- Write pages \| \|- btrfs_finish_ordered_io \| \|- insert_reserved_file_extent \| \|- btrfs_qgroup_release_data() \| Since no qgroup space is reserved in Task A, we underflow qgroup reserved space This can be detected by fstest btrfs/104. [CAUSE] In insert_reserved_file_extent() we tell qgroup to release the @ram_bytes size of qgroup reserved_space in all cases. And btrfs_qgroup_release_data() will check if quotas are enabled. However in the above case, the buffered write happens before quota is enabled, so we don't have the reserved space for that range. [FIX] In insert_reserved_file_extent(), we tell qgroup to release the acctual byte number it released. In the above case, since we don't have the reserved space, we tell qgroups to release 0 byte, so the problem can be fixed. And thanks to the @reserved parameter introduced by the qgroup rework, and previous patch to return released bytes, the fix can be as small as 10 lines. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ changelog updates ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Return actually freed bytes for qgroup release or free data	Qu Wenruo
	btrfs_qgroup_release/free_data() only returns 0 or a negative error number (ENOMEM is the only possible error). This is normally good enough, but sometimes we need the exact byte count it freed/released. Change it to return actually released/freed bytenr number instead of 0 for success. And slightly modify related extent_changeset structure, since in btrfs one no-hole data extent won't be larger than 128M, so "unsigned int" is large enough for the use case. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function	Qu Wenruo
	Quite a lot of qgroup corruption happens due to wrong time of calling btrfs_qgroup_prepare_account_extents(). Since the safest time is to call it just before btrfs_qgroup_account_extents(), there is no need to separate these 2 functions. Merging them will make code cleaner and less bug prone. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ changelog and comment adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	btrfs: qgroup: Add quick exit for non-fs extents	Qu Wenruo
	Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents. The quick exit condition is: 1) The extent belongs to a non-fs tree Only fs-tree extents can affect qgroup numbers and is the only case where extent can be shared between different trees. Although strictly speaking extent in data-reloc or tree-reloc tree can be shared, data/tree-reloc root won't appear in the result of btrfs_find_all_roots(), so we can ignore such case. So we can check the first root in old_roots/new_roots ulist. - if we find the 1st root is a not a fs/subvol root, then we can skip the extent - if we find the 1st root is a fs/subvol root, then we must continue calculation OR 2) both 'nr_old_roots' and 'nr_new_roots' are 0 This means either such extent got allocated then freed in current transaction or it's a new reloc tree extent, whose nr_new_roots is 0. Either way it won't affect qgroup accounting and can be skipped safely. Such quick exit can make trace output more quite and less confusing: (example with fs uuid and time stamp removed) Before: ------ add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0 btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1 ------ Extent tree block will trigger btrfs_qgroup_account_extent() trace point while no qgroup number is changed, as extent tree won't affect qgroup accounting. After: ------ add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0 ------ Now such unrelated extent won't trigger btrfs_qgroup_account_extent() trace point, making the trace less noisy. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ changelog and comment adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: rework delayed ref total_bytes_pinned accounting	Omar Sandoval
	The total_bytes_pinned counter is completely broken when accounting delayed refs: - If two drops for the same extent are merged, we will decrement total_bytes_pinned twice but only increment it once. - If an add is merged into a drop or vice versa, we will decrement the total_bytes_pinned counter but never increment it. - If multiple references to an extent are dropped, we will account it multiple times, potentially vastly over-estimating the number of bytes that will be freed by a commit and doing unnecessary work when we're close to ENOSPC. The last issue is relatively minor, but the first two make the total_bytes_pinned counter leak or underflow very often. These accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu to keep track of possibly pinned bytes"), but they were papered over by zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix race of using total_bytes_pinned"). We need to make sure that an extent is accounted as pinned exactly once if and only if we will drop references to it when when the transaction is committed. Ideally we would only add to total_bytes_pinned when the last reference is dropped, but this information isn't readily available for data extents. Again, this over-estimation can lead to extra commits when we're close to ENOSPC, but it's not as bad as before. The fix implemented here is to increment total_bytes_pinned when the total refmod count for an extent goes negative and decrement it if the refmod count goes back to non-negative or after we've run all of the delayed refs for that extent. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: return old and new total ref mods when adding delayed refs	Omar Sandoval
	We need this to decide when to account pinned bytes. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: always account pinned bytes when dropping a tree block ref	Omar Sandoval
	Currently, we only increment total_bytes_pinned in btrfs_free_tree_block() when dropping the last reference on the block. However, when the delayed ref is run later, we will decrement total_bytes_pinned regardless of whether it was the last reference or not. This causes the counter to underflow when the reference we dropped was not the last reference. Fix it by incrementing the counter unconditionally, which is what btrfs_free_extent() does. This makes total_bytes_pinned an overestimate when references to shared extents are dropped, but in the worst case this will just make us try to commit the transaction to try to free up space and find we didn't free enough. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: update total_bytes_pinned when pinning down extents	Omar Sandoval
	The extents marked in pin_down_extent() will be unpinned later in unpin_extent_range(), which decrements total_bytes_pinned. pin_down_extent() must increment the counter to avoid underflowing it. Also adjust btrfs_free_tree_block() to avoid accounting for the same extent twice. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()	Omar Sandoval
	The value of flags is one of DATA/METADATA/SYSTEM, they must exist at when add_pinned_bytes is called. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: David Sterba <dsterba@suse.com> [ added changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64	Omar Sandoval
	There are a few places where we pass in a negative num_bytes, so make it signed for clarity. Also move it up in the file since later patches will need it there. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	iwlwifi: bump MAX API for 8000/9000/A000 to 33	Luca Coelho
	Bump the maximum API supported by these device families to 33. Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	btrfs: fix validation of XATTR_ITEM dir items	David Sterba
	The XATTR_ITEM is a type of a directory item so we use the common validator helper. Unlike other dir items, it can have data. The way the name len validation is currently implemented does not reflect that. We'd have to adjust by the data_len when comparing the read and item limits. However, this will not work for multi-item xattr dir items. Example from tree dump of generic/337: item 7 key (257 XATTR_ITEM 751495445) itemoff 15667 itemsize 147 location key (0 UNKNOWN.0 0) type XATTR transid 8 data_len 3 name_len 11 name: user.foobar data 123 location key (0 UNKNOWN.0 0) type XATTR transid 8 data_len 6 name_len 13 name: user.WvG1c1Td data qwerty location key (0 UNKNOWN.0 0) type XATTR transid 8 data_len 5 name_len 19 name: user.J3__T_Km3dVsW_ data hello At the point of btrfs_is_name_len_valid call we don't have access to the data_len value of the 2nd and 3rd sub-item. So simple btrfs_dir_data_len(leaf, di) would always return 3, although we'd need to get 6 and 5 respectively to get the claculations right. (read_end + name_len + data_len vs item_end) We'd have to also pass data_len externally, which is not point of the name validation. The last check is supposed to test if there's at least one dir item space after the one we're processing. I don't think this is particularly useful, validation of the next item would catch that too. So the check is removed and we don't weaken the validation. Now tests btrfs/048, btrfs/053, generic/273 and generic/337 pass. Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29	iwlwifi: pcie: wait longer after device reset	Emmanuel Grumbach
	The newest devices need a longer time to reset because of their more complex hardware. Wait 5ms after device reset. Consolidate all the places that reset the device in the PCIe transport to avoid future bugs. While at it, unify the flow to use set_bit instead of full write as requested by the hardware designers. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: pcie: propagate iwl_pcie_apm_init's status	Emmanuel Grumbach
	iwl_pcie_apm_init can fail so make sure that the caller takes the status into account. Also, ensure that the error that iwl_pcie_apm_init can emit will appear in the kernel log by default. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: quietly accept non-sta disassoc frames	Johannes Berg
	When a station that's not associated sends a data frame (e.g. an NDP) hostapd will respond with a disassoc frame, telling it that it's not associated. The station might also not be authenticated, in which case there will not be a station entry for it, and as a result we need to accept such frames without a station. Fixes: 3ee0f0e23e4f ("iwlwifi: mvm: fix DQA AP mode station assumption") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: update rx statistics cmd api	Liad Kaufman
	The API has changed - update the code. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: remove DQA non-STA client mode special case	Johannes Berg
	When we get a non-STA frame to transmit in client mode, we try to use the IWL_MVM_DQA_BSS_CLIENT_QUEUE queue (queue #4). However, at this point, the queue might not be allocated at all, causing warnings. The scenario on which this happened was a race condition between mac80211 and our queue allocation work: * mac80211 sends auth * we stop mac80211 queues to allocate a hw queue * authentication is aborted * we allocate HW queue and start mac80211 queues * mac80211 removes station * mac80211 hands us the auth frame from the pending queue At this point, since mac80211 has already removed the station, we try to transmit the frame through this special non-station case on queue 4 anyway. In order to really use it properly, we'd have to again go through the hw queue allocation work, and attach it to a station, etc. In this case that isn't possible (there's no station anymore), but if this special case were needed, then we'd have to do it this way. However, the special case is documented to exist for TDLS, but can't trigger there because the TDLS setup frames etc. are normal to-DS frames going to the peer through the AP. Testing also confirms that this code path isn't triggered in TDLS. Therefore, remove the code path to avoid using an unused queue. The erroneous frame described above will still be transmitted on the AUX queue, but arguably that's a mac80211 problem, which will eventually be fixed by moving everything there to TXQs. Fixes: e3118ad74d7e ("iwlwifi: mvm: support tdls in dqa mode") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: don't mess the SNAP header in TSO for non-QoS packets	Emmanuel Grumbach
	When we get large sends on non-QoS association, we had a bug that mangled the SNAP header. Fix that. Fixes: a6d5e32f247c ("iwlwifi: mvm: send large SKBs to the transport") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: pcie: reconfigure MSI-X HW on resume	Johannes Berg
	When going into suspend, the HW configuration for MSI-X will likely be lost. As a consequence, after waking up, all IRQ causes will be mapped to interrupt 0, and as a consequence we don't notice the interrupt because in most cases this is an interrupt for a queue, and getting it doesn't read the other cause registers. Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: don't send fetch the TID from a non-QoS packet in TSO	Emmanuel Grumbach
	Getting the TID of a packet before we know it is a QoS data packet isn't a good idea. Delay the TID retrieval until we know the packet is a QoS data packet. Fixes: bb81bb68f472 ("iwlwifi: mvm: add Tx A-MSDU inside A-MPDU") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: fix mac80211's hw_queue in DQA mode	Johannes Berg
	When in non-DQA mode, mac80211 actually gets a pretty much perfect idea (in vif->hw_queue/cab_queue) of which queues we're using. But in DQA mode, this isn't true - nonetheless, we were adding all the queues, even the ones stations are using, to the queue allocation bitmap. Fix this, we should only add the queues we really are using in DQA mode: * IWL_MVM_OFFCHANNEL_QUEUE, as we use this in both modes * mvm->aux_queue, as we use this in both modes - mac80211 never really knows about it but we use it as a cookie internally, so can't reuse it * possibly the GCAST queue (cab_queue) * all the "queues" we told mac80211 about we were using on each interface (vif->hw_queue), these are entirely virtual in this mode Also add back the failure now when we can't allocate any more of these - now virtual - queues; this was skipped in DQA mode and would lead to having multiple ACs or even interfaces use the same queue number in mac80211 (10, since that's the limit), which would stop far too many queues if stopped. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: map cab_queue to real one earlier	Johannes Berg
	There may be a difference between the mac80211 vif->cab_queue and mvmvif->cab_queue, particularly with TVQM. Make the code map this earlier, instead of first returning the mac80211 one again from iwl_mvm_get_ctrl_vif_queue(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: fix mac80211 queue tracking	Johannes Berg
	In the driver, we track which hardware queue is associated with which mac80211 "hw_queue", in order to be able to stop and wake it. When moving these bitmaps out of the queue_info structures, the type of the bitmap was erroneously changed from u32 to u8, presumably in order to save memory. Turns out that u32 isn't needed, because the highest queue we can ever tell mac80211 is always < 16, but a u16 definitely is needed, queues >=8 do happen. While at it, throw a BUILD_BUG_ON() into the place where we set the limit (mvm->first_agg_queue) and a warning when it actually gets put into the bitmap. The consequence of this bug is that full HW queues associated with such a too-high mac80211 number never stop higher layer queues when full, and thus would simply drop all packets that couldn't be enqueued to the hardware queue. Fixes: 34e10860ae8d ("iwlwifi: mvm: remove references to queue_info in new TX path") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: mvm: properly enable IP header checksumming	Johannes Berg
	The code was intended to enable IP header checksumming on AMSDUs, but failed to really do so because the A-MSDU bit was set after all the checksumming bits, and thus checking for A-MSDU could never be true. Fix this by setting the A-MSDU bit before the offload bits. Fixes: 5e6a98dc4863 ("iwlwifi: mvm: enable TCP/UDP checksum support for 9000 family") Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	iwlwifi: pcie: add MSI-X interrupt tracing	Johannes Berg
	We have tracing for both pre-ICT and ICT interrupts, including all the data read there. Extend the tracing to MSI-X interrupts. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-06-29	Merge branch 'bpf-Add-syscall-lookup-support-for-fd-array-and-htab'	David S. Miller
	Martin KaFai Lau says: ==================== bpf: Add syscall lookup support for fd array and htab This patchset adds BPF_MAP_LOOKUP_ELEM syscall support for BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS ==================== Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	bpf: Add test for syscall on fd array/htab lookup	Martin KaFai Lau
	Checks are added to the existing sockex3 and test_map_in_map test. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	bpf: Add syscall lookup support for fd array and htab	Martin KaFai Lau
	This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	mlxsw: spectrum_router: Fix NULL pointer dereference	Ido Schimmel
	In case a VLAN device is enslaved to a bridge we shouldn't create a router interface (RIF) for it when it's configured with an IP address. This is already handled by the driver for other types of netdevs, such as physical ports and LAG devices. If this IP address is then removed and the interface is subsequently unlinked from the bridge, a NULL pointer dereference can happen, as the original 802.1d FID was replaced with an rFID which was then deleted. To reproduce: $ ip link set dev enp3s0np9 up $ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111 $ ip link set dev enp3s0np9.111 up $ ip link add name br0 type bridge $ ip link set dev br0 up $ ip link set enp3s0np9.111 master br0 $ ip address add dev enp3s0np9.111 192.168.0.1/24 $ ip address del dev enp3s0np9.111 192.168.0.1/24 $ ip link set dev enp3s0np9.111 nomaster Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Petr Machata <petrm@mellanox.com> Tested-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: sched: Fix one possible panic when no destroy callback	Gao Feng
	When qdisc fail to init, qdisc_create would invoke the destroy callback to cleanup. But there is no check if the callback exists really. So it would cause the panic if there is no real destroy callback like the qdisc codel, fq, and so on. Take codel as an example following: When a malicious user constructs one invalid netlink msg, it would cause codel_init->codel_change->nla_parse_nested failed. Then kernel would invoke the destroy callback directly but qdisc codel doesn't define one. It causes one panic as a result. Now add one the check for destroy to avoid the possible panic. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	virtio-net: serialize tx routine during reset	Jason Wang
	We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Tested-by: Jean-Philippe Menil <jpmenil@gmail.com> Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: stmmac: Add additional registers for dwmac1000_dma ethtool	Thor Thayer
	Version 3.70a of the Designware has additional DMA registers so add those to the ethtool DMA Register dump. Offset 9 - Receive Interrupt Watchdog Timer Register Offset 10 - AXI Bus Mode Register Offset 11 - AHB or AXI Status Register Offset 22 - HW Feature Register Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()	Dave Martin
	Now that compat_vfp_get() uses the regset API to copy the FPSCR value out to userspace, compat_vfp_set() looks inconsistent. In particular, compat_vfp_set() will fail if called with kbuf != NULL && ubuf == NULL (which is valid usage according to the regset API). This patch fixes compat_vfp_set() to use user_regset_copyin(), similarly to compat_vfp_get(). This also squashes a sparse warning triggered by the cast that drops __user when calling get_user(). Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-29	arm64: ptrace: Remove redundant overrun check from compat_vfp_set()	Dave Martin
	compat_vfp_set() checks for userspace trying to write an excessive amount of data to the regset. However this check is conspicuous for its absence from every other _set() in the arm64 ptrace implementation. In fact, the core ptrace_regset() already clamps userspace's iov_len to the regset size before the individual regset .{get,set}() methods get called. This patch removes the redundant check. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-29	arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails	Dave Martin
	If get_user() fails when reading the new FPSCR value from userspace in compat_vfp_get(), then garbage* will be written to the task's FPSR and FPCR registers. This patch prevents this by checking the return from get_user() first. [*] Actually, zero, due to the behaviour of get_user() on error, but that's still not what userspace expects. Fixes: 478fcb2cdb23 ("arm64: Debugging support") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-29	Merge tag 'mlx5-updates-2017-06-27' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-06-27 (Innova IPsec offload support) This patchset adds support for Innova IPSec network interface card. About Innova device: -------------------- Innova is a network card with a ConnectX chip and an FPGA chip as a bump-on-the-wire. Internal +----------+ Link +-----------------+ \| +--------------+ FPGA \| +------+ \| ConnectX \| \| Shell +--+ QSFP \| \| +--------------+ +-------+ \| \| Port \| +----------+ I2C \| \| SBU \| \| +------+ \| +-------+ \| +--+----------+---+ \| \| +--+--+ +---+---+ \| DDR \| \| Flash \| +-----+ +-------+ The FPGA synthesized logic is loaded from dedicated flash storage and has access to its own dedicated DDR RAM. The ConnectX chip firmware programs the FPGA by accessing its configuration space over either the slow internal I2C link or the high-speed internal link. The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU). mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality, while other components may handle the various SBU functionalities. The driver opens high-speed reliable communication channels with the shell and the SBU over the internal link. These channels may be used for high-bandwidth configuration or for SBU-specific out-of-band data paths. About Innova IPSec device: -------------------------- Innova IPSec is a network card that allows offloading IPSec cryptography operations from the host CPU to the NIC. It is an Innova card with an IPSec SBU. The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's DDR memory. Internal +----------+ Link +-----------------+ \| +--------------+ FPGA \| +------+ \| ConnectX \| \| Shell +--+ QSFP \| \| +--------------+ +-------+ \| \| Port \| +----------+ Internal I2C \| \| IPSec \| \| +------+ \| \| SBU \| \| \| +-------+ \| +--+----------+---+ \| \| +--+--+ +---+---+ \| DDR \| \| \| \| \| \| Flash \| \|SADB \| \| \| +-----+ +-------+ Modes and ciphers: Currently the following modes and ciphers are supported: IPv4 and IPv6 ESP tunnel and transport modes AES 128 and 256 bit encryption, with GCM authentication (RFC4106) IV is generated using seqiv, in sync with Linux's geniv. More modes and ciphers may be added later. Notes: In the future similar functionality will be included in a single-chip NIC. About the driver: ----------------- Patches 1-4 prepare some existing driver code for the new feature: * Add support for reserved GIDs in the hardware GID table * Allow multiple modules to enable hardware RoCE support independently Patches 5-6 define structs and helper functions for QP work-queues. Patches 7-11 add various FPGA-related features required for Innova. IPSec. Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices. atches 13-16 add IPSec offload support to the mlx5 netdevice. This driver services the new IPSec offload API introduced in commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Configuration Path: If Innova IPSec device is detected, the mlx5e netdevice gets the new NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload capabilities, and also the matching TX checksum and GSO features. The driver configures offloaded Security Associations (SAs) by sending an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR. These messages and their responses are sent over a high-speed channel. Counters for ethtool are retrieved by the driver from the SBU. Data path: On receive path, the SBU decrypts ESP packets which match the offloaded SADB, but keeps them encapsulated. The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload has taken place, the SA with which it was done, and the authentication result. The ConnectX chip performs RX checksum offload on the packet, and RSS using the ESP SPI value. The driver detects the special ethertype, and attaches a struct secpath to the RX SKB, including flags to indicate that crypto offload took place, the authentication result, and which xfrm_state was used for decryption, in the olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate patchset will add support for that in the xfrm stack. On transmit path, the stack encapsulates the packet but does not encrypt it, and indicates in the SKB's secpath that crypto offload is to be performed and the SA to use to do so. The driver avoids performing crypto-offload for ESP fragments, and packets with IP options, as the SBU cannot currently do that. For eligible packets, the driver prepends a special ethertype with metadata instructing the hardware to perform crypto offload. The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer. The driver trims it off, because the SBU automatically appends the trailer for offloaded packets. The ConnectX chip performs TX checksum offload on inner UDP or TCP packets, and GSO for TCP packets (duplicating the prepended metadata). The segmented packets then undergo encryption in the SBU before going on the wire. Performance: We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz Using AES-NI with ESP GSO we get constant 4.1 Gbps. Using crypto offload we get constant 18 Gbps. Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately. - Ilan Tayari ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	Merge branch 'net-fix-sw-timestamping'	David S. Miller
	Ivan Khoronzhuk says: ==================== net: fix sw timestamping for non PTP packets This series contains several corrections connected with timestamping for cpsw and netcp drivers based on same cpts module. Based on net/next ==================== Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: ethernet: ti: netcp_ethss: use cpts to check if packet needs timestamping	Ivan Khoronzhuk
	There is cpts function to check if packet can be timstamped with cpts. Seems that ptp_classify_raw cover all cases listed with "case". Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: ethernet: ti: cpsw: fix sw timestamping for non PTP packets	Ivan Khoronzhuk
	The cpts can timestmap only ptp packets at this moment, so driver cannot mark every packet as though it's going to be timestamped, only because h/w timestamping for given skb is enabled with SKBTX_HW_TSTAMP. It doesn't allow to use sw timestamping, as result outgoing packet is not timestamped at all if it's not PTP and h/w timestamping is enabled. So, fix it by setting SKBTX_IN_PROGRESS only for PTP packets. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: ethernet: ti: cpsw: move skb timestamp to packet_submit	Ivan Khoronzhuk
	Move sw timestamp function close to channel submit function. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	cavium: thunder: Remove duplicate "netdev->name" logging output	Joe Perches
	Using netdev_<level>(netdev, "%s: ...", netdev->name) duplicates the name in the output. Remove those uses. Miscellanea: o Use the netif_<level> convenience macros at the same time Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net/mlx4: fix spelling mistake: "enforcment" -> "enforcement"	Colin Ian King
	Trivial fix to spelling mistake in mlx4_dbg debug message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	net: atl1c: fix spelling mistake: "droppted" -> "dropped"	Colin Ian King
	Trivial fix to spelling mistake in netif_info message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	arm: sun8i: orangepi-2: use internal phy-mode	LABBE Corentin
	Since the PHY used is internal, simply set phy-mode as internal. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	arm: sun8i: nanopi-neo: use internal phy-mode	LABBE Corentin
	Since the PHY used is internal, simply set phy-mode as internal. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29	arm: sun8i: orangepi-one: use internal phy-mode	LABBE Corentin
	Since the PHY used is internal, simply set phy-mode as internal. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>