summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-03-02iwlwifi: mvm: add trigger for firmware dump upon missed beaconsEmmanuel Grumbach
Missing beacons is a good indication that something is going wrong in the firmware. Add a trigger to be able to collect data when we start missing beacons with a configurable threshold. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: mvm: add the cause of the firmware dump in the dumpEmmanuel Grumbach
Now that the firmware dump can be triggered by events in the code and not only the user or an firmware ASSERT, we need a way to know why the firmware dump was triggered. Add a section in the dump file for that. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: mvm: add framework for triggers for fw dumpEmmanuel Grumbach
Most of the time, the issues we want to debug with the firmware dump mechanism are transient. It is then very hard to stop the recording on time and get meaningful data. In order to solve this, I add here an infrastucture of triggers. The user will supply a list of triggers that will start / stop the recording. We have two types of triggers: start and stop. Start triggers can start a specific configuration. The stop triggers will be able to kick the collection of the data with the currently running configuration. These triggers are given to the driver by the .ucode file - just like the configuration. In the next patches, I'll add triggers in the code. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: mvm: use only 40 ms for fragmented scanDavid Spinadel
20 ms fragments are no longer required by system. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: mvm: allow to force the Rx chains from debugfsEmmanuel Grumbach
This is useful to debug weird antenna problems. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: add new TLV capability flag for BT PLCREmmanuel Grumbach
Packet Level Co-Running is a BT Coex feature which is supported on certain devices only, hence the need for a TLV flag for it. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02iwlwifi: mvm: don't iterate interfaces to disconnect in net-detectLuciano Coelho
We shouldn't call iwl_mvm_d3_disconnect_iter() on the running interfaces when we are woken up due to net-detect, because it doesn't make sense. Additionally, this seems to set the IEEE80211_SDATA_DISCONNECT_RESUME flag that will cause a disconnection on the next resume (if a normal WoWLAN is used). To solve this, skip the iteration loop when net-detect is set. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Reported-by: Samuel Tan <samueltan@chromium.org> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2015-03-02Merge branch 'dropcount'David S. Miller
Eyal Birger says: ==================== net: move skb->dropcount to skb->cb[] Commit 977750076d98 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark to userspace. It was considered to alias skb->priority instead of skb->mark. However, that would lead to the inabilty to export skb->priority to userspace if desired. Such change may also lead to hard-to-find issues as skb->priority is assumed to be alias free, and, as noted by Shmulik Ladkani, is not 'naturally orthogonal' with other skb fields. This patch series follows the suggestions made by Eric Dumazet moving the dropcount metric to skb->cb[], eliminating this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. The patch series include compactization of bluetooth and packet use of skb->cb[] as well as the infrastructure for placing dropcount in skb->cb[]. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: move skb->dropcount to skb->cb[]Eyal Birger
Commit 977750076d98 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: add common accessor for setting dropcount on packetsEyal Birger
As part of an effort to move skb->dropcount to skb->cb[], use a common function in order to set dropcount in struct sk_buff. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: use common macro for assering skb->cb[] available size in protocol familiesEyal Birger
As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: packet: use sockaddr_ll fields as storage for skb original length in ↵Eyal Birger
recvmsg path As part of an effort to move skb->dropcount to skb->cb[], 4 bytes of additional room are needed in skb->cb[] in packet sockets. Store the skb original length in the first two fields of sockaddr_ll (sll_family and sll_protocol) as they can be derived from the skb when needed. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to ↵Eyal Birger
sock_recv_timestamp() Commit 3b885787ea4112 ("net: Generalize socket rx gap / receive queue overflow cmsg") allowed receiving packet dropcount information as a socket level option. RXRPC sockets recvmsg function was changed to support this by calling sock_recv_ts_and_drops() instead of sock_recv_timestamp(). However, protocol families wishing to receive dropcount should call sock_queue_rcv_skb() or set the dropcount specifically (as done in packet_rcv()). This was not done for rxrpc and thus this feature never worked on these sockets. Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as part of an effort to move skb->dropcount into skb->cb[] Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit ↵Eyal Birger
fields Convert boolean fields incoming and req_start to bit fields and move force_active in order save space in bt_skb_cb in an effort to use a portion of skb->cb[] for storing skb->dropcount. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrlEyal Birger
struct hci_req_ctrl is never used outside of struct bt_skb_cb; Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing the addition of more ancillary data. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02pppoe: Use workqueue to die properly when a PADT is receivedSimon Farnsworth
When a PADT frame is received, the socket may not be in a good state to close down the PPP interface. The current implementation handles this by simply blocking all further PPP traffic, and hoping that the lack of traffic will trigger the user to investigate. Use schedule_work to get to a process context from which we clear down the PPP interface, in a fashion analogous to hangup on a TTY-based PPP interface. This causes pppd to disconnect immediately, and allows tools to take immediate corrective action. Note that pppd's rp_pppoe.so plugin has code in it to disable the session when it disconnects; however, as a consequence of this patch, the session is already disabled before rp_pppoe.so is asked to disable the session. The result is a harmless error message: Failed to disconnect PPPoE socket: 114 Operation already in progress This message is safe to ignore, as long as the error is 114 Operation already in progress; in that specific case, it means that the PPPoE session has already been disabled before pppd tried to disable it. Signed-off-by: Simon Farnsworth <simon@farnz.org.uk> Tested-by: Dan Williams <dcbw@redhat.com> Tested-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01bnx2: disable toggling of rxvlan if necessaryIvan Vecera
The bnx2 driver uses .ndo_fix_features to force enable of Rx VLAN tag stripping when the card cannot disable it. The driver should remove NETIF_F_HW_VLAN_CTAG_RX flag from hw_features instead so it is fixed for the ethtool. Cc: Sony Chacko <sony.chacko@qlogic.com> Cc: Dept-HSGLinuxNICDev@qlogic.com Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01NFSv4: Don't call put_rpccred() under the rcu_read_lock()Trond Myklebust
put_rpccred() can sleep. Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()") Cc: stable@vger.kernel.org # 2.6.35+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()Trond Myklebust
If the server does not return a valid set of attributes that we can use to either create a file or refresh the inode, then there is no value in calling nfs_prime_dcache(). However if we're just refreshing the inode using the attributes that the server returned, then it shouldn't matter whether or not we have a filehandle, as long as we check the fsid+fileid combination. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01NFSv3: Use the readdir fileid as the mounted-on-fileidTrond Myklebust
When we call readdirplus, set the fileid normally returned by readdir as the mounted-on-fileid, since that is commonly the case if there is a mountpoint. To ensure that we get it right, we only set the flag if the readdir fileid differs from the one returned in the readdirplus attributes. This again means that we can avoid the issues described in commit 2ef47eb1aee17 ("NFS: Fix use of nfs_attr_use_mounted_on_fileid()"), which only fixed NFSv4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()Trond Myklebust
If we're traversing a directory which contains a submounted filesystem, or one that has a referral, the NFS server that is processing the READDIR request will often return information for the underlying (mounted-on) directory. It may, or may not, also return filehandle information. If this happens, and the lookup in nfs_prime_dcache() returns the dentry for the submounted directory, the filehandle comparison will fail, and we call d_invalidate(). Post-commit 8ed936b5671bf ("vfs: Lazily remove mounts on unlinked files and directories."), this means the entire subtree is unmounted. The following minimal patch addresses this problem by punting on the invalidation if there is a submount. Kudos to Neil Brown <neilb@suse.de> for having tracked down this issue (see link). Reported-by: Nix <nix@esperi.org.uk> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01NFSv4: Set a barrier in the update_changeattr() helperTrond Myklebust
Ensure that we don't regress the changes that were made to the directory. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Fix nfs_post_op_update_inode() to set an attribute barrierTrond Myklebust
nfs_post_op_update_inode() is called after a self-induced attribute update. Ensure that it also sets the barrier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Remove size hack in nfs_inode_attrs_need_update()Trond Myklebust
Prior to this patch, we used to always OK attribute updates that extended the file size on the assumption that we might be performing writeback. Now that we have attribute barriers to protect the writeback related updates, we should remove this hack, as it can cause truncate() operations to apparently be reverted if/when a readahead or getattr RPC call races with our on-the-wire SETATTR. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommitTrond Myklebust
Ensure that other operations that race with delegreturn and layoutcommit cannot revert the attribute updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Add attribute update barriers to NFS writebacksTrond Myklebust
Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Set an attribute barrier on all updatesTrond Myklebust
Ensure that we update the attribute barrier even if there were no invalidations, provided that this value is newer than the old one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Add attribute update barriers to nfs_setattr_update_inode()Trond Myklebust
Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Add a helper to set attribute barriersTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Ensure that buffered writes wait for O_DIRECT writes to completeTrond Myklebust
The O_DIRECT code will grab the inode->i_mutex and flush out buffered writes, before scheduling a read or a write. However there is no equivalent in the buffered write code to wait for O_DIRECT to complete. Fixes a reported issue in xfstests generic/133, when first performing an O_DIRECT write followed by a buffered write. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01net: macb: Properly add DMACFG bit definitionsArun Chandran
Add *_SIZE macros for the bits ENDIA_DESC and ENDIA_PKT Signed-off-by: Arun Chandran <achandran@mvista.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01net: macb: Add on the fly CPU endianness detectionArun Chandran
Program management descriptor's access mode according to the dynamically detected CPU endianness. Signed-off-by: Arun Chandran <achandran@mvista.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Tested-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01Driver: Vmxnet3: Copy TCP header to mapped frame for IPv6 packetsShrikrishna Khare
Allows for packet parsing to be done by the fast path. This performance optimization already exists for IPv4. Add similar logic for IPv6. Signed-off-by: Amitabha Banerjee <banerjeea@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01mei: make device disabled on stop unconditionallyAlexander Usyskin
Set the internal device state to to disabled after hardware reset in stop flow. This will cover cases when driver was not brought to disabled state because of an error and in stop flow we wish not to retry the reset. Cc: <stable@vger.kernel.org> #3.10+ Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channelIan Abbott
Reading of analog input channels by the `INSN_READ` comedi instruction is broken for all except channel 0. `pci171x_ai_insn_read()` calls `pci171x_ai_read_sample()` with the wrong value for the third parameter. It is supposed to be the current index in a channel list (which is always of length 1 in this case, so the index should be 0), but instead it is passing the actual channel number. `pci171x_ai_read_sample()` checks the channel number encoded in the raw sample value read from the hardware matches the channel number stored in the specified index of the previously set up channel list and returns `-ENODATA` if it doesn't match. Since the index should always be 0 in this case, the match will fail unless the channel number is also 0. Fix it by passing 0 as the channel index. Note that when the bug first appeared, it was `pci171x_ai_dropout()` that was called with the wrong parameter value. `pci171x_ai_dropout()` got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2500d ("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()"). Fixes: 16c7eb6047bb ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Cc: stable <stable@vger.kernel.org> # 3.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01android: binder: fix binder mmap failuresAndrey Ryabinin
binder_update_page_range() initializes only addr and size fields in 'struct vm_struct tmp_area;' and passes it to map_vm_area(). Before 71394fe50146 ("mm: vmalloc: add flag preventing guard hole allocation") this was because map_vm_area() didn't use any other fields in vm_struct except addr and size. Now get_vm_area_size() (used in map_vm_area()) reads vm_struct's flags to determine whether vm area has guard hole or not. binder_update_page_range() don't initialize flags field, so this causes following binder mmap failures: -----------[ cut here ]------------ WARNING: CPU: 0 PID: 1971 at mm/vmalloc.c:130 vmap_page_range_noflush+0x119/0x144() CPU: 0 PID: 1971 Comm: healthd Not tainted 4.0.0-rc1-00399-g7da3fdc-dirty #157 Hardware name: ARM-Versatile Express [<c001246d>] (unwind_backtrace) from [<c000f7f9>] (show_stack+0x11/0x14) [<c000f7f9>] (show_stack) from [<c049a221>] (dump_stack+0x59/0x7c) [<c049a221>] (dump_stack) from [<c001cf21>] (warn_slowpath_common+0x55/0x84) [<c001cf21>] (warn_slowpath_common) from [<c001cfe3>] (warn_slowpath_null+0x17/0x1c) [<c001cfe3>] (warn_slowpath_null) from [<c00c66c5>] (vmap_page_range_noflush+0x119/0x144) [<c00c66c5>] (vmap_page_range_noflush) from [<c00c716b>] (map_vm_area+0x27/0x48) [<c00c716b>] (map_vm_area) from [<c038ddaf>] (binder_update_page_range+0x12f/0x27c) [<c038ddaf>] (binder_update_page_range) from [<c038e857>] (binder_mmap+0xbf/0x1ac) [<c038e857>] (binder_mmap) from [<c00c2dc7>] (mmap_region+0x2eb/0x4d4) [<c00c2dc7>] (mmap_region) from [<c00c3197>] (do_mmap_pgoff+0x1e7/0x250) [<c00c3197>] (do_mmap_pgoff) from [<c00b35b5>] (vm_mmap_pgoff+0x45/0x60) [<c00b35b5>] (vm_mmap_pgoff) from [<c00c1f39>] (SyS_mmap_pgoff+0x5d/0x80) [<c00c1f39>] (SyS_mmap_pgoff) from [<c000ce81>] (ret_fast_syscall+0x1/0x5c) ---[ end trace 48c2c4b9a1349e54 ]--- binder: 1982: binder_alloc_buf failed to map page at f0e00000 in kernel binder: binder_mmap: 1982 b6bde000-b6cdc000 alloc small buf failed -12 Use map_kernel_range_noflush() instead of map_vm_area() as this is better API for binder's purposes and it allows to get rid of 'vm_struct tmp_area' at all. Fixes: 71394fe50146 ("mm: vmalloc: add flag preventing guard hole allocation") Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Amit Pundir <amit.pundir@linaro.org> Tested-by: Amit Pundir <amit.pundir@linaro.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A CR4-shadow 32-bit init fix, plus two typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() x86/cpu/intel: Fix trivial typo in intel_tlb_table[]
2015-03-01Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Three clockevents/clocksource driver fixes" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: pxa: Fix section mismatch clocksource: mtk: Fix race conditions in probe code clockevents: asm9260: Fix compilation error with sparc/sparc64 allyesconfig
2015-03-01Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two kprobes fixes and a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Make sparc64 arch point to sparc perf symbols: Define EM_AARCH64 for older OSes perf top: Fix SIGBUS on sparc64 perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag perf tools: Fix pthread_attr_setaffinity_np build error perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check perf bench: Fix order of arguments to memcpy_alloc_mem kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace
2015-03-01Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "An rtmutex deadlock path fixlet" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Set state back to running on error
2015-03-01Merge branch 'ebpf_support_for_cls_bpf'David S. Miller
Daniel Borkmann says: ==================== eBPF support for cls_bpf This is the non-RFC version of my patchset posted before netdev01 [1] conference. It contains a couple of eBPF cleanups and preparation patches to get eBPF support into cls_bpf. The last patch adds the actual support. I'll post the iproute2 parts after the kernel bits are merged, an initial preview link to the code is mentioned in the last patch. Patch 4 and 5 were originally one patch, but I've split them into two parts upon request as patch 4 only is also needed for Alexei's tracing patches that go via tip tree. Tested with tc and all in-kernel available BPF test suites. I have configured and built LLVM with --enable-experimental-targets=BPF but as Alexei put it, the plan is to get rid of the experimental status in future [2]. Thanks a lot! v1 -> v2: - Removed arch patches from this series - x86 is already queued in tip tree, under x86/mm - arm64 just reposted directly to arm folks - Rest is unchanged [1] http://thread.gmane.org/gmane.linux.network/350191 [2] http://article.gmane.org/gmane.linux.kernel/1874969 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01cls_bpf: add initial eBPF support for programmable classifiersDaniel Borkmann
This work extends the "classic" BPF programmable tc classifier by extending its scope also to native eBPF code! This allows for user space to implement own custom, 'safe' C like classifiers (or whatever other frontend language LLVM et al may provide in future), that can then be compiled with the LLVM eBPF backend to an eBPF elf file. The result of this can be loaded into the kernel via iproute2's tc. In the kernel, they can be JITed on major archs and thus run in native performance. Simple, minimal toy example to demonstrate the workflow: #include <linux/ip.h> #include <linux/if_ether.h> #include <linux/bpf.h> #include "tc_bpf_api.h" __section("classify") int cls_main(struct sk_buff *skb) { return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos)); } char __license[] __section("license") = "GPL"; The classifier can then be compiled into eBPF opcodes and loaded via tc, for example: clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf cls.o [...] As it has been demonstrated, the scope can even reach up to a fully fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c). For tc, maps are allowed to be used, but from kernel context only, in other words, eBPF code can keep state across filter invocations. In future, we perhaps may reattach from a different application to those maps e.g., to read out collected statistics/state. Similarly as in socket filters, we may extend functionality for eBPF classifiers over time depending on the use cases. For that purpose, cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so we can allow additional functions/accessors (e.g. an ABI compatible offset translation to skb fields/metadata). For an initial cls_bpf support, we allow the same set of helper functions as eBPF socket filters, but we could diverge at some point in time w/o problem. I was wondering whether cls_bpf and act_bpf could share C programs, I can imagine that at some point, we introduce i) further common handlers for both (or even beyond their scope), and/or if truly needed ii) some restricted function space for each of them. Both can be abstracted easily through struct bpf_verifier_ops in future. The context of cls_bpf versus act_bpf is slightly different though: a cls_bpf program will return a specific classid whereas act_bpf a drop/non-drop return code, latter may also in future mangle skbs. That said, we can surely have a "classify" and "action" section in a single object file, or considered mentioned constraint add a possibility of a shared section. The workflow for getting native eBPF running from tc [1] is as follows: for f_bpf, I've added a slightly modified ELF parser code from Alexei's kernel sample, which reads out the LLVM compiled object, sets up maps (and dynamically fixes up map fds) if any, and loads the eBPF instructions all centrally through the bpf syscall. The resulting fd from the loaded program itself is being passed down to cls_bpf, which looks up struct bpf_prog from the fd store, and holds reference, so that it stays available also after tc program lifetime. On tc filter destruction, it will then drop its reference. Moreover, I've also added the optional possibility to annotate an eBPF filter with a name (e.g. path to object file, or something else if preferred) so that when tc dumps currently installed filters, some more context can be given to an admin for a given instance (as opposed to just the file descriptor number). Last but not least, bpf_prog_get() and bpf_prog_put() needed to be exported, so that eBPF can be used from cls_bpf built as a module. Thanks to 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only") I think this is of no concern since anything wanting to alter eBPF opcode after verification stage would crash the kernel. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: move read-only fields to bpf_prog and shrink bpf_prog_auxDaniel Borkmann
is_gpl_compatible and prog_type should be moved directly into bpf_prog as they stay immutable during bpf_prog's lifetime, are core attributes and they can be locked as read-only later on via bpf_prog_select_runtime(). With a bit of rearranging, this also allows us to shrink bpf_prog_aux to exactly 1 cacheline. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: add sched_cls_type and map it to sk_filter's verifier opsDaniel Borkmann
As discussed recently and at netconf/netdev01, we want to prevent making bpf_verifier_ops registration available for modules, but have them at a controlled place inside the kernel instead. The reason for this is, that out-of-tree modules can go crazy and define and register any verfifier ops they want, doing all sorts of crap, even bypassing available GPLed eBPF helper functions. We don't want to offer such a shiny playground, of course, but keep strict control to ourselves inside the core kernel. This also encourages us to design eBPF user helpers carefully and generically, so they can be shared among various subsystems using eBPF. For the eBPF traffic classifier (cls_bpf), it's a good start to share the same helper facilities as we currently do in eBPF for socket filters. That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus one day if there's a good reason to diverge the set of helper functions from the set available to socket filters, we keep ABI compatibility. In future, we could place all bpf_prog_type_list at a central place, perhaps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter codeDaniel Borkmann
This gets rid of CONFIG_BPF_SYSCALL ifdefs in the socket filter code, now that the BPF internal header can deal with it. While going over it, I also changed eBPF related functions to a sk_filter prefix to be more consistent with the rest of the file. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: make internal bpf API independent of CONFIG_BPF_SYSCALL ifdefsDaniel Borkmann
Socket filter code and other subsystems with upcoming eBPF support should not need to deal with the fact that we have CONFIG_BPF_SYSCALL defined or not. Having the bpf syscall as a config option is a nice thing and I'd expect it to stay that way for expert users (I presume one day the default setting of it might change, though), but code making use of it should not care if it's actually enabled or not. Instead, hide this via header files and let the rest deal with it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: export BPF_PSEUDO_MAP_FD to uapiDaniel Borkmann
We need to export BPF_PSEUDO_MAP_FD to user space, as it's used in the ELF BPF loader where instructions are being loaded that need map fixups. An initial stage loads all maps into the kernel, and later on replaces related instructions in the eBPF blob with BPF_PSEUDO_MAP_FD as source register and the actual fd as immediate value. The kernel verifier recognizes this keyword and replaces the map fd with a real pointer internally. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: constify various function pointer structsDaniel Borkmann
We can move bpf_map_ops and bpf_verifier_ops and other structs into ro section, bpf_map_type_list and bpf_prog_type_list into read mostly. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: remove kernel test stubsDaniel Borkmann
Now that we have BPF_PROG_TYPE_SOCKET_FILTER up and running, we can remove the test stubs which were added to get the verifier suite up. We can just let the test cases probe under socket filter type instead. In the fill/spill test case, we cannot (yet) access fields from the context (skb), but we may adapt that test case in future. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01Merge branch 'bcmgenet_systemport_stats'David S. Miller
Florian Fainelli says: ==================== net: bcmgenet and systemport statistics fixes This two patches fix a similar problem in the GENET and SYSTEMPORT drivers for software maintained statistics used to track DMA mapping and SKB re-allocation failures. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>