summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-25Merge branch 'bpf-more-sock_ops-callbacks'Alexei Starovoitov
Lawrence Brakmo says: ==================== This patchset adds support for: - direct R or R/W access to many tcp_sock fields - passing up to 4 arguments to sock_ops BPF functions - tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks - optionally calling sock_ops BPF program when RTO fires - optionally calling sock_ops BPF program when packet is retransmitted - optionally calling sock_ops BPF program when TCP state changes - access to tclass and sk_txhash - new selftest v2: Fixed commit message 0/11. The commit is to "bpf-next" but the patch below used "bpf" and Patchwork didn't work correctly. v3: Cleaned RTO callback as per Yuchung's comment Added BPF enum for TCP states as per Alexei's comment v4: Fixed compile warnings related to detecting changes between TCP internal states and the BPF defined states. v5: Fixed comment issues in some selftest files Fixed accesss issue with u64 fields in bpf_sock_ops struct v6: Made fixes based on comments form Eric Dumazet: The field bpf_sock_ops_cb_flags was addded in a hole on 64bit kernels Field bpf_sock_ops_cb_flags is now set through a helper function which returns an error when a BPF program tries to set bits for callbacks that are not supported in the current kernel. Added a comment indicating that when adding fields to bpf_sock_ops_kern they should be added before the field named "temp" if they need to be cleared before calling the BPF function. v7: Enfornced fields "op" and "replylong[1] .. replylong[3]" not be writable based on comments form Eric Dumazet and Alexei Starovoitov. Filled 32 bit hole in bpf_sock_ops struct with sk_txhash based on comments from Daniel Borkmann. Removed unused functions (tcp_call_bpf_1arg, tcp_call_bpf_4arg) based on comments from Daniel Borkmann. v8: Add commit message 00/12 Add Acked-by as appropriate v9: Moved the bug fix to the front of the patchset Changed RETRANS_CB so it is always called (before it was only called if the retransmit succeeded). It is now called with an extra argument, the return value of tcp_transmit_skb (0 => success). Based on comments from Yuchung Cheng. Added support for reading 2 new fields, sacked_out and lost_out, based on comments from Yuchung Cheng. v10: Moved the callback flags from include/uapi/linux/tcp.h to include/uapi/linux/bpf.h Cleaned up the test in selftest. Added a timeout so it always completes, even if the client is not communicating with the server. Made it faster by removing the sleeps. Made sure it works even when called back-to-back 20 times. Consists of the following patches: [PATCH bpf-next v10 01/12] bpf: Only reply field should be writeable [PATCH bpf-next v10 02/12] bpf: Make SOCK_OPS_GET_TCP size [PATCH bpf-next v10 03/12] bpf: Make SOCK_OPS_GET_TCP struct [PATCH bpf-next v10 04/12] bpf: Add write access to tcp_sock and sock [PATCH bpf-next v10 05/12] bpf: Support passing args to sock_ops bpf [PATCH bpf-next v10 06/12] bpf: Adds field bpf_sock_ops_cb_flags to [PATCH bpf-next v10 07/12] bpf: Add sock_ops RTO callback [PATCH bpf-next v10 08/12] bpf: Add support for reading sk_state and [PATCH bpf-next v10 09/12] bpf: Add sock_ops R/W access to tclass [PATCH bpf-next v10 10/12] bpf: Add BPF_SOCK_OPS_RETRANS_CB [PATCH bpf-next v10 11/12] bpf: Add BPF_SOCK_OPS_STATE_CB [PATCH bpf-next v10 12/12] bpf: add selftest for tcpbpf ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: add selftest for tcpbpfLawrence Brakmo
Added a selftest for tcpbpf (sock_ops) that checks that the appropriate callbacks occured and that it can access tcp_sock fields and that their values are correct. Run with command: ./test_tcpbpf_user Adding the flag "-d" will show why it did not pass. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add BPF_SOCK_OPS_STATE_CBLawrence Brakmo
Adds support for calling sock_ops BPF program when there is a TCP state change. Two arguments are used; one for the old state and another for the new state. There is a new enum in include/uapi/linux/bpf.h that exports the TCP states that prepends BPF_ to the current TCP state names. If it is ever necessary to change the internal TCP state values (other than adding more to the end), then it will become necessary to convert from the internal TCP state value to the BPF value before calling the BPF sock_ops function. There are a set of compile checks added in tcp.c to detect if the internal and BPF values differ so we can make the necessary fixes. New op: BPF_SOCK_OPS_STATE_CB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add BPF_SOCK_OPS_RETRANS_CBLawrence Brakmo
Adds support for calling sock_ops BPF program when there is a retransmission. Three arguments are used; one for the sequence number, another for the number of segments retransmitted, and the last one for the return value of tcp_transmit_skb (0 => success). Does not include syn-ack retransmissions. New op: BPF_SOCK_OPS_RETRANS_CB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add sock_ops R/W access to tclassLawrence Brakmo
Adds direct write access to sk_txhash and access to tclass for ipv6 flows through getsockopt and setsockopt. Sample usage for tclass: bpf_getsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v, sizeof(v)) where skops is a pointer to the ctx (struct bpf_sock_ops). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add support for reading sk_state and moreLawrence Brakmo
Add support for reading many more tcp_sock fields state, same as sk->sk_state rtt_min same as sk->rtt_min.s[0].v (current rtt_min) snd_ssthresh rcv_nxt snd_nxt snd_una mss_cache ecn_flags rate_delivered rate_interval_us packets_out retrans_out total_retrans segs_in data_segs_in segs_out data_segs_out lost_out sacked_out sk_txhash bytes_received (__u64) bytes_acked (__u64) Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add sock_ops RTO callbackLawrence Brakmo
Adds an optional call to sock_ops BPF program based on whether the BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags. The BPF program is passed 2 arguments: icsk_retransmits and whether the RTO has expired. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Adds field bpf_sock_ops_cb_flags to tcp_sockLawrence Brakmo
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary use is to determine if there should be calls to sock_ops bpf program at various points in the TCP code. The field is initialized to zero, disabling the calls. A sock_ops BPF program can set it, per connection and as necessary, when the connection is established. It also adds support for reading and writting the field within a sock_ops BPF program. Reading is done by accessing the field directly. However, writing is done through the helper function bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program is trying to set a callback that is not supported in the current kernel (i.e. running an older kernel). The helper function returns 0 if it was able to set all of the bits set in the argument, a positive number containing the bits that could not be set, or -EINVAL if the socket is not a full TCP socket. Examples of where one could call the bpf program: 1) When RTO fires 2) When a packet is retransmitted 3) When the connection terminates 4) When a packet is sent 5) When a packet is received Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Support passing args to sock_ops bpf functionLawrence Brakmo
Adds support for passing up to 4 arguments to sock_ops bpf functions. It reusues the reply union, so the bpf_sock_ops structures are not increased in size. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add write access to tcp_sock and sock fieldsLawrence Brakmo
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to struct tcp_sock or struct sock fields. This required adding a new field "temp" to struct bpf_sock_ops_kern for temporary storage that is used by sock_ops_convert_ctx_access. It is used to store and recover the contents of a register, so the register can be used to store the address of the sk. Since we cannot overwrite the dst_reg because it contains the pointer to ctx, nor the src_reg since it contains the value we want to store, we need an extra register to contain the address of the sk. Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the GET or SET macros depending on the value of the TYPE field. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Make SOCK_OPS_GET_TCP struct independentLawrence Brakmo
Changed SOCK_OPS_GET_TCP to SOCK_OPS_GET_FIELD and added 2 arguments so now it can also work with struct sock fields. The first argument is the name of the field in the bpf_sock_ops struct, the 2nd argument is the name of the field in the OBJ struct. Previous: SOCK_OPS_GET_TCP(FIELD_NAME) New: SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) Where OBJ is either "struct tcp_sock" or "struct sock" (without quotation). BPF_FIELD is the name of the field in the bpf_sock_ops struct and OBJ_FIELD is the name of the field in the OBJ struct. Although the field names are currently the same, the kernel struct names could change in the future and this change makes it easier to support that. Note that adding access to tcp_sock fields in sock_ops programs does not preclude the tcp_sock fields from being removed as long as we are willing to do one of the following: 1) Return a fixed value (e.x. 0 or 0xffffffff), or 2) Make the verifier fail if that field is accessed (i.e. program fails to load) so the user will know that field is no longer supported. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Make SOCK_OPS_GET_TCP size independentLawrence Brakmo
Make SOCK_OPS_GET_TCP helper macro size independent (before only worked with 4-byte fields. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Only reply field should be writeableLawrence Brakmo
Currently, a sock_ops BPF program can write the op field and all the reply fields (reply and replylong). This is a bug. The op field should not have been writeable and there is currently no way to use replylong field for indices >= 1. This patch enforces that only the reply field (which equals replylong[0]) is writeable. Fixes: 40304b2a1567 ("bpf: BPF support for sock_ops") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26drm/nouveau: Move irq setup/teardown to pci ctor/dtorLyude Paul
For a while we've been having issues with seemingly random interrupts coming from nvidia cards when resuming them. Originally the fix for this was thought to be just re-arming the MSI interrupt registers right after re-allocating our IRQs, however it seems a lot of what we do is both wrong and not even nessecary. This was made apparent by what appeared to be a regression in the mainline kernel that started introducing suspend/resume issues for nouveau: a0c9259dc4e1 (irq/matrix: Spread interrupts on allocation) After this commit was introduced, we started getting interrupts from the GPU before we actually re-allocated our own IRQ (see references below) and assigned the IRQ handler. Investigating this turned out that the problem was not with the commit, but the fact that nouveau even free/allocates it's irqs before and after suspend/resume. For starters: drivers in the linux kernel haven't had to handle freeing/re-allocating their IRQs during suspend/resume cycles for quite a while now. Nouveau seems to be one of the few drivers left that still does this, despite the fact there's no reason we actually need to since disabling interrupts from the device side should be enough, as the kernel is already smart enough to know to disable host-side interrupts for us before going into suspend. Since we were tearing down our IRQs by hand however, that means there was a short period during resume where interrupts could be received before we re-allocated our IRQ which would lead to us getting an unhandled IRQ. Since we never handle said IRQ and re-arm the interrupt registers, this would cause us to miss all of the interrupts from the GPU and cause our init process to start timing out on anything requiring interrupts. So, since this whole setup/teardown every suspend/resume cycle is useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor functions instead so they're only called at driver load and driver unload. This should fix most of the issues with pending interrupts on resume, along with getting suspend/resume for nouveau to work again. As well, this probably means we can also just remove the msi rearm call inside nvkm_pci_init(). But since our main focus here is to fix suspend/resume before 4.15, we'll save that for a later patch. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-01-25qed: code indent should use tabs where possibleRohit Visavalia
Issue found by checkpatch. Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com> Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25be2net: networking block comments don't use an empty /* lineRohit Visavalia
Resolved Warning: networking block comments don't use an empty /* line, use /* Comment... Issue found by checkpatch. Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-01-25 Here's one last bluetooth-next pull request for the 4.16 kernel: - Improved support for Intel controllers - New set_parity method to serdev (agreed with maintainers to be taken through bluetooth-next) - Fix error path in hci_bcm (missing call to serdev close) - New ID for BCM4343A0 UART controller Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25cxgb4: fix possible deadlockGanesh Goudar
t4_wr_mbox_meat_timeout() can be called from both softirq context and process context, hence protect the mbox with spin_lock_bh() instead of simple spin_lock() Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: don't call update_pmtu unconditionallyNicolas Dichtel
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/ipv6: Do not allow route add with a device that is downDavid Ahern
IPv6 allows routes to be installed when the device is not up (admin up). Worse, it does not mark it as LINKDOWN. IPv4 does not allow it and really there is no reason for IPv6 to allow it, so check the flags and deny if device is admin down. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25Merge branch 'net-smc-more-socket-closing-improvements'David S. Miller
Ursula Braun says: ==================== net/smc: more socket closing improvements these patches improve the smc behavior for abnormal socket closing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: check for healthy link group resp. connectionsUrsula Braun
If a problem for at least one connection of a link group is detected, the whole link group and all its connections are terminated. This patch adds a check for healthy link group when trying to reserve a work request, and checks for healthy connections before starting a tx worker. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: wake up wr_reg_wait when terminating a link groupUrsula Braun
If a new connection with a new rmb is added to a link group, its memory region is registered. If a link group is terminated, a pending registration requires a wake up. And consolidate setting of tx_flag peer_conn_abort in smc_lgr_terminate(). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: do not reuse a linkgroup with setup problemsUrsula Braun
Once a linkgroup is created successfully, it stays alive for a certain time to service more connections potentially created. If one of the initialization steps for a new linkgroup fails, the linkgroup should not be reused by other connections following. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: terminate link group for ib_post_send problemsUrsula Braun
If ib_post_send() fails, terminate all connections of this link group. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: handle state SMC_PEERFINCLOSEWAIT correctlyUrsula Braun
A state transition from closing state SMC_PEERFINCLOSEWAIT to closing state SMC_APPFINCLOSEWAIT is not allowed. Once a closing indication from the peer has been received, the socket reaches state SMC_CLOSED. And receiving a peer_conn_abort just changes the state of the socket into one of the states SMC_PROCESSABORT or SMC_CLOSED; sending a peer_conn_abort occurs in smc_close_active() for state SMC_PROCESSABORT only. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: cancel tx worker in case of socket abortsUrsula Braun
If an SMC socket is aborted, the tx worker should be cancelled. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25Merge branch 'sfc-support-PTP-on-8000-and-X2000-series-NICs'David S. Miller
Edward Cree says: ==================== sfc: support PTP on 8000 and X2000 series NICs Starting from the 8000-series (Medford 1), SFC NICs can timestamp TX packets sent through an ordinary DMA queue, rather than a special control-plane operation as in the 7000-series. Patches 2-8 implement support for this. The X2000-series (Medford 2) changes the format of timestamps, from seconds+ (2^27)ths to seconds + quarter nanoseconds, as well as changing the shift of the frequency adjustment for increased precision. Patches 9-12 implement support for these changes. Patch #1 is an unrelated fix for NAPI budget handling, needed in order for TX completion changes in the later patches to apply cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: support Medford2 frequency adjustment formatLaurence Evans
Support increased precision frequency adjustment format (FP44) used by Medford2 adapters. Signed-off-by: Laurence Evans <levans@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: support second + quarter ns time format for receive datapathEdward Cree
The time_format that we stash in the PTP data structure is never referenced, so we can remove it. Instead, store the information needed to interpret sync event timestamps. Also rolls in a couple of other related minor PTP fixes. Based on patches by Bert Kenward <bkenward@solarflare.com> and Laurence Evans <levans@solarflare.com>. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: support separate PTP and general timestampingLaurence Evans
Support MC_CMD_PTP_OUT_GET_TIMESTAMP_CORRECTIONS_V2. Extract general timestamp corrections in addition to PTP corrections. Apply receive timestamp corrections for general datapath receive timestamping, and correspondingly for transmit. Signed-off-by: Laurence Evans <levans@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: simplify RX datapath timestampingLaurence Evans
Use timestamp conversion function with correction to avoid duplicate correction handling. Signed-off-by: Laurence Evans <levans@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: only advertise TX timestamping if we have the license for itMartin Habets
We check the license for TX hardware timestamping capability. The PTP probe will have enabled PTP sync events from the adapter. If later, at TX queue init, it turns out we do not have the license, we don't need the sync events either. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: on 8000 series use TX queues for TX timestampsEdward Cree
For this we create and use one or more new TX queues on the PTP channel, and enable sync events for it. Based on a patch by Martin Habets <mhabets@solarflare.com>. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: MAC TX timestamp handling on the 8000 seriesMartin Habets
TX timestamps on 8000 series are supplied from the MAC. This timestamp is only 48 bits long. The high order bits from the last time sync event are used for the top 16 bits. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: only enable TX timestamping if the adapter is licensed for itMartin Habets
If we try to enable the feature and do not have the license for it, the MCPU will refuse and fail our TX queue init. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: use main datapath for HW timestamps if availableMartin Habets
We can now transmit SKBs in 2 ways: 1. Via the MC (for the 7XXX series and earlier), using efx_ptp_xmit_skb_mc(). 2. Via the TX queues on the dedicated PTP channel (8XXX series and later), using efx_ptp_xmit_skb_queue(). The PTP worker thread uses the method set up at probe time. It never checked the return code from the old efx_ptp_xmit_skb(), so it now returns void. We increment the TX dropped counter of the device if the transmit fails. As a result of the probe per channel the remove gets called multiple times. Clean up efx->ptp_data properly to avoid the 2nd call blowing up. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: add function to determine which TX timestamping method to useMartin Habets
Use MC capability MC_CMD_GET_CAPABILITIES_V2_OUT_TX_MAC_TIMESTAMPING to detect whether the NIC supports timestamping packets sent out the main datapath. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: handle TX timestamps in the normal data pathMartin Habets
Before this work, TX timestamping is done by sending each SKB to the MC. On the 8000 series (Medford1) we have high speed timestamping via the MAC, which means we can use normal TX queues for this without a significant drop in bandwidth. On the X2000 series (Medford2) support for transmitting via the MC is removed, so the new way must be used. This patch enables timestamping on a TX queue, if requested. It also enhances TX event handling to process the extra completion events, and puts the time in the SKB. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25sfc: remove tx and MCDI handling from NAPI budget considerationBert Kenward
The NAPI budget is only for RX processing work, not other work such as TX or MCDI completion handling. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo
ath.git patches for 4.16. Major changes: wil6210 * add PCI device id for Talyn * support flashless device ath9k * improve RSSI/signal accuracy on AR9003 series
2018-01-25rtlwifi: btcoex: Fix some static warnings from SparsePing-Ke Shih
Add 'static' or declaration to resolve the warnings, and remove two unused functions halbtc_set_macreg() and halbtc_get_macreg() exposed when they were made static. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-01-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "Fix races and a potential use after free in the s390 cmma migration code" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: add proper locking for CMMA migration bitmap
2018-01-25Merge tag 'for-4.15-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "It's been reported recently that readdir can list stale entries under some conditions. Fix it." * tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix stale entries in readdir
2018-01-25net: Move net:netns_ids destruction out of rtnl_lock() and document locking ↵Kirill Tkhai
scheme Currently, we unhash a dying net from netns_ids lists under rtnl_lock(). It's a leftover from the time when net::netns_ids was introduced. There was no net::nsid_lock, and rtnl_lock() was mostly need to order modification of alive nets nsid idr, i.e. for: for_each_net(tmp) { ... id = __peernet2id(tmp, net); idr_remove(&tmp->netns_ids, id); ... } Since we have net::nsid_lock, the modifications are protected by this local lock, and now we may introduce better scheme of netns_ids destruction. Let's look at the functions peernet2id_alloc() and get_net_ns_by_id(). Previous commits taught these functions to work well with dying net acquired from rtnl unlocked lists. And they are the only functions which can hash a net to netns_ids or obtain from there. And as easy to check, other netns_ids operating functions works with id, not with net pointers. So, we do not need rtnl_lock to synchronize cleanup_net() with all them. The another property, which is used in the patch, is that net is unhashed from net_namespace_list in the only place and by the only process. So, we avoid excess rcu_read_lock() or rtnl_lock(), when we'are iterating over the list in unhash_nsid(). All the above makes possible to keep rtnl_lock() locked only for net->list deletion, and completely avoid it for netns_ids unhashing and destruction. As these two doings may take long time (e.g., memory allocation to send skb), the patch should positively act on the scalability and signify decrease the time, which rtnl_lock() is held in cleanup_net(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: tcp: close sock if net namespace is exitingDan Streetman
When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman <ddstreet@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25Bluetooth: btintel: Create common function for firmware downloadTedd Ho-Jeong An
The firmware download flow for RAM SKU is same for both USB and UART and this patch creates a common function for both driver. Signed-off-by: Tedd Ho-Jeong An <tedd.an@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-01-25wcn36xx: release DMA memory in case of errorRamon Fried
wcn36xx_dxe_init() doesn't check for the return value of wcn36xx_dxe_init_descs(), release the resources in case an error ocurred. Signed-off-by: Ramon Fried <rfried@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-01-25ath9k: Display calibration data piers in debugfsWojciech Dubowik
Display per frequency calibration data in dump_modal debugfs entry including reference power, voltage, tx temperature and noise floor. Example of chain 0 of OEM card (dump from modal_eeprom): Chain 0 Freq ref volt temp nf_Cal nf_Pow rx_temp 5180 -30 0 137 0 0 0 5320 -24 0 137 0 0 0 5500 -15 0 137 0 0 0 5620 -10 0 137 0 0 0 5700 -15 0 137 0 0 0 5745 -16 0 135 0 0 0 5785 -19 0 136 0 0 0 5825 -22 0 136 0 0 0 Example of a card with calibrated noise floor. Chain 0 Freq ref volt temp nf_Cal nf_Pow rx_temp 4890 -49 0 128 -107 -97 124 5100 -23 0 128 -101 -96 124 5180 -18 0 128 -101 -96 124 5300 -12 0 128 -102 -97 124 5500 -9 0 128 -101 -97 125 5640 -17 0 128 -101 -98 124 5785 -25 0 128 -101 -98 124 5940 -33 0 128 -106 -99 124 Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-01-25ath9k: Use calibrated noise floor value when availableWojciech Dubowik
AR9003 series allows to calibrate noise floor for different frequency bins. Once it's done it's possible to get more accurate rssi/signal values over whole frequency band at a given temperature. The RSSI/signal accuracy reported by calibrated RF cards improves from 6 to up to 2dB. This could be interesting for application which require good signal accuracy like roaming or mesh protocols. Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>