summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-01-25bpf: Add BPF_SOCK_OPS_RETRANS_CBLawrence Brakmo
Adds support for calling sock_ops BPF program when there is a retransmission. Three arguments are used; one for the sequence number, another for the number of segments retransmitted, and the last one for the return value of tcp_transmit_skb (0 => success). Does not include syn-ack retransmissions. New op: BPF_SOCK_OPS_RETRANS_CB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add support for reading sk_state and moreLawrence Brakmo
Add support for reading many more tcp_sock fields state, same as sk->sk_state rtt_min same as sk->rtt_min.s[0].v (current rtt_min) snd_ssthresh rcv_nxt snd_nxt snd_una mss_cache ecn_flags rate_delivered rate_interval_us packets_out retrans_out total_retrans segs_in data_segs_in segs_out data_segs_out lost_out sacked_out sk_txhash bytes_received (__u64) bytes_acked (__u64) Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add sock_ops RTO callbackLawrence Brakmo
Adds an optional call to sock_ops BPF program based on whether the BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags. The BPF program is passed 2 arguments: icsk_retransmits and whether the RTO has expired. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Adds field bpf_sock_ops_cb_flags to tcp_sockLawrence Brakmo
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary use is to determine if there should be calls to sock_ops bpf program at various points in the TCP code. The field is initialized to zero, disabling the calls. A sock_ops BPF program can set it, per connection and as necessary, when the connection is established. It also adds support for reading and writting the field within a sock_ops BPF program. Reading is done by accessing the field directly. However, writing is done through the helper function bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program is trying to set a callback that is not supported in the current kernel (i.e. running an older kernel). The helper function returns 0 if it was able to set all of the bits set in the argument, a positive number containing the bits that could not be set, or -EINVAL if the socket is not a full TCP socket. Examples of where one could call the bpf program: 1) When RTO fires 2) When a packet is retransmitted 3) When the connection terminates 4) When a packet is sent 5) When a packet is received Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Support passing args to sock_ops bpf functionLawrence Brakmo
Adds support for passing up to 4 arguments to sock_ops bpf functions. It reusues the reply union, so the bpf_sock_ops structures are not increased in size. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25bpf: Add write access to tcp_sock and sock fieldsLawrence Brakmo
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to struct tcp_sock or struct sock fields. This required adding a new field "temp" to struct bpf_sock_ops_kern for temporary storage that is used by sock_ops_convert_ctx_access. It is used to store and recover the contents of a register, so the register can be used to store the address of the sk. Since we cannot overwrite the dst_reg because it contains the pointer to ctx, nor the src_reg since it contains the value we want to store, we need an extra register to contain the address of the sk. Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the GET or SET macros depending on the value of the TYPE field. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25fs/buffer.c: fold init_buffer() into init_page_buffers()Eric Biggers
Since commit e76004093db1 ("fs/buffer.c: remove unnecessary init operation after allocating buffer_head"), there are no callers of init_buffer() outside of init_page_buffers(). So just fold it into init_page_buffers(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25fs: fold __inode_permission() into inode_permission()Eric Biggers
Since commit 9c630ebefeee ("ovl: simplify permission checking"), overlayfs doesn't call __inode_permission() anymore, which leaves no users other than inode_permission(). So just fold it back into inode_permission(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25fs: add RWF_APPENDJürg Billeter
This is the per-I/O equivalent of O_APPEND to support atomic append operations on any open file. If a file is opened with O_APPEND, pwrite() ignores the offset and always appends data to the end of the file. RWF_APPEND enables atomic append and pwrite() with offset on a single file descriptor. Signed-off-by: Jürg Billeter <j@bitron.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25f2fs: support inode creation timeChao Yu
This patch adds creation time field in inode layout to support showing kstat.btime in ->statx. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-01-25 Here's one last bluetooth-next pull request for the 4.16 kernel: - Improved support for Intel controllers - New set_parity method to serdev (agreed with maintainers to be taken through bluetooth-next) - Fix error path in hci_bcm (missing call to serdev close) - New ID for BCM4343A0 UART controller Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: don't call update_pmtu unconditionallyNicolas Dichtel
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: tcp: close sock if net namespace is exitingDan Streetman
When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman <ddstreet@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25PCI: Make of_irq_parse_pci() staticRob Herring
Now that the DT PCI code is merged into drivers/pci, of_irq_parse_pci() can be static. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Frank Rowand <frowand.list@gmail.com>
2018-01-26crypto: sha3-generic - export init/update/final routinesArd Biesheuvel
To allow accelerated implementations to fall back to the generic routines, e.g., in contexts where a SIMD based implementation is not allowed to run, expose the generic SHA3 init/update/final routines to other modules. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-26crypto: sha3-generic - simplify codeArd Biesheuvel
In preparation of exposing the generic SHA3 implementation to other versions as a fallback, simplify the code, and remove an inconsistency in the output handling (endian swabbing rsizw words of state before writing the output does not make sense) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-25device property: Define type of PROPERTY_ENRTY_*() macrosAndy Shevchenko
Some of the drivers may use the macro at runtime flow, like struct property_entry p[10]; ... p[index++] = PROPERTY_ENTRY_U8("u8 property", u8_data); In that case and absence of the data type compiler fails the build: drivers/char/ipmi/ipmi_dmi.c:79:29: error: Expected ; at end of statement drivers/char/ipmi/ipmi_dmi.c:79:29: error: got { Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Corey Minyard <minyard@acm.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-24Merge branch 'rebased-net-ioctl' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Avoid negative netdev refcount in error flow of xfrm state add, from Aviad Yehezkel. 2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the ethernet header protocol field in xfrm{4,6}_mode_tunnel_input(). From Yossi Kuperman. 3) Fix a syzbot triggered skb_under_panic in pppoe having to do with failing to allocate an appropriate amount of headroom. From Guillaume Nault. 4) Fix memory leak in vmxnet3 driver, from Neil Horman. 5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module, from Wolfgang Bumiller. 6) Restrict what kinds of sockets can be bound to the KCM multiplexer and also disallow when another layer has attached to the socket and made use of sk_user_data. From Tom Herbert. 7) Fix use before init of IOTLB in vhost code, from Jason Wang. 8) Correct STACR register write bit definition in IBM emac driver, from Ivan Mikhaylov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net/ibm/emac: wrong bit is used for STA control register write net/ibm/emac: add 8192 rx/tx fifo size vhost: do not try to access device IOTLB when not initialized vhost: use mutex_lock_nested() in vhost_dev_lock_vqs() i40e: flower: check if TC offload is enabled on a netdev qed: Free reserved MR tid qed: Remove reserveration of dpi for kernel kcm: Check if sk_user_data already set in kcm_attach kcm: Only allow TCP sockets to be attached to a KCM mux net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr net: sched: em_nbyte: don't add the data offset twice mlxsw: spectrum_router: Don't log an error on missing neighbor vmxnet3: repair memory leak ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL pppoe: take ->needed_headroom of lower device into account on xmit xfrm: fix boolean assignment in xfrm_get_type_offload xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version xfrm: fix error flow in case of add state fails xfrm: Add SA to hardware at the end of xfrm_state_construct()
2018-01-24kill kernel_sock_ioctl()Al Viro
no users since 2014 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24dev_ioctl(): move copyin/copyout to callersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24lift handling of SIOCIW... out of dev_ioctl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24ip_rt_ioctl(): take copyin to callerAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24devinet_ioctl(): take copyin/copyout to callerAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24net: separate SIOCGIFCONF handling from dev_ioctl()Al Viro
Only two of dev_ioctl() callers may pass SIOCGIFCONF to it. Separating that codepath from the rest of dev_ioctl() allows both to simplify dev_ioctl() itself (all other cases work with struct ifreq *) *and* seriously simplify the compat side of that beast: all it takes is passing to inet_gifconf() an extra argument - the size of individual records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)). With dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf() that's easy to arrange. As the result, compat side of SIOCGIFCONF doesn't need any allocations, copy_in_user() back and forth, etc. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24net: erspan: fix use-after-freeWilliam Tu
When building the erspan header for either v1 or v2, the eth_hdr() does not point to the right inner packet's eth_hdr, causing kasan report use-after-free and slab-out-of-bouds read. The patch fixes the following syzkaller issues: [1] BUG: KASAN: slab-out-of-bounds in erspan_xmit+0x22d4/0x2430 net/ipv4/ip_gre.c:735 [2] BUG: KASAN: slab-out-of-bounds in erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698 [3] BUG: KASAN: use-after-free in erspan_xmit+0x22d4/0x2430 net/ipv4/ip_gre.c:735 [4] BUG: KASAN: use-after-free in erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698 [2] CPU: 0 PID: 3654 Comm: syzkaller377964 Not tainted 4.15.0-rc9+ #185 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:440 erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698 erspan_xmit+0x3b8/0x13b0 net/ipv4/ip_gre.c:740 __netdev_start_xmit include/linux/netdevice.h:4042 [inline] netdev_start_xmit include/linux/netdevice.h:4051 [inline] packet_direct_xmit+0x315/0x6b0 net/packet/af_packet.c:266 packet_snd net/packet/af_packet.c:2943 [inline] packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2968 sock_sendmsg_nosec net/socket.c:638 [inline] sock_sendmsg+0xca/0x110 net/socket.c:648 SYSC_sendto+0x361/0x5c0 net/socket.c:1729 SyS_sendto+0x40/0x50 net/socket.c:1697 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129 RIP: 0023:0xf7fcfc79 RSP: 002b:00000000ffc6976c EFLAGS: 00000286 ORIG_RAX: 0000000000000171 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000020011000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020008000 RBP: 000000000000001c R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Reported-by: syzbot+9723f2d288e49b492cf0@syzkaller.appspotmail.com Reported-by: syzbot+f0ddeb2b032a8e1d9098@syzkaller.appspotmail.com Reported-by: syzbot+f14b3703cd8d7670203f@syzkaller.appspotmail.com Reported-by: syzbot+eefa384efad8d7997f20@syzkaller.appspotmail.com Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: remove tc_cls_common_offload_init_deprecated()Jakub Kicinski
All users are now converted to tc_cls_common_offload_init(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_bpf: remove gen_flags from bpf_offloadJakub Kicinski
cls_bpf now guarantees that only device-bound programs are allowed with skip_sw. The drivers no longer pay attention to flags on filter load, therefore the bpf_offload member can be removed. If flags are needed again they should probably be added to struct tc_cls_common_offload instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: prepare for reimplementation of tc_cls_common_offload_init()Jakub Kicinski
Rename the tc_cls_common_offload_init() helper function to tc_cls_common_offload_init_deprecated() and add a new implementation which also takes flags argument. We will only set extack if flags indicate that offload is forced (skip_sw) otherwise driver errors should be ignored, as they don't influence the overall filter installation. Note that we need the tc_skip_hw() helper for new version, therefore it is added later in the file. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: propagate extack to cls->destroy callbacksJakub Kicinski
Propagate extack to cls->destroy callbacks when called from non-error paths. On error paths pass NULL to avoid overwriting the failure message. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptrWolfgang Bumiller
TCF_LAYER_LINK and TCF_LAYER_NETWORK returned the same pointer as skb->data points to the network header. Use skb_mac_header instead. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge tag 'trace-v4.15-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "With the new ORC unwinder, ftrace stack tracing became disfunctional. One was that ORC didn't know how to handle the ftrace callbacks in general (which Josh fixed). The other was that ORC would just bail if it hit a dynamically allocated trampoline. Which means all ftrace stack tracing that happens from the function tracer would produce no results (that includes killing the max stack size tracer). I added a check to the ORC unwinder to see if the trampoline belonged to ftrace, and if it did, use the orc entry of the static trampoline that was used to create the dynamic one (it would be identical). Finally, I noticed that the skip values of the stack tracing were out of whack. I went through and fixed them up" * tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Update stack trace skipping for ORC unwinder ftrace, orc, x86: Handle ftrace dynamically allocated trampolines x86/ftrace: Fix ORC unwinding from ftrace handlers
2018-01-24Revert "module: Add retpoline tag to VERMAGIC"Greg Kroah-Hartman
This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12. Turns out distros do not want to make retpoline as part of their "ABI", so this patch should not have been merged. Sorry Andi, this was my fault, I suggested it when your original patch was the "correct" way of doing this instead. Reported-by: Jiri Kosina <jikos@kernel.org> Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC") Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-24ASoC: soc-pcm: rename .pmdown_time to .use_pmdown_time for ComponentKuninori Morimoto
commit fbb16563c6c2 ("ASoC: snd_soc_component_driver has pmdown_time") added new .pmdown_time which is for inverted version of current .ignore_pmdown_time But it is confusable name. Let's rename it to .use_pmdown_time Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-01-24vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()Miklos Szeredi
Those helpers are going to be used by overlayfs to implement NFS export decode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24sched/core: Fix cpu.max vs. cpuhotplug deadlockPeter Zijlstra
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion: tg_set_cfs_bandwidth() get_online_cpus() cpus_read_lock() cfs_bandwidth_usage_inc() static_key_slow_inc() cpus_read_lock() Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-23ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABELBen Hutchings
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used. The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level. Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net/sched: act_csum: don't use spinlock in the fast pathDavide Caratti
use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on act_csum configuration, to reduce the effects of contention in the data path when multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24ocxl: Add a kernel API for other opencapi driversFrederic Barrat
Some of the functions done by the generic driver should also be needed by other opencapi drivers: attaching a context to an adapter, translation fault handling, AFU interrupt allocation... So to avoid code duplication, the driver provides a kernel API that other drivers can use, similar to calling a in-kernel library. It is still a bit theoretical, for lack of real hardware, and will likely need adjustements down the road. But we used the cxlflash driver as a guinea pig. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24ocxl: Add AFU interrupt supportFrederic Barrat
Add user APIs through ioctl to allocate, free, and be notified of an AFU interrupt. For opencapi, an AFU can trigger an interrupt on the host by sending a specific command targeting a 64-bit object handle. On POWER9, this is implemented by mapping a special page in the address space of a process and a write to that page will trigger an interrupt. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24ocxl: Driver code for 'generic' opencapi devicesFrederic Barrat
Add an ocxl driver to handle generic opencapi devices. Of course, it's not meant to be the only opencapi driver, any device is free to implement its own. But if a host application only needs basic services like attaching to an opencapi adapter, have translation faults handled or allocate AFU interrupts, it should suffice. The AFU config space must follow the opencapi specification and use the expected vendor/device ID to be seen by the generic driver. The driver exposes the device AFUs as a char device in /dev/ocxl/ Note that the driver currently doesn't handle memory attached to the opencapi device. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24powerpc/powernv: Capture actag information for the deviceFrederic Barrat
In the opencapi protocol, host memory contexts are referenced by a 'actag'. During setup, a driver must tell the device how many actags it can used, and what values are acceptable. On POWER9, the NPU can handle 64 actags per link, so they must be shared between all the PCI functions of the link. To get a global picture of how many actags are used by each AFU of every function, we capture some data at the end of PCI enumeration, so that actags can be shared fairly if needed. This is not powernv specific per say, but rather a consequence of the opencapi configuration specification being quite general. The number of available actags on POWER9 makes it more likely to be hit. This is somewhat mitigated by the fact that existing AFUs are coded by requesting a reasonable count of actags and existing devices carry only one AFU. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-23ftrace, orc, x86: Handle ftrace dynamically allocated trampolinesSteven Rostedt (VMware)
The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23PCI: Add pci_enable_atomic_ops_to_root()Jay Cornwall
The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic transctions to be requested by, routed through and completed by PCIe components. Routing and completion do not require software support. Component support for each is detectable via the DEVCAP2 register. A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is set. This should be set only if the Completer and all intermediate routing elements support AtomicOps. A concrete example is the AMD Fiji-class GPU (which is capable of making AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing) with a Haswell host bridge (advertising AtomicOp completion support). Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp requests. This checks to be sure the Root Port supports completion of the desired AtomicOp sizes and the path to the Root Port supports routing the AtomicOps. Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> [bhelgaas: changelog, comments, whitespace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23Merge tag 'nfs-rdma-for-4.16-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFS-over-RDMA client updates for Linux 4.16 New features: - xprtrdma tracepoints Bugfixes and cleanups: - Fix memory leak if rpcrdma_buffer_create() fails - Fix allocating extra rpcrdma_reps for the backchannel - Remove various unused and redundant variables and lock cycles - Fix IPv6 support in xprt_rdma_set_port() - Fix memory leak by calling buf_free for callback replies - Fix "bytes registered" accounting - Fix kernel-doc comments - SUNRPC tracepoint cleanups for consistent information - Optimizations for __rpc_execute()
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepointBenjamin Coddington
Backchannel tasks will not have a reference to the rpc_clnt. Return -1 for cl_clid in that case. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-23mm/memory_failure: Remove unused trapno from memory_failureEric W. Biederman
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc, powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO (alpha, metag, sparc, and tile). These two sets of architectures do not interesect so remove the trapno paramater to remove confusion. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-23IB/srp: Add RDMA/CM supportBart Van Assche
Since the SRP_LOGIN_REQ defined in the SRP standard is larger than what fits in the RDMA/CM login request private data, introduce a new login request format for the RDMA/CM. Note: since srp_daemon and ibsrpdm rely on the subnet manager and since there is no equivalent of the IB subnet manager in non-IB networks, login has to be performed manually for non-IB networks. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Doug Ledford <dledford@redhat.com>