summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-02-21seccomp, ptrace: switch get_metadata types to arch independentTycho Andersen
Commit 26500475ac1b ("ptrace, seccomp: add support for retrieving seccomp metadata") introduced `struct seccomp_metadata`, which contained unsigned longs that should be arch independent. The type of the flags member was chosen to match the corresponding argument to seccomp(), and so we need something at least as big as unsigned long. My understanding is that __u64 should fit the bill, so let's switch both types to that. While this is userspace facing, it was only introduced in 4.16-rc2, and so should be safe assuming it goes in before then. Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-28ptrace, seccomp: add support for retrieving seccomp metadataTycho Andersen
With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these flags for checkpoint restore, since they describe the state of a filter. So, let's add PTRACE_SECCOMP_GET_METADATA, similar to ..._GET_FILTER, which returns the metadata of the nth filter (right now, just the flags). Hopefully this will be future proof, and new per-filter metadata can be added to this struct. Signed-off-by: Tycho Andersen <tycho@docker.com> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-26Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Glexiner: - unbreak the irq trigger type check for legacy platforms - a handful fixes for ARM GIC v3/4 interrupt controllers - a few trivial fixes all over the place * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Make - vs ?: Precedence explicit irqchip/imgpdc: Use resource_size function on resource object irqchip/qcom: Fix u32 comparison with value less than zero irqchip/exiu: Fix return value check in exiu_init() irqchip/gic-v3-its: Remove artificial dependency on PCI irqchip/gic-v4: Add forward definition of struct irq_domain_ops irqchip/gic-v3: pr_err() strings should end with newlines irqchip/s3c24xx: pr_err() strings should end with newlines irqchip/gic-v3: Fix ppi-partitions lookup irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails genirq: Track whether the trigger type has been set
2017-11-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - topology enumeration fixes - KASAN fix - two entry fixes (not yet the big series related to KASLR) - remove obsolete code - instruction decoder fix - better /dev/mem sanity checks, hopefully working better this time - pkeys fixes - two ACPI fixes - 5-level paging related fixes - UMIP fixes that should make application visible faults more debuggable - boot fix for weird virtualization environment * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/decoder: Add new TEST instruction pattern x86/PCI: Remove unused HyperTransport interrupt support x86/umip: Fix insn_get_code_seg_params()'s return value x86/boot/KASLR: Remove unused variable x86/entry/64: Add missing irqflags tracing to native_load_gs_index() x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing x86/pkeys/selftests: Fix protection keys write() warning x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey' x86/mpx/selftests: Fix up weird arrays x86/pkeys: Update documentation about availability x86/umip: Print a warning into the syslog if UMIP-protected instructions are used x86/smpboot: Fix __max_logical_packages estimate x86/topology: Avoid wasting 128k for package id array perf/x86/intel/uncore: Cache logical pkg id in uncore driver x86/acpi: Reduce code duplication in mp_override_legacy_irq() x86/acpi: Handle SCI interrupts above legacy space gracefully x86/boot: Fix boot failure when SMP MP-table is based at 0 x86/mm: Limit mmap() of /dev/mem to valid physical addresses x86/selftests: Add test for mapping placement for 5-level paging ...
2017-11-26Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: a documentation fix, a Sparse warning fix and a debugging fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix task state recording/printout sched/deadline: Don't use dubious signed bitfields sched/deadline: Fix the description of runtime accounting in the documentation
2017-11-26Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "A handful of objtool fixes, most of them related to making the UAPI header-syncing warnings easier to read and easier to act upon" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Sync objtool UAPI header objtool: Fix cross-build objtool: Move kernel headers/code sync check to a script objtool: Move synced files to their original relative locations objtool: Make unreachable annotation inline asms explicitly volatile objtool: Add a comment for the unreachable annotation macros
2017-11-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - The final conversion of timer wheel timers to timer_setup(). A few manual conversions and a large coccinelle assisted sweep and the removal of the old initialization mechanisms and the related code. - Remove the now unused VSYSCALL update code - Fix permissions of /proc/timer_list. I still need to get rid of that file completely - Rename a misnomed clocksource function and remove a stale declaration * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) m68k/macboing: Fix missed timer callback assignment treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts timer: Remove redundant __setup_timer*() macros timer: Pass function down to initialization routines timer: Remove unused data arguments from macros timer: Switch callback prototype to take struct timer_list * argument timer: Pass timer_list pointer to callbacks unconditionally Coccinelle: Remove setup_timer.cocci timer: Remove setup_*timer() interface timer: Remove init_timer() interface treewide: setup_timer() -> timer_setup() (2 field) treewide: setup_timer() -> timer_setup() treewide: init_timer() -> setup_timer() treewide: Switch DEFINE_TIMER callbacks to struct timer_list * s390: cmm: Convert timers to use timer_setup() lightnvm: Convert timers to use timer_setup() drivers/net: cris: Convert timers to use timer_setup() drm/vc4: Convert timers to use timer_setup() block/laptop_mode: Convert timers to use timer_setup() net/atm/mpc: Avoid open-coded assignment of timer callback function ...
2017-11-24Merge tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "Trimmed second batch of KVM changes for Linux 4.15: - GICv4 Support for KVM/ARM - re-introduce support for CPUs without virtual NMI (cc stable) and allow testing of KVM without virtual NMI on available CPUs - fix long-standing performance issues with assigned devices on AMD (cc stable)" * tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) kvm: vmx: Allow disabling virtual NMI support kvm: vmx: Reinstate support for CPUs without virtual NMI KVM: SVM: obey guest PAT KVM: arm/arm64: Don't queue VLPIs on INV/INVALL KVM: arm/arm64: Fix GICv4 ITS initialization issues KVM: arm/arm64: GICv4: Theory of operations KVM: arm/arm64: GICv4: Enable VLPI support KVM: arm/arm64: GICv4: Prevent userspace from changing doorbell affinity KVM: arm/arm64: GICv4: Prevent a VM using GICv4 from being saved KVM: arm/arm64: GICv4: Enable virtual cpuif if VLPIs can be delivered KVM: arm/arm64: GICv4: Hook vPE scheduling into vgic flush/sync KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source KVM: arm/arm64: GICv4: Add doorbell interrupt handling KVM: arm/arm64: GICv4: Use pending_last as a scheduling hint KVM: arm/arm64: GICv4: Handle INVALL applied to a vPE KVM: arm/arm64: GICv4: Propagate property updates to VLPIs KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE KVM: arm/arm64: GICv4: Handle CLEAR applied to a VLPI KVM: arm/arm64: GICv4: Propagate affinity changes to the physical ITS KVM: arm/arm64: GICv4: Unmap VLPI when freeing an LPI ...
2017-11-24Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series is predominantly bug-fixes, with a few small improvements that have been outstanding over the last release cycle. As usual, the associated bug-fixes have CC' tags for stable. Also, things have been particularly quiet wrt new developments the last months, with most folks continuing to focus on stability atop 4.x stable kernels for their respective production configurations. Also at this point, the stable trees have been synced up with mainline. This will continue to be a priority, as production users tend to run exclusively atop stable kernels, a few releases behind mainline. The highlights include: - Fix PR PREEMPT_AND_ABORT null pointer dereference regression in v4.11+ (tangwenji) - Fix OOPs during removing TCMU device (Xiubo Li + Zhang Zhuoyu) - Add netlink command reply supported option for each device (Kenjiro Nakayama) - cxgbit: Abort the TCP connection in case of data out timeout (Varun Prakash) - Fix PR/ALUA file path truncation (David Disseldorp) - Fix double se_cmd completion during ->cmd_time_out (Mike Christie) - Fix QUEUE_FULL + SCSI task attribute handling in 4.1+ (Bryant Ly + nab) - Fix quiese during transport_write_pending_qf endless loop (nab) - Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK in 3.14+ (Don White + nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (35 commits) tcmu: Add a missing unlock on an error path tcmu: Fix some memory corruption iscsi-target: Fix non-immediate TMR reference leak iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK target: Fix quiese during transport_write_pending_qf endless loop target: Fix caw_sem leak in transport_generic_request_failure target: Fix QUEUE_FULL + SCSI task attribute handling iSCSI-target: Use common error handling code in iscsi_decode_text_input() target/iscsi: Detect conn_cmd_list corruption early target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd() target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype target/iscsi: Fix endianness in an error message target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it target/iscsi: Define OFFLOAD_BUF_SIZE once target: Inline transport_put_cmd() target: Suppress gcc 7 fallthrough warnings target: Move a declaration of a global variable into a header file tcmu: fix double se_cmd completion target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES ...
2017-11-24sched/debug: Fix task state recording/printoutThomas Gleixner
The recent conversion of the task state recording to use task_state_index() broke the sched_switch tracepoint task state output. task_state_index() returns surprisingly an index (0-7) which is then printed with __print_flags() applying bitmasks. Not really working and resulting in weird states like 'prev_state=t' instead of 'prev_state=I'. Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a bitmask from the return value of task_state_index() and store it in entry->prev_state, which makes __print_flags() work as expected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho. 2) bpf offload bug fixes from Jakub Kicinski. 3) Fix bpf verifier to NOP out code which is dead at run time because due to branch pruning the verifier will not explore such instructions. From Alexei Starovoitov. 4) Fix crash when deleting secondary chains in packet scheduler classifier. From Roman Kapl. 5) Fix buffer management bugs in smc, from Ursula Braun. 6) Fix regression in anycast route handling, from David Ahern. 7) Fix link settings regression in r8169, from Tobias Jakobi. 8) Add back enough UFO support so that live migration still works, from Willem de Bruijn. 9) Linearize enough packet data for the full extent to which the ipvlan code will inspect the packet headers, from Gao Feng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) ipvlan: Fix insufficient skb linear check for ipv6 icmp ipvlan: Fix insufficient skb linear check for arp geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6 net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY net: accept UFO datagrams from tuntap and packet net: realtek: r8169: implement set_link_ksettings() net: ipv6: Fixup device for anycast routes during copy net/smc: Fix preinitialization of buf_desc in __smc_buf_create() net/smc: use sk_rcvbuf as start for rmb creation ipv6: Do not consider linkdown nexthops during multipath net: sched: fix crash when deleting secondary chains net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE bpf: fix branch pruning logic bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: remove explicit handling of 0 for arg2 in bpf_probe_read bpf: introduce ARG_PTR_TO_MEM_OR_NULL i40evf: Use smp_rmb rather than read_barrier_depends fm10k: Use smp_rmb rather than read_barrier_depends igb: Use smp_rmb rather than read_barrier_depends ...
2017-11-23Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two basic fixes: one for the sparse problem with the blacklist flags and another for a hang forever in bnx2i" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Use 'blist_flags_t' for scsi_devinfo flags scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
2017-11-23Merge tag 'sound-fix-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All commits found here are small fixes for regression or stable: - PCM timestamp behavior fix that could be seen as a regression - Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl - HD-audio HDMI/DP channel mapping fix for 32bit archs - Fix the previous fix for HD-audio initialization code - More hardening USB-audio against malicious USB descriptors - HD-audio quirks/fixes (Realtek codec, AMD controller) - Missing help text for the recent Intel SST kconfig change" * tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda: Add Raven PCI ID ALSA: hda/realtek - Fix ALC700 family no sound issue ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization ALSA: usb-audio: Add sanity checks in v2 clock parsers ALSA: usb-audio: Fix potential zero-division at parsing FU ALSA: usb-audio: Fix potential out-of-bound access at parsing SU ALSA: usb-audio: Add sanity checks to FE parser ALSA: timer: Remove kernel warning at compat ioctl error paths ALSA: pcm: update tstamp only if audio_tstamp changed ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon ALSA: hda: Fix too short HDMI/DP chmap reporting ALSA: usb-audio: uac1: Invalidate ctl on interrupt ALSA: hda/realtek - Fix ALC275 no sound issue ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
2017-11-23Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull more drm updates from Dave Airlie: "Fixes/cleanups for rc1, non-desktop flags for VR - remove the MSM dt-bindings file Rob managed to push in the previous pull. - add a property/edid quirk to denote HMD devices, I had these hanging around for a few weeks and Keith had done some work on them, they are fairly self contained and small, and only affect people using HTC Vive VR headsets so far. - amdgpu, tegra, tilcdc, fsl fixes - some imx-drm cleanups I missed, these seemed pretty small, and no reason to hold off. I have one TTM regression fix (fixes bochs-vga in qemu) sitting locally awaiting review I'll probably send that in a separate pull request tomorrow" * tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits) dt-bindings: remove file that was added accidentally drm/edid: quirk HTC vive headset as non-desktop. [v2] drm/fb: add support for not enabling fbcon on non-desktop displays [v2] drm: add connector info/property for non-desktop displays [v2] drm/amdgpu: fix rmmod KCQ disable failed error drm/amdgpu: fix kernel hang when starting VNC server drm/amdgpu: don't skip attributes when powerplay is enabled drm/amd/pp: fix typecast error in powerplay. drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support drm/tegra: sor: Reimplement pad clock Revert "drm/radeon: dont switch vt on suspend" drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence drm/amd/powerplay: fix unfreeze level smc message for smu7 drm/amdgpu:fix memleak drm/amdgpu:fix memleak in takedown drm/amd/pp: fix dpm randomly failed on Vega10 drm/amdgpu: set f_mapping on exported DMA-bufs drm/amdgpu: Properly allocate VM invalidate eng v2 drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume() drm/fsl-dcu: avoid disabling pixel clock twice on suspend ...
2017-11-24Merge tag 'keys-next-20171123' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys Merge keys subsystem changes from David Howells, for v4.15.
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-11-23 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Several BPF offloading fixes, from Jakub. Among others: - Limit offload to cls_bpf and XDP program types only. - Move device validation into the driver and don't make any assumptions about the device in the classifier due to shared blocks semantics. - Don't pass offloaded XDP program into the driver when it should be run in native XDP instead. Offloaded ones are not JITed for the host in such cases. - Don't destroy device offload state when moved to another namespace. - Revert dumping offload info into user space for now, since ifindex alone is not sufficient. This will be redone properly for bpf-next tree. 2) Fix test_verifier to avoid using bpf_probe_write_user() helper in test cases, since it's dumping a warning into kernel log which may confuse users when only running tests. Switch to use bpf_trace_printk() instead, from Yonghong. 3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics before it becomes uabi, from Gianluca. More specifically: - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only by bpf_csum_diff(), where the argument is either a valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO then enforces a valid pointer in case of non-0 size or a valid pointer or NULL in case of size 0. Given that, the semantics for ARG_PTR_TO_MEM in combination with ARG_CONST_SIZE_OR_ZERO are now such that in case of size 0, the pointer must always be valid and cannot be NULL. This fix in semantics allows for bpf_probe_read() to drop the recently added size == 0 check in the helper that would become part of uabi otherwise once released. At the same time we can then fix bpf_probe_read_str() and bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO instead of ARG_CONST_SIZE in order to fix recently reported issues by Arnaldo et al, where LLVM optimizes two boundary checks into a single one for unknown variables where the verifier looses track of the variable bounds and thus rejects valid programs otherwise. 4) A fix for the verifier for the case when it detects comparison of two constants where the branch is guaranteed to not be taken at runtime. Verifier will rightfully prune the exploration of such paths, but we still pass the program to JITs, where they would complain about using reserved fields, etc. Track such dead instructions and sanitize them with mov r0,r0. Rejection is not possible since LLVM may generate them for valid C code and doesn't do as much data flow analysis as verifier. For bpf-next we might implement removal of such dead code and adjust branches instead. Fix from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: accept UFO datagrams from tuntap and packetWillem de Bruijn
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-23Merge tag 'for-linus-timers-conversion-final-v4.15-rc1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into timers/urgent Pull the last batch of manual timer conversions from Kees Cook: - final batch of "non trivial" timer conversions (multi-tree dependencies, things Coccinelle couldn't handle, etc). - treewide conversions via Coccinelle, in 4 steps: - DEFINE_TIMER() functions converted to struct timer_list * argument - init_timer() -> setup_timer() - setup_timer() -> timer_setup() - setup_timer() -> timer_setup() (with a single embedded structure) - deprecated timer API removals (init_timer(), setup_*timer()) - finalization of new API (remove global casts)
2017-11-23bpf: fix branch pruning logicAlexei Starovoitov
when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-22Merge tag 'for-linus-20171120' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Richard Weinberger: "General changes: - Unconfuse get_unmapped_area and point/unpoint driver methods - New partition parser: sharpslpart - Kill GENERIC_IO - Various fixes NAND changes: - Add a flag to mark NANDs that require 3 address cycles to encode a page address - Set a default ECC/free layout when NAND_ECC_NONE is requested - Fix a bug in panic_nand_write() - Another batch of cleanups for the denali driver - Fix PM support in the atmel driver - Remove support for platform data in the omap driver - Fix subpage write in the omap driver - Fix irq handling in the mtk driver - Change link order of mtk_ecc and mtk_nand drivers to speed up boot time - Change log level of ECC error messages in the mxc driver - Patch the pxa3xx driver to support Armada 8k platforms - Add BAM DMA support to the qcom driver - Convert gpio-nand to the GPIO desc API - Fix ECC handling in the mt29f driver SPI-NOR changes: - Introduce system power management support - New mechanism to select the proper .quad_enable() hook by JEDEC ID, when needed, instead of only by manufacturer ID - Add support to new memory parts from Gigadevice, Winbond, Macronix and Everspin - Maintainance for Cadence, Intel, Mediatek and STM32 drivers" * tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd: (85 commits) mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid mtd: sharpslpart: Add sharpslpart partition parser mtd: Add sanity checks in mtd_write/read_oob() mtd: remove the get_unmapped_area method mtd: implement mtd_get_unmapped_area() using the point method mtd: chips/map_rom.c: implement point and unpoint methods mtd: chips/map_ram.c: implement point and unpoint methods mtd: mtdram: properly handle the phys argument in the point method mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD' mtd: slram: use memremap() instead of ioremap() kconfig: kill off GENERIC_IO option mtd: Fix C++ comment in include/linux/mtd/mtd.h mtd: constify mtd_partition mtd: plat-ram: Replace manual resource management by devm mtd: nand: Fix writing mtdoops to nand flash. mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID mtd: nand: mtk: fix infinite ECC decode IRQ issue mtd: spi-nor: Add support for mr25h128 mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o mtd: spi-nor: enable 4B opcodes for mx66l51235l ...
2017-11-23dt-bindings: remove file that was added accidentallyRob Clark
I think this snuck in when I applied the patch for f97decac5f4c (didn't apply cleanly, required some manual applying + git-add). It is unused and shouldn't be here. My bad. Fixes: f97decac5f4c "drm/msm: Support multiple ringbuffers" Signed-off-by: Rob Clark <robdclark@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-23drm: add connector info/property for non-desktop displays [v2]Dave Airlie
This adds the infrastructure needed to quirk displays using edid and to mark them a non-desktop. A non-desktop display is one which shouldn't normally be included as a part of a desktop environment. This is meant to cover head mounted devices like HTC Vive. v2: Change description from non-standard to non-desktop, add docs Reviewed-by: Keith Packard <keithp@keithp.com> Signed-off-by: Dave Airlie <airlied@redhat.com> fixup docs
2017-11-22bpf: introduce ARG_PTR_TO_MEM_OR_NULLGianluca Borello
With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO and the verifier can prove the value of this next argument is 0. However, most helpers are just interested in handling <!NULL, 0>, so forcing them to deal with <NULL, 0> makes the implementation of those helpers more complicated for no apparent benefits, requiring them to explicitly handle those corner cases with checks that bpf programs could start relying upon, preventing the possibility of removing them later. Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case. Currently, the only helper that needs this is bpf_csum_diff_proto(), so change arg1 and arg3 to this new type as well. Also add a new battery of tests that explicitly test the !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the various <NULL, 0> variations are focused on bpf_csum_diff, so cover also other helpers. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-22ALSA: hda - Fix yet remaining issue with vmaster 0dB initializationTakashi Iwai
The previous fix for addressing the breakage in vmaster slave initialization, commit a91d66129fb9 ("ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removal"), introduced a new helper to process over each slave kctl. However, this helper passes only the original kctl, not the virtual slave kctl. As a result, HD-audio driver (which is the only user so far) couldn't initialize the slave correctly because it's trying to update the value directly with the original kctl, not with the mapped kctl. This patch fixes the situation again by passing both the mapped slaved and original slave kctls to the function. Luckily there is a single caller as of now, so changing the call signature is no big matter. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197959 Fixes: a91d66129fb9 ("ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removal") Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-21treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook
With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove redundant __setup_timer*() macrosKees Cook
With __init_timer*() now matching __setup_timer*(), remove the redundant internal interface, clean up the resulting definitions and add more documentation. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Pass function down to initialization routinesKees Cook
In preparation for removing more macros, pass the function down to the initialization routines instead of doing it in macros. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove unused data arguments from macrosKees Cook
With the .data field removed, the ignored data arguments in timer macros can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Switch callback prototype to take struct timer_list * argumentKees Cook
Since all callbacks have been converted, we can switch the core prototype to "struct timer_list *" now too. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Pass timer_list pointer to callbacks unconditionallyKees Cook
Now that all timer callbacks are already taking their struct timer_list pointer as the callback argument, just do this unconditionally and remove the .data field. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove setup_*timer() interfaceKees Cook
With all callers converted to timer_setup(), the old setup_*timer() interface can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove init_timer() interfaceKees Cook
All users of init_timer() have been updated. Remove the ancient interface. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21block/laptop_mode: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: linux-block@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix a reference to a module parameter which was lost during the GREv6 receive path rewrite, from Alexey Kodanev. 2) Fix deref before NULL check in ipheth, from Gustavo A. R. Silva. 3) RCU read lock imbalance in tun_build_skb(), from Xin Long. 4) Some stragglers from the mac80211 folks: a) Timer conversions from Kees Cook b) Fix some sequencing issue when cfg80211 is built statically, from Johannes Berg c) Memory leak in mac80211_hwsim, from Ben Hutchings. 5) Add new qmi_wwan device ID, from Sebastian Sjoholm. 6) Fix use after free in tipc, from Jon Maloy. 7) Missing kdoc in nfp driver, from Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: nfp: flower: add missing kdoc tipc: fix access of released memory net: qmi_wwan: add Quectel BG96 2c7c:0296 mlxsw: spectrum: Do not try to create non-existing ports during unsplit mac80211: properly free requested-but-not-started TX agg sessions mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl() cfg80211: initialize regulatory keys/database later mac80211: aggregation: Convert timers to use timer_setup() nl80211: don't expose wdev->ssid for most interfaces mac80211: Convert timers to use timer_setup() net: vxge: Fix some indentation issues net: ena: fix race condition between device reset and link up setup r8169: use same RTL8111EVL green settings as in vendor driver r8169: fix RTL8111EVL EEE and green settings tun: fix rcu_read_lock imbalance in tun_build_skb tcp: when scheduling TLP, time of RTO should account for current ACK usbnet: ipheth: fix potential null pointer dereference in ipheth_carrier_set gre6: use log_ecn_error module parameter in ip6_tnl_rcv()
2017-11-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - print the warning about dropped messages on consoles on a separate line. It makes it more legible. - one typo fix and small code clean up. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: added new line symbol after warning about dropped messages printk: fix typo in printk_safe.c printk: simplify no_printk()
2017-11-21sched/deadline: Don't use dubious signed bitfieldsDan Carpenter
It doesn't cause a run-time bug, but these bitfields should be unsigned. When it's signed ->dl_throttled is set to either 0 or -1, instead of 0 and 1 as expected. The sched.h file is included into tons of places so Sparse generates a flood of warnings like this: ./include/linux/sched.h:477:54: error: dubious one-bit signed bitfield Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Cc: luca abeni <luca.abeni@santannapisa.it> Link: http://lkml.kernel.org/r/20171013070121.dzcncojuj2f4utij@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-20Merge tag 'fbdev-v4.15' of git://github.com/bzolnier/linuxLinus Torvalds
Pull fbdev updates from Bartlomiej Zolnierkiewicz: "There is nothing really major here (though removal of the dead igafb driver stands out in diffstat). Summary: - convert timers to use timer_setup() (Kees Cook, Thierry Reding) - fix panels support on iMX boards in mxsfb driver (Stefan Agner) - fix timeout on EDID read in udlfb driver (Ladislav Michl) - add missing modes to fix out of bounds access in controlfb driver (Geert Uytterhoeven) - update initialisation paths in sa1100fb driver to be more robust (Russell King) - fix error handling path of ->probe method in au1200fb driver (Christophe JAILLET) - fix handling of cases when either panel or crt is defined in sm501fb driver (Sudip Mukherjee, Colin Ian King) - add ability to the Goldfish FB driver to be recognized by OS via DT (Aleksandar Markovic) - structures constifications (Bhumika Goyal) - misc fixes (Allen Pais, Gustavo A. R. Silva, Dan Carpenter) - misc cleanups (Colin Ian King, Himanshu Jha, Markus Elfring) - remove dead igafb driver" * tag 'fbdev-v4.15' of git://github.com/bzolnier/linux: (42 commits) OMAPFB: prevent buffer underflow in omapfb_parse_vram_param() video: fbdev: sm501fb: fix potential null pointer dereference on fbi fbcon: Initialize ops->info early video: fbdev: Convert timers to use timer_setup() video: fbdev: pxa3xx_gcu: Convert timers to use timer_setup() fbdev: controlfb: Add missing modes to fix out of bounds access video: fbdev: sis_main: mark expected switch fall-throughs video: fbdev: cirrusfb: mark expected switch fall-throughs video: fbdev: aty: radeon_pm: mark expected switch fall-throughs video: fbdev: sm501fb: mark expected switch fall-through in sm501fb_blank_crt video: fbdev: intelfb: remove redundant variables video/fbdev/dnfb: Use common error handling code in dnfb_probe() sm501fb: suspend and resume fb if it exists sm501fb: unregister framebuffer only if registered sm501fb: deallocate colormap only if allocated video: goldfishfb: Add support for device tree bindings Documentation: Add device tree binding for Goldfish FB driver video: udlfb: Fix read EDID timeout video: fbdev: remove dead igafb driver video: fbdev: mxsfb: fix pixelclock polarity ...
2017-11-21bpf: make bpf_prog_offload_verifier_prep() static inlineJakub Kicinski
Header implementation of bpf_prog_offload_verifier_prep() which is used if CONFIG_NET=n should be a static inline. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: revert report offload info to user spaceJakub Kicinski
This reverts commit bd601b6ada11 ("bpf: report offload info to user space"). The ifindex by itself is not sufficient, we should provide information on which network namespace this ifindex belongs to. After considering some options we concluded that it's best to just remove this API for now, and rework it in -next. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: turn bpf_prog_get_type() into a wrapperJakub Kicinski
bpf_prog_get_type() is identical to bpf_prog_get_type_dev(), with false passed as attach_drv. Instead of keeping it as an exported symbol turn it into static inline wrapper. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: offload: move offload device validation out to the driversJakub Kicinski
With TC shared block changes we can't depend on correct netdev pointer being available in cls_bpf. Move the device validation to the driver. Core will only make sure that offloaded programs are always attached in the driver (or in HW by the driver). We trust that drivers which implement offload callbacks will perform necessary checks. Moving the checks to the driver is generally a useful thing, in practice the check should be against a switchdev instance, not a netdev, given that most ASICs will probably allow using the same program on many ports. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: offload: rename the ifindex fieldJakub Kicinski
bpf_target_prog seems long and clunky, rename it to prog_ifindex. We don't want to call this field just ifindex, because maps may need a similar field in the future and bpf_attr members for programs and maps are unnamed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-19Merge tag 'ntb-4.15' of git://github.com/jonmason/ntbLinus Torvalds
Pull ntb updates from Jon Mason: "Support for the switchtec ntb and related changes. Also, a couple of bug fixes" [ The timing isn't great. I had asked people to send me pull requests before my family vacation, and this code has not even been in linux-next as far as I can tell. But Logan Gunthorpe pleaded for its inclusion because the Switchtec driver has apparently been around for a while, just never in linux-next - Linus ] * tag 'ntb-4.15' of git://github.com/jonmason/ntb: ntb: intel: remove b2b memory window workaround for Skylake NTB NTB: make idt_89hpes_cfg const NTB: switchtec_ntb: Update switchtec documentation with notes for NTB NTB: switchtec_ntb: Add memory window support NTB: switchtec_ntb: Implement scratchpad registers NTB: switchtec_ntb: Implement doorbell registers NTB: switchtec_ntb: Add link management NTB: switchtec_ntb: Add skeleton NTB driver NTB: switchtec_ntb: Initialize hardware for doorbells and messages NTB: switchtec_ntb: Initialize hardware for memory windows NTB: switchtec_ntb: Introduce initial NTB driver NTB: Add check and comment for link up to mw_count() and mw_get_align() NTB: Ensure ntb_mw_get_align() is only called when the link is up NTB: switchtec: Add link event notifier callback NTB: switchtec: Add NTB hardware register definitions NTB: switchtec: Export class symbol for use in upper layer driver NTB: switchtec: Move structure definitions into a common header ntb: update maintainer list for Intel NTB driver
2017-11-19tcp: when scheduling TLP, time of RTO should account for current ACKNeal Cardwell
Fix the TLP scheduling logic so that when scheduling a TLP probe, we ensure that the estimated time at which an RTO would fire accounts for the fact that ACKs indicating forward progress should push back RTO times. After the following fix: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") we had an unintentional behavior change in the following kind of scenario: suppose the RTT variance has been very low recently. Then suppose we send out a flight of N packets and our RTT is 100ms: t=0: send a flight of N packets t=100ms: receive an ACK for N-1 packets The response before df92c8394e6e that was: -> schedule a TLP for now + RTO_interval The response after df92c8394e6e is: -> schedule a TLP for t=0 + RTO_interval Since RTO_interval = srtt + RTT_variance, this means that we have scheduled a TLP timer at a point in the future that only accounts for RTT_variance. If the RTT_variance term is small, this means that the timer fires soon. Before df92c8394e6e this would not happen, because in that code, when we receive an ACK for a prefix of flight, we did: 1) Near the top of tcp_ack(), switch from TLP timer to RTO at write_queue_head->paket_tx_time + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO to TLP at now + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); In df92c8394e6e we removed that 3-phase dance, and instead directly set the TLP timer once: we set the TLP timer in cases like this to write_queue_head->packet_tx_time + RTO_interval. So if the RTT variance is small, then this means that this is setting the TLP timer to fire quite soon. This means if the ACK for the tail of the flight takes longer than an RTT to arrive (often due to delayed ACKs), then the TLP timer fires too quickly. Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-18NTB: switchtec_ntb: Add skeleton NTB driverLogan Gunthorpe
Add a skeleton NTB driver which will be filled out in subsequent patches. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec_ntb: Introduce initial NTB driverLogan Gunthorpe
Seeing the Switchtec NTB hardware shares the same endpoint as the management endpoint we utilize the class_interface API to register an NTB driver for every Switchtec device in the system that has the NTB class code. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: Add check and comment for link up to mw_count() and mw_get_align()Logan Gunthorpe
Adds a comment and a check to ntb_mw_get_align() so that it always fails if the function is called before the link is up. Also adds a comment to ntb_mw_count() to note that it may return 0 if it is called before the link is up. This is to prevent accidental mis-use in clients that are testing on hardware that this doesn't matter for. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec: Add link event notifier callbackLogan Gunthorpe
In order for the Switchtec NTB code to handle link change events we create a notifier callback in the switchtec code which gets called whenever an appropriate event interrupt occurs. In order to preserve userspace's ability to follow these events, we compare the event count with a stored copy from last time we checked. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec: Add NTB hardware register definitionsLogan Gunthorpe
There are two additional regions: ctrl and dbmsg. The first is for generic NTB control and memory windows. The second is for doorbells and message registers. This patch also adds a number of related constants for using these registers. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>