summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-08net: core: move push MPLS functionality from OvS to core helperJohn Hurley
Open vSwitch provides code to push an MPLS header to a packet. In preparation for supporting this in TC, move the push code to an skb helper that can be reused. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08skbuff: increase verbosity when dumping skb dataWillem de Bruijn
skb_warn_bad_offload and netdev_rx_csum_fault trigger on hard to debug issues. Dump more state and the header. Optionally dump the entire packet and linear segment. This is required to debug checksum bugs that may include bytes past skb_tail_pointer(). Both call sites call this function inside a net_ratelimit() block. Limit full packet log further to a hard limit of can_dump_full (5). Based on an earlier patch by Cong Wang, see link below. Changes v1 -> v2 - dump frag_list only on full_pkt Link: https://patchwork.ozlabs.org/patch/1000841/ Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv6: elide flowlabel check if no exclusive leases existWillem de Bruijn
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO. If not set, by default an autogenerated flowlabel is selected. Explicit flowlabels require a control operation per label plus a datapath check on every connection (every datagram if unconnected). This is particularly expensive on unconnected sockets multiplexing many flows, such as QUIC. In the common case, where no lease is exclusive, the check can be safely elided, as both lease request and check trivially succeed. Indeed, autoflowlabel does the same even with exclusive leases. Elide the check if no process has requested an exclusive lease. fl6_sock_lookup previously returns either a reference to a lease or NULL to denote failure. Modify to return a real error and update all callers. On return NULL, they can use the label and will elide the atomic_dec in fl6_sock_release. This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Changes RFC->v1: - use static_key_false_deferred to rate limit jump label operations - call static_key_deferred_flush to stop timers on exit - move decrement out of RCU context - defer optimization also if opt data is associated with a lease - updated all fp6_sock_lookup callers, not just udp Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge tag 'keys-namespace-20190627' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring namespacing from David Howells: "These patches help make keys and keyrings more namespace aware. Firstly some miscellaneous patches to make the process easier: - Simplify key index_key handling so that the word-sized chunks assoc_array requires don't have to be shifted about, making it easier to add more bits into the key. - Cache the hash value in the key so that we don't have to calculate on every key we examine during a search (it involves a bunch of multiplications). - Allow keying_search() to search non-recursively. Then the main patches: - Make it so that keyring names are per-user_namespace from the point of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible cross-user_namespace. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this. - Move the user and user-session keyrings to the user_namespace rather than the user_struct. This prevents them propagating directly across user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick from the current user_namespace). - Make it possible to include the target namespace in which the key shall operate in the index_key. This will allow the possibility of multiple keys with the same description, but different target domains to be held in the same keyring. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this. - Make it so that keys are implicitly invalidated by removal of a domain tag, causing them to be garbage collected. - Institute a network namespace domain tag that allows keys to be differentiated by the network namespace in which they operate. New keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network domain in force when they are created. - Make it so that the desired network namespace can be handed down into the request_key() mechanism. This allows AFS, NFS, etc. to request keys specific to the network namespace of the superblock. This also means that the keys in the DNS record cache are thenceforth namespaced, provided network filesystems pass the appropriate network namespace down into dns_query(). For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other cache keyrings, such as idmapper keyrings, also need to set the domain tag - for which they need access to the network namespace of the superblock" * tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Pass the network namespace into request_key mechanism keys: Network namespace domain tag keys: Garbage collect keys for which the domain has been removed keys: Include target namespace in match criteria keys: Move the user and user-session keyrings to the user_namespace keys: Namespace keyring names keys: Add a 'recurse' flag for keyring searches keys: Cache the hash value to avoid lots of recalculation keys: Simplify key description management
2019-07-08tcp: Reset bytes_acked and bytes_received when disconnectingChristoph Paasch
If an app is playing tricks to reuse a socket via tcp_disconnect(), bytes_acked/received needs to be reset to 0. Otherwise tcp_info will report the sum of the current and the old connection.. Cc: Eric Dumazet <edumazet@google.com> Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bonding: fix value exported by Netlink for peer_notif_delayVincent Bernat
IFLA_BOND_PEER_NOTIF_DELAY was set to the value of downdelay instead of peer_notif_delay. After this change, the correct value is exported. Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08coallocate socket_wq with socket itselfAl Viro
socket->wq is assign-once, set when we are initializing both struct socket it's in and struct socket_wq it points to. As the matter of fact, the only reason for separate allocation was the ability to RCU-delay freeing of socket_wq. RCU-delaying the freeing of socket itself gets rid of that need, so we can just fold struct socket_wq into the end of struct socket and simplify the life both for sock_alloc_inode() (one allocation instead of two) and for tun/tap oddballs, where we used to embed struct socket and struct socket_wq into the same structure (now - embedding just the struct socket). Note that reference to struct socket_wq in struct sock does remain a reference - that's unchanged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08sockfs: switch to ->free_inode()Al Viro
we do have an RCU-delayed part there already (freeing the wq), so it's not like the pipe situation; moreover, it might be worth considering coallocating wq with the rest of struct sock_alloc. ->sk_wq in struct sock would remain a pointer as it is, but the object it normally points to would be coallocated with struct socket... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge tag 'keys-request-20190626' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull request_key improvements from David Howells: "These are all request_key()-related, including a fix and some improvements: - Fix the lack of a Link permission check on a key found by request_key(), thereby enabling request_key() to link keys that don't grant this permission to the target keyring (which must still grant Write permission). Note that the key must be in the caller's keyrings already to be found. - Invalidate used request_key authentication keys rather than revoking them, so that they get cleaned up immediately rather than hanging around till the expiry time is passed. - Move the RCU locks outwards from the keyring search functions so that a request_key_rcu() can be provided. This can be called in RCU mode, so it can't sleep and can't upcall - but it can be called from LOOKUP_RCU pathwalk mode. - Cache the latest positive result of request_key*() temporarily in task_struct so that filesystems that make a lot of request_key() calls during pathwalk can take advantage of it to avoid having to redo the searching. This requires CONFIG_KEYS_REQUEST_CACHE=y. It is assumed that the key just found is likely to be used multiple times in each step in an RCU pathwalk, and is likely to be reused for the next step too. Note that the cleanup of the cache is done on TIF_NOTIFY_RESUME, just before userspace resumes, and on exit" * tag 'keys-request-20190626' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Kill off request_key_async{,_with_auxdata} keys: Cache result of request_key*() temporarily in task_struct keys: Provide request_key_rcu() keys: Move the RCU locks outwards from the keyring search functions keys: Invalidate used request_key authentication keys keys: Fix request_key() lack of Link perm check on found key
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-09 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Lots of libbpf improvements: i) addition of new APIs to attach BPF programs to tracing entities such as {k,u}probes or tracepoints, ii) improve specification of BTF-defined maps by eliminating the need for data initialization for some of the members, iii) addition of a high-level API for setting up and polling perf buffers for BPF event output helpers, all from Andrii. 2) Add "prog run" subcommand to bpftool in order to test-run programs through the kernel testing infrastructure of BPF, from Quentin. 3) Improve verifier for BPF sockaddr programs to support 8-byte stores for user_ip6 and msg_src_ip6 members given clang tends to generate such stores, from Stanislav. 4) Enable the new BPF JIT zero-extension optimization for further riscv64 ALU ops, from Luke. 5) Fix a bpftool json JIT dump crash on powerpc, from Jiri. 6) Fix an AF_XDP race in generic XDP's receive path, from Ilya. 7) Various smaller fixes from Ilya, Yue and Arnd. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge tag 'keys-misc-20190619' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc keyring updates from David Howells: "These are some miscellaneous keyrings fixes and improvements: - Fix a bunch of warnings from sparse, including missing RCU bits and kdoc-function argument mismatches - Implement a keyctl to allow a key to be moved from one keyring to another, with the option of prohibiting key replacement in the destination keyring. - Grant Link permission to possessors of request_key_auth tokens so that upcall servicing daemons can more easily arrange things such that only the necessary auth key is passed to the actual service program, and not all the auth keys a daemon might possesss. - Improvement in lookup_user_key(). - Implement a keyctl to allow keyrings subsystem capabilities to be queried. The keyutils next branch has commits to make available, document and test the move-key and capabilities code: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log They're currently on the 'next' branch" * tag 'keys-misc-20190619' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Add capability-checking keyctl function keys: Reuse keyring_index_key::desc_len in lookup_user_key() keys: Grant Link permission to possessers of request_key auth keys keys: Add a keyctl to move a key between keyrings keys: Hoist locking out of __key_link_begin() keys: Break bits out of key_unlink() keys: Change keyring_serialise_link_sem to a mutex keys: sparse: Fix kdoc mismatches keys: sparse: Fix incorrect RCU accesses keys: sparse: Fix key_fs[ug]id_changed()
2019-07-08Merge tag 'selinux-pr-20190702' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "Like the audit pull request this is a little early due to some upcoming vacation plans and uncertain network access while I'm away. Also like the audit PR, the list of patches here is pretty minor, the highlights include: - Explicitly use __le variables to make sure "sparse" can verify proper byte endian handling. - Remove some BUG_ON()s that are no longer needed. - Allow zero-byte writes to the "keycreate" procfs attribute without requiring key:create to make it easier for userspace to reset the keycreate label. - Consistently log the "invalid_context" field as an untrusted string in the AUDIT_SELINUX_ERR audit records" * tag 'selinux-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: format all invalid context as untrusted selinux: fix empty write to keycreate file selinux: remove some no-op BUG_ONs selinux: provide __le variables explicitly
2019-07-08Merge tag 'audit-pr-20190702' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "This pull request is a bit early, but with some vacation time coming up I wanted to send this out now just in case the remote Internet Gods decide not to smile on me once the merge window opens. The patchset for v5.3 is pretty minor this time, the highlights include: - When the audit daemon is sent a signal, ensure we deliver information about the sender even when syscall auditing is not enabled/supported. - Add the ability to filter audit records based on network address family. - Tighten the audit field filtering restrictions on string based fields. - Cleanup the audit field filtering verification code. - Remove a few BUG() calls from the audit code" * tag 'audit-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: remove the BUG() calls in the audit rule comparison functions audit: enforce op for string fields audit: add saddr_fam filter field audit: re-structure audit field valid checks audit: deliver signal_info regarless of syscall
2019-07-08Merge tag 'tpmdd-next-20190625' of git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds
Pull tpm updates from Jarkko Sakkinen: "This contains two critical bug fixes and support for obtaining TPM events triggered by ExitBootServices(). For the latter I have to give a quite verbose explanation not least because I had to revisit all the details myself to remember what was going on in Matthew's patches. The preboot software stack maintains an event log that gets entries every time something gets hashed to any of the PCR registers. What gets hashed could be a component to be run or perhaps log of some actions taken just to give couple of coarse examples. In general, anything relevant for the boot process that the preboot software does gets hashed and a log entry with a specific event type [1]. The main application for this is remote attestation and the reason why it is useful is nicely put in the very first section of [1]: "Attestation is used to provide information about the platform’s state to a challenger. However, PCR contents are difficult to interpret; therefore, attestation is typically more useful when the PCR contents are accompanied by a measurement log. While not trusted on their own, the measurement log contains a richer set of information than do the PCR contents. The PCR contents are used to provide the validation of the measurement log." Because EFI_TCG2_PROTOCOL.GetEventLog() is not available after calling ExitBootServices(), Linux EFI stub copies the event log to a custom configuration table. Unfortunately, ExitBootServices() also generates events and obviously these events do not get copied to that table. Luckily firmware does this for us by providing a configuration table identified by EFI_TCG2_FINAL_EVENTS_TABLE_GUID. This essentially contains necessary changes to provide the full event log for the use the user space that is concatenated from these two partial event logs [2]" [1] https://trustedcomputinggroup.org/resource/pc-client-specific-platform-firmware-profile-specification/ [2] The final concatenation is done in drivers/char/tpm/eventlog/efi.c * tag 'tpmdd-next-20190625' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm: Don't duplicate events from the final event log in the TCG2 log Abstract out support for locating an EFI config table tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations efi: Attempt to get the TCG2 event log in the boot stub tpm: Append the final event log to the TPM event log tpm: Reserve the TPM final events table tpm: Abstract crypto agile event size calculations tpm: Actually fail on TPM errors during "get random"
2019-07-08Merge branch 'x86-topology-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology updates from Ingo Molnar: "Implement multi-die topology support on Intel CPUs and expose the die topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui. These changes should have no effect on the kernel's existing understanding of topologies, i.e. there should be no behavioral impact on cache, NUMA, scheduler, perf and other topologies and overall system performance" * 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages perf/x86/intel/cstate: Support multi-die/package perf/x86/intel/rapl: Support multi-die/package perf/x86/intel/uncore: Support multi-die/package topology: Create core_cpus and die_cpus sysfs attributes topology: Create package_cpus sysfs attribute hwmon/coretemp: Support multi-die/package powercap/intel_rapl: Update RAPL domain name and debug messages thermal/x86_pkg_temp_thermal: Support multi-die/package powercap/intel_rapl: Support multi-die/package powercap/intel_rapl: Simplify rapl_find_package() x86/topology: Define topology_logical_die_id() x86/topology: Define topology_die_id() cpu/topology: Export die_id x86/topology: Create topology_max_die_per_package() x86/topology: Add CPUID.1F multi-die/package support
2019-07-08Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updayes from Ingo Molnar: "Most of the commits add ACRN hypervisor guest support, plus two cleanups" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/jailhouse: Mark jailhouse_x2apic_available() as __init x86/platform/geode: Drop <linux/gpio.h> includes x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector x86: Add support for Linux guests on an ACRN hypervisor x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol
2019-07-08Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Ingo Molnar: "A handful of paravirt patching code enhancements to make it more robust against patching failures, and related cleanups and not so related cleanups - by Thomas Gleixner and myself" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/paravirt: Rename paravirt_patch_site::instrtype to paravirt_patch_site::type x86/paravirt: Standardize 'insn_buff' variable names x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering x86/paravirt: Replace the paravirt patch asm magic x86/paravirt: Unify the 32/64 bit paravirt patching code x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call() x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns() x86/paravirt: Remove bogus extern declarations
2019-07-08Merge branch 'x86-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 AVX512 status update from Ingo Molnar: "This adds a new ABI that the main scheduler probably doesn't want to deal with but HPC job schedulers might want to use: the AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status file, which allows the user-space job scheduler to cluster such tasks, to avoid turbo frequency drops" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/filesystems/proc.txt: Add arch_status file x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status proc: Add /proc/<pid>/arch_status
2019-07-08Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc small cleanups: removal of superfluous code and coding style cleanups mostly" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Make variable static and config dependent x86/defconfigs: Remove useless UEVENT_HELPER_PATH x86/amd_nb: Make hygon_nb_misc_ids static x86/tsc: Move inline keyword to the beginning of function declarations x86/io_delay: Define IO_DELAY macros in C instead of Kconfig x86/io_delay: Break instead of fallthrough in switch statement
2019-07-08Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control update from Ingo Molnar: "Two cleanup patches" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Cleanup cbm_ensure_valid() x86/resctrl: Use _ASM_BX to avoid ifdeffery
2019-07-08Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Two kbuild enhancements by Masahiro Yamada" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove redundant 'clean-files += capflags.c' x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
2019-07-08Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Most of the changes relate to Peter Zijlstra's cleanup of ptregs handling, in particular the i386 part is now much simplified and standardized - no more partial ptregs stack frames via the esp/ss oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump and kgdb. There's also a CR4 hardening enhancements by Kees Cook, to make the generic platform functions such as native_write_cr4() less useful as ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0 against similar attacks. The rest is smaller cleanups/fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Add int3_emulate_call() selftest x86/stackframe/32: Allow int3_emulate_push() x86/stackframe/32: Provide consistent pt_regs x86/stackframe, x86/ftrace: Add pt_regs frame annotations x86/stackframe, x86/kprobes: Fix frame pointer annotations x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h x86/entry/32: Clean up return from interrupt preemption path x86/asm: Pin sensitive CR0 bits x86/asm: Pin sensitive CR4 bits Documentation/x86: Fix path to entry_32.S x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
2019-07-09xdp: fix race on generic receive pathIlya Maximets
Unlike driver mode, generic xdp receive could be triggered by different threads on different CPU cores at the same time leading to the fill and rx queue breakage. For example, this could happen while sending packets from two processes to the first interface of veth pair while the second part of it is open with AF_XDP socket. Need to take a lock for each generic receive to avoid race. Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Remove the unused per rq load array and all its infrastructure, by Dietmar Eggemann. - Add utilization clamping support by Patrick Bellasi. This is a refinement of the energy aware scheduling framework with support for boosting of interactive and capping of background workloads: to make sure critical GUI threads get maximum frequency ASAP, and to make sure background processing doesn't unnecessarily move to cpufreq governor to higher frequencies and less energy efficient CPU modes. - Add the bare minimum of tracepoints required for LISA EAS regression testing, by Qais Yousef - which allows automated testing of various power management features, including energy aware scheduling. - Restructure the former tsk_nr_cpus_allowed() facility that the -rt kernel used to modify the scheduler's CPU affinity logic such as migrate_disable() - introduce the task->cpus_ptr value instead of taking the address of &task->cpus_allowed directly - by Sebastian Andrzej Siewior. - Misc optimizations, fixes, cleanups and small enhancements - see the Git log for details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched/uclamp: Add uclamp support to energy_compute() sched/uclamp: Add uclamp_util_with() sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks sched/uclamp: Set default clamps for RT tasks sched/uclamp: Reset uclamp values on RESET_ON_FORK sched/uclamp: Extend sched_setattr() to support utilization clamping sched/core: Allow sched_setattr() to use the current policy sched/uclamp: Add system default clamps sched/uclamp: Enforce last task's UCLAMP_MAX sched/uclamp: Add bucket local max tracking sched/uclamp: Add CPU's clamp buckets refcounting sched/fair: Rename weighted_cpuload() to cpu_runnable_load() sched/debug: Export the newly added tracepoints sched/debug: Add sched_overutilized tracepoint sched/debug: Add new tracepoint to track PELT at se level sched/debug: Add new tracepoints to track PELT at rq level sched/debug: Add a new sched_trace_*() helper functions sched/autogroup: Make autogroup_path() always available sched/wait: Deduplicate code with do-while sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ...
2019-07-08Merge branch 'mp-inner-L3'David S. Miller
Stephen Suryaputra says: ==================== net: Multipath hashing on inner L3 This series extends commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") to include support when the outer L3 is IPv6 and to consider the case where the inner L3 is different version from the outer L3, such as IPv6 tunneled by IPv4 GRE or vice versa. It also includes kselftest scripts to test the use cases. v2: Clarify the commit messages in the commits in this series to use the term tunneled by IPv4 GRE or by IPv6 GRE so that it's clear which one is the inner and which one is the outer (per David Miller). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnelStephen Suryaputra
Add selftest scripts for multipath hashing on inner IP pkts when there is a single GRE tunnel but there are multiple underlay routes to reach the other end of the tunnel. Four cases are covered in these scripts: - IPv4 inner, IPv4 outer - IPv6 inner, IPv4 outer - IPv4 inner, IPv6 outer - IPv6 inner, IPv6 outer Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv6: Support multipath hashing on inner IP pktsStephen Suryaputra
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv4: Multipath hashing on inner L3 needs to consider inner IPv6 pktsStephen Suryaputra
Commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner Layer 3 if present, but it only considers inner IPv4. There is a use case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on inner IPv6 addresses. Fixes: 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: pasemi: fix an use-after-free in pasemi_mac_phy_init()Wen Yang
The phy_dn variable is still being used in of_phy_connect() after the of_node_put() call, which may result in use-after-free. Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio infrastructure") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Boris is on vacation so I'm sending the RAS bits this time. The main changes were: - Various RAS/CEC improvements and fixes by Borislav Petkov: - error insertion fixes - offlining latency fix - memory leak fix - additional sanity checks - cleanups - debug output improvements - More SMCA enhancements by Yazen Ghannam: - make banks truly per-CPU which they are in the hardware - don't over-cache certain registers - make the number of MCA banks per-CPU variable The long term goal with these changes is to support future heterogenous SMCA extensions. - Misc fixes and improvements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Do not check return value of debugfs_create functions x86/MCE: Determine MCA banks' init state properly x86/MCE: Make the number of MCA banks a per-CPU variable x86/MCE/AMD: Don't cache block addresses on SMCA systems x86/MCE: Make mce_banks a per-CPU array x86/MCE: Make struct mce_banks[] static RAS/CEC: Add copyright RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there RAS/CEC: Dump the different array element sections RAS/CEC: Rename count_threshold to action_threshold RAS/CEC: Sanity-check array on every insertion RAS/CEC: Fix potential memory leak RAS/CEC: Do not set decay value on error RAS/CEC: Check count_threshold unconditionally RAS/CEC: Fix pfn insertion
2019-07-08net: axienet: fix a potential double free in axienet_probe()Wen Yang
There is a possible use-after-free issue in the axienet_probe(): 1701: np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0); 1702: if (np) { ... 1787: of_node_put(np); ---> released here 1788: lp->eth_irq = platform_get_irq(pdev, 0); 1789: } else { ... 1801: } 1802: if (IS_ERR(lp->dma_regs)) { ... 1805: of_node_put(np); ---> double released here 1806: goto free_netdev; 1807: } We solve this problem by removing the unnecessary of_node_put(). Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: Anirudha Sarangi <anirudh@xilinx.com> Cc: John Linn <John.Linn@xilinx.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Robert Hancock <hancock@sedsystems.ca> Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08sunrpc/cache: remove the exporting of cache_seq_nextDenis Efremov
The function cache_seq_next is declared static and marked EXPORT_SYMBOL_GPL, which is at best an odd combination. Because the function is not used outside of the net/sunrpc/cache.c file it is defined in, this commit removes the EXPORT_SYMBOL_GPL() marking. Fixes: d48cf356a130 ("SUNRPC: Remove non-RCU protected lookup") Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-09selftests/bpf: fix test_reuseport_array on s390Ilya Leoshkevich
Fix endianness issue: passing a pointer to 64-bit fd as a 32-bit key does not work on big-endian architectures. So cast fd to 32-bits when necessary. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08net: stmmac: enable clause 45 mdio supportKweh Hock Leong
DWMAC4 is capable to support clause 45 mdio communication. This patch enable the feature on stmmac_mdio_write() and stmmac_mdio_read() by following phy_write_mmd() and phy_read_mmd() mdiobus read write implementation format. Reviewed-by: Li, Yifan <yifan2.li@intel.com> Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: openvswitch: use netif_ovs_is_port() instead of opencodeTaehee Yoo
Use netif_ovs_is_port() function instead of open code. This patch doesn't change logic. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08MAINTAINERS: Add page_pool maintainer entryJesper Dangaard Brouer
In this release cycle the number of NIC drivers using page_pool will likely reach 4 drivers. It is about time to add a maintainer entry. Add myself and Ilias. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'mvpp2-cls-ether'David S. Miller
Maxime Chevallier says: ==================== net: mvpp2: Add classification based on the ETHER flow This series adds support for classification of the ETHER flow in the mvpp2 driver. The first patch allows detecting when a user specifies a flow_type that isn't supported by the driver, while the second adds support for this flow_type by adding the mapping between the ETHER_FLOW enum value and the relevant classifier flow entries. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: mvpp2: cls: Add support for ETHER_FLOWMaxime Chevallier
Users can specify classification actions based on the 'ether' flow type. In that case, this will apply to all ethernet traffic, superseeding flows such as 'udp4' or 'tcp6'. Add support for this flow type in the PPv2 classifier, by mapping the ETHER_FLOW value to the corresponding entries in the classifier. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: mvpp2: cls: Report an error for unsupported flow typesMaxime Chevallier
Add a missing check to detect flow types that we don't support, so that user can be informed of this. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The changes in this cycle are: - RCU flavor consolidation cleanups and optmizations - Documentation updates - Miscellaneous fixes - SRCU updates - RCU-sync flavor consolidation - Torture-test updates - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) tools/memory-model: Improve data-race detection tools/memory-model: Change definition of rcu-fence tools/memory-model: Expand definition of barrier tools/memory-model: Do not use "herd" to refer to "herd7" tools/memory-model: Fix comment in MP+poonceonces.litmus Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() rcu: Don't return a value from rcu_assign_pointer() rcu: Force inlining of rcu_read_lock() rcu: Fix irritating whitespace error in rcu_assign_pointer() rcu: Upgrade sync_exp_work_done() to smp_mb() rcutorture: Upper case solves the case of the vanishing NULL pointer torture: Suppress propagating trace_printk() warning rcutorture: Dump trace buffer for callback pipe drain failures torture: Add --trust-make to suppress "make clean" torture: Make --cpus override idleness calculations torture: Run kernel build in source directory torture: Add function graph-tracing cheat sheet torture: Capture qemu output rcutorture: Tweak kvm options rcutorture: Add trivial RCU implementation ...
2019-07-08selftests: txring_overwrite: fix incorrect test of mmap() return valueFrank de Brabander
If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1). The current if-statement incorrectly tests if *ring is NULL. Fixes: 358be656406d ("selftests/net: add txring_overwrite") Signed-off-by: Frank de Brabander <debrabander@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'vsock-virtio-fixes'David S. Miller
Stefano Garzarella says: ==================== vsock/virtio: several fixes in the .probe() and .remove() During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock before registering the driver", Stefan pointed out some possible issues in the .probe() and .remove() callbacks of the virtio-vsock driver. This series tries to solve these issues: - Patch 1 adds RCU critical sections to avoid use-after-free of 'the_virtio_vsock' pointer. - Patch 2 stops workers before to call vdev->config->reset(vdev) to be sure that no one is accessing the device. - Patch 3 moves the works flush at the end of the .remove() to avoid use-after-free of 'vsock' object. v3: - Patch 1: use rcu_dereference_protected() to get the_virtio_vosck value in the virtio_vsock_probe() [Jason] v2: https://patchwork.kernel.org/cover/11022343/ v1: https://patchwork.kernel.org/cover/10964733/ Before this series the guest crashes in a few second. After this series the test runs (~12h) without issues. Tested on an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait) with these scripts to stress the .probe()/.remove() path: - guest while true; do cat /dev/urandom | nc-vsock -l 4321 > /dev/null & cat /dev/urandom | nc-vsock -l 5321 > /dev/null & cat /dev/urandom | nc-vsock -l 6321 > /dev/null & cat /dev/urandom | nc-vsock -l 7321 > /dev/null & wait done - host while true; do cat /dev/urandom | nc-vsock 3 4321 > /dev/null & cat /dev/urandom | nc-vsock 3 5321 > /dev/null & cat /dev/urandom | nc-vsock 3 6321 > /dev/null & cat /dev/urandom | nc-vsock 3 7321 > /dev/null & sleep 2 echo "device_del v1" | nc 127.0.0.1 1234 sleep 1 echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234 sleep 1 done ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: fix flush of works during the .remove()Stefano Garzarella
This patch moves the flush of works after vdev->config->del_vqs(vdev), because we need to be sure that no workers run before to free the 'vsock' object. Since we stopped the workers using the [tx|rx|event]_run flags, we are sure no one is accessing the device while we are calling vdev->config->reset(vdev), so we can safely move the workers' flush. Before the vdev->config->del_vqs(vdev), workers can be scheduled by VQ callbacks, so we must flush them after del_vqs(), to avoid use-after-free of 'vsock' object. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: stop workers during the .remove()Stefano Garzarella
Before to call vdev->config->reset(vdev) we need to be sure that no one is accessing the device, for this reason, we add new variables in the struct virtio_vsock to stop the workers during the .remove(). This patch also add few comments before vdev->config->reset(vdev) and vdev->config->del_vqs(vdev). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsockStefano Garzarella
Some callbacks used by the upper layers can run while we are in the .remove(). A potential use-after-free can happen, because we free the_virtio_vsock without knowing if the callbacks are over or not. To solve this issue we move the assignment of the_virtio_vsock at the end of .probe(), when we finished all the initialization, and at the beginning of .remove(), before to release resources. For the same reason, we do the same also for the vdev->priv. We use RCU to be sure that all callbacks that use the_virtio_vsock ended before freeing it. This is not required for callbacks that use vdev->priv, because after the vdev->config->del_vqs() we are sure that they are ended and will no longer be invoked. We also take the mutex during the .remove() to avoid that .probe() can run while we are resetting the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'b53-docs'David S. Miller
Benedikt Spranger says: ==================== Document the configuration of b53 this is the third round to document the configuration of a b53 supported switch. v3..v2: - fix a typo - improve b53 configuration in DSA_TAG_PROTO_NONE showcase. - grade up from RFC to patch for mainline inclusion. v1..v2: - split out generic parts of the configuration. - target comments by Andrew Lunn and Florian Fainelli. - make changes visible to build system ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Documentation: net: dsa: b53: Describe b53 configurationBenedikt Spranger
Document the different needs of documentation for the b53 driver. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Documentation: net: dsa: Describe DSA switch configurationBenedikt Spranger
Document DSA tagged and VLAN based switch configuration by showcases. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>