summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-04-21ipv6: Simplify rt6_qualify_for_ecmpDavid Ahern
After commit c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create"), the gateway is no longer filled in for fib6_nh structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can be dropped from the 'rt6_qualify_for_ecmp'. Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be removed from the check as well. This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking if the nexthop has a gateway which is the real indication of whether entries can be coalesced into a multipath route. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-21uapi/habanalabs: add missing fields in bmon paramsOded Gabbay
This patch adds missing fields of start address 0 and 1 in the bmon parameter structure that is received from the user in the debug IOCTL. Without these fields, the functionality of the bmon trace is broken, because there is no configuration of the base address of the filter of the bus monitor. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-20Merge tag 'for-linus-20190420' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A set of small fixes that should go into this series. This contains: - Removal of unused queue member (Hou) - Overflow bvec fix (Ming) - Various little io_uring tweaks (me) - kthread parking - Only call cpu_possible() for verified CPU - Drop unused 'file' argument to io_file_put() - io_uring_enter vs io_uring_register deadlock fix - CQ overflow fix - BFQ internal depth update fix (me)" * tag 'for-linus-20190420' of git://git.kernel.dk/linux-block: block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue io_uring: fix CQ overflow condition io_uring: fix possible deadlock between io_uring_{enter,register} io_uring: drop io_file_put() 'file' argument bfq: update internal depth state when queue depth changes io_uring: only test SQPOLL cpu after we've verified it io_uring: park SQPOLL thread if it's percpu
2019-04-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
2019-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: a console spam fix, section attributes fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot quirk, boot options related warnings fix, an LTO fix, a deadlock fix and an RDT fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority x86/cpu/bugs: Use __initconst for 'const' init data x86/mm/KASLR: Fix the size of the direct mapping section x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T x86/mm: Prevent bogus warnings with "noexec=off" x86/build/lto: Fix truncated .bss with -fdata-sections x86/speculation: Prevent deadlock on ssb_state::lock x86/resctrl: Do not repeat rdtgroup mode initialization
2019-04-19random: move rand_initialize() earlierKees Cook
Right now rand_initialize() is run as an early_initcall(), but it only depends on timekeeping_init() (for mixing ktime_get_real() into the pools). However, the call to boot_init_stack_canary() for stack canary initialization runs earlier, which triggers a warning at boot: random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0 Instead, this moves rand_initialize() to after timekeeping_init(), and moves canary initialization here as well. Note that this warning may still remain for machines that do not have UEFI RNG support (which initializes the RNG pools during setup_arch()), or for x86 machines without RDRAND (or booting without "random.trust=on" or CONFIG_RANDOM_TRUST_CPU=y). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-19tipc: introduce new socket option TIPC_SOCK_RECVQ_USEDTung Nguyen
When using TIPC_SOCK_RECVQ_DEPTH for getsockopt(), it returns the number of buffers in receive socket buffer which is not so helpful for user space applications. This commit introduces the new option TIPC_SOCK_RECVQ_USED which returns the current allocated bytes of the receive socket buffer. This helps user space applications dimension its buffer usage to avoid buffer overload issue. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19net: socket: implement 64-bit timestampsArnd Bergmann
The 'timeval' and 'timespec' data structures used for socket timestamps are going to be redefined in user space based on 64-bit time_t in future versions of the C library to deal with the y2038 overflow problem, which breaks the ABI definition. Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not use the _IOR() macro to encode the size of the transferred data, so it remains ambiguous whether the application uses the old or new layout. The best workaround I could find is rather ugly: we redefine the command code based on the size of the respective data structure with a ternary operator. This lets it get evaluated as late as possible, hopefully after that structure is visible to the caller. We cannot use an #ifdef here, because inux/sockios.h might have been included before any libc header that could determine the size of time_t. The ioctl implementation now interprets the new command codes as always referring to the 64-bit structure on all architectures, while the old architecture specific command code still refers to the old architecture specific layout. The new command number is only used when they are actually different. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19net: rework SIOCGSTAMP ioctl handlingArnd Bergmann
The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many socket protocol handlers, and all of those end up calling the same sock_get_timestamp()/sock_get_timestampns() helper functions, which results in a lot of duplicate code. With the introduction of 64-bit time_t on 32-bit architectures, this gets worse, as we then need four different ioctl commands in each socket protocol implementation. To simplify that, let's add a new .gettstamp() operation in struct proto_ops, and move ioctl implementation into the common sock_ioctl()/compat_sock_ioctl_trans() functions that these all go through. We can reuse the sock_get_timestamp() implementation, but generalize it so it can deal with both native and compat mode, as well as timeval and timespec structures. Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19vlan: support binding link state to vlan member bridge portsMike Manning
In the case of vlan filtering on bridges, the bridge may also have the corresponding vlan devices as upper devices. Currently the link state of vlan devices is transferred from the lower device. So this is up if the bridge is in admin up state and there is at least one bridge port that is up, regardless of the vlan that the port is a member of. The link state of the vlan device may need to track only the state of the subset of ports that are also members of the corresponding vlan, rather than that of all ports. Add a flag to specify a vlan bridge binding mode, by which the link state is no longer automatically transferred from the lower device, but is instead determined by the bridge ports that are members of the vlan. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19USB: core: Fix bug caused by duplicate interface PM usage counterAlan Stern
The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping mm/kmemleak.c: fix unused-function warning init: initialize jump labels before command line option parsing kernel/watchdog_hld.c: hard lockup message should end with a newline kcov: improve CONFIG_ARCH_HAS_KCOV help text mm: fix inactive list balancing between NUMA nodes and cgroups mm/hotplug: treat CMA pages as unmovable proc: fixup proc-pid-vm test proc: fix map_files test on F29 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock mm: swapoff: shmem_unuse() stop eviction without igrab() mm: swapoff: take notice of completion sooner mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES mm: swapoff: shmem_find_swap_entries() filter out other types slab: store tagged freelist for off-slab slabmgmt
2019-04-19block: make sure that bvec length can't be overflowMing Lei
bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as, when one bio is splitted in the middle of one bvec via bio_split(), and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of remained bio. And the remained bio's bvec may be re-submitted to fs layer via ITER_IBVEC, such as loop and nvme-loop. So we have to make sure that every bvec's offset is less than PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop, nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC. This patch fixes this issue reported by Zhang Yi When running nvme/011. Cc: Christoph Hellwig <hch@lst.de> Cc: Yi Zhang <yi.zhang@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19block: kill all_q_node in request_queueHou Tao
all_q_node has not been used since commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU"), so remove it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - several new key mappings for HID - a host of new ACPI IDs used to identify Elan touchpads in Lenovo laptops * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ HID: input: add mapping for "Toggle Display" key HID: input: add mapping for "Full Screen" key HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys HID: input: add mapping for Expose/Overview key HID: input: fix mapping of aspect ratio key [media] doc-rst: switch to new names for Full Screen/Aspect keys Input: document meanings of KEY_SCREEN and KEY_ZOOM Input: elan_i2c - add hardware ID for multiple Lenovo laptops
2019-04-19coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19mm: swapoff: shmem_unuse() stop eviction without igrab()Hugh Dickins
The igrab() in shmem_unuse() looks good, but we forgot that it gives no protection against concurrent unmounting: a point made by Konstantin Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78 ("tmpfs: fix race between umount and swapoff"). The current 5.1-rc swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day..." followed by GPF. Once again, give up on using igrab(); but don't go back to making such heavy-handed use of shmem_swaplist_mutex as last time: that would spoil the new design, and I expect could deadlock inside shmem_swapin_page(). Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem- specific inode, and shmem_evict_inode() wait for that to go down to 0. Call it "stop_eviction" rather than "swapoff_busy" because it can be put to use for others later (huge tmpfs patches expect to use it). That simplifies shmem_unuse(), protecting it from both unlink and unmount; and in practice lets it locate all the swap in its first try. But do not rely on that: there's still a theoretical case, when shmem_writepage() might have been preempted after its get_swap_page(), before making the swap entry visible to swapoff. [hughd@google.com: remove incorrect list_del()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Cc: Huang Ying <ying.huang@intel.com> Cc: Kelley Nielsen <kelleynnn@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rik van Riel <riel@surriel.com> Cc: Vineeth Pillai <vpillai@digitalocean.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19drm/ttm: fix re-init of global structuresChristian König
When a driver unloads without unloading TTM we don't correctly clear the global structures leading to errors on re-init. Next step should probably be to remove the global structures and kobjs all together, but this is tricky since we need to maintain backward compatibility. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Tested-by: Karol Herbst <kherbst@redhat.com> CC: stable@vger.kernel.org # 5.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-19vt: selection: allow functions to be called from inside kernelOkash Khawaja
This patch breaks set_selection() into two functions so that when called from kernel, copy_from_user() can be avoided. The two functions are called set_selection_user() and set_selection_kernel() in order to be explicit about their purposes. This also means updating any references to set_selection() and fixing for name change. It also exports set_selection_kernel() and paste_selection(). These changes are used the following patch where speakup's selection functionality calls into the above functions, thereby doing away with parallel implementation. Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Tested-by: Gregory Nowak <greg@gregn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19x86/kprobes: Verify stack frame on kretprobeMasami Hiramatsu
Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19usb: typec: tcpm: Notify the tcpc to start connection-detection for SRPsHans de Goede
Some tcpc device-drivers need to explicitly be told to watch for connection events, otherwise the tcpc will not generate any TCPM_CC_EVENTs and devices being plugged into the Type-C port will not be noticed. For dual-role ports tcpm_start_drp_toggling() is used to tell the tcpc to watch for connection events. Sofar we lack a similar callback to the tcpc for single-role ports. With some tcpc-s such as the fusb302 this means no TCPM_CC_EVENTs will be generated when the port is configured as a single-role port. This commit renames start_drp_toggling to start_toggling and since the device-properties are parsed by the tcpm-core, adds a port_type parameter to the start_toggling callback so that the tcpc_dev driver knows the port-type and can act accordingly when it starts toggling. The new start_toggling callback now always gets called if defined, instead of only being called for DRP ports. To avoid this causing undesirable functional changes all existing start_drp_toggling implementations are not only renamed to start_toggling, but also get a port_type check added and return -EOPNOTSUPP when port_type is not DRP. Fixes: ea3b4d5523bc("usb: typec: fusb302: Resolve fixed power role ...") Cc: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19rseq: Remove superfluous rseq_len from task_structMathieu Desnoyers
The rseq system call, when invoked with flags of "0" or "RSEQ_FLAG_UNREGISTER" values, expects the rseq_len parameter to be equal to sizeof(struct rseq), which is fixed-size and fixed-layout, specified in uapi linux/rseq.h. Expecting a fixed size for rseq_len is a design choice that ensures multiple libraries and application defining __rseq_abi in the same process agree on its exact size. Considering that this size is and will always be the same value, there is no point in saving this value within task_struct rseq_len. Remove this field from task_struct. No change in functionality intended. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20190305194755.2602-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18net: mdio: rename mdio_device reset to reset_gpioDavid Bauer
This renames the GPIO reset of mdio devices from 'reset' to 'reset_gpio' to better differentiate between GPIO and reset-controller driven reset line. Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18net: phy: add support for reset-controllerDavid Bauer
This commit adds support for PHY reset pins handled by a reset controller. Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18net: skb: remove unused assertsJakub Kicinski
We are discouraging the use of BUG() these days, remove the unused ASSERT macros from skbuff.h. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18ipv6: Add rate limit mask for ICMPv6 messagesStephen Suryaputra
To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP message types use larger numeric values, a simple bitmask doesn't fit. I use large bitmap. The input and output are the in form of list of ranges. Set the default to rate limit all error messages but Packet Too Big. For Packet Too Big, use ratemask instead of hard-coded. There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow() aren't called. This patch only adds them to icmpv6_echo_reply(). Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says that it is also acceptable to rate limit informational messages. Thus, I removed the current hard-coded behavior of icmpv6_mask_allow() that doesn't rate limit informational messages. v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL isn't defined, expand the description in ip-sysctl.txt and remove unnecessary conditional before kfree(). v3: Inline the bitmap instead of dynamically allocated. Still is a pointer to it is needed because of the way proc_do_large_bitmap work. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18device property: Add fwnode_graph_get_endpoint_by_id()Sakari Ailus
fwnode_graph_get_endpoint_by_id() is intended for obtaining local endpoints by a given local port. fwnode_graph_get_endpoint_by_id() is slightly different from its OF counterpart, of_graph_get_endpoint_by_regs(): instead of using -1 as a value to indicate that a port or an endpoint number does not matter, it uses flags to look for equal or greater endpoint. The port number is always fixed. It also returns only remote endpoints that belong to an available device, a behaviour that can be turned off with a flag. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-18crypto: cryptd - remove ability to instantiate ablkciphersEric Biggers
Remove cryptd_alloc_ablkcipher() and the ability of cryptd to create algorithms with the deprecated "ablkcipher" type. This has been unused since commit 0e145b477dea ("crypto: ablk_helper - remove ablk_helper"). Instead, cryptd_alloc_skcipher() is used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithmVitaly Chikunov
Add Elliptic Curve Russian Digital Signature Algorithm (GOST R 34.10-2012, RFC 7091, ISO/IEC 14888-3) is one of the Russian (and since 2018 the CIS countries) cryptographic standard algorithms (called GOST algorithms). Only signature verification is supported, with intent to be used in the IMA. Summary of the changes: * crypto/Kconfig: - EC-RDSA is added into Public-key cryptography section. * crypto/Makefile: - ecrdsa objects are added. * crypto/asymmetric_keys/x509_cert_parser.c: - Recognize EC-RDSA and Streebog OIDs. * include/linux/oid_registry.h: - EC-RDSA OIDs are added to the enum. Also, a two currently not implemented curve OIDs are added for possible extension later (to not change numbering and grouping). * crypto/ecc.c: - Kenneth MacKay copyright date is updated to 2014, because vli_mmod_slow, ecc_point_add, ecc_point_mult_shamir are based on his code from micro-ecc. - Functions needed for ecrdsa are EXPORT_SYMBOL'ed. - New functions: vli_is_negative - helper to determine sign of vli; vli_from_be64 - unpack big-endian array into vli (used for a signature); vli_from_le64 - unpack little-endian array into vli (used for a public key); vli_uadd, vli_usub - add/sub u64 value to/from vli (used for increment/decrement); mul_64_64 - optimized to use __int128 where appropriate, this speeds up point multiplication (and as a consequence signature verification) by the factor of 1.5-2; vli_umult - multiply vli by a small value (speeds up point multiplication by another factor of 1.5-2, depending on vli sizes); vli_mmod_special - module reduction for some form of Pseudo-Mersenne primes (used for the curves A); vli_mmod_special2 - module reduction for another form of Pseudo-Mersenne primes (used for the curves B); vli_mmod_barrett - module reduction using pre-computed value (used for the curve C); vli_mmod_slow - more general module reduction which is much slower (used when the modulus is subgroup order); vli_mod_mult_slow - modular multiplication; ecc_point_add - add two points; ecc_point_mult_shamir - add two points multiplied by scalars in one combined multiplication (this gives speed up by another factor 2 in compare to two separate multiplications). ecc_is_pubkey_valid_partial - additional samity check is added. - Updated vli_mmod_fast with non-strict heuristic to call optimal module reduction function depending on the prime value; - All computations for the previously defined (two NIST) curves should not unaffected. * crypto/ecc.h: - Newly exported functions are documented. * crypto/ecrdsa_defs.h - Five curves are defined. * crypto/ecrdsa.c: - Signature verification is implemented. * crypto/ecrdsa_params.asn1, crypto/ecrdsa_pub_key.asn1: - Templates for BER decoder for EC-RDSA parameters and public key. Cc: linux-integrity@vger.kernel.org Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18X.509: parse public key parameters from x509 for akcipherVitaly Chikunov
Some public key algorithms (like EC-DSA) keep in parameters field important data such as digest and curve OIDs (possibly more for different EC-DSA variants). Thus, just setting a public key (as for RSA) is not enough. Append parameters into the key stream for akcipher_set_{pub,priv}_key. Appended data is: (u32) algo OID, (u32) parameters length, parameters data. This does not affect current akcipher API nor RSA ciphers (they could ignore it). Idea of appending parameters to the key stream is by Herbert Xu. Cc: David Howells <dhowells@redhat.com> Cc: Denis Kenzior <denkenz@gmail.com> Cc: keyrings@vger.kernel.org Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18crypto: akcipher - new verify API for public key algorithmsVitaly Chikunov
Previous akcipher .verify() just `decrypts' (using RSA encrypt which is using public key) signature to uncover message hash, which was then compared in upper level public_key_verify_signature() with the expected hash value, which itself was never passed into verify(). This approach was incompatible with EC-DSA family of algorithms, because, to verify a signature EC-DSA algorithm also needs a hash value as input; then it's used (together with a signature divided into halves `r||s') to produce a witness value, which is then compared with `r' to determine if the signature is correct. Thus, for EC-DSA, nor requirements of .verify() itself, nor its output expectations in public_key_verify_signature() wasn't sufficient. Make improved .verify() call which gets hash value as input and produce complete signature check without any output besides status. Now for the top level verification only crypto_akcipher_verify() needs to be called and its return value inspected. Make sure that `digest' is in kmalloc'd memory (in place of `output`) in {public,tpm}_key_verify_signature() as insisted by Herbert Xu, and will be changed in the following commit. Cc: David Howells <dhowells@redhat.com> Cc: keyrings@vger.kernel.org Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18crypto: des_generic - Forbid 2-key in 3DES and add helpersHerbert Xu
This patch adds a requirement to the generic 3DES implementation such that 2-key 3DES (K1 == K3) is no longer allowed in FIPS mode. We will also provide helpers that may be used by drivers that implement 3DES to make the same check. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM commits from Paul E. McKenney: - An LKMM commit adding support for synchronize_srcu_expedited() - A couple of straggling RCU flavor consolidation updates - Documentation updates. - Miscellaneous fixes - SRCU updates - RCU CPU stall-warning updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18locking/lockdep: Avoid bogus Clang warningArnd Bergmann
When lockdep is enabled, and -Wuninitialized warnings are enabled, Clang produces a silly warning for every file we compile: In file included from kernel/sched/fair.c:23: kernel/sched/sched.h:1094:15: error: variable 'cookie' is uninitialized when used here [-Werror,-Wuninitialized] rf->cookie = lockdep_pin_lock(&rq->lock); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/lockdep.h:474:60: note: expanded from macro 'lockdep_pin_lock' #define lockdep_pin_lock(l) ({ struct pin_cookie cookie; cookie; }) ^~~~~~ kernel/sched/sched.h:1094:15: note: variable 'cookie' is declared here include/linux/lockdep.h:474:34: note: expanded from macro 'lockdep_pin_lock' #define lockdep_pin_lock(l) ({ struct pin_cookie cookie; cookie; }) ^ As the 'struct pin_cookie' structure is empty in this configuration, there is no need to initialize it for correctness, but it also does not hurt to set it to an empty structure, so do that to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: clang-built-linux@googlegroups.com Link: http://lkml.kernel.org/r/20190325125807.1437049-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-17net ipv6: Prevent neighbor add if protocol is disabled on deviceDavid Ahern
Disabling IPv6 on an interface removes existing entries but nothing prevents new entries from being manually added. To that end, add a new neigh_table operation, allow_add, that is called on RTM_NEWNEIGH to see if neighbor entries are allowed on a given device. If IPv6 is disabled on the device, allow_add returns false and passes a message back to the user via extack. $ echo 1 > /proc/sys/net/ipv6/conf/eth1/disable_ipv6 $ ip -6 neigh add fe80::4c88:bff:fe21:2704 dev eth1 lladdr de:ad:be:ef:01:01 Error: IPv6 is disabled on this device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18IB/mlx5: Fix scatter to CQE in DCT QP creationGuy Levi
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-17ipv6: Add fib6_type and fib6_flags to fib6_resultDavid Ahern
Add the fib6_flags and fib6_type to fib6_result. Update the lookup helpers to set them and update post fib lookup users to use the version from the result. This allows nexthop objects to have blackhole nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to fib lookupsDavid Ahern
Change fib6_lookup and fib6_table_lookup to take a fib6_result and set f6i and nh rather than returning a fib6_info. For now both always return 0. A later patch set can make these more like the IPv4 counterparts and return EINVAL, EACCESS, etc based on fib6_type. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to fib6_table_lookup tracepointDavid Ahern
Change fib6_table_lookup tracepoint to take the fib6_result and use the fib6_info and fib6_nh from it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to ip6_mtu_from_fib6 and fib6_mtuDavid Ahern
Change ip6_mtu_from_fib6 and fib6_mtu to take a fib6_result over a fib6_info. Update both to use the fib6_nh from fib6_result. Since the signature of ip6_mtu_from_fib6 is already changing, add const to daddr and saddr. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Rename fib6_multipath_select and pass fib6_resultDavid Ahern
Add 'struct fib6_result' to hold the fib entry and fib6_nh from a fib lookup as separate entries, similar to what IPv4 now has with fib_result. Rename fib6_multipath_select to fib6_select_path, pass fib6_result to it, and set f6i and nh in the result once a path selection is done. Call fib6_select_path unconditionally for path selection which means moving the sibling and oif check to fib6_select_path. To handle the two different call paths (2 only call multipath_select if flowi6_oif == 0 and the other always calls it), add a new have_oif_match that controls the sibling walk if relevant. Update callers of fib6_multipath_select accordingly and have them use the fib6_info and fib6_nh from the result. This is needed for multipath nexthop objects where a single f6i can point to multiple fib6_nh (similar to IPv4). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net: core: introduce build_skb_aroundJesper Dangaard Brouer
The function build_skb() also have the responsibility to allocate and clear the SKB structure. Introduce a new function build_skb_around(), that moves the responsibility of allocation and clearing to the caller. This allows caller to use kmem_cache (slab/slub) bulk allocation API. Next patch use this function combined with kmem_cache_alloc_bulk. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17cpu/speculation: Add 'mitigations=' cmdline optionJosh Poimboeuf
Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17docs: hwmon: Add an index file and rename docs to *.rstMauro Carvalho Chehab
Now that all files were converted to ReST format, rename them and add an index. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle init flow failures properly in iwlwifi driver, from Shahar S Matityahu. 2) mac80211 TXQs need to be unscheduled on powersave start, from Felix Fietkau. 3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau. 4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed. 5) Avoid checksum complete with XDP in mlx5, also from Saeed. 6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon. 7) Partial sent TLS record leak fix from Jakub Kicinski. 8) Reject zero size iova range in vhost, from Jason Wang. 9) Allow pending work to complete before clcsock release from Karsten Graul. 10) Fix XDP handling max MTU in thunderx, from Matteo Croce. 11) A lot of protocols look at the sa_family field of a sockaddr before validating it's length is large enough, from Tetsuo Handa. 12) Don't write to free'd pointer in qede ptp error path, from Colin Ian King. 13) Have to recompile IP options in ipv4_link_failure because it can be invoked from ARP, from Stephen Suryaputra. 14) Doorbell handling fixes in qed from Denis Bolotin. 15) Revert net-sysfs kobject register leak fix, it causes new problems. From Wang Hai. 16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva. 17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits) socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW tcp: tcp_grow_window() needs to respect tcp_space() ocelot: Clean up stats update deferred work ocelot: Don't sleep in atomic context (irqs_disabled()) net: bridge: fix netlink export of vlan_stats_per_port option qed: fix spelling mistake "faspath" -> "fastpath" tipc: set sysctl_tipc_rmem and named_timeout right range tipc: fix link established but not in session net: Fix missing meta data in skb with vlan packet net: atm: Fix potential Spectre v1 vulnerabilities net/core: work around section mismatch warning for ptp_classifier net: bridge: fix per-port af_packet sockets bnx2x: fix spelling mistake "dicline" -> "decline" route: Avoid crash from dereferencing NULL rt->from MAINTAINERS: normalize Woojung Huh's email address bonding: fix event handling for stacked bonds Revert "net-sysfs: Fix memory leak in netdev_register_kobject" rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check qed: Fix the DORQ's attentions handling qed: Fix missing DORQ attentions ...
2019-04-17fscrypt: cache decrypted symlink target in ->i_linkEric Biggers
Path lookups that traverse encrypted symlink(s) are very slow because each encrypted symlink needs to be decrypted each time it's followed. This also involves dropping out of rcu-walk mode. Make encrypted symlinks faster by caching the decrypted symlink target in ->i_link. The first call to fscrypt_get_symlink() sets it. Then, the existing VFS path lookup code uses the non-NULL ->i_link to take the fast path where ->get_link() isn't called, and lookups in rcu-walk mode remain in rcu-walk mode. Also set ->i_link immediately when a new encrypted symlink is created. To safely free the symlink target after an RCU grace period has elapsed, introduce a new function fscrypt_free_inode(), and make the relevant filesystems call it just before actually freeing the inode. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-17random: only read from /dev/random after its pool has received 128 bitsTheodore Ts'o
Immediately after boot, we allow reads from /dev/random before its entropy pool has been fully initialized. Fix this so that we don't allow this until the blocking pool has received 128 bits. We do this by repurposing the initialized flag in the entropy pool struct, and use the initialized flag in the blocking pool to indicate whether it is safe to pull from the blocking pool. To do this, we needed to rework when we decide to push entropy from the input pool to the blocking pool, since the initialized flag for the input pool was used for this purpose. To simplify things, we no longer use the initialized flag for that purpose, nor do we use the entropy_total field any more. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-17fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertextEric Biggers
->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-17fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directoryEric Biggers
Make __d_move() clear DCACHE_ENCRYPTED_NAME on the source dentry. This is needed for when d_splice_alias() moves a directory's encrypted alias to its decrypted alias as a result of the encryption key being added. Otherwise, the decrypted alias will incorrectly be invalidated on the next lookup, causing problems such as unmounting a mount the user just mount()ed there. Note that we don't have to support arbitrary moves of this flag because fscrypt doesn't allow dentries with DCACHE_ENCRYPTED_NAME to be the source or target of a rename(). Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Reported-by: Sarthak Kukreti <sarthakkukreti@chromium.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>