linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-02-01	Merge back earlier cpuidle material for v5.1.	Rafael J. Wysocki

2019-02-01	Merge back earlier PM core material for v5.1.	Rafael J. Wysocki

2019-02-01	driver core: Add device link flag DL_FLAG_AUTOPROBE_CONSUMER	Rafael J. Wysocki
	Add a new device link flag, DL_FLAG_AUTOPROBE_CONSUMER, to request the driver core to probe for a consumer driver automatically after binding a driver to the supplier device on a persistent managed device link. As unbinding the supplier driver on a managed device link causes the consumer driver to be detached from its device automatically, this flag provides a complementary mechanism which is needed to address some "composite device" use cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-01	driver core: Fix handling of runtime PM flags in device_link_add()	Rafael J. Wysocki
	After commit ead18c23c263 ("driver core: Introduce device links reference counting"), if there is a link between the given supplier and the given consumer already, device_link_add() will refcount it and return it unconditionally without updating its flags. It is possible, however, that the second (or any subsequent) caller of device_link_add() for the same consumer-supplier pair will pass DL_FLAG_PM_RUNTIME, possibly along with DL_FLAG_RPM_ACTIVE, in flags to it and the existing link may not behave as expected then. First, if DL_FLAG_PM_RUNTIME is not set in the existing link's flags at all, it needs to be set like during the original initialization of the link. Second, if DL_FLAG_RPM_ACTIVE is passed to device_link_add() in flags (in addition to DL_FLAG_PM_RUNTIME), the existing link should to be updated to reflect the "active" runtime PM configuration of the consumer-supplier pair and extra care must be taken here to avoid possible destructive races with runtime PM of the consumer. To that end, redefine the rpm_active field in struct device_link as a refcount, initialize it to 1 and make rpm_resume() (for the consumer) and device_link_add() increment it whenever they acquire a runtime PM reference on the supplier device. Accordingly, make rpm_suspend() (for the consumer) and pm_runtime_clean_up_links() decrement it and drop runtime PM references to the supplier device in a loop until rpm_active becones 1 again. Fixes: ead18c23c263 ("driver core: Introduce device links reference counting") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-01	dma-mapping: don't BUG when calling dma_map_resource on RAM	Christoph Hellwig
	Use WARN_ON_ONCE to print a stack trace and return a proper error code instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
2019-02-01	dma-mapping: remove the default map_resource implementation	Christoph Hellwig
	Instead provide a proper implementation in the direct mapping code, and also wire it up for arm and powerpc, leaving an error return for all the IOMMU or virtual mapping instances for which we'd have to wire up an actual implementation Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
2019-02-01	mfd: tps65218.c: Add input voltage options	Christian Hohnstaedt
	These options apply to all regulators in this chip. ti,strict-supply-voltage-supervision: Set STRICT flag in CONFIG1 ti,under-voltage-limit-microvolt: Select 2.75, 2.95, 3.25 or 3.35 V UVLO in CONFIG1 ti,under-voltage-hyst-microvolt: Select 200mV or 400mV UVLOHYS in CONFIG2 Signed-off-by: Christian Hohnstaedt <Christian.Hohnstaedt@wago.com> Tested-by: Keerthy <j-keerthy@ti.com> Reviewed-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd: wm8350-core: Drop unused module infrastructure from non-modular code	Paul Gortmaker
	The Kconfig currently controlling compilation of this code is: drivers/mfd/Kconfig:config MFD_WM8350 drivers/mfd/Kconfig: bool ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. We replace module.h with init.h and export.h ; the latter since the file does export some symbols. Previous demodularizaion work has made wm8350_device_exit() no longer used, so it is also removed from the 8350 core code. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd: wm831x-core: Drop unused module infrastructure from non-modular code	Paul Gortmaker
	The Kconfig currently controlling compilation of this code is: drivers/mfd/Kconfig:config MFD_WM831X drivers/mfd/Kconfig: bool ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Previous demodularizaion work has made wm831x_device_exit() no longer used, so it is also removed from the 831x core code. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	Merge branches 'ib-mfd-iio-input-5.1', 'ib-mfd-input-watchdog-5.1' and ↵	Lee Jones
	'ib-mfd-platform-5.1' into ibs-for-mfd-merged
2019-02-01	mfd / platform: cros_ec: Move device sysfs attributes to its own driver	Enric Balletbo i Serra
	The entire way how cros debugfs attibutes are created is broken. cros_ec_sysfs should be its own driver and its attributes should be associated with the sysfs driver not the mfd driver. The patch also adds the sysfs documentation. Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd / platform: cros_ec: Move debugfs attributes to its own driver	Enric Balletbo i Serra
	The entire way how cros debugfs attibutes are created is broken. cros_ec_debugfs should be its own driver and its attributes should be associated with a debugfs driver not the mfd driver. Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd / platform: cros_ec: Move vbc attributes to its own driver	Enric Balletbo i Serra
	The entire way how cros sysfs attibutes are created is broken. cros_ec_vbc should be its own driver and its attributes should be associated with a vbc driver not the mfd driver. In order to retain the path, the vbc attributes are attached to the cros_class. The patch also adds the sysfs documentation. Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd / platform: cros_ec: Move lightbar attributes to its own driver	Enric Balletbo i Serra
	The entire way how cros sysfs attibutes are created is broken. cros_ec_lightbar should be its own driver and its attributes should be associated with a lightbar driver not the mfd driver. In order to retain the path, the lightbar attributes are attached to the cros_class. The patch also adds the sysfs documentation. Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	mfd / platform: cros_ec: Use devm_mfd_add_devices	Enric Balletbo i Serra
	Use devm_mfd_add_devices() for adding cros-ec core MFD child devices. This reduces the need of remove callback from platform/chrome for removing the MFD child devices. Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-02-01	pipe: stop using ->can_merge	Jann Horn
	Al Viro pointed out that since there is only one pipe buffer type to which new data can be appended, it isn't necessary to have a ->can_merge field in struct pipe_buf_operations, we can just check for a magic type. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-01	splice: don't merge into linked buffers	Jann Horn
	Before this patch, it was possible for two pipes to affect each other after data had been transferred between them with tee(): ============ $ cat tee_test.c int main(void) { int pipe_a[2]; if (pipe(pipe_a)) err(1, "pipe"); int pipe_b[2]; if (pipe(pipe_b)) err(1, "pipe"); if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write"); if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee"); if (write(pipe_b[1], "xx", 2) != 2) err(1, "write"); char buf[5]; if (read(pipe_a[0], buf, 4) != 4) err(1, "read"); buf[4] = 0; printf("got back: '%s'\n", buf); } $ gcc -o tee_test tee_test.c $ ./tee_test got back: 'abxx' $ ============ As suggested by Al Viro, fix it by creating a separate type for non-mergeable pipe buffers, then changing the types of buffers in splice_pipe_to_pipe() and link_pipe(). Cc: <stable@vger.kernel.org> Fixes: 7c77f0b3f920 ("splice: implement pipe to pipe splicing") Fixes: 70524490ee2e ("[PATCH] splice: add support for sys_tee()") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-31	audit: remove unused actx param from audit_rule_match	Richard Guy Briggs
	The audit_rule_match() struct audit_context *actx parameter is not used by any in-tree consumers (selinux, apparmour, integrity, smack). The audit context is an internal audit structure that should only be accessed by audit accessor functions. It was part of commit 03d37d25e0f9 ("LSM/Audit: Introduce generic Audit LSM hooks") but appears to have never been used. Remove it. Please see the github issue https://github.com/linux-audit/audit-kernel/issues/107 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: fixed the referenced commit title] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-01-31	bpf: run bpf programs with preemption disabled	Alexei Starovoitov
	Disabled preemption is necessary for proper access to per-cpu maps from BPF programs. But the sender side of socket filters didn't have preemption disabled: unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN and a combination of af_packet with tun device didn't disable either: tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue-> tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN Disable preemption before executing BPF programs (both classic and extended). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	ide: ensure atapi sense request aren't preempted	Jens Axboe
	There's an issue with how sense requests are handled in IDE. If ide-cd encounters an error, it queues a sense request. With how IDE request handling is done, this is the next request we need to handle. But it's impossible to guarantee this, as another request could come in between the sense being queued, and ->queue_rq() being run and handling it. If that request ALSO fails, then we attempt to doubly queue the single sense request we have. Since we only support one active request at the time, defer request processing when a sense request is queued. Fixes: 600335205b8d "ide: convert to blk-mq" Reported-by: He Zhe <zhe.he@windriver.com> Tested-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-31	cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix ↵	Oleg Nesterov
	the accounting The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which needs pids_free() to uncharge the pid. However, ->free() is called from __put_task_struct()->cgroup_free() and this is too late. Even the trivial program which does for (;;) { int pid = fork(); assert(pid >= 0); if (pid) wait(NULL); else exit(0); } can run out of limits because release_task()->call_rcu(delayed_put_task_struct) implies an RCU gp after the task/pid goes away and before the final put(). Test-case: mkdir -p /tmp/CG mount -t cgroup2 none /tmp/CG echo '+pids' > /tmp/CG/cgroup.subtree_control mkdir /tmp/CG/PID echo 2 > /tmp/CG/PID/pids.max perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' & echo $! > /tmp/CG/PID/cgroup.procs Without this patch the forking process fails soon after migration. Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite into the new helper, cgroup_release(), called by release_task() which actually frees the pid(s). Reported-by: Herton R. Krzesinski <hkrzesin@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-01-31	async: Add support for queueing on specific NUMA node	Alexander Duyck
	Introduce four new variants of the async_schedule_ functions that allow scheduling on a specific NUMA node. The first two functions are async_schedule_near and async_schedule_near_domain end up mapping to async_schedule and async_schedule_domain, but provide NUMA node specific functionality. They replace the original functions which were moved to inline function definitions that call the new functions while passing NUMA_NO_NODE. The second two functions are async_schedule_dev and async_schedule_dev_domain which provide NUMA specific functionality when passing a device as the data member and that device has a NUMA node other than NUMA_NO_NODE. The main motivation behind this is to address the need to be able to schedule device specific init work on specific NUMA nodes in order to improve performance of memory initialization. I have seen a significant improvement in initialziation time for persistent memory as a result of this approach. In the case of 3TB of memory on a single node the initialization time in the worst case went from 36s down to about 26s for a 10s improvement. As such the data shows a general benefit for affinitizing the async work to the node local to the device. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31	workqueue: Provide queue_work_node to queue work near a given NUMA node	Alexander Duyck
	Provide a new function, queue_work_node, which is meant to schedule work on a "random" CPU of the requested NUMA node. The main motivation for this is to help assist asynchronous init to better improve boot times for devices that are local to a specific node. For now we just default to the first CPU that is in the intersection of the cpumask of the node and the online cpumask. The only exception is if the CPU is local to the node we will just use the current CPU. This should work for our purposes as we are currently only using this for unbound work so the CPU will be translated to a node anyway instead of being directly used. As we are only using the first CPU to represent the NUMA node for now I am limiting the scope of the function so that it can only be used with unbound workqueues. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31	irqchip/gic-v3-its: Fix ITT_entry_size accessor	Zenghui Yu
	According to ARM IHI 0069C (ID070116), we should use GITS_TYPER's bits [7:4] as ITT_entry_size instead of [8:4]. Although this is pretty annoying, it only results in a potential over-allocation of memory, and nothing bad happens. Fixes: 3dfa576bfb45 ("irqchip/gic-v3-its: Add probing for VLPI properties") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> [maz: massaged subject and commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-01-31	PM-runtime: Replace jiffies-based accounting with ktime-based accounting	Thara Gopinath
	Replace jiffies-based accounting for runtime_active_time and runtime_suspended_time with ktime-based accounting. This makes the runtime debug counters inline with genpd and other PM subsytems which use ktime-based accounting. Timekeeping is initialized before driver_init(). It's only at that time that PM-runtime can be enabled. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> [switch from ktime to raw nsec] Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-31	bpf, cgroups: clean up kerneldoc warnings	Valdis Kletnieks
	Building with W=1 reveals some bitrot: CC kernel/bpf/cgroup.o kernel/bpf/cgroup.c:238: warning: Function parameter or member 'flags' not described in '__cgroup_bpf_attach' kernel/bpf/cgroup.c:367: warning: Function parameter or member 'unused_flags' not described in '__cgroup_bpf_detach' Add a kerneldoc line for 'flags'. Fixing the warning for 'unused_flags' is best approached by removing the unused parameter on the function call. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf: fix missing prototype warnings	Valdis Kletnieks
	Compiling with W=1 generates warnings: CC kernel/bpf/core.o kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes] 721 \| u64 __weak bpf_jit_alloc_exec_limit(void) \| ^~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes] 757 \| void __weak bpf_jit_alloc_exec(unsigned long size) \| ^~~~~~~~~~~~~~~~~~ kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes] 762 \| void __weak bpf_jit_free_exec(void addr) \| ^~~~~~~~~~~~~~~~~ All three are weak functions that archs can override, provide proper prototypes for when a new arch provides their own. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf: BPF_PROG_TYPE_CGROUP_{SKB, SOCK, SOCK_ADDR} require cgroups enabled	Stanislav Fomichev
	There is no way to exercise appropriate attach points without cgroups enabled. This lets test_verifier correctly skip tests for these prog_types if kernel was compiled without BPF cgroup support. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-30	net: stmmac: Fallback to Platform Data clock in Watchdog conversion	Jose Abreu
	If we don't have DT then stmmac_clk will not be available. Let's add a new Platform Data field so that we can specify the refclk by this mean. This way we can still use the coalesce command in PCI based setups. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	ipvlan, l3mdev: fix broken l3s mode wrt local routes	Daniel Borkmann
	While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin, I ran into the issue that while l3 mode is working fine, l3s mode does not have any connectivity to kube-apiserver and hence all pods end up in Error state as well. The ipvlan master device sits on top of a bond device and hostns traffic to kube-apiserver (also running in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573 where the latter is the address of the bond0. While in l3 mode, a curl to https://10.152.183.1:443 or to https://139.178.29.207:37573 works fine from hostns, neither of them do in case of l3s. In the latter only a curl to https://127.0.0.1:37573 appeared to work where for local addresses of bond0 I saw kernel suddenly starting to emit ARP requests to query HW address of bond0 which remained unanswered and neighbor entries in INCOMPLETE state. These ARP requests only happen while in l3s. Debugging this further, I found the issue is that l3s mode is piggy- backing on l3 master device, and in this case local routes are using l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback"). I found that reverting them back into using the net->loopback_dev fixed ipvlan l3s connectivity and got everything working for the CNI. Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the l3mdev paper in [0] the only sole reason why ipvlan l3s is relying on l3 master device is to get the l3mdev_ip_rcv() receive hook for setting the dst entry of the input route without adding its own ipvlan specific hacks into the receive path, however, any l3 domain semantics beyond just that are breaking l3s operation. Note that ipvlan also has the ability to dynamically switch its internal operation from l3 to l3s for all ports via ipvlan_set_port_mode() at runtime. In any case, l3 vs l3s soley distinguishes itself by 'de-confusing' netfilter through switching skb->dev to ipvlan slave device late in NF_INET_LOCAL_IN before handing the skb to L4. Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which, if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook without any additional l3mdev semantics on top. This should also have minimal impact since dev->priv_flags is already hot in cache. With this set, l3s mode is working fine and I also get things like masquerading pod traffic on the ipvlan master properly working. [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback") Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Mahesh Bandewar <maheshb@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Florian Westphal <fw@strlen.de> Cc: Martynas Pumputis <m@lambda.lt> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	audit: ignore fcaps on umount	Richard Guy Briggs
	Don't fetch fcaps when umount2 is called to avoid a process hang while it waits for the missing resource to (possibly never) re-appear. Note the comment above user_path_mountpoint_at(): * A umount is a special case for path walking. We're not actually interested * in the inode in this situation, and ESTALE errors can be a problem. We * simply want track down the dentry and vfsmount attached at the mountpoint * and avoid revalidating the last component. This can happen on ceph, cifs, 9p, lustre, fuse (gluster) or NFS. Please see the github issue tracker https://github.com/linux-audit/audit-kernel/issues/100 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: merge fuzz in audit_log_fcaps()] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-01-30	spi: support inter-word delay requirement for devices	Jonas Bonn
	Some devices are slow and cannot keep up with the SPI bus and therefore require a short delay between words of the SPI transfer. The example of this that I'm looking at is a SAMA5D2 with a minimum SPI clock of 400kHz talking to an AVR-based SPI slave. The AVR cannot put bytes on the bus fast enough to keep up with the SoC's SPI controller even at the lowest bus speed. This patch introduces the ability to specify a required inter-word delay for SPI devices. It is up to the controller driver to configure itself accordingly in order to introduce the requested delay. Note that, for spi_transfer, there is already a field word_delay that provides similar functionality. This field, however, is specified in clock cycles (and worse, SPI controller cycles, not SCK cycles); that makes this value dependent on the master clock instead of the device clock for which the delay is intended to provide some relief. This patch leaves this old word_delay in place and provides a time-based word_delay_us alongside it; the new field fits in the struct padding so struct size is constant. There is only one in-kernel user of the word_delay field and presumably that driver could be reworked to use the time-based value instead. The time-based delay is limited to 8 bits as these delays are intended to be short. The SAMA5D2 that I've tested this on limits delays to a maximum of ~100us, which is already many word-transfer periods even at the minimum transfer speed supported by the controller. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> CC: Mark Brown <broonie@kernel.org> CC: Rob Herring <robh+dt@kernel.org> CC: Mark Rutland <mark.rutland@arm.com> CC: linux-spi@vger.kernel.org CC: devicetree@vger.kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2019-01-30	vfs: Introduce logging functions	David Howells
	Introduce a set of logging functions through which informational messages, warnings and error messages incurred by the mount procedure can be logged and, in a future patch, passed to userspace instead by way of the filesystem configuration context file descriptor. There are four functions: (1) infof(const char fmt, ...); Logs an informational message. (2) warnf(const char fmt, ...); Logs a warning message. (3) errorf(const char fmt, ...); Logs an error message. (4) invalf(const char fmt, ...); As errof(), but returns -EINVAL so can be used on a return statement. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	introduce fs_context methods	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	fs_context flavour for submounts	Al Viro
	This is an eventual replacement for vfs_submount() uses. Unlike the "mount" and "remount" cases, the users of that thing are not in VFS - they are buried in various ->d_automount() instances and rather than converting them all at once we introduce the (thankfully small and simple) infrastructure here and deal with the prospective users in afs, nfs, etc. parts of the series. Here we just introduce a new constructor (fs_context_for_submount()) along with the corresponding enum constant to be put into fc->purpose for those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	convert do_remount_sb() to fs_context	David Howells
	Replace do_remount_sb() with a function, reconfigure_super(), that's fs_context aware. The fs_context is expected to be parameterised already and have ->root pointing to the superblock to be reconfigured. A legacy wrapper is provided that is intended to be called from the fs_context ops when those appear, but for now is called directly from reconfigure_super(). This wrapper invokes the ->remount_fs() superblock op for the moment. It is intended that the remount_fs() op will be phased out. The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate that the context is being used for reconfiguration. do_umount_root() is provided to consolidate remount-to-R/O for umount and emergency remount by creating a context and invoking reconfiguration. do_remount(), do_umount() and do_emergency_remount_callback() are switched to use the new process. [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the umount / bug, gets rid of pointless complexity] [AV -- set ->net_ns in all cases; nfs remount will need that] [AV -- shift security_sb_remount() call into reconfigure_super(); the callers that didn't do security_sb_remount() have NULL fc->security anyway, so it's a no-op for them] Signed-off-by: David Howells <dhowells@redhat.com> Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	teach vfs_get_tree() to handle subtype, switch do_new_mount() to it	Al Viro
	Roll the handling of subtypes into do_new_mount() and vfs_get_tree(). The former determines any subtype string and hangs it off the fs_context; the latter applies it. Make do_new_mount() create, parameterise and commit an fs_context and create a mount for itself rather than calling vfs_kern_mount(). [AV -- missing kstrdup()] [AV -- ... and no kstrdup() if we get to setting ->s_submount - we simply transfer it from fc, leaving NULL behind] [AV -- constify ->s_submount, while we are at it] Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	new helpers: vfs_create_mount(), fc_mount()	Al Viro
	Create a new helper, vfs_create_mount(), that creates a detached vfsmount object from an fs_context that has a superblock attached to it. Almost all uses will be paired with immediately preceding vfs_get_tree(); add a helper for such combination. Switch vfs_kern_mount() to use this. NOTE: mild behaviour change; passing NULL as 'device name' to something like procfs will change /proc/*/mountstats - "device none" instead on "no device". That is consistent with /proc/mounts et.al. [do'h - EXPORT_SYMBOL_GPL slipped in by mistake; removed] [AV -- remove confused comment from vfs_create_mount()] [AV -- removed the second argument] Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	vfs: Introduce fs_context, switch vfs_kern_mount() to it.	David Howells
	Introduce a filesystem context concept to be used during superblock creation for mount and superblock reconfiguration for remount. This is allocated at the beginning of the mount procedure and into it is placed: (1) Filesystem type. (2) Namespaces. (3) Source/Device names (there may be multiple). (4) Superblock flags (SB_*). (5) Security details. (6) Filesystem-specific data, as set by the mount options. Accessor functions are then provided to set up a context, parameterise it from monolithic mount data (the data page passed to mount(2)) and tear it down again. A legacy wrapper is provided that implements what will be the basic operations, wrapping access to filesystems that aren't yet aware of the fs_context. Finally, vfs_kern_mount() is changed to make use of the fs_context and mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount(). [AV -- add missing kstrdup()] [AV -- put_cred() can be unconditional - fc->cred can't be NULL] [AV -- take legacy_validate() contents into legacy_parse_monolithic()] [AV -- merge KERNEL_MOUNT and USER_MOUNT] [AV -- don't unlock superblock on success return from vfs_get_tree()] [AV -- kill 'reference' argument of init_fs_context()] Signed-off-by: David Howells <dhowells@redhat.com> Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-30	Merge tag 'soc-fsl-next-v5.1-2' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into arm/drivers NXP/FSL SoC driver updates for v5.1 DPIO driver - Clean up the remove path in the dpio driver so that successive bind/unbind commands behave properly - Add the ability to automatically create a device link between a consumer device on the fsl-mc bus and a supplier one - Add prefetch to dpio dequeue to improve performance - Update the type of dpio APIs to align with buffer pool id register field guts driver - Prevent allocation failure by reuse the machine type data from device tree directly * tag 'soc-fsl-next-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: soc: fsl: guts: reuse machine name from device tree soc: fsl: dpio: Change bpid type to u16 soc: fsl: dpio: Add prefetch instruction bus: fsl-mc: automatically add a device_link on fsl_mc_[portal,object]_allocate soc: fsl: dpio: add a device_link at dpaa2_io_service_register soc: fsl: dpio: store a backpointer to the device backing the dpaa2_io soc: fsl: dpio: keep a per dpio device MC portal soc: fsl: dpio: perform DPIO Reset on Probe soc: fsl: dpio: use a cpumask to identify which cpus are unused soc: fsl: dpio: cleanup the cpu array on dpaa2_io_down Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-30	cpufreq: Auto-register the driver as a thermal cooling device if asked	Amit Kucheria
	All cpufreq drivers do similar things to register as a cooling device. Provide a cpufreq driver flag so drivers can just ask the cpufreq core to register the cooling device on their behalf. This allows us to get rid of duplicated code in the drivers. In order to allow this, we add a struct thermal_cooling_device pointer to struct cpufreq_policy so that drivers don't need to store it in a private data structure. Suggested-by: Stephen Boyd <swboyd@chromium.org> Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-30	PM-runtime: Fix deadlock with ktime_get()	Vincent Guittot
	A deadlock has been seen when swicthing clocksources which use PM-runtime. The call path is: change_clocksource ... write_seqcount_begin ... timekeeping_update ... sh_cmt_clocksource_enable ... rpm_resume pm_runtime_mark_last_busy ktime_get do read_seqcount_begin while read_seqcount_retry .... write_seqcount_end Although we should be safe because we haven't yet changed the clocksource at that time, we can't do that because of seqcount protection. Use ktime_get_mono_fast_ns() instead which is lock safe for such cases. With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be monotonic across an update and as a result can goes backward. According to update_fast_timekeeper() description: "In the worst case, this can result is a slightly wrong timestamp (a few nanoseconds)". For PM-runtime autosuspend, this means only that the suspend decision may be slightly suboptimal. Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers") Reported-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-30	fs/dcache: Track & report number of negative dentries	Waiman Long
	The current dentry number tracking code doesn't distinguish between positive & negative dentries. It just reports the total number of dentries in the LRU lists. As excessive number of negative dentries can have an impact on system performance, it will be wise to track the number of positive and negative dentries separately. This patch adds tracking for the total number of negative dentries in the system LRU lists and reports it in the 5th field in the /proc/sys/fs/dentry-state file. The number, however, does not include negative dentries that are in flight but not in the LRU yet as well as those in the shrinker lists which are on the way out anyway. The number of positive dentries in the LRU lists can be roughly found by subtracting the number of negative dentries from the unused count. Matthew Wilcox had confirmed that since the introduction of the dentry_stat structure in 2.1.60, the dummy array was there, probably for future extension. They were not replacements of pre-existing fields. So no sane applications that read the value of /proc/sys/fs/dentry-state will do dummy thing if the last 2 fields of the sysctl parameter are not zero. IOW, it will be safe to use one of the dummy array entry for negative dentry count. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30	fs: Don't need to put list_lru into its own cacheline	Waiman Long
	The list_lru structure is essentially just a pointer to a table of per-node LRU lists. Even if CONFIG_MEMCG_KMEM is defined, the list field is just used for LRU list registration and shrinker_id is set at initialization. Those fields won't need to be touched that often. So there is no point to make the list_lru structures to sit in their own cachelines. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30	cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM	Josh Poimboeuf
	With the following commit: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30	x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery	Borislav Petkov
	Remove the ifdeffery in the breakpoint parsing arch_build_bp_info() by adding a within_kprobe_blacklist() stub for the !CONFIG_KPROBES case. It is returning true when kprobes are not enabled to mean that any address is within the kprobes blacklist on such kernels and thus not allow kernel breakpoints on non-kprobes kernels. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190127131237.4557-1-bp@alien8.de
2019-01-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller

2019-01-29	x86/speculation: Add PR_SPEC_DISABLE_NOEXEC	Waiman Long
	With the default SPEC_STORE_BYPASS_SECCOMP/SPEC_STORE_BYPASS_PRCTL mode, the TIF_SSBD bit will be inherited when a new task is fork'ed or cloned. It will also remain when a new program is execve'ed. Only certain class of applications (like Java) that can run on behalf of multiple users on a single thread will require disabling speculative store bypass for security purposes. Those applications will call prctl(2) at startup time to disable SSB. They won't rely on the fact the SSB might have been disabled. Other applications that don't need SSBD will just move on without checking if SSBD has been turned on or not. The fact that the TIF_SSBD is inherited across execve(2) boundary will cause performance of applications that don't need SSBD but their predecessors have SSBD on to be unwittingly impacted especially if they write to memory a lot. To remedy this problem, a new PR_SPEC_DISABLE_NOEXEC argument for the PR_SET_SPECULATION_CTRL option of prctl(2) is added to allow applications to specify that the SSBD feature bit on the task structure should be cleared whenever a new program is being execve'ed. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Jiri Kosina <jikos@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: https://lkml.kernel.org/r/1547676096-3281-1-git-send-email-longman@redhat.com
2019-01-29	Merge branch 'devx-async' into k.o/for-next	Jason Gunthorpe
	Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29	sched: Remove stale PF_MUTEX_TESTER bit	Thomas Gleixner
	The RTMUTEX tester was removed long ago but the PF bit stayed around. Remove it and free up the space. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>