linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-11	net/mlx4_core: Fix warnings during boot on driverinit param set failures	Moshe Shemesh
	During boot, mlx4_core sets the driverinit configuration parameters and updates the devlink module on the initial values calling devlink_param_driverinit_value_set(). If devlink_param_driverinit_value_set() returns an error mlx4_core reports kernel module warning. This caused false alarm during boot in case kernel was compiled with CONFIG_NET_DEVLINK off. Fix by removing warning reported in case devlink_param_driverinit_value_set() fails. This actually makes the function mlx4_devlink_set_init_value() redundant to using directly devlink_param_driverinit_value_set() and so removed. It fixes the following kernel trace: mlx4_core 0000:00:06.0: devlink set parameter 0 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 1 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 4 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 5 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 3 value failed (err = -95) Fixes: bd1b51dc66df ("mlx4: Add mlx4 initial parameters table and register it") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge branch 'for-4.19-fixes' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Tejun writes: "cgroup fixes for v4.19-rc7 One cgroup2 threaded mode fix for v4.19-rc7. While threaded mode isn't used widely (yet) and the bug requires somewhat convoluted sequence of operations, it causes a userland visible malfunction - EINVAL on a valid attempt to enable threaded mode. This pull request contains the fix" * 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix dom_cgrp propagation when enabling threaded mode
2018-10-11	tipc: eliminate possible recursive locking detected by LOCKDEP	Ying Xue
	When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); * DEADLOCK * May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge tag 'kbuild-fixes-v4.19-2' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Masahiro writes: "Kbuild fixes for v4.19 (2nd) - Fix warnings from recordmcount.pl when building with Clang - Allow Clang to use GNU toolchains correctly - Disable CONFIG_SAMPLES for UML to avoid build error" * tag 'kbuild-fixes-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: samples: disable CONFIG_SAMPLES for UML kbuild: allow to use GCC toolchain not in Clang search path ftrace: Build with CPPFLAGS to get -Qunused-arguments
2018-10-11	Merge branch 'net-explicitly-requires-bash-when-needed'	David S. Miller
	Paolo Abeni says: ==================== net: explicitly requires bash when needed. Some test scripts require bash-only features but use the default shell. This may cause random failures if the default shell is not bash. Instead of doing a potentially complex rewrite of such scripts, these patches require the bash interpreter, where needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	selftests: udpgso_bench.sh explicitly requires bash	Paolo Abeni
	The udpgso_bench.sh script requires several bash-only features. This may cause random failures if the default shell is not bash. Address the above explicitly requiring bash as the script interpreter Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	selftests: rtnetlink.sh explicitly requires bash.	Paolo Abeni
	the script rtnetlink.sh requires a bash-only features (sleep with sub-second precision). This may cause random test failure if the default shell is not bash. Address the above explicitly requiring bash as the script interpreter. Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge tag 'alloc-args-v4.19-rc8' of ↵	Greg Kroah-Hartman
	https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Kees writes: "Fix open-coded multiplication arguments to allocators - Fixes several new open-coded multiplications added in the 4.19 merge window." * tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Replace more open-coded allocation size multiplications
2018-10-11	block: describe difference between flags IO_STAT and STATS	Konstantin Khlebnikov
	This adds reasonable comments, but they definitely needs better names. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-11	s390/vmalloc: fix VMALLOC_START calculation	Mikhail Zaslonko
	With the introduction of the module area on top of the vmalloc area, the calculation of VMALLOC_START in setup_memory_end() function hasn't been adjusted. As a result we got vmalloc area 2 Gb (MODULES_LEN) smaller than it should be and the preceding vmemmap area got extra memory instead. The patch fixes this calculation error although there were no visible negative effects. Apart from that, change 'tmp' variable to 'vmemmap' in memory_end calculation for better readability. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-11	spi/spi-pxa2xx: add PXA2xx SSP SPI Controller	Lubomir Rintel
	This is the SPI controller found on Marvel MMP2 and perhaps more platforms. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: pxa2xx: Add devicetree support	Lubomir Rintel
	The MMP2 platform, that uses device tree, has this controller. Let's add devicetree alongside platform & PCI. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: pxa2xx: Use an enum for type	Lubomir Rintel
	That seems to be the correct type. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: spi-geni-qcom: Add SPI driver support for GENI based QUP	Girish Mahadevan
	This driver supports GENI based SPI Controller in the Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable module supporting a wide range of serial interfaces including SPI. This driver supports SPI operations using FIFO mode of transfer. Signed-off-by: Girish Mahadevan <girishm@codeaurora.org> Signed-off-by: Dilip Kota <dkota@codeaurora.org> Signed-off-by: Alok Chauhan <alokc@codeaurora.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: soc: qcom: GENI SE SPI controller device tree binding	Dilip Kota
	Move GENI SE SPI controller device-tree bindings from devicetree/bindings/soc/qcom/qcom,geni-se.txt to devicetree/bindings/spi/qcom,spi-geni-qcom.txt. Signed-off-by: Dilip Kota <dkota@codeaurora.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alok Chauhan <alokc@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	dt-bindings: soc: qcom: Remove SPI controller maximum frequency binding	Dilip Kota
	SPI controller driver should maintain the maximum frequency of the controller instead of relying on device tree bindings. Because maximum frequency is specific property of SPI controller. Signed-off-by: Dilip Kota <dkota@codeaurora.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alok Chauhan <alokc@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: rockchip: simplify spi enable logic	Emil Renner Berthing
	Let the dma/non-dma code paths handle the spi enable flag themselves. This removes some logic to determine if the flag should be turned on before or after dma and also don't leave the spi enabled if the dma path fails. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: rockchip: directly use direction constants	Emil Renner Berthing
	The dma direction for the tx and rx dma channels never change, so just use the constants directly rather than storing them in device data. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: rockchip: mark use_dma as bool	Emil Renner Berthing
	The driver data has a u32 field use_dma which is only ever used as a boolean, so change its type to reflect that. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: rockchip: remove unneeded dma_caps	Emil Renner Berthing
	We no longer need the dma_caps since the dma driver already clamps the burst length to the hardware limit, so don't request and store dma_caps in device data. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: rockchip: adjust dma watermark and burstlen	Huibin Hong
	Signal tx dma when spi fifo is less than half full, and limit tx bursts to half the fifo length. Clamp rx burst length to 1 to avoid alignment issues. Signed-off-by: Huibin Hong <huibin.hong@rock-chips.com> Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	perf vendor events intel: Fix wrong filter_band* values for uncore events	Jiri Olsa
	Michael reported that he could not stat following event: $ perf stat -e unc_p_freq_ge_1200mhz_cycles -a -- ls event syntax error: '..e_1200mhz_cycles' \___ value too big for format, maximum is 255 Run 'perf list' for a list of valid events The event is unwrapped into: uncore_pcu/event=0xb,filter_band0=1200/ where filter_band0 format says it's one byte only: # cat uncore_pcu/format/filter_band0 config1:0-7 while JSON files specifies bigger number: "Filter": "filter_band0=1200", all the filter_band* formats show 1 byte width: # cat uncore_pcu/format/filter_band1 config1:8-15 # cat uncore_pcu/format/filter_band2 config1:16-23 # cat uncore_pcu/format/filter_band3 config1:24-31 The reason of the issue is that filter_band* values are supposed to be in 100Mhz units.. it's stated in the JSON help for the events, like: filter_band3=XXX, with XXX in 100Mhz units This patch divides the filter_band* values by 100, plus there's couple of changes that actually change the number completely, like: - "Filter": "edge=1,filter_band2=4000", + "Filter": "edge=1,filter_band2=30", Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20181010080339.GB15790@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-11	spi: rockchip: initialize dma_slave_config properly	Huibin Hong
	The rxconf and txconf structs are allocated on the stack, so make sure we zero them before filling out the relevant fields. Signed-off-by: Huibin Hong <huibin.hong@rock-chips.com> Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: Introduce new driver for Qualcomm QuadSPI controller	Girish Mahadevan
	New driver for Qualcomm QuadSPI(QSPI) controller that is used to communicate with slaves such as flash memory devices. The QSPI controller can operate in 2 or 4 wire mode but only supports SPI Mode 0. The controller can also operate in Single or Dual data rate modes. Signed-off-by: Girish Mahadevan <girishm@codeaurora.org> Signed-off-by: Ryan Case <ryandcase@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: Qualcomm Quad SPI(QSPI) documentation	Girish Mahadevan
	Bindings for Qualcomm Quad SPI used on SoCs such as sdm845. Signed-off-by: Girish Mahadevan <girishm@codeaurora.org> Signed-off-by: Ryan Case <ryandcase@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	dw: spi: add support for Amazon's Alpine spi controller	Talel Shenhar
	Add support for a new devicetree compatible string called 'amazon,alpine-apb-ssi', which is necessary for the Amazon Alpine spi controller. 'amazon,alpine-dw-apb-ssi' is used in the dw spi driver if specified in the devicetree. Otherwise, fall back to driver default behavior, i.e. original dw IP hw driver behavior. Signed-off-by: Talel Shenhar <talel@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	spi: dw: add compatible for Amazon's Alpine spi controller	Talel Shenhar
	This compatible adds the ability for dw spi controller driver to work with the dw spi controller found on Alpine chips. The dw spi controller has an auto-deselect of Chip-Select, in case there is no data inside the Tx FIFO. While working on platforms with Alpine chips, auto-deselect mode causes an issue for some spi devices that can't handle the Chip-Select deselect in the middle of a transaction. It is a normal behavior for a Tx FIFO to be empty in the middle of a transaction, due to busy cpu. In the Alpine chip family an option to change the default behavior was added to the original dw spi controller to prevent this issue of de-asserting Chip-Select once TX FIFO is empty. The change was to allow SW manual control of the Chip-Select. With this change, as long as the Slave Enable Register is asserted, the Chip-Select will be asserted. As a result, it is necessary to deselect the Slave Select Register once the transaction is done. This feature is enabled via a new device compatible string called 'amazon,alpine-dw-apb-ssi'. Once the driver identifies the new compatible string, it enables the hw fixup logic, by writing to a dedicated register found in the IP reserved area and will start manual deselecting the Slave Select Register when the transfer ends. Signed-off-by: Talel Shenhar <talel@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-11	Merge tag 'qcom-geni-immutable-for-mark-brown' of ↵	Mark Brown
	git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into spi-4.20 Immutable branch for QCOM Geni patches
2018-10-11	spi: bcm-qspi: switch back to reading flash using smaller chunks	Rafał Miłecki
	Fixing/optimizing bcm_qspi_bspi_read() performance introduced two changes: 1) It added a loop to read all requested data using multiple BSPI ops. 2) It bumped max size of a single BSPI block request from 256 to 512 B. The later change resulted in occasional BSPI timeouts causing a regression. For some unknown reason hardware doesn't always handle reads as expected when using 512 B chunks. In such cases it may happen that BSPI returns amount of requested bytes without the last 1-3 ones. It provides the remaining bytes later but doesn't raise an interrupt until another LR start. Switching back to 256 B reads fixes that problem and regression. Fixes: 345309fa7c0c ("spi: bcm-qspi: Fix bcm_qspi_bspi_read() performance") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2018-10-11	spi: bcm-qspi: fix calculation of address length	Rafał Miłecki
	During implementation of the new API bcm_qspi_bspi_set_flex_mode() has been modified breaking calculation of address length. An unnecessary multiplication was added breaking flash reads. Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2018-10-11	xfrm: policy: use hlist rcu variants on insert	Florian Westphal
	bydst table/list lookups use rcu, so insertions must use rcu versions. Fixes: a7c44247f704e ("xfrm: policy: make xfrm_policy_lookup_bytype lockless") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-10-11	net/xfrm: fix out-of-bounds packet access	Alexei Starovoitov
	BUG: KASAN: slab-out-of-bounds in _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161 Read of size 1 at addr ffff8801d882eec7 by task syz-executor1/6667 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299 xfrm_decode_session include/net/xfrm.h:1232 [inline] vti6_tnl_xmit+0x3c3/0x1bc1 net/ipv6/ip6_vti.c:542 __netdev_start_xmit include/linux/netdevice.h:4313 [inline] netdev_start_xmit include/linux/netdevice.h:4322 [inline] xmit_one net/core/dev.c:3217 [inline] dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3233 __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3803 dev_queue_xmit+0x17/0x20 net/core/dev.c:3836 Reported-by: syzbot+acffccec848dc13fe459@syzkaller.appspotmail.com Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-10-11	sched/fair: Fix throttle_list starvation with low CFS quota	Phil Auld
	With a very low cpu.cfs_quota_us setting, such as the minimum of 1000, distribute_cfs_runtime may not empty the throttled_list before it runs out of runtime to distribute. In that case, due to the change from c06f04c7048 to put throttled entries at the head of the list, later entries on the list will starve. Essentially, the same X processes will get pulled off the list, given CPU time and then, when expired, get put back on the head of the list where distribute_cfs_runtime will give runtime to the same set of processes leaving the rest. Fix the issue by setting a bit in struct cfs_bandwidth when distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can decide to put the throttled entry on the tail or the head of the list. The bit is set/cleared by the callers of distribute_cfs_runtime while they hold cfs_bandwidth->lock. This is easy to reproduce with a handful of CPU consumers. I use 'crash' on the live system. In some cases you can simply look at the throttled list and see the later entries are not changing: crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining \| paste - - \| awk '{print $1" "$4}' \| pr -t -n3 1 ffff90b56cb2d200 -976050 2 ffff90b56cb2cc00 -484925 3 ffff90b56cb2bc00 -658814 4 ffff90b56cb2ba00 -275365 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining \| paste - - \| awk '{print $1" "$4}' \| pr -t -n3 1 ffff90b56cb2d200 -994147 2 ffff90b56cb2cc00 -306051 3 ffff90b56cb2bc00 -961321 4 ffff90b56cb2ba00 -24490 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 Sometimes it is easier to see by finding a process getting starved and looking at the sched_info: crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-11	Merge branch 'x86-urgent-for-linus' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "x86 fixes An intel_rdt memory access fix and a VLA fix in pgd_alloc()." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Avoid VLA in pgd_alloc() x86/intel_rdt: Fix out-of-bounds memory access in CBM tests
2018-10-11	Merge branch 'sched-urgent-for-linus' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fix: Cleanup of dead code left over from the recent sched/numa fixes." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm, sched/numa: Remove remaining traces of NUMA rate-limiting
2018-10-11	Merge branch 'perf-urgent-for-linus' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo, a man of few words, writes: "perf fixes: misc perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf record: Use unmapped IP for inline callchain cursors perf python: Use -Wno-redundant-decls to build with PYTHON=python3 perf report: Don't try to map ip to invalid map perf script python: Fix export-to-sqlite.py sample columns perf script python: Fix export-to-postgresql.py occasional failure
2018-10-11	sched/completions/Documentation: Clean up the document some more	Ingo Molnar
	Refresh the document: - Remove unnecessary liguistic complexity and improve the clarity of the text - Improve the explanations all around - Remove unnecessary and stale version info - Fix whitespace noise - Make pseudo-code match kernel style - Fix minor syntax errors in pseudo-code - Use consistent denotation - Mark multi-CPU sequences more explicitly - Unbreak line breaks - Use quotes to refer to 'struct completion' - Use 'IRQ context' and 'IRQs' consistently - Improve grammar - etc. Cc: John Garry <john.garry@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Mc Guire <hofrat@osadl.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1539183392-239389-1-git-send-email-john.garry@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-11	sched/completions/Documentation: Fix a couple of punctuation nits	John Garry
	This patch fixes a couple of punctuation nits which can make the document more correct and readable. Also missing "()" are added to some function references for consistency. Signed-off-by: John Garry <john.garry@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1539183392-239389-1-git-send-email-john.garry@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-11	xsk: do not call synchronize_net() under RCU read lock	Björn Töpel
	The XSKMAP update and delete functions called synchronize_net(), which can sleep. It is not allowed to sleep during an RCU read section. Instead we need to make sure that the sock sk_destruct (xsk_destruct) function is asynchronously called after an RCU grace period. Setting the SOCK_RCU_FREE flag for XDP sockets takes care of this. Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-10	qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface	Giacinto Cifelli
	Added support for Gemalto's Cinterion ALASxx WWAN interfaces by adding QMI_FIXED_INTF with Cinterion's VID and PID. Signed-off-by: Giacinto Cifelli <gciofono@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	tipc: queue socket protocol error messages into socket receive buffer	Parthasarathy Bhuvaragan
	In tipc_sk_filter_rcv(), when we detect protocol messages with error we call tipc_sk_conn_proto_rcv() and let it reset the connection and notify the socket by calling sk->sk_state_change(). However, tipc_sk_filter_rcv() may have been called from the function tipc_backlog_rcv(), in which case the socket lock is held and the socket already awake. This means that the sk_state_change() call is ignored and the error notification lost. Now the receive queue will remain empty and the socket sleeps forever. In this commit, we convert the protocol message into a connection abort message and enqueue it into the socket's receive queue. By this addition to the above state change we cover all conditions. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	tipc: set link tolerance correctly in broadcast link	Jon Maloy
	In the patch referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the affected links. However, the 'tolerance' field of the broadcast link is never set, and remains at zero. This renders the whole commit without the intended improving effect, but luckily also with no negative effect. In this commit we add the missing initialization. Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'net-ipv4-fixes-for-PMTU-when-link-MTU-changes'	David S. Miller
	Sabrina Dubroca says: ==================== net: ipv4: fixes for PMTU when link MTU changes The first patch adapts the changes that commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and raise PMTU when the first hop was limiting PMTU discovery and its MTU is increased. The second patch fixes bugs introduced in commit d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that only appear once the first patch is applied. Selftests for these cases were introduced in net-next commit e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests") v2: add cover letter, and fix a few small things in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net: ipv4: don't let PMTU updates increase route MTU	Sabrina Dubroca
	When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net: ipv4: update fnhe_pmtu when first hop's MTU changes	Sabrina Dubroca
	Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net/ipv6: stop leaking percpu memory in fib6 info	Mike Rapoport
	The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge tag 'rxrpc-fixes-20181008' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if we rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	rds: RDS (tcp) hangs on sendto() to unresponding address	Ka-Cheong Poon
	In rds_send_mprds_hash(), if the calculated hash value is non-zero and the MPRDS connections are not yet up, it will wait. But it should not wait if the send is non-blocking. In this case, it should just use the base c_path for sending the message. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge tag 'for-4.19/dm-fixes-4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Mike writes: "device mapper fix for 4.19 final - Fix for earlier 4.19 final DM linear change that incorrectly checked for CONFIG_DM_ZONED rather than CONFIG_BLK_DEV_ZONED." * tag 'for-4.19/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm linear: fix linear_end_io conditional definition
2018-10-11	Merge tag 'xfs-fixes-for-4.19-rc7' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Dave writes: "xfs: fixes for 4.19-rc7 Update for 4.19-rc7 to fix numerous file clone and deduplication issues." * tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix data corruption w/ unaligned reflink ranges xfs: fix data corruption w/ unaligned dedupe ranges xfs: update ctime and remove suid before cloning files xfs: zero posteof blocks when cloning above eof xfs: refactor clonerange preparation into a separate helper