linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-08-21	dsa: remove unused net_device arg from handlers	Florian Westphal
	compile tested only, but saw no warnings/errors with allmodconfig build. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-21	of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not ↵	Sudeep Holla
	registered Instead of the callsites choosing between of_cpu_device_node_get if the CPUs are registered as of_node is populated by then and of_get_cpu_node when the CPUs are not yet registered as CPU of_nodes are not yet stashed thereby needing to parse the device tree, we can call of_get_cpu_node in case the CPUs are not yet registered. This will allow to use of_cpu_device_node_get anywhere hiding the details from the caller. Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2017-08-21	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-08-21 1) Support RX checksum with IPsec crypto offload for esp4/esp6. From Ilan Tayari. 2) Fixup IPv6 checksums when doing IPsec crypto offload. From Yossi Kuperman. 3) Auto load the xfrom offload modules if a user installs a SA that requests IPsec offload. From Ilan Tayari. 4) Clear RX offload informations in xfrm_input to not confuse the TX path with stale offload informations. From Ilan Tayari. 5) Allow IPsec GSO for local sockets if the crypto operation will be offloaded. 6) Support setting of an output mark to the xfrm_state. This mark can be used to to do the tunnel route lookup. From Lorenzo Colitti. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-21	quota: Add lock annotations to struct members	Jan Kara
	Add annotation which lock protects which struct members to struct dquot and struct mem_dqinfo. Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-21	mfd: rk808: Add rk805 regs addr and ID	Elaine Zhang
	Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Joseph Chen <chenjh@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-08-21	mfd: rk808: Fix up the chip id get failed	Elaine Zhang
	the rk8xx chip id is: ((MSB << 8) \| LSB) & 0xfff0 Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Joseph Chen <chenjh@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-08-21	Merge branch 'for-mingo' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Removal of spin_unlock_wait() - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes - CPU-hotplug fixes - Miscellaneous non-RCU fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-20	net_sched: fix order of queue length updates in qdisc_replace()	Konstantin Khlebnikov
	This important to call qdisc_tree_reduce_backlog() after changing queue length. Parent qdisc should deactivate class in ->qlen_notify() called from qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero. Missed class deactivations leads to crashes/warnings at picking packets from empty qdisc and corrupting state at reactivating this class in future. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-21	Merge tag 'drm-amdkfd-next-2017-08-18' of ↵	Dave Airlie
	git://people.freedesktop.org/~gabbayo/linux into drm-next This is the amdkfd pull request for 4.14 merge window. AMD has started cleaning the pipe and sending patches from their internal development to the upstream community. The plan as I understand it is to first get all the non-dGPU patches to upstream and then move to upstream dGPU support. The patches here are relevant only for Kaveri and Carrizo. The following is a summary of the changes: - Add new IOCTL to set a Scratch memory VA - Update PM4 headers for new firmware that support scratch memory - Support image tiling mode - Remove all uses of BUG_ON - Various Bug fixes and coding style fixes * tag 'drm-amdkfd-next-2017-08-18' of git://people.freedesktop.org/~gabbayo/linux: (24 commits) drm/amdkfd: Implement image tiling mode support v2 drm/amdgpu: Add kgd kfd interface get_tile_config() v2 drm/amdkfd: Adding new IOCTL for scratch memory v2 drm/amdgpu: Add kgd/kfd interface to support scratch memory v2 drm/amdgpu: Program SH_STATIC_MEM_CONFIG globally, not per-VMID drm/amd: Update MEC HQD loading code for KFD drm/amdgpu: Disable GFX PG on CZ drm/amdkfd: Update PM4 packet headers drm/amdkfd: Clamp EOP queue size correctly on Gfx8 drm/amdkfd: Add more error printing to help bringup v2 drm/amdkfd: Handle remaining BUG_ONs more gracefully v2 drm/amdkfd: Allocate gtt_sa_bitmap in long units drm/amdkfd: Fix doorbell initialization and finalization drm/amdkfd: Remove BUG_ONs for NULL pointer arguments drm/amdkfd: Remove usage of alloc(sizeof(struct... drm/amdkfd: Fix goto usage v2 drm/amdkfd: Change x==NULL/false references to !x drm/amdkfd: Consolidate and clean up log commands drm/amdkfd: Clean up KFD style errors and warnings v2 drm/amdgpu: Remove hard-coded assumptions about compute pipes ...
2017-08-21	Merge branch 'irq/for-gpio' of ↵	Linus Walleij
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into devel
2017-08-21	Merge branch 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux ↵	Dave Airlie
	into drm-next More changes for 4.14. Highlights: - command submission overhead improvements - Huge page support for vega10 - physical mode support for mjpeg for asics that don't support UVD vm - improve ttm_mem_type_manager_func debug - misc ttm fixes, cleanups - misc gpuvm cleanups * 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux: (26 commits) drm/ttm: use reservation_object_trylock in ttm_bo_individualize_resv v2 drm/amdgpu: fix vega10 graphic hang issue in S3 test drm/amdgpu: bump version for support of UVD MJPEG decode drm/amdgpu: add MJPEG check for UVD physical mode msg buffer drm/ttm: Fix accounting error when fail to get pages for pool drm/amd/amdgpu: expose fragment size as module parameter (v2) drm/amd/amdgpu: store fragment_size in vm_manager drm/amdgpu: rename VM invalidated to moved drm/amdgpu: separate bo_va structure drm/amdgpu: drop the extra VM huge page flag v2 drm/amdgpu: remove superflous amdgpu_bo_kmap in the VM drm/amdgpu: cleanup static CSA handling drm/amdgpu: SHADOW and VRAM_CONTIGUOUS flags shouldn't be used by userspace drm/amdgpu: save list length when fence is signaled drm/amdgpu: move vram usage tracking into the vram manager v2 drm/amdgpu: move gtt usage tracking into the gtt manager v2 drm/amdgpu: move debug print into the MM managers drm/amdgpu: fix incorrect use of the lru_lock drm/radeon: fix incorrect use of the lru_lock drm/ttm: make ttm_mem_type_manager_func debug more useful ...
2017-08-20	Merge tag 'fixes-for-4.13b' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Second set of IIO fixes for the 4.13 cycle. Given the late stage of this series, some more involved fixes have been held back for the upcoming merge window. The hid-sensor issue has been causing problems for a long time so it is great to have that one finally fixed! No more bug reports for the userspace guys (well about that anyway). * documentation - some warning fixes due to missing colons in kernel-doc. * adis16480 - fix accel scale factor. * bmp280 - properly initialize the device for humidity readings - without this the humidity readings may be skipped and a magic value of 0x8000 returned. * hid-sensor-strigger - fix a race with user space when powering up the sensor. * ina291 - Avoid an underflow for the sleeping time as a result of supporting the fastest rates. * st-magnetometer - Fix the status register address for hte LSM303AGR, - Remove the ihl property for LSM303AGR as the sensor doesn't support active low for the dataready line. * stm32-adc - Fix use of a common clock rate. * stm32-timer - fix the quadrature mode get routine to account for the magic 0 value. set on boot. - fix the return value of write_raw, - fix the get/set down count direction as the enum value was not being converted to the relevant bit field, - add an enable attribute to actually turn it on when in encoder mode, - missing mask when reading the trigger mode.
2017-08-20	Merge tag 'iio-for-4.14b' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Second set of IIO new device support, features and cleanup for the 4.14 cycle. New device support: * ak8974 - support the AMI306. * st_magnetometer - add support for the LIS2MDL with bindings. * rockchip-saradc - add binding for rv1108 SoC (no driver change). * srf08 - add srf02 (i2c only) and srf10 support. * stm32-timer - support for the STM32H7 to existing driver. Features: * tools - move over to the tools buildsystem rather than hand rolling. - add an install section to the build. * ak8974 - use serial number to add device randomness. - add AMI306 calibration data output. * ccs811 - triggered buffer support. * srf08 - add a device tree table as the old style i2c probing is going away, - add triggered buffer support * st32-adc - add optional st,min-sample-time-nsecs binding to allow control of sampling against analog circuitry. * stm32-timer - add output compare triggers. * ti-ads1015 - add threshold event support. * ti-ads7950 - Allow use on ACPI platforms including providing a default reference voltage as there is no way to obtain this on ACPI currently. Cleanup and fixes: * ad7606 - fix an error return code in probe. * ads1015 - fix incorrect data rate setting update when capture in progress, - fix wrong scale information for the ADS1115, - make conversions work when CONFIG_PM is not set, - make sure we don't get a stale result after a runtime resume by ensuring we wait long enough, - avoid returning a false error form the buffer setup callbacks, - add enough wait time to get the correct conversion, - remove an unnecessary config register update, - add a helper to set conversion mode reducing repeated boilerplate, - use devm_iio_triggered_buffer_setup to simplify error and remove paths, - use iio_device_claim_direct_mode instead of opencoding the same. * ak8974 - mark the INT_CLEAR register as precious to prevent debugfs access. * apds9300 - constify the i2c_device_id. * at91-sama5 adc - add missing Kconfig dependency. * bma180 accel - constify the i2c_device_id. * rockchip_saradc - explicitly request exclusive reset control as part of the reset rework on going throughout the kernel. * st_accel - fix drdy configuration for a load of accelerometers that only have the int1 line. Fix is unimportant as presumably no deviec tree actually used the non existent hardware line. * st_pressure - fix drdy configuration for LPS22HB and LPS25H by dropping int2 support as they don't have this. Fix is unimportant as presumably no device tree actually used the non existent hardware line. * stm32-dac - explicitly request exclusive reset control (part of reset being reworked). * tsl2583 - constify the i2c_device_id. * xadc - coding style fixes.
2017-08-20	Merge branch 'bugfixes'	Trond Myklebust

2017-08-20	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two fixes for the perf subsystem: - Fix an inconsistency of RDPMC mm struct tagging across exec() which causes RDPMC to fault. - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which causes incorrect timestamps and total time calculations" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix time on IOC_ENABLE perf/x86: Fix RDPMC vs. mm_struct tracking
2017-08-20	Merge branch 'core-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog fix from Thomas Gleixner: "A fix for the hardlockup watchdog to prevent false positives with extreme Turbo-Modes which make the perf/NMI watchdog fire faster than the hrtimer which is used to verify. Slightly larger than the minimal fix, which just would increase the hrtimer frequency, but comes with extra overhead of more watchdog timer interrupts and thread wakeups for all users. With this change we restrict the overhead to the extreme Turbo-Mode systems" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/watchdog: Prevent false positives with turbo modes
2017-08-20	NFS: Remove unused parameter gfp_flags from nfs_pageio_init()	Trond Myklebust
	Now that the mirror allocation has been moved, the parameter can go. Also remove the redundant symbol export. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-20	PATCH] iio: Fix some documentation warnings	Jonathan Corbet
	The kerneldoc description for the trig_readonly field of struct iio_dev lacked a colon, leading to this doc build warning: ./include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly' A similar issue for iio_trigger_set_immutable() in trigger.h yielded: ./include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev' ./include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig' Fix the formatting and silence the warnings. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-08-20	media: rc: rename RC_TYPE_* to RC_PROTO_* and RC_BIT_* to RC_PROTO_BIT_*	Sean Young
	RC_TYPE is confusing and it's just the protocol. So rename it. Suggested-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Sean Young <sean@mess.org> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: cec: fix remote control passthrough	Hans Verkuil
	The 'Press and Hold' operation was not correctly implemented, in particular the requirement that the repeat doesn't start until the second identical keypress arrives. The REP_DELAY value also had to be adjusted (see the comment in the code) to achieve the desired behavior. The 'enabled_protocols' field was also never set, fix that too. Since CEC is a fixed protocol the driver has to set this field. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: rc: simplify ir_raw_event_store_edge()	Sean Young
	Since commit 12749b198fa4 ("[media] rc: saa7134: add trailing space for timely decoding"), the workaround of inserting reset events is no longer needed. Note that the initial reset is not needed either; other rc-core drivers that don't use ir_raw_event_store_edge() never call this at all. Verified on a HVR-1150 and Raspberry Pi. Fixes: 3f5c4c73322e ("[media] rc: fix ghost keypresses with certain hw") Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: rc: add zx-irdec remote control driver	Shawn Guo
	It adds the remote control driver and corresponding keymap file for IRDEC block found on ZTE ZX family SoCs. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: rc: ir-nec-decoder: move scancode composing code into a shared function	Shawn Guo
	The NEC scancode composing and protocol type detection in ir_nec_decode() is generic enough to be a shared function. Let's create an inline function in rc-core.h, so that other remote control drivers can reuse this function to save some code. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: rc-core: rename input_name to device_name	Sean Young
	When an ir-spi is registered, you get this message. rc rc0: Unspecified device as /devices/platform/soc/3f215080.spi/spi_master/spi32766/spi32766.128/rc/rc0 "Unspecified device" refers to input_name, which makes no sense for IR TX only devices. So, rename to device_name. Also make driver_name const char* so that no casts are needed anywhere. Now ir-spi reports: rc rc0: IR SPI as /devices/platform/soc/3f215080.spi/spi_master/spi32766/spi32766.128/rc/rc0 Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: Convert to using %pOF instead of full_name	Rob Herring
	Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Songjun Wu <songjun.wu@microchip.com> Cc: Kukjin Kim <kgene@kernel.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Javier Martinez Canillas <javier@osg.samsung.com> Cc: Minghsiu Tsai <minghsiu.tsai@mediatek.com> Cc: Houlong Wei <houlong.wei@mediatek.com> Cc: Andrew-CT Chen <andrew-ct.chen@mediatek.com> Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Hyun Kwon <hyun.kwon@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-mediatek@lists.infradead.org Cc: linux-renesas-soc@vger.kernel.org Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: cec-pin: fix irq handling	Hans Verkuil
	The free_irq() function could be called from interrupt context, which is invalid. Move this to the thread. In the interrupt handler we just request that the thread disables the irq. This is done through an atomic so we don't need to add any spinlocks. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: cec: rename pin events/function	Hans Verkuil
	The CEC_EVENT_PIN_LOW/HIGH defines and the cec_queue_pin_event() function did not specify that these were about CEC pin events. Since in the future there will also be HPD pin events it is wise to rename the event defines and function to CEC_EVENT_PIN_CEC_LOW/HIGH and cec_queue_pin_cec_event() now before these become part of the ABI. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	media: v4l2-ctrls.h: better document the arguments for v4l2_ctrl_fill	Mauro Carvalho Chehab
	The arguments for this function are pointers. Make it clear at its documentation. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20	net/mlx5e: Add outbound PCI buffer overflow counter	Eran Ben Elisha
	Add outbound_pci_buffer_overflow to ethtool output for monitoring the number of packets that were dropped due to lack of PCIe buffers on receive path from NIC port toward the host(s). This counter is valid only in case that tx_overflow_buffer_pkt is supported in MCAM enhanced features. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20	net/mlx5: Add RX buffer fullness counters infrastructure	Gal Pressman
	Add capability bit in PCAM register and counters to PPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20	net/mlx5: Add PCIe outbound stalls counters infrastructure	Gal Pressman
	Add capability bit in MCAM register and counters to MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-19	bpf: linux/bpf.h needs linux/numa.h	David S. Miller
	Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19	bpf: Allow selecting numa node during map creation	Martin KaFai Lau
	The current map creation API does not allow to provide the numa-node preference. The memory usually comes from where the map-creation-process is running. The performance is not ideal if the bpf_prog is known to always run in a numa node different from the map-creation-process. One of the use case is sharding on CPU to different LRU maps (i.e. an array of LRU maps). Here is the test result of map_perf_test on the INNER_LRU_HASH_PREALLOC test if we force the lru map used by CPU0 to be allocated from a remote numa node: [ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ] ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec 4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec 3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec 6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec 2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec 1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec 7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec 0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<< After specifying numa node: ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec 3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec 1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec 6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec 2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec 4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec 7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec 0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<< This patch adds one field, numa_node, to the bpf_attr. Since numa node 0 is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node field is honored if and only if the BPF_F_NUMA_NODE flag is set. Numa node selection is not supported for percpu map. This patch does not change all the kmalloc. F.e. 'htab = kzalloc()' is not changed since the object is small enough to stay in the cache. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19	netfilter: rt: add support to fetch path mss	Florian Westphal
	to be used in combination with tcp option set support to mimic iptables TCPMSS --clamp-mss-to-pmtu. v2: Eric Dumazet points out dst must be initialized. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-19	netfilter: exthdr: tcp option set support	Florian Westphal
	This allows setting 2 and 4 byte quantities in the tcp option space. Main purpose is to allow native replacement for xt_TCPMSS to work around pmtu blackholes. Writes to kind and len are now allowed at the moment, it does not seem useful to do this as it causes corruption of the tcp option space. We can always lift this restriction later if a use-case appears. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-19	clk: sunxi-ng: support R40 SoC	Icenowy Zheng
	Allwinner R40 SoC have a clock controller module in the style of the SoCs beyond sun6i, however, it's more rich and complex. Add support for it. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2017-08-18	net: drop unused attribute argument from sysfs queue funcs	stephen hemminger
	The show and store functions don't need/use the attribute. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	net: constify net_ns_type_operations	stephen hemminger
	This can be const. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	net: constify netdev_class_file	stephen hemminger
	These functions are wrapper arount class_create_file which can take a const attribute. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	net: ethtool: Add macro to clear a link mode setting	Lendacky, Thomas
	There are currently macros to set and test an ETHTOOL_LINK_MODE_ setting, but not to clear one. Add a macro to clear an ETHTOOL_LINK_MODE_ setting. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	xdp: adjust xdp redirect tracepoint to include return error code	Jesper Dangaard Brouer
	The return error code need to be included in the tracepoint xdp:xdp_redirect, else its not possible to distinguish successful or failed XDP_REDIRECT transmits. XDP have no queuing mechanism. Thus, it is fairly easily to overrun a NIC transmit queue. The eBPF program invoking helpers (bpf_redirect or bpf_redirect_map) to redirect a packet doesn't get any feedback whether the packet was actually transmitted. Info on failed transmits in the tracepoint xdp:xdp_redirect, is interesting as this opens for providing a feedback-loop to the receiving XDP program. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	net: inet: diag: expose sockets cgroup classid	Levin, Alexander (Sasha Levin)
	This is useful for directly looking up a task based on class id rather than having to scan through all open file descriptors. Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	mm, oom: fix potential data corruption when oom_reaper races with writer	Michal Hocko
	Wenwei Tao has noticed that our current assumption that the oom victim is dying and never doing any visible changes after it dies, and so the oom_reaper can tear it down, is not entirely true. __task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set but do_group_exit sends SIGKILL to all threads _after_ the flag is set. So there is a race window when some threads won't have fatal_signal_pending while the oom_reaper could start unmapping the address space. Moreover some paths might not check for fatal signals before each PF/g-u-p/copy_from_user. We already have a protection for oom_reaper vs. PF races by checking MMF_UNSTABLE. This has been, however, checked only for kernel threads (use_mm users) which can outlive the oom victim. A simple fix would be to extend the current check in handle_mm_fault for all tasks but that wouldn't be sufficient because the current check assumes that a kernel thread would bail out after EFAULT from get_user*/copy_from_user and never re-read the same address which would succeed because the PF path has established page tables already. This seems to be the case for the only existing use_mm user currently (virtio driver) but it is rather fragile in general. This is even more fragile in general for more complex paths such as generic_perform_write which can re-read the same address more times (e.g. iov_iter_copy_from_user_atomic to fail and then iov_iter_fault_in_readable on retry). Therefore we have to implement MMF_UNSTABLE protection in a robust way and never make a potentially corrupted content visible. That requires to hook deeper into the PF path and check for the flag _every time_ before a pte for anonymous memory is established (that means all !VM_SHARED mappings). The corruption can be triggered artificially (http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp) but there doesn't seem to be any real life bug report. The race window should be quite tight to trigger most of the time. Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrea Argangeli <andrea@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18	mm: discard memblock data later	Pavel Tatashin
	There is existing use after free bug when deferred struct pages are enabled: The memblock_add() allocates memory for the memory array if more than 128 entries are needed. See comment in e820__memblock_setup(): * The bootstrap memblock region count maximum is 128 entries * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries * than that - so allow memblock resizing. This memblock memory is freed here: free_low_memory_core_early() We access the freed memblock.memory later in boot when deferred pages are initialized in this path: deferred_init_memmap() for_each_mem_pfn_range() __next_mem_pfn_range() type = &memblock.memory; One possible explanation for why this use-after-free hasn't been hit before is that the limit of INIT_MEMBLOCK_REGIONS has never been exceeded at least on systems where deferred struct pages were enabled. Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128, and verifying in qemu that this code is getting excuted and that the freed pages are sane. Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18	wait: add wait_event_killable_timeout()	Luis R. Rodriguez
	These are the few pending fixes I have queued up for v4.13-final. One is a a generic regression fix for recursive loops on kmod and the other one is a trivial print out correction. During the v4.13 development we assumed that recursive kmod loops were no longer possible. Clearly that is not true. The regression fix makes use of a new killable wait. We use a killable wait to be paranoid in how signals might be sent to modprobe and only accept a proper SIGKILL. The signal will only be available to userspace to issue iff a thread has already entered a wait state, and that happens only if we've already throttled after 50 kmod threads have been hit. Note that although it may seem excessive to trigger a failure afer 5 seconds if all kmod thread remain busy, prior to the series of changes that went into v4.13 we would actually always fatally fail any request which came in if the limit was already reached. The new waiting implemented in v4.13 actually gives us more breathing room -- the wait for 5 seconds is a wait for any kmod thread to finish. We give up and fail iff no kmod thread has finished and they're all running straight for 5 consecutive seconds. If 50 kmod threads are running consecutively for 5 seconds something else must be really bad. Recursive loops with kmod are bad but they're also hard to implement properly as a selftest without currently fooling current userspace tools like kmod [1]. For instance kmod will complain when you run depmod if it finds a recursive loop with symbol dependency between modules as such this type of recursive loop cannot go upstream as the modules_install target will fail after running depmod. These tests already exist on userspace kmod upstream though (refer to the testsuite/module-playground/mod-loop-*.c files). The same is not true if request_module() is used though, or worst if aliases are used. Likewise the issue with 64-bit kernels booting 32-bit userspace without a binfmt handler built-in is also currently not detected and proactively avoided by userspace kmod tools, or kconfig for all architectures. Although we could complain in the kernel when some of these individual recursive issues creep up, proactively avoiding these situations in userspace at build time is what we should keep striving for. Lastly, since recursive loops could happen with kmod it may mean recursive loops may also be possible with other kernel usermode helpers, this should be investigated and long term if we can come up with a more sensible generic solution even better! [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final [1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git This patch (of 3): This wait is similar to wait_event_interruptible_timeout() but only accepts SIGKILL interrupt signal. Other signals are ignored. Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: David Binderman <dcb314@hotmail.com> Cc: Matt Redfearn <matt.redfearn@imgetc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18	mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()	Johannes Weiner
	Jaegeuk and Brad report a NULL pointer crash when writeback ending tries to update the memcg stats: BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0 IP: test_clear_page_writeback+0x12e/0x2c0 [...] RIP: 0010:test_clear_page_writeback+0x12e/0x2c0 Call Trace: <IRQ> end_page_writeback+0x47/0x70 f2fs_write_end_io+0x76/0x180 [f2fs] bio_endio+0x9f/0x120 blk_update_request+0xa8/0x2f0 scsi_end_request+0x39/0x1d0 scsi_io_completion+0x211/0x690 scsi_finish_command+0xd9/0x120 scsi_softirq_done+0x127/0x150 __blk_mq_complete_request_remote+0x13/0x20 flush_smp_call_function_queue+0x56/0x110 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x27/0x40 call_function_single_interrupt+0x89/0x90 RIP: 0010:native_safe_halt+0x6/0x10 (gdb) l (test_clear_page_writeback+0x12e) 0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619). 614 mod_node_page_state(page_pgdat(page), idx, val); 615 if (mem_cgroup_disabled() \|\| !page->mem_cgroup) 616 return; 617 mod_memcg_state(page->mem_cgroup, idx, val); 618 pn = page->mem_cgroup->nodeinfo[page_to_nid(page)]; 619 this_cpu_add(pn->lruvec_stat->count[idx], val); 620 } 621 622 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t pgdat, int order, 623 gfp_t gfp_mask, The issue is that writeback doesn't hold a page reference and the page might get freed after PG_writeback is cleared (and the mapping is unlocked) in test_clear_page_writeback(). The stat functions looking up the page's node or zone are safe, as those attributes are static across allocation and free cycles. But page->mem_cgroup is not, and it will get cleared if we race with truncation or migration. It appears this race window has been around for a while, but less likely to trigger when the memcg stats were updated first thing after PG_writeback is cleared. Recent changes reshuffled this code to update the global node stats before the memcg ones, though, stretching the race window out to an extent where people can reproduce the problem. Update test_clear_page_writeback() to look up and pin page->mem_cgroup before clearing PG_writeback, then not use that pointer afterward. It is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()") but leaves the pageref-holding callsites that aren't affected alone. Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Jaegeuk Kim <jaegeuk@kernel.org> Tested-by: Jaegeuk Kim <jaegeuk@kernel.org> Reported-by: Bradley Bolen <bradleybolen@gmail.com> Tested-by: Brad Bolen <bradleybolen@gmail.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18	ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t	Eric Dumazet
	refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18	datagram: When peeking datagrams with offset < 0 don't skip empty skbs	Matthew Dawson
	Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove headers from UDP packets before queueing"), when udp packets are being peeked the requested extra offset is always 0 as there is no need to skip the udp header. However, when the offset is 0 and the next skb is of length 0, it is only returned once. The behaviour can be seen with the following python script: from socket import *; f=socket(AF_INET6, SOCK_DGRAM \| SOCK_NONBLOCK, 0); g=socket(AF_INET6, SOCK_DGRAM \| SOCK_NONBLOCK, 0); f.bind(('::', 0)); addr=('::1', f.getsockname()[1]); g.sendto(b'', addr) g.sendto(b'b', addr) print(f.recvfrom(10, MSG_PEEK)); print(f.recvfrom(10, MSG_PEEK)); Where the expected output should be the empty string twice. Instead, make sk_peek_offset return negative values, and pass those values to __skb_try_recv_datagram/__skb_try_recv_from_queue. If the passed offset to __skb_try_recv_from_queue is negative, the checked skb is never skipped. __skb_try_recv_from_queue will then ensure the offset is reset back to 0 if a peek is requested without an offset, unless no packets are found. Also simplify the if condition in __skb_try_recv_from_queue. If _off is greater then 0, and off is greater then or equal to skb->len, then (_off \|\| skb->len) must always be true assuming skb->len >= 0 is always true. Also remove a redundant check around a call to sk_peek_offset in af_unix.c, as it double checked if MSG_PEEK was set in the flags. V2: - Moved the negative fixup into __skb_try_recv_from_queue, and remove now redundant checks - Fix peeking in udp{,v6}_recvmsg to report the right value when the offset is 0 V3: - Marked new branch in __skb_try_recv_from_queue as unlikely. Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19	Merge tag 'reset-for-4.14' of git://git.pengutronix.de/git/pza/linux into ↵	Arnd Bergmann
	next/drivers Pull "Reset controller changes for v4.14" from Philipp Zabel: - constify zx2967 reset_ops - add a convenience API to manage an array of resets - let deassert report success and let assert report success for shared resets if the reset controller driver does not implement (de)assert. - add HSDKv1 reset driver - remove Gemini reset controller, the driver is made obsolete by a combined clock/reset driver in drivers/clk - fix the total number of reset lines in the sunxi driver - various uniphier updates and fixes: - remove sLD3 SoC support - simplify system reset register and bit definitions - add audio systems, video input subsystem, and analog amplifiers reset controls * tag 'reset-for-4.14' of git://git.pengutronix.de/git/pza/linux: reset: uniphier: add analog amplifiers reset control reset: uniphier: add video input subsystem reset control reset: uniphier: add audio systems reset control reset: sunxi: fix number of reset lines reset: uniphier: do not use per-SoC macro for system reset block reset: uniphier: remove sLD3 SoC support Revert "reset: Add a Gemini reset controller" ARC: reset: introduce HSDKv1 reset driver reset: make (de)assert report success for self-deasserting reset drivers reset: Add APIs to manage array of resets reset: zx2967: constify zx2967_reset_ops.
2017-08-18	Merge tag 'tegra-for-4.14-soc' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into next/drivers Pull "soc/tegra: Changes for v4.14-rc1" from Thierry Reding: Contains a fix for unbalanced reference counting of device tree nodes in the PMC-based generic power domains code. A second change moves the SoC device registration code from its old location in arch/arm/mach-tegra to drivers/soc/tegra so that it can be shared between 32-bit and 64-bit ARM Tegra SoCs. * tag 'tegra-for-4.14-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: Register SoC device soc/tegra: Fix bad of_node_put() in powergate init