git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-07-30	tcmu: free old string on reconfig	Bryant G. Ly
	On initial tcmu_configure_device call the info->name would have already been allocated and set, so on the second call make sure to free it first. Reported-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30	tcmu: Fix possible to/from address overflow when doing the memcpy	Xiubo Li
	For most case the sg->length equals to PAGE_SIZE, so this bug won't be triggered. Otherwise this will crash the kernel, for example when all segments' sg->length equal to 1K. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30	Linux 4.13-rc3v4.13-rc3	Linus Torvalds

2017-07-30	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - prevent the kernel from using the EFI reboot method when EFI is disabled. - two patches addressing clang issues" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Disable the address-of-packed-member compiler warning x86/efi: Fix reboot_mode when EFI runtime services are disabled x86/boot: #undef memcpy() et al in string.c
2017-07-30	Merge branch 'sched-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two patches addressing build warnings caused by inconsistent kernel doc comments" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/wait: Clean up some documentation warnings sched/core: Fix some documentation build warnings
2017-07-30	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A couple of fixes for performance counters and kprobes: - a series of small patches which make the uncore performance counters on Skylake server systems work correctly - add a missing instruction slot release to the failure path of kprobes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Release insn_slot in failure path perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs perf/x86/intel/uncore: Fix SKX CHA event extra regs perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask perf/x86/intel/uncore: Fix Skylake server PCU PMU event format perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
2017-07-30	Merge branch 'irq-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "Fix for a regression caused by the conversion of x86 to the generic hotplug code. Instead of doing a plain single line revert, this adds a pile of comments so the semantics of the force argument are clear" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
2017-07-30	iio: adc: stm32: fix common clock rate	Fabrice Gasnier
	ADC clock input is provided to internal prescaler (that decreases its frequency). It's then used as reference clock for conversions. - Fix common clock rate used then by stm32-adc sub-devices. Take common prescaler into account. Currently, rate is used to set "boost" mode. It may unnecessarily be set. This impacts power consumption. - Fix ADC max clock rate on STM32H7 (fADC from datasheet). Currently, prescaler may be set too low. This can result in ADC reference clock used for conversion to exceed max allowed clock frequency. Fixes: 95e339b6e85d ("iio: adc: stm32: add support for STM32H7") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING	Ian Abbott
	Comedi's read and write file operation handlers (`comedi_read()` and `comedi_write()`) currently call `copy_to_user()` or `copy_from_user()` whilst in the `TASK_INTERRUPTIBLE` state, which falls foul of the `might_fault()` checks when enabled. Fix it by setting the current task state back to `TASK_RUNNING` a bit earlier before calling these functions. Reported-by: Piotr Gregor <piotrgregor@rsyncme.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Cc: <stable@vger.kernel.org> # 4.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	tty: pl011: fix initialization order of QDF2400 E44	Timur Tabi
	The work-around for Qualcomm Technologies QDF2400 Erratum 44 hinges on a global variable defined in the pl011 driver. The ACPI SPCR parsing code determines whether the work-around is needed, and if so, it changes the console name from "pl011" to "qdf2400_e44". The expectation is that the pl011 driver will implement the work-around when it sees the console name. The global variable qdf2400_e44_present is set when that happens. The problem is that work-around needs to be enabled when the pl011 driver probes, not when the console name is queried. However, sbsa_probe() is called before pl011_console_match(). The work-around appeared to work previously because the default console on QDF2400 platforms was always ttyAMA1. The first time sbsa_probe() is called (for ttyAMA0), qdf2400_e44_present is still false. Then pl011_console_match() is called, and it sets qdf2400_e44_present to true. All subsequent calls to sbsa_probe() enable the work-around. The solution is to move the global variable into spcr.c and let the pl011 driver query it during probe time. This works because all QDF2400 platforms require SPCR, so parse_spcr() will always be called. pl011_console_match still checks for the "qdf2400_e44" console name, but it doesn't do anything else special. Fixes: 5a0722b898f8 ("tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44") Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	usb: musb: fix tx fifo flush handling again	Bin Liu
	commit 68fe05e2a451 ("usb: musb: fix tx fifo flush handling") drops the 1ms delay trying to solve the long disconnect time issue when application queued many tx urbs. However, the 1ms delay is needed for some use cases, for example, without the delay, reconnecting AR9271 WIFI dongle no longer works if the connection is dropped from the AP. So let's add back the 1ms delay in musb_h_tx_flush_fifo(), and solve the long disconnect time problem with a separate patch for usb_hcd_flush_endpoint(). Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	usb: core: unlink urbs from the tail of the endpoint's urb_list	Bin Liu
	While unlink an urb, if the urb has been programmed in the controller, the controller driver might do some hw related actions to tear down the urb. Currently usb_hcd_flush_endpoint() passes each urb from the head of the endpoint's urb_list to the controller driver, which could make the controller driver think each urb has been programmed and take the unnecessary actions for each urb. This patch changes the behavior in usb_hcd_flush_endpoint() to pass the urbs from the tail of the list, to avoid any unnecessary actions in an controller driver. Cc: stable@vger.kernel.org # v4.4+ Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	usb-storage: fix deadlock involving host lock and scsi_done	Alan Stern
	Christoph Hellwig says that since version 4.12, the kernel switched to using blk-mq by default. The old code used a softirq for handling request completions, but blk-mq can handle completions in the caller's context. This may cause a problem for usb-storage, because it invokes the ->scsi_done callback while holding the host lock, and the completion routine sometimes tries to acquire the same lock (when running the error handler, for example). The consequence is that the existing code will sometimes deadlock upon error completion of a SCSI command (with a lockdep warning). This is easy enough to fix, since usb-storage doesn't really need to hold the host lock while the callback runs. It was simpler to write it that way, but moving the call outside the locked region is pretty easy and there's no downside. That's what this patch does. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net> CC: Christoph Hellwig <hch@lst.de> CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069	Alan Swanson
	Similar to commit d595259fbb7a ("usb-storage: Add ignore-residue quirk for Initio INIC-3619") for INIC-3169 in unusual_devs.h but INIC-3069 already present in unusual_uas.h. Both in same controller IC family. Issue is that MakeMKV fails during key exchange with installed bluray drive with following error: 002004:0000 Error 'Scsi error - ILLEGAL REQUEST:COPY PROTECTION KEY EXCHANGE FAILURE - KEY NOT ESTABLISHED' occurred while issuing SCSI command AD010..080002400 to device 'SG:dev_11:0' Signed-off-by: Alan Swanson <reiver@improbability.net> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	USB: hcd: Mark secondary HCD as dead if the primary one died	Rafael J. Wysocki
	Make usb_hc_died() clear the HCD_FLAG_RH_RUNNING flag for the shared HCD and set HCD_FLAG_DEAD for it, in analogy with what is done for the primary one. Among other thigs, this prevents check_root_hub_suspended() from returning -EBUSY for dead HCDs which helps to work around system suspend issues in some situations. This actually fixes occasional suspend failures on one of my test machines. Suggested-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-30	iio: adc: ina219: Avoid underflow for sleeping time	Stefan Brüns
	Proper support for the INA219 lowered the minimum sampling period from 2140us to 284us. Subtracting 200us later leads to an underflow and an almost infinite udelay later. Using a signed int for the sampling period provides sufficient range (at most 286401024us), but catches the underflow when comparing with buffer_us. Fixes: 18edac2e22f4 ("iio: adc: Fix integration time/averaging for INA219/220") Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	iio: trigger: stm32-timer: add enable attribute	Fabrice Gasnier
	In order to use encoder mode, timers needs to be enabled (e.g. CEN bit) along with peripheral clock. Add IIO_CHAN_INFO_ENABLE attribute to handle this. Also, in triggered mode, CEN bit is set automatically in hardware. Then clock must be enabled before starting triggered mode. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	iio: trigger: stm32-timer: fix get/set down count direction	Fabrice Gasnier
	Fixes: 4adec7da0536 ("iio: stm32 trigger: Add quadrature encoder device") This fixes two issues: - stm32_set_count_direction: to set down direction - stm32_get_count_direction: to get down direction IIO core provides/expects value to be an index of iio_enum items array. This needs to be turned by these routines into TIM_CR1_DIR (e.g. BIT(4)) value. Also, report error when attempting to write direction, when in encoder mode: in this case, direction is read only (given by encoder inputs). Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	iio: trigger: stm32-timer: fix write_raw return value	Fabrice Gasnier
	Fixes: 4adec7da0536 ("iio: stm32 trigger: Add quadrature encoder device") IIO core expects zero as return value for write_raw() callback in case of success. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	iio: trigger: stm32-timer: fix quadrature mode get routine	Fabrice Gasnier
	Fixes: 4adec7da0536 ("iio: stm32 trigger: Add quadrature encoder device") SMS bitfiled is mode + 1. After reset, upon boot, SMS = 0. When reading from sysfs, stm32_get_quadrature_mode() returns -1 (e.g. -EPERM) which is wrong error code here. So, check SMS bitfiled matches valid encoder mode, or return -EINVAL. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	iio: bmp280: properly initialize device for humidity reading	Andreas Klinger
	If the device is not initialized at least once it happens that the humidity reading is skipped, which means the special value 0x8000 is delivered. For omitting this case the oversampling of the humidity must be set before the oversampling of the temperature und pressure is set as written in the datasheet of the BME280. Furthermore proper error detection is added in case a skipped value is read from the device. This is done also for pressure and temperature reading. Especially it don't make sense to compensate this value and treat it as regular value. Signed-off-by: Andreas Klinger <ak@it-klinger.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-07-30	ACPI: APD: Fix HID for Hisilicon Hip07/08	Hanjun Guo
	ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2, not HISI0A21/2, HISI02A1/2 was tested ok but was modified by the stupid typo when upstream the patches (by me), correct them to the right IDs (matching the IDs in drivers/i2c/busses/i2c-designware-platdrv.c). Fixes: 6e14cf361a0c (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller) Reported-by: Tao Tian <tiantao6@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-30	cpufreq: x86: Make scaling_cur_freq behave more as expected	Rafael J. Wysocki
	After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute in sysfs only behaves as expected on x86 with APERF/MPERF registers available when it is read from at least twice in a row. The value returned by the first read may not be meaningful, because the computations in there use cached values from the previous iteration of aperfmperf_snapshot_khz() which may be stale. To prevent that from happening, modify arch_freq_get_on_cpu() to call aperfmperf_snapshot_khz() twice, with a short delay between these calls, if the previous invocation of aperfmperf_snapshot_khz() was too far back in the past (specifically, more that 1s ago). Also, as pointed out by Doug Smythies, aperf_delta is limited now and the multiplication of it by cpu_khz won't overflow, so simplify the s->khz computations too. Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-29	bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len	Daniel Borkmann
	bpf_prog_size(prog->len) is not the correct length we want to dump back to user space. The code in bpf_prog_get_info_by_fd() uses this to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also includes the size of struct bpf_prog itself plus program instructions and is usually used either in context of accounting or for bpf_prog_alloc() et al, thus we copy out of bounds in bpf_prog_get_info_by_fd() potentially. Use the correct bpf_prog_insn_size() instead. Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	tcp: avoid bogus gcc-7 array-bounds warning	Arnd Bergmann
	When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a false-positive warning: net/ipv4/tcp_output.c: In function 'tcp_connect': net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ^~ net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ I have opened a gcc bug for this, but distros have already shipped compilers with this problem, and it's not clear yet whether there is a way for gcc to avoid the warning. As the problem is related to the bitfield access, this introduces a temporary variable to store the old enum value. I did not notice this warning earlier, since UBSAN is disabled when building with COMPILE_TEST, and that was always turned on in both allmodconfig and randconfig tests. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	Merge tag 'wireless-drivers-for-davem-2017-07-28' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.13 Two fixes for for brcmfmac, the crash was reported by two people already so it's a high priority fix. brcmfmac * fix a crash in skb headroom handling in v4.13-rc1 * fix a memory leak due to a merge error in v4.6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"	Colin Ian King
	Trivial fix to spelling mistake in printk message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	block, bfq: consider also in_service_entity to state whether an entity is active	Paolo Valente
	Groups of BFQ queues are represented by generic entities in BFQ. When a queue belonging to a parent entity is deactivated, the parent entity may need to be deactivated too, in case the deactivated queue was the only active queue for the parent entity. This deactivation may need to be propagated upwards if the entity belongs, in its turn, to a further higher-level entity, and so on. In particular, the upward propagation of deactivation stops at the first parent entity that remains active even if one of its child entities has been deactivated. To decide whether the last non-deactivation condition holds for a parent entity, BFQ checks whether the field next_in_service is still not NULL for the parent entity, after the deactivation of one of its child entity. If it is not NULL, then there are certainly other active entities in the parent entity, and deactivations can stop. Unfortunately, this check misses a corner case: if in_service_entity is not NULL, then next_in_service may happen to be NULL, although the parent entity is evidently active. This happens if: 1) the entity pointed by in_service_entity is the only active entity in the parent entity, and 2) according to the definition of next_in_service, the in_service_entity cannot be considered as next_in_service. See the comments on the definition of next_in_service for details on this second point. Hitting the above corner case causes crashes. To address this issue, this commit: 1) Extends the above check on only next_in_service to controlling both next_in_service and in_service_entity (if any of them is not NULL, then no further deactivation is performed) 2) Improves the (important) comments on how next_in_service is defined and updated; in particular it fixes a few rather obscure paragraphs Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net> Reported-by: Rick Yiu <rick_yiu@htc.com> Reported-by: Tom X Nguyen <tom81094@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net> Tested-by: Rick Yiu <rick_yiu@htc.com> Tested-by: Laurentiu Nicola <lnicola@dend.ro> Tested-by: Tom X Nguyen <tom81094@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	block, bfq: reset in_service_entity if it becomes idle	Paolo Valente
	BFQ implements hierarchical scheduling by representing each group of queues with a generic parent entity. For each parent entity, BFQ maintains an in_service_entity pointer: if one of the child entities happens to be in service, in_service_entity points to it. The resetting of these pointers happens only on queue expirations: when the in-service queue is expired, i.e., stops to be the queue in service, BFQ resets all in_service_entity pointers along the parent-entity path from this queue to the root entity. Functions handling the scheduling of entities assume, naturally, that in-service entities are active, i.e., have pending I/O requests (or, as a special case, even if they have no pending requests, they are expected to receive a new request very soon, with the scheduler idling the storage device while waiting for such an event). Unfortunately, the above resetting scheme of the in_service_entity pointers may cause this assumption to be violated. For example, the in-service queue may happen to remain without requests because of a request merge. In this case the queue does become idle, and all related data structures are updated accordingly. But in_service_entity still points to the queue in the parent entity. This inconsistency may even propagate to higher-level parent entities, if they happen to become idle as well, as a consequence of the leaf queue becoming idle. For this queue and parent entities, scheduling functions have an undefined behaviour, and, as reported, may easily lead to kernel crashes or hangs. This commit addresses this issue by simply resetting the in_service_entity field also when it is detected to point to an entity becoming idle (regardless of why the entity becomes idle). Reported-by: Laurentiu Nicola <lnicola@dend.ro> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Laurentiu Nicola <lnicola@dend.ro> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	bpf: don't indicate success when copy_from_user fails	Daniel Borkmann
	err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so user space can be notified of buggy behavior. Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	udp6: fix socket leak on early demux	Paolo Abeni
	When an early demuxed packet reaches __udp6_lib_lookup_skb(), the sk reference is retrieved and used, but the relevant reference count is leaked and the socket destructor is never called. Beyond leaking the sk memory, if there are pending UDP packets in the receive queue, even the related accounted memory is leaked. In the long run, this will cause persistent forward allocation errors and no UDP skbs (both ipv4 and ipv6) will be able to reach the user-space. Fix this by explicitly accessing the early demux reference before the lookup, and properly decreasing the socket reference count after usage. Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and the now obsoleted comment about "socket cache". The newly added code is derived from the current ipv4 code for the similar path. v1 -> v2: fixed the __udp6_lib_rcv() return code for resubmission, as suggested by Eric Reported-by: Sam Edwards <CFSworks@gmail.com> Reported-by: Marc Haber <mh+netdev@zugschlus.de> Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	net: thunderx: Fix BGX transmit stall due to underflow	Sunil Goutham
	For SGMII/RGMII/QSGMII interfaces when physical link goes down while traffic is high is resulting in underflow condition being set on that specific BGX's LMAC. Which assets a backpresure and VNIC stops transmitting packets. This is due to BGX being disabled in link status change callback while packet is in transit. This patch fixes this issue by not disabling BGX but instead just disables packet Rx and Tx. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	Revert "vhost: cache used event for better performance"	Jason Wang
	This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. The assumption was that guest won't move the event idx back, but this could happen in fact when 16 bit index wraps around after 64K entries. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	Merge tag 'mlx5-fixes-2017-07-27-V2' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2017-07-27 This series contains some misc fixes to the mlx5 driver. Please pull and let me know if there's any problem. V1->V2: - removed redundant braces for -stable: 4.7 net/mlx5: Fix command bad flow on command entry allocation failure 4.9 net/mlx5: Consider tx_enabled in all modes on remap net/mlx5e: Fix outer_header_zero() check size 4.10 net/mlx5: Fix mlx5_add_flow_rules call with correct num of dests 4.11 net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size net/mlx5e: Add field select to MTPPS register net/mlx5e: Fix broken disable 1PPS flow net/mlx5e: Change 1PPS out scheme net/mlx5e: Add missing support for PTP_CLK_REQ_PPS request net/mlx5e: Fix wrong delay calculation for overflow check scheduling net/mlx5e: Schedule overflow check work to mlx5e workqueue 4.12 net/mlx5: Fix command completion after timeout access invalid structure net/mlx5e: IPoIB, Modify add/remove underlay QPN flows I hope this is not too much, but most of the patches do apply cleanly on -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	team: use a larger struct for mac address	WANG Cong
	IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len, but in many places especially bonding, we use struct sockaddr to copy and set mac addr, this could lead to stack out-of-bounds access. Fix it by using a larger address storage like bonding. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	net: check dev->addr_len for dev_set_mac_address()	WANG Cong
	Historically, dev_ifsioc() uses struct sockaddr as mac address definition, this is why dev_set_mac_address() accepts a struct sockaddr pointer as input but now we have various types of mac addresse whose lengths are up to MAX_ADDR_LEN, longer than struct sockaddr, and saved in dev->addr_len. It is too late to fix dev_ifsioc() due to API compatibility, so just reject those larger than sizeof(struct sockaddr), otherwise we would read and use some random bytes from kernel stack. Fortunately, only a few IPv6 tunnel devices have addr_len larger than sizeof(struct sockaddr) and they don't support ndo_set_mac_addr(). But with team driver, in lb mode, they can still be enslaved to a team master and make its mac addr length as the same. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29	blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags	Jens Axboe
	We know we're in process context, so don't bother using the IRQ safe versions of the spin lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	block: DAC960: shut up format-overflow warning	Arnd Bergmann
	gcc-7 points out that a large controller number would overflow the string length for the procfs name and the firmware version string: drivers/block/DAC960.c: In function 'DAC960_Probe': drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration': drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=] drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255] drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12 Both of these seem appropriately sized, and using snprintf() instead of sprintf() improves this by ensuring that even incorrect data won't cause undefined behavior here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	block: use standard blktrace API to output cgroup info for debug notes	Shaohua Li
	Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgroup' option enabled. cgroup info isn't output as a string by default too, we only do this with 'blk_cgname' option enabled. Also cgroup info is output in different position of the note string. I think these behavior changes aren't a big issue (actually we make trace data shorter which is good), since the blktrace note is solely for debugging. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	blktrace: add an option to allow displaying cgroup path	Shaohua Li
	By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.015252: 8,0 /test/level D R 24 + 8 [dd] Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	block: always attach cgroup info into bio	Shaohua Li
	blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup info. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	blktrace: export cgroup info in trace	Shaohua Li
	Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always comes from flusher thread but the BIOs are for different blk cgroups. Request could be requeued and dispatched from completely different tasks. MD/DM are another examples. This patch tries to fix the gap. We print out cgroup fhandle info in blktrace. Userspace can use open_by_handle_at() syscall to find the cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to find fhandle for a cgroup and use a BPF program to filter out blktrace for a specific cgroup. We add a new 'blk_cgroup' trace option for blk tracer. It's default off. Application which doesn't know the new option isn't affected. When it's on, we output fhandle info right after blk_io_trace with an extra bit set in event action. So from application point of view, blktrace with the option will output new actions. I didn't change blk trace event yet, since I'm not sure if changing the trace event output is an ABI issue. If not, I'll do it later. Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	cgroup: export fhandle info for a cgroup	Shaohua Li
	Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inode number and generation number just like what generic_fh_to_dentry does. And we can avoid the overhead of getting an inode too, since kernfs_node_id (ino and generation) has all the info required. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: add exportfs operations	Shaohua Li
	Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for the cgroup. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: introduce kernfs_node_id	Shaohua Li
	inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: don't set dentry->d_fsdata	Shaohua Li
	When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode->i_private already points to kernfs_node, and we can get inode from a dentry. So this patch just delete the d_fsdata usage. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: add an API to get kernfs node from inode number	Shaohua Li
	Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter out stale kernfs nodes. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: implement i_generation	Shaohua Li
	Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number allocation wraps, we increase generation number. In this way generation/inode number consist of a 64-bit number which is unlikely duplicated. This does make the idr tree more sparse and waste some memory. Since idr manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6 bits of the key. In a 100k inode kernfs, the worst case will have around 300k radix tree node. Each node is 576bytes, so the tree will use about ~150M memory. Sounds not too bad, if this really is a problem, we should find better data structure. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	kernfs: use idr instead of ida to manage inode number	Shaohua Li
	kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29	ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk	Geert Uytterhoeven
	Simon Horman reported that Koelsch and Lager hang during boot, and bisected this to commit 1c3c5eab171590f8 ("sched/core: Enable might_sleep() and smp_processor_id() checks early"). The da9063/da9210 regulator quirk for R-Car Gen2 boards uses a bus notifier, and unregisters the notifier when it is no longer needed. However, a notifier must not be unregistered from within the call chain. This bug went unnoticed, as blocking_notifier_chain_unregister() didn't take the semaphore during early boot. The aforementioned commit changed that behavior, leading to a deadlock. Fix this by removing the call to bus_unregister_notifier(), and keeping local completion state instead. Reported-by: Simon Horman <horms+renesas@verge.net.au> Fixes: 663fbb52159cca6f ("ARM: shmobile: R-Car Gen2: Add da9063/da9210 regulator quirk") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>