linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-01-29	Merge tag 'acpi-4.16-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The majority of this is an update of the ACPICA kernel code to upstream revision 20171215 with a cosmetic change and a maintainers information update on top of it. The rest is mostly some minor fixes and cleanups in the ACPI drivers and cleanups to initialization on x86. Specifics: - Update the ACPICA kernel code to upstream revision 20171215 including: * Support for ACPI 6.0A changes in the NFIT table (Bob Moore) * Local 64-bit divide in string conversions (Bob Moore) * Fix for a regression in acpi_evaluate_object_type() (Bob Moore) * Fixes for memory leaks during package object resolution (Bob Moore) * Deployment of safe version of strncpy() (Bob Moore) * Debug and messaging updates (Bob Moore) * Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob Moore) * Null pointer dereference avoidance in Op and cleanups (Colin Ian King) * Fix for memory leak from building prefixed pathname (Erik Schmauss) * Coding style fixes, disassembler and compiler updates (Hanjun Guo, Erik Schmauss) * Additional PPTT flags from ACPI 6.2 (Jeremy Linton) * Fix for an off-by-one error in acpi_get_timer_duration() (Jung-uk Kim) * Infinite loop detection timeout and utilities cleanups (Lv Zheng) * Windows 10 version 1607 and 1703 OSI strings (Mario Limonciello) - Update ACPICA information in MAINTAINERS to reflect the current status of ACPICA maintenance and rename a local variable in one function to match the corresponding upstream code (Rafael Wysocki) - Clean up ACPI-related initialization on x86 (Andy Shevchenko) - Add support for Intel Merrifield to the ACPI GPIO code (Andy Shevchenko) - Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav) - Fix the ACPI Generic Event Device (GED) driver to free IRQs on shutdown and clean up the PCI IRQ Link driver (Sinan Kaya) - Make the GHES code call into the AER driver on all errors and clean up the ACPI APEI code (Colin Ian King, Tyler Baicar) - Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao Kulkarni) - Add a lid switch blacklist to the ACPI button driver and make it print extra debug messages on lid events (Hans de Goede) - Add quirks for Asus GL502VSK and UX305LA to the ACPI battery driver and clean it up somewhat (Bjørn Mork, Kai-Heng Feng) - Add device link for CHT SD card dependency on I2C to the ACPI LPSS (Intel SoCs) driver and make it avoid creating platform device objects for devices without MMIO resources (Adrian Hunter, Hans de Goede) - Fix the ACPI GPE mask kernel command line parameter handling (Prarit Bhargava) - Fix the handling of (incorrectly exposed) backlight interfaces without LCD (Hans de Goede) - Fix the usage of debugfs_create_() in the ACPI EC driver (Geert Uytterhoeven)" tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits) ACPI/PCI: pci_link: reduce verbosity when IRQ is enabled ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources ACPI / PMIC: Convert to use builtin_platform_driver() macro ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq() ACPICA: Update version to 20171215 ACPICA: trivial style fix, no functional change ACPICA: Fix a couple memory leaks during package object resolution ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings ACPICA: DT compiler: prevent error if optional field at the end of table is not present ACPICA: Rename a global variable, no functional change ACPICA: Create and deploy safe version of strncpy ACPICA: Cleanup the global variables and update comments ACPICA: Debugger: fix slight indentation issue ACPICA: Fix a regression in the acpi_evaluate_object_type() interface ACPICA: Update for a few debug output statements ACPICA: Debug output, no functional change ACPI: EC: Fix debugfs_create_*() usage ACPI / video: Default lcd_only to true on Win8-ready and newer machines ACPI / x86: boot: Don't setup SCI on HW-reduced platforms ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi ...
2018-01-29	Merge tag 'pm-4.16-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "This includes some infrastructure changes in the PM core, mostly related to integration between runtime PM and system-wide suspend and hibernation, plus some driver changes depending on them and fixes for issues in that area which have become quite apparent recently. Also included are changes making more x86-based systems use the Low Power Sleep S0 _DSM interface by default, which turned out to be necessary to handle power button wakeups from suspend-to-idle on Surface Pro3. On the cpufreq front we have fixes and cleanups in the core, some new hardware support, driver updates and the removal of some unused code from the CPU cooling thermal driver. Apart from this, the Operating Performance Points (OPP) framework is prepared to be used with power domains in the future and there is a usual bunch of assorted fixes and cleanups. Specifics: - Define a PM driver flag allowing drivers to request that their devices be left in suspend after system-wide transitions to the working state if possible and add support for it to the PCI bus type and the ACPI PM domain (Rafael Wysocki). - Make the PM core carry out optimizations for devices with driver PM flags set in some cases and make a few drivers set those flags (Rafael Wysocki). - Fix and clean up wrapper routines allowing runtime PM device callbacks to be re-used for system-wide PM, change the generic power domains (genpd) framework to stop using those routines incorrectly and fix up a driver depending on that behavior of genpd (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven). - Fix and clean up the PM core's device wakeup framework and re-factor system-wide PM core code related to device wakeup (Rafael Wysocki, Ulf Hansson, Brian Norris). - Make more x86-based systems use the Low Power Sleep S0 _DSM interface by default (to fix power button wakeup from suspend-to-idle on Surface Pro3) and add a kernel command line switch to tell it to ignore the system sleep blacklist in the ACPI core (Rafael Wysocki). - Fix a race condition related to cpufreq governor module removal and clean up the governor management code in the cpufreq core (Rafael Wysocki). - Drop the unused generic code related to the handling of the static power energy usage model in the CPU cooling thermal driver along with the corresponding documentation (Viresh Kumar). - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh Cheng). - Add a new operating point to the imx6ul and imx6q cpufreq drivers and switch the latter to using clk_bulk_get() (Anson Huang, Dong Aisheng). - Add support for multiple regulators to the TI cpufreq driver along with a new DT binding related to that and clean up that driver somewhat (Dave Gerlach). - Fix a powernv cpufreq driver regression leading to incorrect CPU frequency reporting, fix that driver to deal with non-continguous P-states correctly and clean it up (Gautham Shenoy, Shilpasri Bhat). - Add support for frequency scaling on Armada 37xx SoCs through the generic DT cpufreq driver (Gregory CLEMENT). - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT). - Fix a transition delay setting regression in the longhaul cpufreq driver (Viresh Kumar). - Add Skylake X (server) support to the intel_pstate cpufreq driver and clean up that driver somewhat (Srinivas Pandruvada). - Clean up the cpufreq statistics collection code (Viresh Kumar). - Drop cluster terminology and dependency on physical_package_id from the PSCI driver and drop dependency on arm_big_little from the SCPI cpufreq driver (Sudeep Holla). - Add support for system-wide suspend and resume to the RAPL power capping driver and drop a redundant semicolon from it (Zhen Han, Luis de Bethencourt). - Make SPI domain validation (in the SCSI SPI transport driver) and system-wide suspend mutually exclusive as they rely on the same underlying mechanism and cannot be carried out at the same time (Bart Van Assche). - Fix the computation of the amount of memory to preallocate in the hibernation core and clean up one function in there (Rainer Fiebig, Kyungsik Lee). - Prepare the Operating Performance Points (OPP) framework for being used with power domains and clean up one function in it (Viresh Kumar, Wei Yongjun). - Clean up the generic sysfs interface for device PM (Andy Shevchenko). - Fix several minor issues in power management frameworks and clean them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven, Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker, Sergey Senozhatsky, gaurav jindal). - Make it easier to disable PM via Kconfig (Mark Brown). - Clean up the cpupower and intel_pstate_tracer utilities (Doug Smythies, Laura Abbott)" * tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) PCI / PM: Remove spurious semicolon cpufreq: scpi: remove arm_big_little dependency drivers: psci: remove cluster terminology and dependency on physical_package_id powercap: intel_rapl: Fix trailing semicolon dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit PM / runtime: Allow no callbacks in pm_runtime_force_suspend\|resume() PM / hibernate: Drop unused parameter of enough_swap PM / runtime: Check ignore_children in pm_runtime_need_not_resume() PM / runtime: Rework pm_runtime_force_suspend/resume() PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin cpufreq: intel_pstate: Add Skylake servers support cpufreq: intel_pstate: Replace bxt_funcs with core_funcs platform/x86: surfacepro3: Support for wakeup from suspend-to-idle ACPI / PM: Use Low Power S0 Idle on more systems PM / wakeup: Print warn if device gets enabled as wakeup source during sleep PM / domains: Don't skip driver's ->suspend\|resume_noirq() callbacks PM / core: Propagate wakeup_path status flag in __device_suspend_late() PM / core: Re-structure code for clearing the direct_complete flag powercap: add suspend and resume mechanism for SOC power limit ...
2018-01-29	Merge branch 'net_sched-reflect-tx_queue_len-change-for-pfifo_fast'	David S. Miller
	Cong Wang says: ==================== net_sched: reflect tx_queue_len change for pfifo_fast This pathcset restores the pfifo_fast qdisc behavior of dropping packets based on latest dev->tx_queue_len. Patch 1 introduces a helper, patch 2 introduces a new Qdisc ops which is called when we modify tx_queue_len, patch 3 implements this ops for pfifo_fast. Please see each patch for details. --- v3: use skb_array_resize_multiple() v2: handle error case for ->change_tx_queue_len() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	net_sched: implement ->change_tx_queue_len() for pfifo_fast	Cong Wang
	pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len, so we have to resize skb array when we change tx_queue_len. Other qdiscs which read tx_queue_len are fine because they all save it to sch->limit or somewhere else in qdisc during init. They don't have to implement this, it is nicer if they do so that users don't have to re-configure qdisc after changing tx_queue_len. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	net_sched: plug in qdisc ops change_tx_queue_len	Cong Wang
	Introduce a new qdisc ops ->change_tx_queue_len() so that each qdisc could decide how to implement this if it wants. Previously we simply read dev->tx_queue_len, after pfifo_fast switches to skb array, we need this API to resize the skb array when we change dev->tx_queue_len. To avoid handling race conditions with TX BH, we need to deactivate all TX queues before change the value and bring them back after we are done, this also makes implementation easier. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	net: introduce helper dev_change_tx_queue_len()	Cong Wang
	This patch promotes the local change_tx_queue_len() to a core helper function, dev_change_tx_queue_len(), so that rtnetlink and net-sysfs could share the code. This also prepares for the following patch. Note, the -EFAULT in the original code doesn't make sense, we should propagate the errno from notifiers. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	Merge tag 'sound-4.16-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "The major changes in the core API side in this cycle are the still on-going ASoC componentization works. Other than that, only few small changes such as 20bit PCM format support are found. Meanwhile the rest majority of changes are for ASoC drivers: - Large cleanups of some of the TI CODEC drivers - Continued work on Intel ASoC stuff for new quirks, ACPI GPIO handling, Kconfigs and lots of cleanups - Refactoring of the Freescale SSI driver, as preliminary work for the upcoming changes - Work on ST DFSDM driver, including the required IIO patches - New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and TAS6424 devices - Removal of dead codes for SN95031 and board drivers Last but not least, a few HD-audio and USB-audio quirks are included as usual, too" * tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits) ALSA: hda - Reduce the suspend time consumption for ALC256 ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup IIO: ADC: stm32-dfsdm: remove unused variable again ASoC: bcm2835: fix hw_params error when device is in prepared state ASoC: mxs-sgtl5000: Do not print error on probe deferral ASoC: sgtl5000: Do not print error on probe deferral ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON ALSA: usb-audio: Support changing input on Sound Blaster E1 ASoC: Intel: remove second duplicated assignment to pointer 'res' ALSA: hda/realtek - update ALC215 depop optimize ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289 ALSA: pcm: Fix trailing semicolon ASoC: add Component level .read/.write ASoC: cx20442: fix regression by adding back .read/.write ASoC: uda1380: fix regression by adding back .read/.write ASoC: tlv320dac33: fix regression by adding back .read/.write ALSA: hda - Use IS_REACHABLE() for dependency on input IIO: ADC: stm32-dfsdm: fix static check warning IIO: ADC: stm32-dfsdm: code optimization ...
2018-01-29	vhost_net: stop device during reset owner	Jason Wang
	We don't stop device before reset owner, this means we could try to serve any virtqueue kick before reset dev->worker. This will result a warn since the work was pending at llist during owner resetting. Fix this by stopping device during owner reset. Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	Merge branch 'net-Ease-to-follow-an-interface-that-moves-to-another-netns'	David S. Miller
	Nicolas Dichtel says: ==================== net: Ease to follow an interface that moves to another netns The goal of this series is to ease the user to follow an interface that moves to another netns. After this series, with a patched iproute2: $ ip netns bar foo $ ip monitor link & $ ip link set dummy0 netns foo Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6 => new nsid: 0, new ifindex: 6 (was 5 in the previous netns) $ ip link set eth1 netns bar Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3 => new nsid: 1, new ifindex: 3 (same ifindex) $ ip netns bar (id: 1) foo (id: 0) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	dev: advertise the new ifindex when the netns iface changes	Nicolas Dichtel
	The goal is to let the user follow an interface that moves to another netns. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	dev: always advertise the new nsid when the netns iface changes	Nicolas Dichtel
	The user should be able to follow any interface that moves to another netns. There is no reason to hide physical interfaces. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	net: ethernet: cavium: Correct Cavium Thunderx NIC driver names accordingly ↵	Vadim Lomovtsev
	to module name It was found that ethtool provides unexisting module name while it queries the specified network device for associated driver information. Then user tries to unload that module by provided module name and fails. This happens because ethtool reads value of DRV_NAME macro, while module name is defined at the driver's Makefile. This patch is to correct Cavium CN88xx Thunder NIC driver names (DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic' to 'nicpf', sync bgx and xcv driver names accordingly to their module names. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	Merge tag 'init_task-20180117' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull init_task initializer cleanups from David Howells: "It doesn't seem useful to have the init_task in a header file rather than in a normal source file. We could consolidate init_task handling instead and expand out various macros. Here's a series of patches that consolidate init_task handling: (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and openrisc. (2) Alter the INIT_TASK_DATA linker script macro to set init_thread_union and init_stack rather than defining these in C. Insert init_task and init_thread_into into the init_stack area in the linker script as appropriate to the configuration, with different section markers so that they end up correctly ordered. We can then get merge ia64's init_task.c into the main one. We then have a bunch of single-use INIT_() macros that seem only to be macros because they used to be used per-arch. We can then expand these in place of the user and get rid of a few lines and a lot of backslashes. (3) Expand INIT_TASK() in place. (4) Expand in place various small INIT_() macros that are defined conditionally. Expand them and surround them by #if[n]def/#endif in the .c file as it takes fewer lines. (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place. (6) Expand INIT_STRUCT_PID in place. These macros can then be discarded" * tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: Expand INIT_STRUCT_PID and remove Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove Expand various INIT_* macros and remove Expand INIT_TASK() in init/init_task.c and remove Construct init thread stack in the linker script rather than by union openrisc: Make THREAD_SIZE available to vmlinux.lds hexagon: Make THREAD_SIZE available to vmlinux.lds cris: Make THREAD_SIZE available to vmlinux.lds
2018-01-29	Merge branch 'ptr_ring-fixes'	David S. Miller
	Michael S. Tsirkin says: ==================== ptr_ring fixes This fixes a bunch of issues around ptr_ring use in net core. One of these: "tap: fix use-after-free" is also needed on net, but can't be backported cleanly. I will post a net patch separately. Lightly tested - Jason, could you pls confirm this addresses the security issue you saw with ptr_ring? Testing reports would be appreciated too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2018-01-29	tools/virtio: fix smp_mb on x86	Michael S. Tsirkin
	Offset 128 overlaps the last word of the redzone. Use 132 which is always beyond that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	tools/virtio: copy READ/WRITE_ONCE	Michael S. Tsirkin
	This is to make ptr_ring test build again. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	tools/virtio: more stubs to fix tools build	Michael S. Tsirkin
	Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	tools/virtio: switch to __ptr_ring_empty	Michael S. Tsirkin
	We don't rely on lockless guarantees, but it seems cleaner than inverting __ptr_ring_peek. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	ptr_ring: prevent queue load/store tearing	Michael S. Tsirkin
	In theory compiler could tear queue loads or stores in two. It does not seem to be happening in practice but it seems easier to convert the cases where this would be a problem to READ/WRITE_ONCE than worry about it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	skb_array: use __ptr_ring_empty	Michael S. Tsirkin
	__skb_array_empty should use __ptr_ring_empty since that's the only legal lockless function. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	Revert "net: ptr_ring: otherwise safe empty checks can overrun array bounds"	Michael S. Tsirkin
	This reverts commit bcecb4bbf88aa03171c30652bca761cf27755a6b. If we try to allocate an extra entry as the above commit did, and when the requested size is UINT_MAX, addition overflows causing zero size to be passed to kmalloc(). kmalloc then returns ZERO_SIZE_PTR with a subsequent crash. Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	ptr_ring: disallow lockless __ptr_ring_full	Michael S. Tsirkin
	Similar to bcecb4bbf88a ("net: ptr_ring: otherwise safe empty checks can overrun array bounds") a lockless use of __ptr_ring_full might cause an out of bounds access. We can fix this, but it's easier to just disallow lockless __ptr_ring_full for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	tap: fix use-after-free	Michael S. Tsirkin
	Lockless access to __ptr_ring_full is only legal if ring is never resized, otherwise it might cause use-after free errors. Simply drop the lockless test, we'll drop the packet a bit later when produce fails. Fixes: 362899b8 ("macvtap: switch to use skb array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	ptr_ring: READ/WRITE_ONCE for __ptr_ring_empty	Michael S. Tsirkin
	Lockless __ptr_ring_empty requires that consumer head is read and written at once, atomically. Annotate accordingly to make sure compiler does it correctly. Switch locked callers to __ptr_ring_peek which does not support the lockless operation. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	ptr_ring: clean up documentation	Michael S. Tsirkin
	The only function safe to call without locks is __ptr_ring_empty. Move documentation about lockless use there to make sure people do not try to use __ptr_ring_peek outside locks. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	ptr_ring: keep consumer_head valid at all times	Michael S. Tsirkin
	The comment near __ptr_ring_peek says: * If ring is never resized, and if the pointer is merely * tested, there's no need to take the lock - see e.g. __ptr_ring_empty. but this was in fact never possible since consumer_head would sometimes point outside the ring. Refactor the code so that it's always pointing within a ring. Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	GFS2: Don't try to end a non-existent transaction in unlink	Bob Peterson
	Before this patch, if function gfs2_unlink failed to get a valid transaction (for example, not enough journal blocks) it would go to label out_end_trans which did gfs2_trans_end. But if the trans_begin failed, there's no transaction to end, and trying to do so results in: kernel BUG at fs/gfs2/trans.c:117! This patch changes the goto so that it does not try to end a non-existent transaction. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29	ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only	Martin KaFai Lau
	If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it implicitly implies it is an ipv6only socket. However, in inet6_bind(), this addr_type checking and setting sk->sk_ipv6only to 1 are only done after sk->sk_prot->get_port(sk, snum) has been completed successfully. This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses the 'get_port()'. In particular, when binding SO_REUSEPORT UDP sockets, udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock() checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be 1 while ipv6_only_sock(sk) is still 0 here. The end result is, reuseport_alloc(sk) is called instead of adding sk to the existing sk2->sk_reuseport_cb. It can be reproduced by binding two SO_REUSEPORT UDP sockets on an IPv6 address (!ANY and !MAPPED). Only one of the socket will receive packet. The fix is to set the implicit sk_ipv6only before calling get_port(). The original sk_ipv6only has to be saved such that it can be restored in case get_port() failed. The situation is similar to the inet_reset_saddr(sk) after get_port() has failed. Thanks to Calvin Owens <calvinowens@fb.com> who created an easy reproduction which leads to a fix. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	Merge branch 'rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_DELLINK-RTM_SETINK'	David S. Miller
	Christian Brauner says: ==================== rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK Based on the previous discussion this enables passing a IFLA_IF_NETNSID property along with RTM_SETLINK and RTM_DELLINK requests. The patch for RTM_NEWLINK will be sent out in a separate patch since there are more corner-cases to think about. Changelog 2018-01-24: * Preserve old behavior and report -ENODEV when either ifindex or ifname is provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK	Christian Brauner
	- Backwards Compatibility: If userspace wants to determine whether RTM_DELLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_DELLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK	Christian Brauner
	- Backwards Compatibility: If userspace wants to determine whether RTM_SETLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_SETLINK. Userpace should then fallback to other means. To retain backwards compatibility the kernel will first check whether a IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either one is found it will be used to identify the target network namespace. This implies that users who do not care whether their running kernel supports IFLA_IF_NETNSID with RTM_SETLINK can pass both IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network namespace. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	rtnetlink: enable IFLA_IF_NETNSID in do_setlink()	Christian Brauner
	RTM_{NEW,SET}LINK already allow operations on other network namespaces by identifying the target network namespace through IFLA_NET_NS_{FD,PID} properties. This is done by looking for the corresponding properties in do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID property. This introduces no functional changes since all callers of do_setlink() currently block IFLA_IF_NETNSID by reporting an error before they reach do_setlink(). This introduces the helpers: static struct net rtnl_link_get_net_by_nlattr(struct net src_net, struct nlattr tb[]) static struct net rtnl_link_get_net_capable(const struct sk_buff skb, struct net src_net, struct nlattr tb[], int cap) to simplify permission checks and target network namespace retrieval for RTM_ requests that already support IFLA_NET_NS_{FD,PID} but get extended to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look for IFLA_NET_NS_{FD,PID} properties first before checking for IFLA_IF_NETNSID. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29	xfs: remove experimental tag for reflinks	Christoph Hellwig
	But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: port to 4.16] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29	xfs: don't screw up direct writes when freesp is fragmented	Darrick J. Wong
	xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: check reflink allocation mappings	Darrick J. Wong
	There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write can return a zero error code but no mappings. This happens if there's an extent size hint (which causes allocation requests to be rounded to extsz granularity internally), but there wasn't a big enough chunk of free space to start filling at the extsz granularity and fill even one block of the range that we actually requested. In any case, if we got no mappings we can't possibly do anything useful with the contents of imap, so we must bail out with ENOSPC here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	iomap: warn on zero-length mappings	Darrick J. Wong
	Don't let the iomap callback get away with feeding us a garbage zero length mapping -- there was a bug in xfs that resulted in those leaking out to hilarious effect. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: treat CoW fork operations as delalloc for quota accounting	Darrick J. Wong
	Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: only grab shared inode locks for source file during reflink	Darrick J. Wong
	Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes	Darrick J. Wong
	Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: reflink should break pnfs leases before sharing blocks	Darrick J. Wong
	Before we share blocks between files, we need to break the pnfs leases on the layout before we start slicing and dicing the block map. The structure of this function sets us up for the lock contention reduction in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: don't clobber inobt/finobt cursors when xref with rmap	Darrick J. Wong
	Even if we can't use the inobt/finobt cursors to count the number of inode btree blocks, we are never allowed to clobber the cursor of the btree being checked, so don't do this. Found by fuzzing level = ones in xfs/364. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: skip CoW writes past EOF when writeback races with truncate	Darrick J. Wong
	Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks when running fsstress, as we do in generic/269. The cause of this is writeback racing with truncate -- writeback doesn't take the iolock, so truncate can sneak in to decrease i_size and truncate page cache while writeback is gathering buffer heads to schedule writeout. If we hit this race on a block that has a CoW mapping, we'll get a valid imap from the CoW fork but the reduced i_size trims the mapping to zero length (which makes it invalid), so we call xfs_map_blocks to try again. This doesn't do much anyway, since any mapping we get out of that will also be invalid, so we might as well skip the assert and just stop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: preserve i_rdev when recycling a reclaimable inode	Amir Goldstein
	Commit 66f364649d870 ("xfs: remove if_rdev") moved storing of rdev value for special inodes to VFS inodes, but forgot to preserve the value of i_rdev when recycling a reclaimable xfs_inode. This was detected by xfstest overlay/017 with inodex=on mount option and xfs base fs. The test does a lookup of overlay chardev and blockdev right after drop caches. Overlayfs inodes hold a reference on underlying xfs inodes when mount option index=on is configured. If drop caches reclaim xfs inodes, before it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs inode and that test hits that case quite often. When that happens, the xfs inode cache remains broken (zere i_rdev) until the next cycle mount or drop caches. Fixes: 66f364649d870 ("xfs: remove if_rdev") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29	xfs: refactor accounting updates out of xfs_bmap_btalloc	Darrick J. Wong
	Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-29	xfs: refactor inode verifier corruption error printing	Darrick J. Wong
	Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: make tracepoint inode number format consistent	Darrick J. Wong
	Fix all the inode number formats to be consistently (0x%llx) in all trace point definitions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: always zero di_flags2 when we free the inode	Darrick J. Wong
	Always zero the di_flags2 field when we free the inode so that we never end up with an on-disk record for an unallocated inode that also has the reflink iflag set. This is in keeping with the general principle that only files can have the reflink iflag set, even though we'll zero out di_flags2 if we ever reallocate the inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: call xfs_qm_dqattach before performing reflink operations	Darrick J. Wong
	Ensure that we've attached all the necessary dquots before performing reflink operations so that quota accounting is accurate. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29	xfs: bmap code cleanup	Shan Hai
	Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29	Use list_head infra-structure for buffer's log items list	Carlos Maiolino
	Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>