linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-06-04	Merge branch 'clk-imx6ul' into clk-next	Stephen Boyd
	* clk-imx6ul: clk: imx6ul: fix periph clk2 clock mux selection
2018-06-04	Merge branches 'clk-davinci-psc-da830', 'clk-renesas', 'clk-at91-recalc', ↵	Stephen Boyd
	'clk-davinci' and 'clk-meson' into clk-next * clk-davinci-psc-da830: clk: davinci: psc-da830: fix USB0 48MHz PHY clock registration * clk-renesas: clk: renesas: cpg-mssr: Add support for R-Car E3 clk: renesas: Add r8a77990 CPG Core Clock Definitions clk: renesas: rcar-gen2: Centralize quirks handling clk: renesas: r8a77980: Correct parent clock of PCIEC0 clk: renesas: r8a7794: Fix LB clock divider clk: renesas: r8a7792: Fix LB clock divider clk: renesas: r8a7791/r8a7793: Fix LB clock divider clk: renesas: r8a7745: Fix LB clock divider clk: renesas: r8a7743: Fix LB clock divider clk: renesas: cpg-mssr: Add r8a77470 support clk: renesas: Add r8a77470 CPG Core Clock Definitions clk: renesas: r8a77965: Add MSIOF controller clocks * clk-at91-recalc: clk: at91: PLL recalc_rate() now using cached MUL and DIV values * clk-davinci: clk: davinci: Fix link errors when not all SoCs are enabled clk: davinci: psc: allow for dev == NULL clk: davinci: da850-pll: change PLL0 to CLK_OF_DECLARE clk: davinci: pll: allow dev == NULL clk: davinci: psc-dm365: fix few clocks clk: davinci: pll-dm646x: keep PLL2 SYSCLK1 always enabled clk: davinci: psc-dm355: fix ASP0/1 clkdev lookups clk: davinci: pll-dm355: fix SYSCLKn parent names clk: davinci: pll-dm355: drop pll2_sysclk2 * clk-meson: clk: meson: axg: let mpll clocks round closest clk: meson: mpll: add round closest support clk: meson: meson8b: mark fclk_div2 gate clocks as CLK_IS_CRITICAL clk: meson: use SPDX license identifiers consistently clk: meson: drop CLK_SET_RATE_PARENT flag clk: meson-axg: Add AO Clock and Reset controller driver clk: meson: aoclk: refactor common code into dedicated file clk: meson: migrate to devm_of_clk_add_hw_provider API clk: meson: gxbb: add the video decoder clocks clk: meson: meson8b: add support for the NAND clocks dt-bindings: clock: reset: Add AXG AO Clock and Reset Bindings dt-bindings: clock: axg-aoclkc: New binding for Meson-AXG SoC clk: meson: gxbb: expose VDEC_1 and VDEC_HEVC clocks dt-bindings: clock: meson8b: export the NAND clock
2018-06-04	Merge branch 'clk-qcom-8996-halt' into clk-next	Stephen Boyd
	* clk-qcom-8996-halt: clk: qcom: gcc-msm8996: Disable halt check on UFS clocks clk: msm8996-gcc: Mark halt check as no-op for USB/PCIE pipe_clk
2018-06-04	rpmsg: smd: do not use mananged resources for endpoints and channels	Srinivas Kandagatla
	All the managed resources would be freed by the time release function is invoked. Handling such memory in qcom_smd_edge_release() would do bad things. Found this issue while testing Audio usecase where the dsp is started up and shutdown in a loop. This patch fixes this issue by using simple kzalloc for allocating channel->name and channel which is then freed in qcom_smd_edge_release(). Without this patch restarting a remoteproc would crash the system. Fixes: 53e2822e56c7 ("rpmsg: Introduce Qualcomm SMD backend") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-06-04	Merge branch 'clk-qcom-sdm845' into clk-next	Stephen Boyd
	* clk-qcom-sdm845: clk: qcom: Export clk_fabia_pll_configure() clk: qcom: Add video clock controller driver for SDM845 dt-bindings: clock: Introduce QCOM Video clock bindings clk: qcom: Add Global Clock controller (GCC) driver for SDM845 clk: qcom: Add DT bindings for SDM845 gcc clock controller clk: qcom: Configure the RCGs to a safe source as needed clk: qcom: Add support for BRANCH_HALT_SKIP flag for branch clocks clk: qcom: Simplify gdsc status checking logic clk: qcom: gdsc: Add support to poll CFG register to check GDSC state clk: qcom: gdsc: Add support to poll for higher timeout value clk: qcom: gdsc: Add support to reset AON and block reset logic clk: qcom: Add support for controlling Fabia PLL clk: qcom: Clear hardware clock control bit of RCG Also fixup the Kconfig mess where SDM845 GCC has msm8998 in the description and also the video Kconfig says things slightly differently from the GCC one so just make it the same.
2018-06-04	Merge tag 'docs-4.18' of git://git.lwn.net/linux	Linus Torvalds
	Pull documentation updates from Jonathan Corbet: "There's been a fair amount of work in the docs tree this time around, including: - Extensive RST conversions and organizational work in the memory-management docs thanks to Mike Rapoport. - An update of Documentation/features from Andrea Parri and a script to keep it updated. - Various LICENSES updates from Thomas, along with a script to check SPDX tags. - Work to fix dangling references to documentation files; this involved a fair number of one-liner comment changes outside of Documentation/ ... and the usual list of documentation improvements, typo fixes, etc" * tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits) Documentation: document hung_task_panic kernel parameter docs/admin-guide/mm: add high level concepts overview docs/vm: move ksm and transhuge from "user" to "internals" section. docs: Use the kerneldoc comments for memalloc_no*() doc: document scope NOFS, NOIO APIs docs: update kernel versions and dates in tables docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge docs/vm: transhuge: minor updates docs/vm: transhuge: change sections order Documentation: arm: clean up Marvell Berlin family info Documentation: gpio: driver: Fix a typo and some odd grammar docs: ranoops.rst: fix location of ramoops.txt scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode docs: uio-howto.rst: use a code block to solve a warning mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback w1: w1_io.c: fix a kernel-doc warning Documentation/process/posting: wrap text at 80 cols docs: admin-guide: add cgroup-v2 documentation Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'" Documentation: refcount-vs-atomic: Update reference to LKMM doc. ...
2018-06-04	IB/hfi1: Ensure VL index is within bounds	Kaike Wan
	Improve the safety of the code and ensure the array cannot be indexed out of bounds when picking the CPU for a given SDMA engine. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04	IB/hfi1: Fix user context tail allocation for DMA_RTAIL	Mike Marciniszyn
	The following code fails to allocate a buffer for the tail address that the hardware DMAs into when the user context DMA_RTAIL is set. if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) { rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent( &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail, gfp_flags); if (!rcd->rcvhdrtail_kvaddr) goto bail_free; rcd->rcvhdrqtailaddr_dma = dma_hdrqtail; } So the rcvhdrtail_kvaddr would then be NULL. The mmap logic fails to check for a NULL rcvhdrtail_kvaddr. The fix is to test for both user and kernel DMA_TAIL options during the allocation as well as testing for a NULL rcvhdrtail_kvaddr during the mmap processing. Additionally, all downstream testing of the capmask for DMA_RTAIL have been eliminated in favor of testing rcvhdrtail_kvaddr. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04	Merge branches 'clk-match-string', 'clk-ingenic', 'clk-si544-round-fix' and ↵	Stephen Boyd
	'clk-bcm-stingray' into clk-next * clk-match-string: clk: use match_string() helper clk: bcm2835: use match_string() helper * clk-ingenic: clk: ingenic: jz4770: Add 150us delay after enabling VPU clock clk: ingenic: jz4770: Enable power of AHB1 bus after ungating VPU clock clk: ingenic: jz4770: Modify C1CLK clock to disable CPU clock stop on idle clk: ingenic: jz4770: Change OTG from custom to standard gated clock clk: ingenic: Support specifying "wait for clock stable" delay clk: ingenic: Add support for clocks whose gate bit is inverted * clk-si544-round-fix: clk-si544: Properly round requested frequency to nearest match * clk-bcm-stingray: clk: bcm: Update and add Stingray clock entries dt-bindings: clk: Update Stingray binding doc
2018-06-04	Merge branches 'clk-imx7d', 'clk-hisi-stub', 'clk-mvebu', 'clk-imx6-epit' ↵	Stephen Boyd
	and 'clk-debugfs-simple' into clk-next * clk-imx7d: clk: imx7d: reset parent for mipi csi root clk: imx7d: fix mipi dphy div parent * clk-hisi-stub: clk/driver/hisi: Consolidate the Kconfig for the CLOCK_STUB * clk-mvebu: clk: mvebu: use correct bit for 98DX3236 NAND * clk-imx6-epit: clk: imx6: add EPIT clock support * clk-debugfs-simple: clk: Return void from debug_init op clk: remove clk_debugfs_add_file() clk: tegra: no need to check return value of debugfs_create functions clk: davinci: no need to check return value of debugfs_create functions clk: bcm2835: no need to check return value of debugfs_create functions clk: no need to check return value of debugfs_create functions
2018-06-04	Merge branches 'clk-imx6sx', 'clk-imx7d-enet' and 'clk-aspeed-24' into clk-next	Stephen Boyd
	* clk-imx6sx: clk: imx6sl: correct ocram_podf clock type clk: imx6sx: disable unnecessary clocks during clock initialization clk: imx6sx: add missing lvds2 clock to the clock tree * clk-imx7d-enet: ARM: dts: imx7: correct enet ipg clock clk: imx7d: correct enet clock CCGR registers clk: imx7d: correct enet phy ref clock gates * clk-aspeed-24: clk: aspeed: Add 24MHz fixed clock
2018-06-04	Merge branches 'clk-allwinner', 'clk-rockchip', 'clk-tegra', 'clk-berlin' ↵	Stephen Boyd
	and 'clk-qcom-mmagic' into clk-next * clk-allwinner: clk: sunxi-ng: r40: export a regmap to access the GMAC register clk: sunxi-ng: r40: rewrite init code to a platform driver clk: sunxi-ng: add support for H6 PRCM CCU * clk-rockchip: clk: rockchip: remove deprecated gate-clk code and dt-binding clk: rockchip: use match_string() helper * clk-tegra: clk: tegra: Add quirk for getting CDEV1/2 clocks on Tegra20 clk: tegra20: Correct parents of CDEV1/2 clocks clk: tegra20: Add DEV1/DEV2 OSC dividers * clk-berlin: clk: berlin: switch to SPDX license identifier * clk-qcom-mmagic: clk: qcom: mmcc-msm8996: leave all mmagic gdscs and clocks always enabled clk: qcom: Register the gdscs before the clocks clk: qcom: gdsc: Add support for ALWAYS_ON gdscs
2018-06-04	Merge branches 'clk-hisi-usb', 'clk-silent-bulk', 'clk-mtk-hdmi', ↵	Stephen Boyd
	'clk-mtk-mali' and 'clk-imx6ul-ccosr' into clk-next * clk-hisi-usb: clk: hisilicon: add missing usb3 clocks for Hi3798CV200 SoC * clk-silent-bulk: clk: bulk: silently error out on EPROBE_DEFER * clk-mtk-hdmi: clk: mediatek: correct the clocks for MT2701 HDMI PHY module * clk-mtk-mali: clk: mediatek: add g3dsys support for MT2701 and MT7623 dt-bindings: reset: mediatek: add entry for Mali-450 node to refer dt-bindings: clock: mediatek: add entry for Mali-450 node to refer dt-bindings: clock: mediatek: add g3dsys bindings * clk-imx6ul-ccosr: clk: imx: Add new clo01 and clo2 controlled by CCOSR
2018-06-04	Merge branches 'clk-stm32mp1', 'clk-samsung', 'clk-uniphier-mpeg', ↵	Stephen Boyd
	'clk-stratix10' and 'clk-aspeed' into clk-next * clk-stm32mp1: clk: stm32mp1: Fix a memory leak in 'clk_stm32_register_gate_ops()' clk: stm32mp1: Add CLK_IGNORE_UNUSED to ck_sys_dbg clock clk: stm32mp1: remove ck_apb_dbg clock clk: stm32mp1: set stgen_k clock as critical clk: stm32mp1: add missing tzc2 clock clk: stm32mp1: fix SAI3 & SAI4 clocks clk: stm32mp1: remove unused dfsdm_src[] const clk: stm32mp1: add missing static * clk-samsung: clk: samsung: simplify getting .drvdata * clk-uniphier-mpeg: clk: uniphier: add LD11/LD20 stream demux system clock * clk-stratix10: clk: socfpga: stratix10: suppress unbinding platform's clock driver clk: socfpga: stratix10: use platform driver APIs * clk-aspeed: clk:aspeed: Fix reset bits for PCI/VGA and PECI clk: aspeed: Support second reset register
2018-06-04	Merge branches 'clk-qcom-rpmh', 'clk-npcm7xx', 'clk-of-parent-count' and ↵	Stephen Boyd
	'clk-qcom-rcg-fix' into clk-next * clk-qcom-rpmh: dt-bindings: clock: Introduce QCOM RPMh clock bindings * clk-npcm7xx: clk: npcm7xx: fix return value check in npcm7xx_clk_init() clk: npcm7xx: add clock controller dt-binding: clk: npcm750: Add binding for Nuvoton NPCM7XX Clock * clk-of-parent-count: pinctrl: sunxi: Use of_clk_get_parent_count() instead of open coding soc/tegra: pmc: Use of_clk_get_parent_count() instead of open coding soc: rockchip: power-domain: Use of_clk_get_parent_count() instead of open coding ARM: timer-sp: Use of_clk_get_parent_count() instead of open coding clk: Extract OF clock helpers in <linux/of_clk.h> * clk-qcom-rcg-fix: clk: qcom: Base rcg parent rate off plan frequency
2018-06-04	Merge branch 'clk-actions' into clk-next	Stephen Boyd
	* clk-actions: clk: actions: Add S900 SoC clock support clk: actions: Add pll clock support clk: actions: Add composite clock support clk: actions: Add fixed factor clock support clk: actions: Add factor clock support clk: actions: Add divider clock support clk: actions: Add mux clock support clk: actions: Add gate clock support clk: actions: Add common clock driver support dt-bindings: clock: Add Actions S900 clock bindings
2018-06-04	Merge branches 'clk-warn', 'clk-core', 'clk-spear' and 'clk-qcom-msm8998' ↵	Stephen Boyd
	into clk-next * clk-warn: clk: Print the clock name and warning cause * clk-core: clk: Remove clk_init_cb typedef * clk-spear: clk: spear: fix WDT clock definition on SPEAr600 * clk-qcom-msm8998: clk: qcom: Add MSM8998 Global Clock Control (GCC) driver
2018-06-04	Merge branch 'sh_eth-fix-and-clean-up-sh_eth_soft_swap'	David S. Miller
	Sergei Shtylyov says: ==================== sh_eth: fix & clean up sh_eth_soft_swap() Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually care) plus couple clean ups around sh_eth_soft_swap()... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04	sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap()	Sergei Shtylyov
	When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs to be rounded up -- that's just asking for DIV_ROUND_UP()! Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04	sh_eth: uninline sh_eth_soft_swap()	Sergei Shtylyov
	sh_eth_tsu_soft_swap() is called twice by the driver, remove inline and move that function from the header to the driver itself to let gcc decide whether to expand it inline or not... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04	sh_eth: make sh_eth_soft_swap() work on ARM	Sergei Shtylyov
	Browsing thru the driver disassembly, I noticed that ARM gcc generated no code whatsoever for sh_eth_soft_swap() while building a little-endian kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however it got implicitly #define'd when building with the SH gcc (I could only find the explicit #define __LITTLE_ENDIAN that was #include'd when building a little-endian kernel). Luckily, the Ether controller only doing big- endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we need to fix the #ifdef inside sh_eth_soft_swap() to something that would work on all architectures... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04	NFS: Filter cache invalidation when holding a delegation	Trond Myklebust
	If the client holds a delegation, then ensure we filter out attempts to invalidate the size, owner, group owner, or mode unless we made the change, in which case, check that NFS_INO_REVAL_FORCED is set by the caller. Always filter out attempts to invalidate the change attribute and size, since we are authoritative for those. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()	Trond Myklebust
	If we hold a delegation, we should not need to call nfs_check_inode_attributes() since we already know which attributes are valid, and which ones may still need revalidation. The state of the NFS_INO_REVAL_FORCED flag is therefore irrelevant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFS: Improve caching while holding a delegation	Trond Myklebust
	Make sure that the client completely ignores change attribute and size changes on the server when it holds a delegation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFS: Fix attribute revalidation	Trond Myklebust
	Don't mark attributes as invalid just because they have changed. Instead, for the purposes of adjusting the attribute cache timeout, keep a separate variable that tracks whether or not a change occurred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFS: fix up nfs_setattr_update_inode	Trond Myklebust
	Always try to set the attributes, even if we don't have a valid struct nfs_fattr. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFSv4: Ensure the inode is clean when we set a delegation	Trond Myklebust
	If there are attributes that are still invalid when we set a delegation, then we need to set the NFS_INO_REVAL_FORCED flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access	Trond Myklebust
	If we hold a delegation, we don't need to care about whether or not the inode attributes are up to date. We know we can cache the results of this call regardless. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04	swait: strengthen language to discourage use	Linus Torvalds
	We already earlier discouraged people from using this interface in commit 88796e7e5c45 ("sched/swait: Document it clearly that the swait facilities are special and shouldn't be used"), but I just got a pull request with a new broken user. So make the comment really clear. The swait interfaces are bad, and should not be used unless you have some very strong reasons that include tons of hard performance numbers on just why you want to use them, and you show that you actually understand that they aren't at all like the normal wait/wakeup interfaces. So far, every single user has been suspect. The main user is KVM, which is completely pointless (there is only ever one waiter, which avoids the interface subtleties, but also means that having a queue instead of a pointer is counter-productive and certainly not an "optimization"). So make the comments much stronger. Not that anybody likely reads them anyway, but there's always some slight hope that it will cause somebody to think twice. I'd like to remove this interface entirely, but there is the theoretical possibility that it's actually the right thing to use in some situation, most likely some deep RT use. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-04	rbd: flush rbd_dev->watch_dwork after watch is unregistered	Dongsheng Yang
	There is a problem if we are going to unmap a rbd device and the watch_dwork is going to queue delayed work for watch: unmap Thread watch Thread timer do_rbd_remove cancel_tasks_sync(rbd_dev) queue_delayed_work for watch destroy_workqueue(rbd_dev->task_wq) drain_workqueue(wq) destroy other resources in wq call_timer_fn __queue_work() Then the delayed work escape the cancel_tasks_sync() and destroy_workqueue() and we will get an user-after-free call trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__queue_work+0x6a/0x3b0 RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000 RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00 R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970 R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0 Call Trace: <IRQ> ? __queue_work+0x3b0/0x3b0 call_timer_fn+0x2d/0x130 run_timer_softirq+0x16e/0x430 ? tick_sched_timer+0x37/0x70 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch() either bails out early because the watch is UNREGISTERED at that point or just gets cancelled. ] Cc: stable@vger.kernel.org Fixes: 99d1694310df ("rbd: retry watch re-registration periodically") Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: update description of some mount options	Chengguang Xu
	Based on code, default value of rsize/wsize is 16 MB. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: show ino32 if the value is different with default	Chengguang Xu
	In current ceph_show_options(), there is no item for showing 'ino32', so add showing mount option 'ino32' if the value is different with default. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: strengthen rsize/wsize/readdir_max_bytes validation	Chengguang Xu
	The check (intval < PAGE_SIZE) will involve type cast, so even when specifying negative value to rsize/wsize/readdir_max_bytes, it will pass the validation check successfully. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: fix alignment of rasize	Chengguang Xu
	On currently logic: when I specify rasize=0~1 then it will be 4096. when I specify rasize=2~4097 then it will be 8192. Make it the same as rsize & wsize. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: fix use-after-free in ceph_statfs()	Luis Henriques
	KASAN found an UAF in ceph_statfs. This was a one-off bug but looking at the code it looks like the monmap access needs to be protected as it can be modified while we're accessing it. Fix this by protecting the access with the monc->mutex. BUG: KASAN: use-after-free in ceph_statfs+0x21d/0x2c0 Read of size 8 at addr ffff88006844f2e0 by task trinity-c5/304 CPU: 0 PID: 304 Comm: trinity-c5 Not tainted 4.17.0-rc6+ #172 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa5/0x11b ? show_regs_print_info+0x5/0x5 ? kmsg_dump_rewind+0x118/0x118 ? ceph_statfs+0x21d/0x2c0 print_address_description+0x73/0x2b0 ? ceph_statfs+0x21d/0x2c0 kasan_report+0x243/0x360 ceph_statfs+0x21d/0x2c0 ? ceph_umount_begin+0x80/0x80 ? kmem_cache_alloc+0xdf/0x1a0 statfs_by_dentry+0x79/0xb0 vfs_statfs+0x28/0x110 user_statfs+0x8c/0xe0 ? vfs_statfs+0x110/0x110 ? __fdget_raw+0x10/0x10 __se_sys_statfs+0x5d/0xa0 ? user_statfs+0xe0/0xe0 ? mutex_unlock+0x1d/0x40 ? __x64_sys_statfs+0x20/0x30 do_syscall_64+0xee/0x290 ? syscall_return_slowpath+0x1c0/0x1c0 ? page_fault+0x1e/0x30 ? syscall_return_slowpath+0x13c/0x1c0 ? prepare_exit_to_usermode+0xdb/0x140 ? syscall_trace_enter+0x330/0x330 ? __put_user_4+0x1c/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 130: __kmalloc+0x124/0x210 ceph_monmap_decode+0x1c1/0x400 dispatch+0x113/0xd20 ceph_con_workfn+0xa7e/0x44e0 process_one_work+0x5f0/0xa30 worker_thread+0x184/0xa70 kthread+0x1a0/0x1c0 ret_from_fork+0x35/0x40 Freed by task 130: kfree+0xb8/0x210 dispatch+0x15a/0xd20 ceph_con_workfn+0xa7e/0x44e0 process_one_work+0x5f0/0xa30 worker_thread+0x184/0xa70 kthread+0x1a0/0x1c0 ret_from_fork+0x35/0x40 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: prevent i_version from going back	Yan, Zheng
	inode info from non-auth can be stale. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: fix wrong check for the case of updating link count	Yan, Zheng
	Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	libceph: allocate the locator string with GFP_NOFAIL	Ilya Dryomov
	calc_target() isn't supposed to fail with anything but POOL_DNE, in which case we report that the pool doesn't exist and fail the request with -ENOENT. Doing this for -ENOMEM is at the very least confusing and also harmful -- as the preceding requests complete, a short-lived locator string allocation is likely to succeed after a wait. (We used to call ceph_object_locator_to_pg() for a pi lookup. In theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning being removed.) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	libceph: make abort_on_full a per-osdc setting	Ilya Dryomov
	The intent behind making it a per-request setting was that it would be set for writes, but not for reads. As it is, the flag is set for all fs/ceph requests except for pool perm check stat request (technically a read). ceph_osdc_abort_on_full() skips reads since the previous commit and I don't see a use case for marking individual requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: don't abort reads in ceph_osdc_abort_on_full()	Ilya Dryomov
	Don't consider reads for aborting and use ->base_oloc instead of ->target_oloc, as done in __submit_request(). Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes either. But, there is an inconsistency in FULL_TRY/FULL_FORCE handling on the OSD side [1], so given that neither of these is used in the kernel client, leave it for when the OSD behaviour is sorted out. [1] http://tracker.ceph.com/issues/24339 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: avoid a use-after-free during map check	Ilya Dryomov
	Sending map check after complete_request() was called is not only useless, but can lead to a use-after-free as req->r_kref decrement in __complete_request() races with map check code. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: don't warn if req->r_abort_on_full is set	Ilya Dryomov
	The "FULL or reached pool quota" warning is there to explain paused requests. No need to emit it if pausing isn't going to occur. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: use for_each_request() in ceph_osdc_abort_on_full()	Ilya Dryomov
	Scanning the trees just to see if there is anything to abort is unnecessary -- all that is needed here is to update the epoch barrier first, before we start aborting. Simplify and do the update inside the loop before calling abort_request() for the first time. The switch to for_each_request() also fixes a bug: homeless requests weren't even considered for aborting. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: defer __complete_request() to a workqueue	Ilya Dryomov
	In the common case, req->r_callback is called by handle_reply() on the ceph-msgr worker thread without any locks. If handle_reply() fails, it is called with both osd->lock and osdc->lock. In the map check case, it is called with just osdc->lock but held for write. Finally, if the request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(), it is called directly on the submitter's thread, again with both locks. req->r_callback on the submitter's thread is relatively new (introduced in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting on itself: inode_wait_for_writeback+0x26/0x40 evict+0xb5/0x1a0 iput+0x1d2/0x220 ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] complete_request+0x2e/0x70 [libceph] __submit_request+0x256/0x330 [libceph] submit_request+0x2b/0x30 [libceph] ceph_osdc_start_request+0x25/0x40 [libceph] ceph_writepages_start+0xdfe/0x1320 [ceph] do_writepages+0x1f/0x70 __writeback_single_inode+0x45/0x330 writeback_sb_inodes+0x26a/0x600 __writeback_inodes_wb+0x92/0xc0 wb_writeback+0x274/0x330 wb_workfn+0x2d5/0x3b0 Defer __complete_request() to a workqueue in all failure cases so it's never on the same thread as ceph_osdc_start_request() and always called with no locks held. Link: http://tracker.ceph.com/issues/23978 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: move more code into __complete_request()	Ilya Dryomov
	Move req->r_completion wake up and req->r_kref decrement into __complete_request(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04	libceph: no need to call flush_workqueue() before destruction	Ilya Dryomov
	destroy_workqueue() drains the workqueue before proceeding with destruction. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: flush pending works before shutdown super	Yan, Zheng
	Pending works hold inode references, which cause "Busy inodes after unmount" warning. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: abort osd requests on force umount	Yan, Zheng
	This avoid force umount waiting on page writeback: io_schedule+0xd/0x30 wait_on_page_bit_common+0xc6/0x130 __filemap_fdatawait_range+0xbd/0x100 filemap_fdatawait_keep_errors+0x15/0x40 sync_inodes_sb+0x1cf/0x240 sync_filesystem+0x52/0x90 generic_shutdown_super+0x1d/0x110 ceph_kill_sb+0x28/0x80 [ceph] deactivate_locked_super+0x35/0x60 cleanup_mnt+0x36/0x70 task_work_run+0x79/0xa0 exit_to_usermode_loop+0x62/0x70 do_syscall_64+0xdb/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 0xffffffffffffffff Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	libceph: introduce ceph_osdc_abort_requests()	Ilya Dryomov
	This will be used by the filesystem for "umount -f". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04	ceph: fix st_nlink stat for directories	Luis Henriques
	Currently, calling stat on a cephfs directory returns 1 for st_nlink. This behaviour has recently changed in the fuse client, as some applications seem to expect this value to be either 0 (if it's unlinked) or 2 + number of subdirectories. This behaviour was changed in the fuse client with commit 67c7e4619188 ("client: use common interp of st_nlink for dirs"). This patch modifies the kernel client to have a similar behaviour. Link: https://tracker.ceph.com/issues/23873 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>