linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-11-08	net: sched: red: delay destroying child qdisc on replace	Jakub Kicinski
	Move destroying of the old child qdisc outside of the sch_tree_lock() section. This should improve the software qdisc replace but is even more important for offloads. Firstly calling offloads under a spin lock is best avoided. Secondly the destroy event of existing child would have been sent to the offload device before the replace, causing confusion. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: sched: refactor grafting Qdiscs with a parent	Jakub Kicinski
	The code for grafting Qdiscs when there is a parent has two needless indentation levels, and breaks the "keep the success path unindented" guideline. Refactor. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: sched: add an offload graft helper	Jakub Kicinski
	Qdisc graft operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: sched: set TCQ_F_OFFLOADED flag for MQ	Jakub Kicinski
	PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload, make MQ do the same. The consistency will help with consistent graft callback behaviour. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: sched: red: remove unnecessary red_dump_offload_stats parameter	Jakub Kicinski
	Offload dump helper does not use opt parameter, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: sched: add an offload dump helper	Jakub Kicinski
	Qdisc dump operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds
	Pull xfs fixes from Darrick Wong: - fix incorrect dropping of error code from bmap - print buffer offsets instead of useless hashed pointers when dumping corrupt metadata - fix integer overflow in attribute verifier * tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix overflow in xfs_attr3_leaf_verify xfs: print buffer offsets when dumping corrupt buffers xfs: Fix error code in 'xfs_ioc_getbmap()'
2018-11-08	Merge tag 'led-fixes-for-4.20-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: "All three fixes are related to the newly added pattern trigger: - remove mutex_lock() from timer callback, which would trigger problems related to sleeping in atomic context, the removal is harmless since mutex protection turned out to be redundant in this case - fix pattern parsing to properly handle intervals with brightness == 0 - fix typos in the ABI documentation" * tag 'led-fixes-for-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: Documentation: ABI: led-trigger-pattern: Fix typos leds: trigger: Fix sleeping function called from invalid context Fix pattern handling optimalization
2018-11-08	Merge tag 'sound-4.20-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Two small regression fixes for HD-audio: one about vga_switcheroo and runtime PM, and another about Oops on some Thinkpads" * tag 'sound-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix incorrect clearance of thinkpad_acpi hooks vga_switcheroo: Fix missing gpu_bound call at audio client registration
2018-11-08	Merge branch 'net-phy-improve-and-simplify-phylib-state-machine'	David S. Miller
	Heiner Kallweit says: ==================== net: phy: improve and simplify phylib state machine This patch series is based on two axioms: - During autoneg a PHY always reports the link being down - Info in clause 22/45 registers doesn't allow to differentiate between these two states: 1. Link is physically down 2. A link partner is connected and PHY is autonegotiating In both cases "link up" and "aneg finished" bits aren't set. One consequence is that having separate states PHY_NOLINK and PHY_AN isn't needed. By using these two axioms the state machine can be significantly simplified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: phy: use phy_check_link_status in more places in the state machine	Heiner Kallweit
	Use phy_check_link_status in more places in the state machine. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: phy: remove state PHY_AN	Heiner Kallweit
	After the recent changes in the state machine state PHY_AN isn't used any longer and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: phy: add phy_check_link_status	Heiner Kallweit
	In few places in the state machine the state is set to PHY_RUNNING or PHY_NOLINK after doing a phy_read_status(). So factor this out to phy_check_link_status(). First use it in phy_start_aneg(): By setting the state to PHY_RUNNING or PHY_NOLINK directly we can remove the code to handle the case that we're using interrupts and aneg was finished already. Definition of phy_link_up and phy_link_down needs to be moved because they are called in the new function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: phy: remove useless check in state machine case PHY_RESUMING	Heiner Kallweit
	If aneg isn't finished yet then the PHY reports the link as down. There's no benefit in setting the state to PHY_AN because the next state machine run would set the status to PHY_NOLINK anyway (except in the meantime aneg has been finished and link is up). Therefore we can set the state to PHY_RUNNING or PHY_NOLINK directly. In addition change the do_carrier parameter in phy_link_down() to true. If carrier was marked as up before (what should never be the case because PHY was in state PHY_HALTED before) then we should mark it as down now. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	net: phy: remove useless check in state machine case PHY_NOLINK	Heiner Kallweit
	If aneg is enabled and the PHY reports the link as up then definitely aneg finished successfully. Therefore this check is useless and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	of, numa: Validate some distance map rules	John Garry
	Currently the NUMA distance map parsing does not validate the distance table for the distance-matrix rules 1-2 in [1]. However the arch NUMA code may enforce some of these rules, but not all. Such is the case for the arm64 port, which does not enforce the rule that the distance between separates nodes cannot equal LOCAL_DISTANCE. The patch adds the following rules validation: - distance of node to self equals LOCAL_DISTANCE - distance of separate nodes > LOCAL_DISTANCE This change avoids a yet-unresolved crash reported in [2]. A note on dealing with symmetrical distances between nodes: Validating symmetrical distances between nodes is difficult. If it were mandated in the bindings that every distance must be recorded in the table, then it would be easy. However, it isn't. In addition to this, it is also possible to record [b, a] distance only (and not [a, b]). So, when processing the table for [b, a], we cannot assert that current distance of [a, b] != [b, a] as invalid, as [a, b] distance may not be present in the table and current distance would be default at REMOTE_DISTANCE. As such, we maintain the policy that we overwrite distance [a, b] = [b, a] for b > a. This policy is different to kernel ACPI SLIT validation, which allows non-symmetrical distances (ACPI spec SLIT rules allow it). However, the distance debug message is dropped as it may be misleading (for a distance which is later overwritten). Some final notes on semantics: - It is implied that it is the responsibility of the arch NUMA code to reset the NUMA distance map for an error in distance map parsing. - It is the responsibility of the FW NUMA topology parsing (whether OF or ACPI) to enforce NUMA distance rules, and not arch NUMA code. [1] Documents/devicetree/bindings/numa.txt [2] https://www.spinics.net/lists/arm-kernel/msg683304.html Cc: stable@vger.kernel.org # 4.7 Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2018-11-08	of/device: Really only set bus DMA mask when appropriate	Robin Murphy
	of_dma_configure() was supposed to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Robert Richter <robert.richter@cavium.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2018-11-08	clk: meson: axg: mark fdiv2 and fdiv3 as critical	Jerome Brunet
	Similar to gxbb and gxl platforms, axg SCPI Cortex-M co-processor uses the fdiv2 and fdiv3 to, among other things, provide the cpu clock. Until clock hand-off mechanism makes its way to CCF and the generic SCPI claims platform specific clocks, these clocks must be marked as critical to make sure they are never disabled when needed by the co-processor. Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-11-08	clk: meson-gxbb: set fclk_div3 as CLK_IS_CRITICAL	Christian Hewitt
	On the Khadas VIM2 (GXM) and LePotato (GXL) board there are problems with reboot; e.g. a ~60 second delay between issuing reboot and the board power cycling (and in some OS configurations reboot will fail and require manual power cycling). Similar to 'commit c987ac6f1f088663b6dad39281071aeb31d450a8 ("clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL")' the SCPI Cortex-M4 Co-Processor seems to depend on FCLK_DIV3 being operational. Until commit 05f814402d6174369b3b29832cbb5eb5ed287059 ("clk: meson: add fdiv clock gates"), this clock was modeled and left on by the bootloader. We don't have precise documentation about the SCPI Co-Processor and its clock requirement so we are learning things the hard way. Marking this clock as critical solves the problem but it should not be viewed as final solution. Ideally, the SCPI driver should claim these clocks. We also depends on some clock hand-off mechanism making its way to CCF, to make sure the clock stays on between its registration and the SCPI driver probe. Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates") Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-11-08	arm64: memblock: don't permit memblock resizing until linear mapping is up	Ard Biesheuvel
	Bhupesh reports that having numerous memblock reservations at early boot may result in the following crash: Unable to handle kernel paging request at virtual address ffff80003ffe0000 ... Call trace: __memcpy+0x110/0x180 memblock_add_range+0x134/0x2e8 memblock_reserve+0x70/0xb8 memblock_alloc_base_nid+0x6c/0x88 __memblock_alloc_base+0x3c/0x4c memblock_alloc_base+0x28/0x4c memblock_alloc+0x2c/0x38 early_pgtable_alloc+0x20/0xb0 paging_init+0x28/0x7f8 This is caused by the fact that we permit memblock resizing before the linear mapping is up, and so the memblock_reserved() array is moved into memory that is not mapped yet. So let's ensure that this crash can no longer occur, by deferring to call to memblock_allow_resize() to after the linear mapping has been created. Reported-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-08	arm64: mm: define NET_IP_ALIGN to 0	Ard Biesheuvel
	On arm64, there is no need to add 2 bytes of padding to the start of each network buffer just to make the IP header appear 32-bit aligned. Since this might actually adversely affect DMA performance some platforms, let's override NET_IP_ALIGN to 0 to get rid of this padding. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-08	libceph: assume argonaut on the server side	Ilya Dryomov
	No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2018-11-08	ceph: quota: fix null pointer dereference in quota check	Luis Henriques
	This patch fixes a possible null pointer dereference in check_quota_exceeded, detected by the static checker smatch, with the following warning: fs/ceph/quota.c:240 check_quota_exceeded() error: we previously assumed 'realm' could be null (see line 188) Fixes: b7a2921765cf ("ceph: quota: support for ceph.quota.max_files") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-11-08	ceph: add destination file data sync before doing any remote copy	Luis Henriques
	If we try to copy into a file that was just written, any data that is remote copied will be overwritten by our buffered writes once they are flushed. When this happens, the call to invalidate_inode_pages2_range will also return a -EBUSY error. This patch fixes this by also sync'ing the destination file before starting any copy. Fixes: 503f82a9932d ("ceph: support copy_file_range file operation") Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-11-08	sata_rcar: convert to SPDX identifiers	Kuninori Morimoto
	This patch updates license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08	ubd: fix missing initialization of io_req	Anton Ivanov
	The SYNC path doesn't initialize io_req->error, which can cause random errors. Before the conversion to blk-mq, we always completed requests with BLK_STS_OK status, but now we actually look at the error field and this issue becomes apparent. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> [axboe: fixed up commit message to explain what is actually going on] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08	arch/alpha, termios: implement BOTHER, IBSHIFT and termios2	H. Peter Anvin (Intel)
	Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags using arbitrary flags. Because BOTHER is not defined, the general Linux code doesn't allow setting arbitrary baud rates, and because CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037. Resolve both problems by #defining BOTHER to 037 on Alpha. However, userspace still needs to know if setting BOTHER is actually safe given legacy kernels (does anyone actually care about that on Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even though they use the same structure. Define struct termios2 just for compatibility; it is the exact same structure as struct termios. In a future patchset, this will be cleaned up so the uapi headers are usable from libc. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: <linux-alpha@vger.kernel.org> Cc: <linux-serial@vger.kernel.org> Cc: Johan Hovold <johan@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-08	Merge tag 'compiler-attributes-for-linus-v4.20-rc2' of ↵	Linus Torvalds
	https://github.com/ojeda/linux Pull compiler attribute fixlets from Miguel Ojeda: "Small improvements to Compiler Attributes: - Define asm_volatile_goto for non-gcc compilers (Nick Desaulniers) - Improve the explanation of compiler_attributes.h" * tag 'compiler-attributes-for-linus-v4.20-rc2' of https://github.com/ojeda/linux: Compiler Attributes: improve explanation of header include/linux/compiler*.h: define asm_volatile_goto
2018-11-08	Merge tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtd	Linus Torvalds
	Pull MTD fixes from Boris Brezillon: "MTD changes: - Kill a VLA in sa1100 SPI NOR changes: - Make sure ->addr_width is restored when SFDP parsing fails - Propate errors happening in cqspi_direct_read_execute() NAND changes: - Fix kernel-doc mismatch - Fix nanddev_neraseblocks() to return the correct value - Avoid selection of BCH_CONST_PARAMS when some users require dynamic BCH settings" * tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtd: mtd: nand: Fix nanddev_pos_next_page() kernel-doc header mtd: sa1100: avoid VLA in sa1100_setup_mtd mtd: spi-nor: Reset nor->addr_width when SFDP parsing failed mtd: spi-nor: cadence-quadspi: Return error code in cqspi_direct_read_execute() mtd: nand: Fix nanddev_neraseblocks() mtd: nand: drop kernel-doc notation for a deleted function parameter mtd: docg3: don't set conflicting BCH_CONST_PARAMS option
2018-11-08	termios, tty/tty_baudrate.c: fix buffer overrun	H. Peter Anvin
	On architectures with CBAUDEX == 0 (Alpha and PowerPC), the code in tty_baudrate.c does not do any limit checking on the tty_baudrate[] array, and in fact a buffer overrun is possible on both architectures. Add a limit check to prevent that situation. This will be followed by a much bigger cleanup/simplification patch. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Requested-by: Cc: Johan Hovold <johan@kernel.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-08	vt: fix broken display when running aptitude	Mikulas Patocka
	If you run aptitude on framebuffer console, the display is corrupted. The corruption is caused by the commit d8ae7242. The patch adds "offset" to "start" when calling scr_memsetw, but it forgets to do the same addition on a subsequent call to do_update_region. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: d8ae72427187 ("vt: preserve unicode values corresponding to screen characters") Reviewed-by: Nicolas Pitre <nico@linaro.org> Cc: stable@vger.kernel.org # 4.19 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-08	Compiler Attributes: improve explanation of header	Miguel Ojeda
	Explain better what "optional" attributes are, and avoid calling them so to avoid confusion. Simply retain "Optional" as a word to look for in the comments. Moreover, add a couple sentences to explain a bit more the intention and the documentation links. Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-11-07	Merge branch '1GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-11-07 This series contains updates to almost all of the Intel wired LAN drivers. Lance Roy replaces a spin lock with lockdep_assert_held() for igbvf driver in move toward trying to remove spin_is_locked(). Colin Ian King fixes a potential null pointer dereference by adding a check in ixgbe. Also fixed the igc driver by properly assigning the return error code of a function call, so that we can properly check it. Shannon Nelson updates the ixgbe driver to not block IPsec offload when in VEPA mode, in VEB mode, IPsec offload is still blocked because the device drops packets into a black hole. Jake adds support for software timestamping for packets sent over ixgbevf. Also modifies i40e, iavf, igb, igc, and ixgbe to delay calling skb_tx_timestamp() to the latest point possible, which is just prior to notifying the hardware of the new Tx packet. Todd adds the new WoL filter flag so that we properly report that we do not support this new feature. YueHaibing from Huawei fixes the igc driver by cleaning up variables that are not "really" used. Dan Carpenter cleans up igc whitespace issues. Miroslav Lichvar fixes e1000e for potential underflow issue in the timecounter, so modify the driver to use timecounter_cyc2time() to allow non-monotonic SYSTIM readings. Sasha provides additional igc cleanups based on community feedback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	mount: Prevent MNT_DETACH from disconnecting locked mounts	Eric W. Biederman
	Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote: > As per mount_namespaces(7) unprivileged users should not be able to look under mount points: > > Mounts that come as a single unit from more privileged mount are locked > together and may not be separated in a less privileged mount namespace. > > However they can: > > 1. Create a mount namespace. > 2. In the mount namespace open a file descriptor to the parent of a mount point. > 3. Destroy the mount namespace. > 4. Use the file descriptor to look under the mount point. > > I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8. > > The setup: > > $ sudo sysctl kernel.unprivileged_userns_clone=1 > kernel.unprivileged_userns_clone = 1 > $ mkdir -p A/B/Secret > $ sudo mount -t tmpfs hide A/B > > > "Secret" is indeed hidden as expected: > > $ ls -lR A > A: > total 0 > drwxrwxrwt 2 root root 40 Feb 12 21:08 B > > A/B: > total 0 > > > The attack revealing "Secret": > > $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A" > /proc/self/fd/4/: > total 0 > drwxr-xr-x 3 root root 60 Feb 12 21:08 B > > /proc/self/fd/4/B: > total 0 > drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret > > /proc/self/fd/4/B/Secret: > total 0 I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and disconnecting all of the mounts in a mount namespace. Fix this by factoring drop_mounts out of drop_collected_mounts and passing 0 instead of UMOUNT_SYNC. There are two possible behavior differences that result from this. - No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on the vfsmounts being unmounted. This effects the lazy rcu walk by kicking the walk out of rcu mode and forcing it to be a non-lazy walk. - No longer disconnecting locked mounts will keep some mounts around longer as they stay because the are locked to other mounts. There are only two users of drop_collected mounts: audit_tree.c and put_mnt_ns. In audit_tree.c the mounts are private and there are no rcu lazy walks only calls to iterate_mounts. So the changes should have no effect except for a small timing effect as the connected mounts are disconnected. In put_mnt_ns there may be references from process outside the mount namespace to the mounts. So the mounts remaining connected will be the bug fix that is needed. That rcu walks are allowed to continue appears not to be a problem especially as the rcu walk change was about an implementation detail not about semantics. Cc: stable@vger.kernel.org Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Reported-by: Timothy Baldwin <timbaldwin@fastmail.co.uk> Tested-by: Timothy Baldwin <timbaldwin@fastmail.co.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-11-07	Merge branch 'nfp-add-and-use-tunnel-netdev-helpers'	David S. Miller
	John Hurley says: ==================== nfp: add and use tunnel netdev helpers A recent patch introduced the function netif_is_vxlan() to verify the tunnel type of a given netdev as vxlan. Add a similar function to detect geneve netdevs and make use of this function in the NFP driver. Also make use of the vxlan helper where applicable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	nfp: flower: include geneve as supported offload tunnel type	John Hurley
	Offload of geneve decap rules is supported in NFP. Include geneve in the check for supported types. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	nfp: flower: use geneve and vxlan helpers	John Hurley
	Make use of the recently added VXLAN and geneve helper functions to determine the type of the netdev from its rtnl_link_ops. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	net: add netif_is_geneve()	John Hurley
	Add a helper function to determine if the type of a netdev is geneve based on its rtnl_link_ops. This allows drivers that may wish to offload tunnels to check the underlying type of the device. A recent patch added a similar helper to vxlan.h Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	s390/perf: Change CPUM_CF return code in event init function	Thomas Richter
	The function perf_init_event() creates a new event and assignes it to a PMU. This a done in a loop over all existing PMUs. For each listed PMU the event init function is called and if this function does return any other error than -ENOENT, the loop is terminated the creation of the event fails. If the event is invalid, return -ENOENT to try other PMUs. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-07	sfc: add missing NVRAM partition types for EF10	Edward Cree
	Expose the MUM/SUC Firmware, UEFI Expansion ROM and MC Status partitions of the NIC's NVRAM as MTDs if found on the NIC. The first two are needed in order to properly update them when performing firmware updates; the MC Status partition is used to determine whether a signed firmware image was accepted or rejected by a Secure NIC. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	posix-cpu-timers: Remove useless call to check_dl_overrun()	Juri Lelli
	check_dl_overrun() is used to send a SIGXCPU to users that asked to be informed when a SCHED_DEADLINE runtime overruns occur. The function is called by check_thread_timers() already, so the call in check_process_timers() is redundant/wrong (even though harmless). Remove it. Fixes: 34be39305a77 ("sched/deadline: Implement "runtime overrun signal" support") Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: linux-rt-users@vger.kernel.org Cc: mtk.manpages@gmail.com Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Claudio Scordino <claudio@evidence.eu.com> Link: https://lkml.kernel.org/r/20181107111032.32291-1-juri.lelli@redhat.com
2018-11-07	Merge branch 'vlan-prepare-for-removal-of-VLAN_TAG_PRESENT'	David S. Miller
	Michał Mirosław says: ==================== net/vlan: prepare for removal of VLAN_TAG_PRESENT This is a preparatory patchset before removing the use of VLAN_TAG_PRESENT bit in skb->vlan_tci as indication of VLAN offload. This set includes only cleanups that allow abstracting of code testing VLAN tag presence in drivers and networking code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	net/vlan: remove unused #define HAVE_VLAN_GET_TAG	Michał Mirosław
	Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	net/vlan: include the shift in skb_vlan_tag_get_prio()	Michał Mirosław
	Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	net/vlan: introduce __vlan_hwaccel_copy_tag() helper	Michał Mirosław
	Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	net/vlan: introduce __vlan_hwaccel_clear_tag() helper	Michał Mirosław
	Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	qlcnic: remove assumption that vlan_tci != 0	Michał Mirosław
	VLAN.TCI == 0 is perfectly valid (802.1p), so allow it to be accelerated. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	ibmvnic: fix accelerated VLAN handling	Michał Mirosław
	Don't request tag insertion when it isn't present in outgoing skb. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	inet: minor optimization for backlog setting in listen(2)	Yafang Shao
	Set the backlog earlier in inet_dccp_listen() and inet_listen(), then we can avoid the redundant setting. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	mount: Don't allow copying MNT_UNBINDABLE\|MNT_LOCKED mounts	Eric W. Biederman
	Jonathan Calmels from NVIDIA reported that he's able to bypass the mount visibility security check in place in the Linux kernel by using a combination of the unbindable property along with the private mount propagation option to allow a unprivileged user to see a path which was purposefully hidden by the root user. Reproducer: # Hide a path to all users using a tmpfs root@castiana:~# mount -t tmpfs tmpfs /sys/devices/ root@castiana:~# # As an unprivileged user, unshare user namespace and mount namespace stgraber@castiana:~$ unshare -U -m -r # Confirm the path is still not accessible root@castiana:~# ls /sys/devices/ # Make /sys recursively unbindable and private root@castiana:~# mount --make-runbindable /sys root@castiana:~# mount --make-private /sys # Recursively bind-mount the rest of /sys over to /mnnt root@castiana:~# mount --rbind /sys/ /mnt # Access our hidden /sys/device as an unprivileged user root@castiana:~# ls /mnt/devices/ breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual Solve this by teaching copy_tree to fail if a mount turns out to be both unbindable and locked. Cc: stable@vger.kernel.org Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Reported-by: Jonathan Calmels <jcalmels@nvidia.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>