summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-02-10genirq: Avoid summation loops for /proc/statThomas Gleixner
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. This can be largely avoided for interrupts which are not marked as 'PER_CPU' interrupts by simply adding a per interrupt summation counter which is incremented along with the per interrupt per cpu counter. The PER_CPU interrupts need to avoid that and use only per cpu accounting because they share the interrupt number and the interrupt descriptor and concurrent updates would conflict or require unwanted synchronization. Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de 8<------------- v2: Undo the unintentional layout change of struct irq_desc. include/linux/irqdesc.h | 1 + kernel/irq/chip.c | 12 ++++++++++-- kernel/irq/internals.h | 8 +++++++- kernel/irq/irqdesc.c | 7 ++++++- 4 files changed, 24 insertions(+), 4 deletions(-)
2019-02-10Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "irqchip driver fixes: most of them are race fixes for ARM GIC (General Interrupt Controller) variants, but also a fix for the ARM MMP (Marvell PXA168 et al) irqchip affecting OLPC keyboards" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Fix ITT_entry_size accessor irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable irqchip/gic-v3-its: Gracefully fail on LPI exhaustion irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID irqchip/gic-v4: Fix occasional VLPI drop
2019-02-09Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph, fixing namespace locking when dealing with the effects log, and a rapid add/remove issue (Keith) - blktrace tweak, ensuring requests with -1 sectors are shown (Jan) - link power management quirk for a Smasung SSD (Hans) - m68k nfblock dynamic major number fix (Chengguang) - series fixing blk-iolatency inflight counter issue (Liu) - ensure that we clear ->private when setting up the aio kiocb (Mike) - __find_get_block_slow() rate limit print (Tetsuo) * tag 'for-linus-20190209' of git://git.kernel.dk/linux-block: blk-mq: remove duplicated definition of blk_mq_freeze_queue Blk-iolatency: warn on negative inflight IO counter blk-iolatency: fix IO hang due to negative inflight counter blktrace: Show requests without sector fs: ratelimit __find_get_block_slow() failure message. m68k: set proper major_num when specifying module param major_num libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD nvme-pci: fix rapid add remove sequence nvme: lock NS list changes while handling command effects aio: initialize kiocb private in case any filesystems expect it.
2019-02-09net: phy: Add support for asking the PHY its abilitiesAndrew Lunn
Add support for runtime determination of what the PHY supports, by adding a new function to the phy driver. The get_features call should set the phydev->supported member with the features the PHY supports. It is only called if phydrv->features is NULL. This requires minor changes to pause. The PHY driver should not set pause abilities, except for when it has odd cause capabilities, e.g. pause cannot be disabled. With this change, phydev->supported already contains the drivers abilities, including pause. So rather than considering phydrv->features, look at the phydev->supported, and enable pause if neither of the pause bits are already set. Signed-off-by: Andrew Lunn <andrew@lunn.ch> [hkallweit1@gmail.com: fixed small checkpatch complaint in one comment] Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08qed: Add API for SmartAN query.Sudarsana Reddy Kalluru
The patch adds driver interface to read the SmartAN capability from management firmware. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal fixes from Eric Biederman: "This contains four small fixes for signal handling. A missing range check, a regression fix, prioritizing signals we have already started a signal group exit for, and better detection of synchronous signals. The confused decision of which signals to handle failed spectacularly when a timer was pointed at SIGBUS and the stack overflowed. Resulting in an unkillable process in an infinite loop instead of a SIGSEGV and core dump" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Better detection of synchronous signals signal: Always notice exiting tasks signal: Always attempt to allocate siginfo for SIGSTOP signal: Make siginmask safe when passed a signal of 0
2019-02-08lib: objagg: add root count to statsJiri Pirko
Count number of roots and add it to stats. It is handy for the library user to have this stats available as it can act upon it without counting roots itself. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08lib: objagg: implement optimization hints assembly and use hints for object ↵Jiri Pirko
creation Implement simple greedy algo to find more optimized root-delta tree for a given objagg instance. This "hints" can be used by a driver to: 1) check if the hints are better (driver's choice) than the original objagg tree. Driver does comparison of objagg stats and hints stats. 2) use the hints to create a new objagg instance which will construct the root-delta tree according to the passed hints. Currently, only a simple greedy algorithm is implemented. Basically it finds the roots according to the maximal possible user count including deltas. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "This pull request is dedicated to the upcoming snowpocalypse parts 2 and 3 in the Pacific Northwest: 1) Drop profiles are broken because some drivers use dev_kfree_skb* instead of dev_consume_skb*, from Yang Wei. 2) Fix IWLWIFI kconfig deps, from Luca Coelho. 3) Fix percpu maps updating in bpftool, from Paolo Abeni. 4) Missing station release in batman-adv, from Felix Fietkau. 5) Fix some networking compat ioctl bugs, from Johannes Berg. 6) ucc_geth must reset the BQL queue state when stopping the device, from Mathias Thore. 7) Several XDP bug fixes in virtio_net from Toshiaki Makita. 8) TSO packets must be sent always on queue 0 in stmmac, from Jose Abreu. 9) Fix socket refcounting bug in RDS, from Eric Dumazet. 10) Handle sparse cpu allocations in bpf selftests, from Martynas Pumputis. 11) Make sure mgmt frames have enough tailroom in mac80211, from Felix Feitkau. 12) Use safe list walking in sctp_sendmsg() asoc list traversal, from Greg Kroah-Hartman. 13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL ccid, from Eric Dumazet. 14) Need to reload WoL password into bcmsysport device after deep sleeps, from Florian Fainelli. 15) Remove filter from mask before freeing in cls_flower, from Petr Machata. 16) Missing release and use after free in error paths of s390 qeth code, from Julian Wiedmann. 17) Fix lockdep false positive in dsa code, from Marc Zyngier. 18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn. 19) Fix EQ firmware assert in qed driver, from Manish Chopra. 20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: dsa: b53: Fix for failure when irq is not defined in dt sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled net: Don't default Cavium PTP driver to 'y' net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net/mlx5e: Don't overwrite pedit action when multiple pedit used net/mlx5e: Update hw flows when encap source mac changed qed*: Advance drivers version to 8.37.0.20 qed: Change verbosity for coalescing message. qede: Fix system crash on configuring channels. qed: Consider TX tcs while deriving the max num_queues for PF. ...
2019-02-08ieee80211: fix for_each_element_extid()Johannes Berg
The data/datalen argument names cannot be used as those are also the struct element names, fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mac80211: indicate support for multiple BSSIDSara Sharon
Set multi-bssid support flags according to driver support. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mac80211: support multi-bssidSara Sharon
Add support for multi-bssid. This includes: - Parsing multi-bssid element - Overriding DTIM values - Taking into account in various places the inner BSSID instead of transmitter BSSID - Save aside some multi-bssid properties needed by drivers Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: add and use strongly typed element iteration macrosJohannes Berg
Rather than always iterating elements from frames with pure u8 pointers, add a type "struct element" that encapsulates the id/datalen/data format of them. Then, add the element iteration macros * for_each_element * for_each_element_id * for_each_element_extid which take, as their first 'argument', such a structure and iterate through a given u8 array interpreting it as elements. While at it and since we'll need it, also add * for_each_subelement * for_each_subelement_id * for_each_subelement_extid which instead of taking data/length just take an outer element and use its data/datalen. Also add for_each_element_completed() to determine if any of the loops above completed, i.e. it was able to parse all of the elements successfully and no data remained. Use for_each_element_id() in cfg80211_find_ie_match() as the first user of this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mmc: block: handle complete_work on separate workqueueZachary Hays
The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set. This generates a rescuer thread for that queue that will trigger when the CPU is under heavy load and collect the uncompleted work. In the case of mmc, this creates the possibility of a deadlock when there are multiple partitions on the device as other blk-mq work is also run on the same queue. For example: - worker 0 claims the mmc host to work on partition 1 - worker 1 attempts to claim the host for partition 2 but has to wait for worker 0 to finish - worker 0 schedules complete_work to release the host - rescuer thread is triggered after time-out and collects the dangling work - rescuer thread attempts to complete the work in order starting with claim host - the task to release host is now blocked by a task to claim it and will never be called The above results in multiple hung tasks that lead to failures to mount partitions. Handling complete_work on a separate workqueue avoids this by keeping the work completion tasks separate from the other blk-mq work. This allows the host to be released without getting blocked by other tasks attempting to claim the host. Signed-off-by: Zachary Hays <zhays@lexmark.com> Fixes: 81196976ed94 ("mmc: block: Add blk-mq support") Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-02-07net: phy: let genphy_c45_read_link manage the devices to checkHeiner Kallweit
Let genphy_c45_read_link manage the devices to check, this removes overhead from callers. Add C22EXT to the list of excluded devices because it doesn't implement the status register. According to the 802.3 clause 45 spec registers 29.0 - 29.4 are reserved. At the moment we have very few clause 45 PHY drivers, so we are lacking experience whether other drivers will have to exclude further devices, or may need to check PHY XS. If we should figure out that list of devices to check needs to be configurable, I think best will be to add a device list member to struct phy_driver. v2: - adjusted commit message - exclude also device C22EXT from link checking Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net: fixed-phy: Add fixed_phy_register_with_gpiod() APIMoritz Fischer
Add fixed_phy_register_with_gpiod() API. It lets users create a fixed_phy instance that uses a GPIO descriptor which was obtained externally e.g. through platform data. This enables platform devices (non-DT based) to use GPIOs for link status. Signed-off-by: Moritz Fischer <mdf@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07Merge tag 'irqchip-5.0-3' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip updates from Marc Zyngier: - Another GICv3 ITS fix for devices sharing the same DevID - Don't return invalid data on exhaustion of the GICv3 LPI pool - Fix a GICv3 field decoding bug leading to memory over-allocation - Init GICv4 at boot time instead of lazy init - Fix interrupt masking on PJ4
2019-02-07blktrace: Show requests without sectorJan Kara
Currently, blktrace will not show requests that don't have any data as rq->__sector is initialized to -1 which is out of device range and thus discarded by act_log_check(). This is most notably the case for cache flush requests sent to the device. Fix the problem by making blk_rq_trace_sector() return 0 for requests without initialized sector. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fix from Jiri Kosina: "A fix for a bug in hid-debug that can lock up the kernel in infinite loop (CVE-2019-3819), from Vladis Dronov" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: debug: fix the ring buffer implementation
2019-02-06net: Introduce ndo_get_port_parent_id()Florian Fainelli
In preparation for getting rid of switchdev_ops, create a dedicated NDO operation for getting the port's parent identifier. There are essentially two classes of drivers that need to implement getting the port's parent ID which are VF/PF drivers with a built-in switch, and pure switchdev drivers such as mlxsw, ocelot, dsa etc. We introduce a helper function: dev_get_port_parent_id() which supports recursion into the lower devices to obtain the first port's parent ID. Convert the bridge, core and ipv4 multicast routing code to check for such ndo_get_port_parent_id() and call the helper function when valid before falling back to switchdev_port_attr_get(). This will allow us to convert all relevant drivers in one go instead of having to implement both switchdev_port_attr_get() and ndo_get_port_parent_id() operations, then get rid of switchdev_port_attr_get(). Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06ethtool: add ethtool_rx_flow_spec to flow_rule structure translatorPablo Neira Ayuso
This patch adds a function to translate the ethtool_rx_flow_spec structure to the flow_rule representation. This allows us to reuse code from the driver side given that both flower and ethtool_rx_flow interfaces use the same representation. This patch also includes support for the flow type flags FLOW_EXT, FLOW_MAC_EXT and FLOW_RSS. The ethtool_rx_flow_spec_input wrapper structure is used to convey the rss_context field, that is away from the ethtool_rx_flow_spec structure, and the ethtool_rx_flow_spec structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: phy: provide full set of accessor functions to MMD registersNikita Yushchenko
This adds full set of locked and unlocked accessor functions to read and write PHY MMD registers and/or bitfields. Set of functions exactly matches what is already available for PHY legacy registers. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06Merge tag 'wireless-drivers-next-for-davem-2019-02-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 5.1 First set of patches for 5.1. Lots of new features in various drivers but nothing really special standing out. Major changes: brcmfmac * DMI nvram filename quirk for PoV TAB-P1006W-232 tablet rsi * support for hardware scan offload iwlwifi * support for Target Wakeup Time (TWT) -- a feature that allows the AP to specify when individual stations can access the medium * support for mac80211 AMSDU handling * some new PCI IDs * relicense the pcie submodule to dual GPL/BSD * reworked the TOF/CSI (channel estimation matrix) implementation * Some product name updates in the human-readable strings mt76 * energy detect regulatory compliance fixes * preparation for MT7603 support * channel switch announcement support mwifiex * support for sd8977 chipset qtnfmac * support for 4addr mode * convert to SPDX license identifiers ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06regulator: core: Only support passing enable GPIO descriptorsLinus Walleij
Now that we changed all providers to pass descriptors into the core for enable GPIOs instead of a global GPIO number, delete the support for passing GPIO numbers in, and we get a cleanup and size reduction in the core, and from a GPIO point of view we use the modern, cleaner interface. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-06regulator: fixed/gpio: Pull inversion/OD into gpiolibLinus Walleij
This pushes the handling of inversion semantics and open drain settings to the GPIO descriptor and gpiolib. All affected board files are also augmented. This is especially nice since we don't have to have any confusing flags passed around to the left and right littering the fixed and GPIO regulator drivers and the regulator core. It is all just very straight-forward: the core asks the GPIO line to be asserted or deasserted and gpiolib deals with the rest depending on how the platform is configured: if the line is active low, it deals with that, if the line is open drain, it deals with that too. Cc: Alexander Shiyan <shc_work@mail.ru> # i.MX boards user Cc: Haojian Zhuang <haojian.zhuang@gmail.com> # MMP2 maintainer Cc: Aaro Koskinen <aaro.koskinen@iki.fi> # OMAP1 maintainer Cc: Tony Lindgren <tony@atomide.com> # OMAP1,2,3 maintainer Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> # EM-X270 maintainer Cc: Robert Jarzmik <robert.jarzmik@free.fr> # EZX maintainer Cc: Philipp Zabel <philipp.zabel@gmail.com> # Magician maintainer Cc: Petr Cvek <petr.cvek@tul.cz> # Magician Cc: Robert Jarzmik <robert.jarzmik@free.fr> # PXA Cc: Paul Parsons <lost.distance@yahoo.com> # hx4700 Cc: Daniel Mack <zonque@gmail.com> # Raumfeld maintainer Cc: Marc Zyngier <marc.zyngier@arm.com> # Zeus maintainer Cc: Geert Uytterhoeven <geert+renesas@glider.be> # SuperH pinctrl/GPIO maintainer Cc: Russell King <rmk+kernel@armlinux.org.uk> # SA1100 Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> #OMAP1 Amstrad Delta Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-06regulator: gpio: Convert to use descriptorsLinus Walleij
This converts the GPIO regulator driver to use decriptors only. We have to let go of the array gpio handling: the fetched descriptors are handled individually anyway, and the array retrieveal function does not make it possible to retrieve each GPIO descriptor with unique flags. Instead get them one by one. We request the "enable" GPIO separately as before, and make sure that this line is requested as nonexclusive since enable lines can be shared and the regulator core expects this. Most users of the GPIO regulator are using device tree. There are two boards in the kernel using the gpio regulator from a non-devicetree path: PXA hx4700 and magician. Make sure to switch these over to use descriptors as well. Cc: Philipp Zabel <p.zabel@pengutronix.de> # Magician Cc: Petr Cvek <petr.cvek@tul.cz> # Magician Cc: Robert Jarzmik <robert.jarzmik@free.fr> # PXA Cc: Paul Parsons <lost.distance@yahoo.com> # hx4700 Cc: Kevin Hilman <khilman@baylibre.com> # Meson Cc: Neil Armstrong <narmstrong@baylibre.com> # Meson Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-05vfio-mdev: Switch to use new generic UUID APIAndy Shevchenko
There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05mtd: rawnand: remove ->legacy.erase and single_erase()Masahiro Yamada
Now that the last user of this hook, denali.c, stopped using it, we can remove the erase hook from nand_legacy. I squashed single_erase() because only the difference between single_erase() and nand_erase_op() is the number of bit shifts. The status/ret conversion in nand_erase_nand() is unneeded since commit eb94555e9e97 ("mtd: nand: use usual return values for the ->erase() hook"). Cleaned it up now. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Boris Brezillon <bbrezillon@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-02-05mtd: rawnand: Simplify the lockingBoris Brezillon
nand_get_device() was complex for apparently no good reason. Let's replace this locking scheme with 2 mutexes: one attached to the controller and another one attached to the chip. Every time the core calls nand_get_device(), it will first lock the chip and if the chip is not suspended, will then lock the controller. nand_release_device() will release both lock in the reverse order. nand_get_device() can sleep, just like the previous implementation, which means you should never call that from an atomic context. We also get rid of - the chip->state field, since all it was used for was flagging the chip as suspended. We replace it by a field called chip->suspended and directly set it from nand_suspend/resume() - the controller->wq and controller->active fields which are no longer needed since the new controller->lock (now a mutex) guarantees that all operations are serialized at the controller level - panic_nand_get_device() which would anyway be a no-op. Talking about panic write, I keep thinking the rawnand implementation is unsafe because there's not negotiation with the controller to know when it's actually done with it's previous operation. I don't intend to fix that here, but that's probably something we should look at, or maybe we should consider dropping the ->_panic_write() implementation Last important change to mention: we now return -EBUSY when someone tries to access a device that as been suspended, and propagate this error to the upper layer. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-02-05irqdesc: Add domain handler for NMIsJulien Thierry
NMI handling code should be executed between calls to nmi_enter and nmi_exit. Add a separate domain handler to properly setup NMI context when handling an interrupt requested as NMI. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide NMI handlersJulien Thierry
Provide flow handlers that are NMI safe for interrupts and percpu_devid interrupts. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide NMI management for percpu_devid interruptsJulien Thierry
Add support for percpu_devid interrupts treated as NMIs. Percpu_devid NMIs need to be setup/torn down on each CPU they target. The same restrictions as for global NMIs still apply for percpu_devid NMIs. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide basic NMI management for interrupt linesJulien Thierry
Add functionality to allocate interrupt lines that will deliver IRQs as Non-Maskable Interrupts. These allocations are only successful if the irqchip provides the necessary support and allows NMI delivery for the interrupt line. Interrupt lines allocated for NMI delivery must be enabled/disabled through enable_nmi/disable_nmi_nosync to keep their state consistent. To treat a PERCPU IRQ as NMI, the interrupt must not be shared nor threaded, the irqchip directly managing the IRQ must be the root irqchip and the irqchip cannot be behind a slow bus. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-04net: phy: fixed-phy: Drop GPIO from fixed_phy_add()Linus Walleij
All users of the fixed_phy_add() pass -1 as GPIO number to the fixed phy driver, and all users of fixed_phy_register() pass -1 as GPIO number as well, except for the device tree MDIO bus. Any new users should create a proper device and pass the GPIO as a descriptor associated with the device so delete the GPIO argument from the calls and drop the code looking requesting a GPIO in fixed_phy_add(). In fixed phy_register(), investigate the "fixed-link" node and pick the GPIO descriptor from "link-gpios" if this property exists. Move the corresponding code out of of_mdio.c as the fixed phy code anyways requires OF to be in use. Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04netfilter: ipv6: avoid indirect calls for IPV6=y caseFlorian Westphal
indirect calls are only needed if ipv6 is a module. Add helpers to abstract the v6ops indirections and use them instead. fragment, reroute and route_input are kept as indirect calls. The first two are not not used in hot path and route_input is only used by bridge netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nat: remove module dependency on ipv6 coreFlorian Westphal
nf_nat_ipv6 calls two ipv6 core functions, so add those to v6ops to avoid the module dependency. This is a prerequisite for merging ipv4 and ipv6 nat implementations. Add wrappers to avoid the indirection if ipv6 is builtin. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-03netdevice.h: Add __cold to netdev_<level> logging functionsJoe Perches
Add __cold to the netdev_<level> logging functions similar to the use of __cold in the generic printk function. Using __cold moves all the netdev_<level> logging functions out-of-line possibly improving code locality and runtime performance. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03net: Fix ip_mc_{dec,inc}_group allocation contextFlorian Fainelli
After 4effd28c1245 ("bridge: join all-snoopers multicast address"), I started seeing the following sleep in atomic warnings: [ 26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh [ 26.777855] INFO: lockdep is turned off. [ 26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20 [ 26.787943] Hardware name: BCM97278SV (DT) [ 26.792118] Call trace: [ 26.794645] dump_backtrace+0x0/0x170 [ 26.798391] show_stack+0x24/0x30 [ 26.801787] dump_stack+0xa4/0xe4 [ 26.805182] ___might_sleep+0x208/0x218 [ 26.809102] __might_sleep+0x78/0x88 [ 26.812762] kmem_cache_alloc_trace+0x64/0x28c [ 26.817301] igmp_group_dropped+0x150/0x230 [ 26.821573] ip_mc_dec_group+0x1b0/0x1f8 [ 26.825585] br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190 [ 26.831704] br_multicast_toggle+0x78/0xcc [ 26.835887] store_bridge_parm+0xc4/0xfc [ 26.839894] multicast_snooping_store+0x3c/0x4c [ 26.844517] dev_attr_store+0x44/0x5c [ 26.848262] sysfs_kf_write+0x50/0x68 [ 26.852006] kernfs_fop_write+0x14c/0x1b4 [ 26.856102] __vfs_write+0x60/0x190 [ 26.859668] vfs_write+0xc8/0x168 [ 26.863059] ksys_write+0x70/0xc8 [ 26.866449] __arm64_sys_write+0x24/0x30 [ 26.870458] el0_svc_common+0xa0/0x11c [ 26.874291] el0_svc_handler+0x38/0x70 [ 26.878120] el0_svc+0x8/0xc while toggling the bridge's multicast_snooping attribute dynamically. Pass a gfp_t down to igmpv3_add_delrec(), introduce __igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t argument. Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to allow caller to specify gfp_t. IPv6 part of the patch appears fine. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMPING_NEWDeepa Dinamani
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: rth@twiddle.net Cc: tglx@linutronix.de Cc: ubraun@linux.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWDeepa Dinamani
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Use old_timeval types for socket timestampsDeepa Dinamani
As part of y2038 solution, all internal uses of struct timeval are replaced by struct __kernel_old_timeval and struct compat_timeval by struct old_timeval32. Make socket timestamps use these new types. This is mainly to be able to verify that the kernel build is y2038 safe when such non y2038 safe types are not supported anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: isdn@linux-pingi.de Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few updates for x86: - Fix an unintended sign extension issue in the fault handling code - Rename the new resource control config switch so it's less confusing - Avoid setting up EFI info in kexec when the EFI runtime is disabled. - Fix the microcode version check in the AMD microcode loader so it only loads higher version numbers and never downgrades - Set EFER.LME in the 32bit trampoline before returning to long mode to handle older AMD/KVM behaviour properly. - Add Darren and Andy as x86/platform reviewers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid confusion over the new X86_RESCTRL config x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86/microcode/amd: Don't falsely trick the late loading mechanism MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers x86/fault: Fix sign-extend unintended sign extension x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode x86/cpu: Add Atom Tremont (Jacobsville)
2019-02-03Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug fixes from Thomas Gleixner: "Two fixes for the cpu hotplug machinery: - Replace the overly clever 'SMT disabled by BIOS' detection logic as it breaks KVM scenarios and prevents speculation control updates when the Hyperthreads are brought online late after boot. - Remove a redundant invocation of the speculation control update function" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM x86/speculation: Remove redundant arch_smt_update() invocation
2019-02-03net/mlx5: Set ODP SRQ support in firmwareMoni Shoua
To avoid compatibility issue with older kernels the firmware doesn't allow SRQ to work with ODP unless kernel asks for it. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-03net/mlx5: Add XRC transport to ODP device capabilities layoutMoni Shoua
The device capabilities for ODP structure was missing the field for XRC transport so add it here. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-02Merge tag 'for-linus-20190202' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should go into this release. This contains: - MD pull request from Song, fixing a recovery OOM issue (Alexei) - Fix for a sync related stall (Jianchao) - Dummy callback for timeouts (Tetsuo) - IDE atapi sense ordering fix (me)" * tag 'for-linus-20190202' of git://git.kernel.dk/linux-block: ide: ensure atapi sense request aren't preempted blk-mq: fix a hung issue when fsync block: pass no-op callback to INIT_WORK(). md/raid5: fix 'out of memory' during raid cache recovery
2019-02-02x86/resctrl: Avoid confusion over the new X86_RESCTRL configJohannes Weiner
"Resource Control" is a very broad term for this CPU feature, and a term that is also associated with containers, cgroups etc. This can easily cause confusion. Make the user prompt more specific. Match the config symbol name. [ bp: In the future, the corresponding ARM arch-specific code will be under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out under the CPU_RESCTRL umbrella symbol. ] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Babu Moger <Babu.Moger@amd.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: linux-doc@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
2019-02-01mm/hotplug: invalid PFNs from pfn_to_online_page()Qian Cai
On an arm64 ThunderX2 server, the first kmemleak scan would crash [1] with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is not directly mapped (MEMBLOCK_NOMAP). Hence, the page->flags is uninitialized. This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning") starts to use pfn_to_online_page() instead of pfn_valid(). However, in the CONFIG_MEMORY_HOTPLUG=y case, pfn_to_online_page() does not call memblock_is_map_memory() while pfn_valid() does. Historically, the commit 68709f45385a ("arm64: only consider memblocks with NOMAP cleared for linear mapping") causes pages marked as nomap being no long reassigned to the new zone in memmap_init_zone() by calling __init_single_page(). Since the commit 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes") introduced pfn_to_online_page() and was designed to return a valid pfn only, but it is clearly broken on arm64. Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can handle nomap thanks to the commit f52bb98f5ade ("arm64: mm: always enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on architectures where have no HOLES_IN_ZONE. [1] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 Internal error: Oops: 96000005 [#1] SMP CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8 pstate: 60400009 (nZCv daif +PAN -UAO) pc : page_mapping+0x24/0x144 lr : __dump_page+0x34/0x3dc sp : ffff00003a5cfd10 x29: ffff00003a5cfd10 x28: 000000000000802f x27: 0000000000000000 x26: 0000000000277d00 x25: ffff000010791f56 x24: ffff7fe000000000 x23: ffff000010772f8b x22: ffff00001125f670 x21: ffff000011311000 x20: ffff000010772f8b x19: fffffffffffffffe x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff802698b19600 x13: ffff802698b1a200 x12: ffff802698b16f00 x11: ffff802698b1a400 x10: 0000000000001400 x9 : 0000000000000001 x8 : ffff00001121a000 x7 : 0000000000000000 x6 : ffff0000102c53b8 x5 : 0000000000000000 x4 : 0000000000000003 x3 : 0000000000000100 x2 : 0000000000000000 x1 : ffff000010772f8b x0 : ffffffffffffffff Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____)) Call trace: page_mapping+0x24/0x144 __dump_page+0x34/0x3dc dump_page+0x28/0x4c kmemleak_scan+0x4ac/0x680 kmemleak_scan_thread+0xb4/0xdc kthread+0x12c/0x13c ret_from_fork+0x10/0x18 Code: d503201f f9400660 36000040 d1000413 (f9400661) ---[ end trace 4d4bd7f573490c8e ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x002,20000c38 Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw Fixes: 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01oom, oom_reaper: do not enqueue same task twiceTetsuo Handa
Arkadiusz reported that enabling memcg's group oom killing causes strange memcg statistics where there is no task in a memcg despite the number of tasks in that memcg is not 0. It turned out that there is a bug in wake_oom_reaper() which allows enqueuing same task twice which makes impossible to decrease the number of tasks in that memcg due to a refcount leak. This bug existed since the OOM reaper became invokable from task_will_free_mem(current) path in out_of_memory() in Linux 4.7, T1@P1 |T2@P1 |T3@P1 |OOM reaper ----------+----------+----------+------------ # Processing an OOM victim in a different memcg domain. try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) out_of_memory() oom_kill_process(P1) do_send_sig_info(SIGKILL, @P1) mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T2@P1) wake_oom_reaper(T2@P1) # T2@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL. mutex_unlock(&oom_lock) # Completed processing an OOM victim in a different memcg domain. spin_lock(&oom_reaper_lock) # T1P1 is dequeued. spin_unlock(&oom_reaper_lock) but memcg's group oom killing made it easier to trigger this bug by calling wake_oom_reaper() on the same task from one out_of_memory() request. Fix this bug using an approach used by commit 855b018325737f76 ("oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a side effect of this patch, this patch also avoids enqueuing multiple threads sharing memory via task_will_free_mem(current) path. Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Aleksa Sarai <asarai@suse.de> Cc: Jay Kamat <jgkamat@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>