summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-10-24team: fix nested locking lockdep warningTaehee Yoo
team interface could be nested and it's lock variable could be nested too. But this lock uses static lockdep key and there is no nested locking handling code such as mutex_lock_nested() and so on. so the Lockdep would warn about the circular locking scenario that couldn't happen. In order to fix, this patch makes the team module to use dynamic lock key instead of static key. Test commands: ip link add team0 type team ip link add team1 type team ip link set team0 master team1 ip link set team0 nomaster ip link set team1 master team0 ip link set team1 nomaster Splat that looks like: [ 40.364352] WARNING: possible recursive locking detected [ 40.364964] 5.4.0-rc3+ #96 Not tainted [ 40.365405] -------------------------------------------- [ 40.365973] ip/750 is trying to acquire lock: [ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team] [ 40.367689] but task is already holding lock: [ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team] [ 40.370280] other info that might help us debug this: [ 40.371159] Possible unsafe locking scenario: [ 40.371942] CPU0 [ 40.372338] ---- [ 40.372673] lock(&team->lock); [ 40.373115] lock(&team->lock); [ 40.373549] *** DEADLOCK *** [ 40.374432] May be due to missing lock nesting notation [ 40.375338] 2 locks held by ip/750: [ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team] [ 40.377989] stack backtrace: [ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 40.380574] Call Trace: [ 40.381208] dump_stack+0x7c/0xbb [ 40.381959] __lock_acquire+0x269d/0x3de0 [ 40.382817] ? register_lock_class+0x14d0/0x14d0 [ 40.383784] ? check_chain_key+0x236/0x5d0 [ 40.384518] lock_acquire+0x164/0x3b0 [ 40.385074] ? team_set_mac_address+0x151/0x290 [team] [ 40.385805] __mutex_lock+0x14d/0x14c0 [ 40.386371] ? team_set_mac_address+0x151/0x290 [team] [ 40.387038] ? team_set_mac_address+0x151/0x290 [team] [ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380 [ 40.388245] ? team_del_slave+0x60/0x60 [team] [ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0 [ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 40.389819] ? lock_acquire+0x164/0x3b0 [ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20 [ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team] [ 40.391353] ? __module_text_address+0x13/0x140 [ 40.391886] ? team_set_mac_address+0x151/0x290 [team] [ 40.392547] team_set_mac_address+0x151/0x290 [team] [ 40.393111] dev_set_mac_address+0x1f0/0x3f0 [ ... ] Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: core: add generic lockdep keysTaehee Yoo
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: core: limit nested device depthTaehee Yoo
Current code doesn't limit the number of nested devices. Nested devices would be handled recursively and this needs huge stack memory. So, unlimited nested devices could make stack overflow. This patch adds upper_level and lower_level, they are common variables and represent maximum lower/upper depth. When upper/lower device is attached or dettached, {lower/upper}_level are updated. and if maximum depth is bigger than 8, attach routine fails and returns -EMLINK. In addition, this patch converts recursive routine of netdev_walk_all_{lower/upper} to iterator routine. Test commands: ip link add dummy0 type dummy ip link add link dummy0 name vlan1 type vlan id 1 ip link set vlan1 up for i in {2..55} do let A=$i-1 ip link add vlan$i link vlan$A type vlan id $i done ip link del dummy0 Splat looks like: [ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850 [ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908 [ 155.515048][ T908] [ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 155.517233][ T908] Call Trace: [ 155.517627][ T908] [ 155.517918][ T908] Allocated by task 0: [ 155.518412][ T908] (stack is not available) [ 155.518955][ T908] [ 155.519228][ T908] Freed by task 0: [ 155.519885][ T908] (stack is not available) [ 155.520452][ T908] [ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0 [ 155.520729][ T908] which belongs to the cache names_cache of size 4096 [ 155.522387][ T908] The buggy address is located 512 bytes inside of [ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0) [ 155.523920][ T908] The buggy address belongs to the page: [ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0 [ 155.525836][ T908] flags: 0x100000000010200(slab|head) [ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0 [ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 155.528429][ T908] page dumped because: kasan: bad access detected [ 155.529158][ T908] [ 155.529410][ T908] Memory state around the buggy address: [ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3 [ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 155.532806][ T908] ^ [ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00 [ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00 [ ... ] Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24i2c: add support for filters optional propertiesEugen Hristev
i2c-digital-filter-width-ns: This optional timing property specifies the width of the spikes on the i2c lines (in ns) that can be filtered out by built-in digital filters which are embedded in some i2c controllers. i2c-analog-filter-cutoff-frequency: This optional timing property specifies the cutoff frequency of a low-pass analog filter built-in i2c controllers. This low pass filter is used to filter out high frequency noise on the i2c lines. Specified in Hz. Include these properties in the timings structure and read them as integers. Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-10-24reset: document (devm_)reset_control_get_optional variantsPhilipp Zabel
Add kerneldoc comments for the optional reset_control_get variants. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-10-24reset: improve of_xlate documentationPhilipp Zabel
Mention of_reset_simple_xlate as the default if of_xlate is not set. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-10-24reset: fix reset_control_get_exclusive kerneldoc commentPhilipp Zabel
Add missing parentheses to correctly hyperlink the reference to reset_control_get_shared(). Fixes: 0b52297f2288 ("reset: Add support for shared reset controls") Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-10-24reset: fix reset_control_lookup kerneldoc commentPhilipp Zabel
Add a missing colon to fix a documentation build warning: ./include/linux/reset-controller.h:45: warning: Function parameter or member 'con_id' not described in 'reset_control_lookup' Fixes: 6691dffab0ab ("reset: add support for non-DT systems") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-10-24rtc: mt6397: move some common definitions into rtc.hJosef Friedl
move code to separate header-file to reuse definitions later in poweroff-driver (drivers/power/reset/mt6323-poweroff.c) Suggested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Josef Friedl <josef.friedl@speed.at> Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-10-24dma-buf: change DMA-buf locking convention v3Christian König
This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2019-10-23net: phy: broadcom: add 1000Base-X support for BCM54616STao Ren
The BCM54616S PHY cannot work properly in RGMII->1000Base-X mode, mainly because genphy functions are designed for copper links, and 1000Base-X (clause 37) auto negotiation needs to be handled differently. This patch enables 1000Base-X support for BCM54616S by customizing 3 driver callbacks, and it's verified to be working on Facebook CMM BMC platform (RGMII->1000Base-KX): - probe: probe callback detects PHY's operation mode based on INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX Control register. - config_aneg: calls genphy_c37_config_aneg when the PHY is running in 1000Base-X mode; otherwise, genphy_config_aneg will be called. - read_status: calls genphy_c37_read_status when the PHY is running in 1000Base-X mode; otherwise, genphy_read_status will be called. Note: BCM54616S PHY can also be configured in RGMII->100Base-FX mode, and 100Base-FX support is not available as of now. Signed-off-by: Tao Ren <taoren@fb.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23net: phy: add support for clause 37 auto-negotiationHeiner Kallweit
This patch adds support for clause 37 1000Base-X auto-negotiation. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Tao Ren <taoren@fb.com> Tested-by: René van Dorst <opensource@vdorst.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23net/flow_dissector: switch to siphashEric Dumazet
UDP IPv6 packets auto flowlabels are using a 32bit secret (static u32 hashrnd in net/core/flow_dissector.c) and apply jhash() over fields known by the receivers. Attackers can easily infer the 32bit secret and use this information to identify a device and/or user, since this 32bit secret is only set at boot time. Really, using jhash() to generate cookies sent on the wire is a serious security concern. Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be a dead end. Trying to periodically change the secret (like in sch_sfq.c) could change paths taken in the network for long lived flows. Let's switch to siphash, as we did in commit df453700e8d8 ("inet: switch IP ID generator to siphash") Using a cryptographically strong pseudo random function will solve this privacy issue and more generally remove other weak points in the stack. Packet schedulers using skb_get_hash_perturb() benefit from this change. Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default") Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Berger <jonathann1@walla.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23spi: document CS setup, hold & inactive times in headerAlexandru Ardelean
This change documents the CS setup, host & inactive times. They were omitted when the fields were added, and were caught by one of the build bots. Fixes: 25093bdeb6bc ("spi: implement SW control for CS times") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20191023070046.12478-1-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-23Remove the nr_exclusive argument from __wake_up_sync_key()David Howells
Remove the nr_exclusive argument from __wake_up_sync_key() and derived functions as everything seems to set it to 1. Note also that if it wasn't set to 1, it would clear WF_SYNC anyway. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2019-10-23compat_ioctl: reimplement SG_IO handlingArnd Bergmann
There are two code locations that implement the SG_IO ioctl: the old sg.c driver, and the generic scsi_ioctl helper that is in turn used by multiple drivers. To eradicate the old compat_ioctl conversion handler for the SG_IO command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr() helper functions that can be used for both compat and native mode, and then I call this from both drivers. For the iovec handling, there is already a compat_import_iovec() function that can simply be called in place of import_iovec(). To avoid having to pass the compat/native state through multiple indirections, I mark the SG_IO command itself as compatible in fs/compat_ioctl.c and use in_compat_syscall() to figure out where we are called from. As a side-effect of this, the sg.c driver now also accepts the 32-bit sg_io_hdr format in compat mode using the read/write interface, not just ioctl. This should improve compatiblity with old 32-bit binaries, but it would break if any application intentionally passes the 64-bit data structure in compat mode here. Steffen Maier helped debug an issue in an earlier version of this patch. Cc: Steffen Maier <maier@linux.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: Doug Gilbert <dgilbert@interlog.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23compat_ioctl: move tape handling into driversArnd Bergmann
MTIOCPOS and MTIOCGET are incompatible between 32-bit and 64-bit user space, and traditionally have been translated in fs/compat_ioctl.c. To get rid of that translation handler, move a corresponding implementation into each of the four drivers implementing those commands. The interesting part of that is now in a new linux/mtio.h header that wraps the existing uapi/linux/mtio.h header and provides an abstraction to let drivers handle both cases easily. Using an in_compat_syscall() check, the caller does not have to keep track of whether this was called through .unlocked_ioctl() or .compat_ioctl(). Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Kai Mäkisara" <Kai.Makisara@kolumbus.fi> Cc: linux-scsi@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23compat: move FS_IOC_RESVSP_32 handling to fs/ioctl.cAl Viro
... and lose the ridiculous games with compat_alloc_user_space() there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23compat_ioctl: add compat_ptr_ioctl()Arnd Bergmann
Many drivers have ioctl() handlers that are completely compatible between 32-bit and 64-bit architectures, except for the argument that is passed down from user space and may have to be passed through compat_ptr() in order to become a valid 64-bit pointer. Using ".compat_ptr = compat_ptr_ioctl" in file operations should let us simplify a lot of those drivers to avoid #ifdef checks, and convert additional drivers that don't have proper compat handling yet. On most architectures, the compat_ptr_ioctl() just passes all arguments to the corresponding ->ioctl handler. The exception is arch/s390, where compat_ptr() clears the top bit of a 32-bit pointer value, so user space pointers to the second 2GB alias the first 2GB, as is the case for native 32-bit s390 user space. The compat_ptr_ioctl() function must therefore be used only with ioctl functions that either ignore the argument or pass a pointer to a compatible data type. If any ioctl command handled by fops->unlocked_ioctl passes a plain integer instead of a pointer, or any of the passed data types is incompatible between 32-bit and 64-bit architectures, a proper handler is required instead of compat_ptr_ioctl. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- v3: add a better description v2: use compat_ptr_ioctl instead of generic_compat_ioctl_ptrarg, as suggested by Al Viro
2019-10-23Merge v5.4-rc4 into drm-nextDaniel Vetter
Thierry needs fd70c7755bf0 ("drm/bridge: tc358767: fix max_tu_symbol value") to be able to merge his dp_link patch series. Some adjacent changes conflicts, plus some clashes in i915 due to cherry-picking and git trying to be helpful and leaving both versions in. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2019-10-23phy: tegra: xusb: Add vbus override support on Tegra210Nagarjuna Kristam
Tegra XUSB device control driver needs to control vbus override during its operations, add API for the support. Signed-off-by: Nagarjuna Kristam <nkristam@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2019-10-23mtd: spi-nor: Introduce 'struct spi_nor_controller_ops'Tudor Ambarus
Move all SPI NOR controller driver specific ops in a dedicated structure. 'struct spi_nor' becomes lighter. Use size_t for lengths in 'int (*write_reg)()' and 'int (*read_reg)()'. Rename wite/read_buf to buf, the name of the functions are suggestive enough. Constify buf in int (*write_reg). Comply with these changes in the SPI NOR controller drivers. Suggested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-10-23mtd: spi-nor: intel-spi: add support for Intel Cannon Lake SPI flashJethro Beekman
Now that SPI flash controllers without a software sequencer are supported, it's trivial to add support for CNL and its PCI ID. Values from https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf Signed-off-by: Jethro Beekman <jethro@fortanix.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
2019-10-22dynamic_debug: provide dynamic_hex_dump stubArnd Bergmann
The ionic driver started using dymamic_hex_dump(), but that is not always defined: drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration] Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG is disabled, printing nothing. Fixes: 938962d55229 ("ionic: Add adminq action") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22ipmi: Don't allow device module unload when in useCorey Minyard
If something has the IPMI driver open, don't allow the device module to be unloaded. Before it would unload and the user would get errors on use. This change is made on user request, and it makes it consistent with the I2C driver, which has the same behavior. It does change things a little bit with respect to kernel users. If the ACPI or IPMI watchdog (or any other kernel user) has created a user, then the device module cannot be unloaded. Before it could be unloaded, This does not affect hot-plug. If the device goes away (it's on something removable that is removed or is hot-removed via sysfs) then it still behaves as it did before. Reported-by: tony camuso <tcamuso@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Tested-by: tony camuso <tcamuso@redhat.com>
2019-10-22bpf: Fix use after free in subprog's jited symbol removalDaniel Borkmann
syzkaller managed to trigger the following crash: [...] BUG: unable to handle page fault for address: ffffc90001923030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline] RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline] RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline] RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline] RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline] RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline] RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709 [...] Call Trace: kernel_text_address kernel/extable.c:147 [inline] __kernel_text_address+0x9a/0x110 kernel/extable.c:102 unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19 arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26 stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123 save_stack mm/kasan/common.c:69 [inline] set_track mm/kasan/common.c:77 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3319 [inline] kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483 getname_flags+0xba/0x640 fs/namei.c:138 getname+0x19/0x20 fs/namei.c:209 do_sys_open+0x261/0x560 fs/open.c:1091 __do_sys_open fs/open.c:1115 [inline] __se_sys_open fs/open.c:1110 [inline] __x64_sys_open+0x87/0x90 fs/open.c:1110 do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] After further debugging it turns out that we walk kallsyms while in parallel we tear down a BPF program which contains subprograms that have been JITed though the program itself has not been fully exposed and is eventually bailing out with error. The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes the symbols, however, bpf_prog_free() tears down the JIT memory too early via scheduled work. Instead, it needs to properly respect RCU grace period as the kallsyms walk for BPF is under RCU. Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error path where we defer final destruction when we have subprogs in the program. Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup") Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
2019-10-22KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Remove unneeded kvm_vcpu variable, guest_xcr0_loadedAaron Lewis
The kvm_vcpu variable, guest_xcr0_loaded, is a waste of an 'int' and a conditional branch. VMX and SVM are the only users, and both unconditionally pair kvm_load_guest_xcr0() with kvm_put_guest_xcr0() making this check unnecessary. Without this variable, the predicates in kvm_load_guest_xcr0 and kvm_put_guest_xcr0 should match. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I7b1eb9b62969d7bbb2850f27e42f863421641b23 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22PM / OPP: Support adjusting OPP voltages at runtimeStephen Boyd
On some SoCs the Adaptive Voltage Scaling (AVS) technique is employed to optimize the operating voltage of a device. At a given frequency, the hardware monitors dynamic factors and either makes a suggestion for how much to adjust a voltage for the current frequency, or it automatically adjusts the voltage without software intervention. Add an API to the OPP library for the former case, so that AVS type devices can update the voltages for an OPP when the hardware determines the voltage should change. The assumption is that drivers like CPUfreq or devfreq will register for the OPP notifiers and adjust the voltage according to suggestions that AVS makes. This patch is derived from [1] submitted by Stephen. [1] https://lore.kernel.org/patchwork/patch/599279/ Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Roger Lu <roger.lu@mediatek.com> [s.nawrocki@samsung.com: added handling of OPP min/max voltage] Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-21soc: mmp: guard include of asm/cputype.h with CONFIG_ARM{,64}Olof Johansson
Since this driver is enabled for COMPILE_TEST, it avoids build error on x86 allmodconfig: In file included from /build/drivers/phy/marvell/phy-mmp3-usb.c:12: /build/include/linux/soc/mmp/cputype.h:5:10: fatal error: asm/cputype.h: No such file or directory Link: https://lore.kernel.org/r/20191022015658.14624-1-olof@lixom.net Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'mmp-drivers-for-v5.5' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/lkundrak/linux-mmp into arm/drivers ARM: Marvell MMP driver patches for v5.5 This tag includes the MMP3 USB2 PHY driver. The branch is based on mmp-soc-for-v5.5-2 because the driver depends on changes in MMP SoC support. * tag 'mmp-drivers-for-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lkundrak/linux-mmp: MAINTAINERS: phy: add entry for USB PHY drivers on MMP SoCs phy: Add USB2 PHY driver for Marvell MMP3 SoC MAINTAINERS: mmp: add Git repository ARM: mmp: remove MMP3 USB PHY registers from regs-usb.h ARM: mmp: move cputype.h to include/linux/soc/ ARM: mmp: add SMP support ARM: mmp: add support for MMP3 SoC ARM: mmp: define MMP_CHIPID by the means of CIU_REG() ARM: mmp: DT: convert timer driver to use TIMER_OF_DECLARE ARM: mmp: map the PGU as well ARM: mmp: don't select CACHE_TAUROS2 on all ARCH_MMP ARM: l2c: add definition for FWA in PL310 aux register Link: https://lore.kernel.org/r/7cee3ddbb553ba7fe6e1420e0dbc5adb4922b317.camel@v3.sk Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'mmp-soc-for-v5.5-2' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/lkundrak/linux-mmp into arm/soc ARM: Marvell MMP SoC patches for v5.5 This tag includes initial support for the Marvell MMP3 processor. MMP3 is used in OLPC XO-4 laptops, Panasonic Toughpad FZ-A1 tablet and Dell Wyse 3020/Tx0D thin clients. * tag 'mmp-soc-for-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/lkundrak/linux-mmp: MAINTAINERS: mmp: add Git repository ARM: mmp: remove MMP3 USB PHY registers from regs-usb.h ARM: mmp: move cputype.h to include/linux/soc/ ARM: mmp: add SMP support ARM: mmp: add support for MMP3 SoC ARM: mmp: define MMP_CHIPID by the means of CIU_REG() ARM: mmp: DT: convert timer driver to use TIMER_OF_DECLARE ARM: mmp: map the PGU as well ARM: mmp: don't select CACHE_TAUROS2 on all ARCH_MMP ARM: l2c: add definition for FWA in PL310 aux register Link: https://lore.kernel.org/r/3a035bed90f9d8acc49b2d11d20089b546062aea.camel@v3.sk Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21fscrypt: remove struct fscrypt_ctxEric Biggers
Now that ext4 and f2fs implement their own post-read workflow that supports both fscrypt and fsverity, the fscrypt-only workflow based around struct fscrypt_ctx is no longer used. So remove the unused code. This is based on a patch from Chandan Rajendra's "Consolidate FS read I/O callbacks code" patchset, but rebased onto the latest kernel, folded __fscrypt_decrypt_bio() into fscrypt_decrypt_bio(), cleaned up fscrypt_initialize(), and updated the commit message. Originally-from: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-21arm64: Retrieve stolen time as paravirtualized guestSteven Price
Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21arm/arm64: Provide a wrapper for SMCCC 1.1 callsSteven Price
SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI conduit. Rather than coding this in every call site, provide a macro which uses the correct instruction. The macro also handles the case where no conduit is configured/available returning a not supported error in res, along with returning the conduit used for the call. This allow us to remove some duplicated code and will be useful later when adding paravirtualized time hypervisor calls. Signed-off-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Allow kvm_device_ops to be constSteven Price
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Support stolen time reporting via shared structureSteven Price
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Implement kvm_put_guest()Steven Price
kvm_put_guest() is analogous to put_user() - it writes a single value to the guest physical address. The implementation is built upon put_user() and so it has the same single copy atomic properties. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Implement PV_TIME_FEATURES callSteven Price
This provides a mechanism for querying which paravirtualized time features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized time features we're about to add. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21iomap: use a srcmap for a read-modify-write I/OGoldwyn Rodrigues
The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21iomap: renumber IOMAP_HOLE to 0Christoph Hellwig
Instead of keeping a separate unnamed state for uninitialized iomaps, renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated as a hole. Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: ignore non-shared or non-data blocks in xfs_file_dirtyChristoph Hellwig
xfs_file_dirty is used to unshare reflink blocks. Rename the function to xfs_file_unshare to better document that purpose, and skip iomaps that are not shared and don't need zeroing. This will allow to simplify the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: better document the IOMAP_F_* flagsChristoph Hellwig
The documentation for IOMAP_F_* is a bit disorganized, and doesn't mention the fact that most flags are set by the file system and consumed by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and consumed by the file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: move struct iomap_page out of iomap.hChristoph Hellwig
Now that all the writepage code is in the iomap code there is no need to keep this structure public. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: lift the xfs writeback code to iomapChristoph Hellwig
Take the xfs writeback code and move it to fs/iomap. A new structure with three methods is added as the abstraction from the generic writeback code to the file system. These methods are used to map blocks, submit an ioend, and cancel a page that encountered an error before it was added to an ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it does] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-10-21pwm: stm32: Remove clutter from ternary operatorThierry Reding
Remove usage of the ternary operator to assign values for register fields. Instead, parameterize the register and field offset macros and pass the index to them. This removes clutter and improves readability. Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-10-21pwm: Update comment on struct pwm_ops::applyRasmus Villemoes
Commit 71523d1812ac (pwm: Ensure pwm_apply_state() doesn't modify the state argument) updated the kernel-doc for pwm_apply_state(), but not for the ->apply callback in the pwm_ops struct. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-10-21PCI: Fix missing inline for pci_pr3_present()Takashi Iwai
The inline prefix was missing in the dummy function pci_pr3_present() definition. Fix it. Reported-by: kbuild test robot <lkp@intel.com> Fixes: 52525b7a3cf8 ("PCI: Add a helper to check Power Resource Requirements _PR3 existence") Link: https://lore.kernel.org/r/201910212111.qHm6OcWx%lkp@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-10-21jbd2: Make state lock a spinlockThomas Gleixner
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held region because bit spinlocks disable preemption even on RT. A first attempt was to replace state lock with a spinlock placed in struct buffer_head and make the locking conditional on PREEMPT_RT and DEBUG_BIT_SPINLOCKS. Jan pointed out that there is a 4 byte hole in struct journal_head where a regular spinlock fits in and he would not object to convert the state lock to a spinlock unconditionally. Aside of solving the RT problem, this also gains lockdep coverage for the journal head state lock (bit-spinlocks are not covered by lockdep as it's hard to fit a lockdep map into a single bit). The trivial change would have been to convert the jbd_*lock_bh_state() inlines, but that comes with the downside that these functions take a buffer head pointer which needs to be converted to a journal head pointer which adds another level of indirection. As almost all functions which use this lock have a journal head pointer readily available, it makes more sense to remove the lock helper inlines and write out spin_*lock() at all call sites. Fixup all locking comments as well. Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Move dropping of jh reference out of un/re-filing functionsJan Kara
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop transaction's jh reference when they remove jh from a transaction. This will be however inconvenient once we move state lock into journal_head itself as we still need to unlock it and we'd need to grab jh reference just for that. Move dropping of jh reference out of these functions into the few callers. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>