summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-01Merge branch 'udp-fix-early-demux-for-mcast-packets'David S. Miller
Paolo Abeni says: ==================== udp: fix early demux for mcast packets Currently the early demux callbacks do not perform source address validation. This is not an issue for TCP or UDP unicast, where the early demux is only allowed for connected sockets and the source address is validated for the first packet and never change. The UDP protocol currently allows early demux also for unconnected multicast sockets, and we are not currently doing any validation for them, after that the first packet lands on the socket: beyond ignoring the rp_filter - if enabled - any kind of martian sources are also allowed. This series addresses the issue allowing the early demux callback to return an error code, and performing the proper checks for unconnected UDP multicast sockets before leveraging the rx dst cache. Alternatively we could disable the early demux for unconnected mcast sockets, but that would cause relevant performance regression - around 50% - while with this series, with full rp_filter in place, we keep the regression to a more moderate level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01udp: perform source validation for mcast early demuxPaolo Abeni
The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01IPv4: early demux can return an error codePaolo Abeni
Currently no error is emitted, but this infrastructure will used by the next patch to allow source address validation for mcast sockets. Since early demux can do a route lookup and an ipv4 route lookup can return an error code this is consistent with the current ipv4 route infrastructure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx pathXin Long
Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel device, like ip6gre_tap tunnel, for which it should also subtract ether header to get the correct mtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip6_gre: ip6gre_tap device should keep dstXin Long
The patch 'ip_gre: ipgre_tap device should keep dst' fixed a issue that ipgre_tap mtu couldn't be updated in tx path. The same fix is needed for ip6gre_tap as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01ip_gre: ipgre_tap device should keep dstXin Long
Without keeping dst, the tunnel will not update any mtu/pmtu info, since it does not have a dst on the skb. Reproducer: client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server After reducing eth1's mtu on client, then perforamnce became 0. This patch is to netif_keep_dst in gre_tap_init, as ipgre does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-09-29 This series contains updates to i40e and i40evf only. Jake provides several of the changes starting with the renaming of a variable to clarify what the value is actually calculating. Found we were misusing the __I40E_RECOVERY_PENDING bit to determine when we should actually request a new IRQ in i40e_setup_misc_vector(), which lead to a design mistake, so to resolve the issue, use a separate state bit for miscellaneous IRQ setup and fix up the design while we are at it. Cleaned up the old legacy PM support in the driver since we support the newer generic PM callbacks. Fixed a failure to hibernate issue, where on some platforms with a large number of CPUs, we would allocate many IRQ vectors which we would try to migrate to CPU0 when hibernating. Sudheer cleans up a check for unqualified module inside i40e_up_complete() because the link state information is in flux at time, so log messages are getting logged with incorrect link state information. Also provided additional log message cleanups and simplify member variable access in the printing of the link messages. Mariusz relaxes the firmware check since Fortville and Fort Park NICs can and do have different firmware versions, so only warn for older Fortville firmware. Fixed an errata with a flow director statistic that was not wrapping as expected, simply reset after reading. Mitch prevents consternation by lowering the log level to debug on a message seen regularly on VF reset or unload, which is meaningless under normal circumstances. Refactor the firmware version checking since Fortville and Fort Park devices can have different firmware versions. Alan fixes a ring to vector mapping, where the past implementation attempted to map each Tx and Rx ring to its own vector, however we use combined queues so we should be mapping the Tx/Rx rings together on one vector. Adds the ability for the VF to request a different number of queues allocated to it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30Merge tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd fixes from Boris Brezillon: - Fix partition alignment check in mtdcore.c - Fix a buffer overflow in the Atmel NAND driver * tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtd: mtd: nand: atmel: fix buffer overflow in atmel_pmecc_user mtd: Fix partition alignment check on multi-erasesize devices
2017-09-30Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight mostly minor fixes for recently discovered issues in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ILLEGAL REQUEST + ASC==27 => target failure scsi: aacraid: Add a small delay after IOP reset scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add() scsi: scsi_transport_fc: set scsi_target_id upon rescan scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly scsi: aacraid: error: testing array offset 'bus' after use scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
2017-09-30netlink: do not proceed if dump's start() errsJason A. Donenfeld
Drivers that use the start method for netlink dumping rely on dumpit not being called if start fails. For example, ila_xlat.c allocates memory and assigns it to cb->args[0] in its start() function. It might fail to do that and return -ENOMEM instead. However, even when returning an error, dumpit will be called, which, in the example above, quickly dereferences the memory in cb->args[0], which will OOPS the kernel. This is but one example of how this goes wrong. Since start() has always been a function with an int return type, it therefore makes sense to use it properly, rather than ignoring it. This patch thus returns early and does not call dumpit() when start() fails. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30batman-adv: Add argument names for function ptr definitionsSven Eckelmann
checkpatch started to report unnamed arguments in function pointer definitions. Add the corresponding names to these definitions to avoid this warning. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-09-30mkiss: remove redundant check on len being zeroColin Ian King
The check on len is redundant as it is always greater than 1, so just remove it and make the printk less complex. Detected by CoverityScan, CID#1226729 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND)Maciej Żenczykowski
So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable even on IPv6 sockets. However, it turns out it is perfectly reasonable to want to set freebind on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP socket option on such a socket (they're all blindly errored out). One use case for this is to allow spoofing src ip on a raw socket via sendmsg cmsg. Tested: built, and booted # python >>> import socket >>> SOL_IP = socket.SOL_IP >>> SOL_IPV6 = socket.IPPROTO_IPV6 >>> IP_FREEBIND = 15 >>> IPV6_FREEBIND = 78 >>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0) >>> s.getsockopt(SOL_IP, IP_FREEBIND) 0 >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND) 0 >>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1) >>> s.getsockopt(SOL_IP, IP_FREEBIND) 1 >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND) 1 Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30net: ipv6: send NS for DAD when link operationally upMike Manning
The NS for DAD are sent on admin up as long as a valid qdisc is found. A race condition exists by which these packets will not egress the interface if the operational state of the lower device is not yet up. The solution is to delay DAD until the link is operationally up according to RFC2863. Rather than only doing this, follow the existing code checks by deferring IPv6 device initialization altogether. The fix allows DAD on devices like tunnels that are controlled by userspace control plane. The fix has no impact on regular deployments, but means that there is no IPv6 connectivity until the port has been opened in the case of port-based network access control, which should be desirable. Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29Merge tag 'platform-drivers-x86-v4.14-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform drivers fix from Darren Hart: "Newly discovered species of fujitsu laptops break some assumptions about ACPI device pairings. fujitsu-laptop: Don't oops when FUJ02E3 is not present" * tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: fujitsu-laptop: Don't oops when FUJ02E3 is not presnt
2017-09-29Merge tag 'led_fixes-4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: "Four fixes for the as3645a LED flash controller and one update to MAINTAINERS" * tag 'led_fixes-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: MAINTAINERS: Add entry for MediaTek PMIC LED driver as3645a: Unregister indicator LED on device unbind as3645a: Use integer numbers for parsing LEDs dt: bindings: as3645a: Use LED number to refer to LEDs as3645a: Use ams,input-max-microamp as documented in DT bindings
2017-09-29Merge tag 'v4.14-rockchip-clkfixes-1' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-fixes Pull Rockchip clk driver fixes from Heiko Stuebner: Some smallish fixes for the rk3128 clock support including some register errors and some clocks that should be critical for safe usage. * tag 'v4.14-rockchip-clkfixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: add sclk_timer5 as critical clock on rk3128 clk: rockchip: fix up rk3128 pvtm and mipi_24m gate regs error clk: rockchip: add pclk_pmu as critical clock on rk3128
2017-09-29clk: Export clk_bulk_prepare()Bjorn Andersson
Allow clk_bulk_prepare() to be referenced by kernel modules by adding the missing EXPORT_SYMBOL_GPL(). Fixes: 266e4e9d9150 ("clk: add clk_bulk_get accessories") Reported-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2017-09-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull waitid fix from Al Viro: "Fix infoleak in waitid()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix infoleak in waitid(2)
2017-09-29Merge branch 'for-4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected a bunch of isolated fixes, for crashes, user-visible behaviour or missing bits from other subsystem cleanups from the past. The overall number is not small but I was not able to make it significantly smaller. Most of the patches are supposed to go to stable" * 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: log csums for all modified extents Btrfs: fix unexpected result when dio reading corrupted blocks btrfs: Report error on removing qgroup if del_qgroup_item fails Btrfs: skip checksum when reading compressed data if some IO have failed Btrfs: fix kernel oops while reading compressed data Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block Btrfs: do not backup tree roots when fsync btrfs: remove BTRFS_FS_QUOTA_DISABLING flag btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid Btrfs: send: fix error number for unknown inode types btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: finish ordered extent cleaning if no progress is found btrfs: clear ordered flag on cleaning up ordered extents Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO Btrfs: do not reset bio->bi_ops while writing bio Btrfs: use the new helper wbc_to_write_flags
2017-09-29Merge tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD fixes from Shaohua Li: "A few fixes for MD. Mainly fix a problem introduced in 4.13, which we retry bio for some code paths but not all in some situations" * tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid5: cap worker count dm-raid: fix a race condition in request handling md: fix a race condition for flush request handling md: separate request handling
2017-09-29i40e: refactor FW version checkingMitch Williams
The i40e driver now supports two different devices with two different firmware versions. So be smart about how we handle these. Move the FW version macros to the appropriate header file, and add a convenience macro that checks the version based on the device. Then use this macro to check whether or not the driver can use the new link info API. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: Enable VF to negotiate number of allocated queuesAlan Brady
Currently the PF allocates a default number of queues for each VF and cannot be changed. This patch enables the VF to request a different number of queues allocated to it. This patch also adds a new virtchnl op and capability flag to facilitate this negotiation. After the PF receives a request message, it will set a requested number of queues for that VF. Then when the VF resets, its VSI will get a new number of queues allocated to it. This is a best effort request and since we only allocate a guaranteed default number, if the VF tries to ask for more than the guaranteed number, there may not be enough in HW to accommodate it unless other queues for other VFs are freed. It should also be noted decreasing the number queues allocated to a VF to below the default will NOT enable the allocation of more than 32 VFs per PF and will not free queues guaranteed to each VF by default. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40evf: fix ring to vector mappingAlan Brady
The current implementation for mapping queues to vectors is broken because it attempts to map each Tx and Rx ring to its own vector, however we use combined queues so we should actually be mapping the Tx/Rx rings together on one vector. Also in the current implementation, in the case where we have more queues than vectors, we attempt to group the queues together into 'chunks' and map each 'chunk' of queues to a vector. Chunking them together would be more ideal if, and only if, we only had RSS because of the way the hashing algorithm works but in the case of a future patch that enables VF ADq, round robin assignment is better and still works with RSS. This patch resolves both those issues and simplifies the code needed to accomplish this. Instead of treating the case where we have more queues than vectors as special, if we notice our vector index is greater than vectors, reset the vector index to zero and continue mapping. This should ensure that in both cases, whether we have enough vectors for each queue or not, the queues get appropriately mapped. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: shutdown all IRQs and disable MSI-X when suspendedJacob Keller
On some platforms with a large number of CPUs, we will allocate many IRQ vectors. When hibernating, the system will attempt to migrate all of the vectors back to CPU0 when shutting down all the other CPUs. It is possible that we have so many vectors that it cannot re-assign them to CPU0. This is even more likely if we have many devices installed in one platform. The end result is failure to hibernate, as it is not possible to shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our interrupt scheme when the device is suspended. A more ideal solution would be some method for the stack to properly handle this for all drivers, rather than on a case-by-case basis for each driver to fix itself. However, until this more ideal solution exists, we can do our part and shutdown our IRQs during suspend, which should allow systems with a large number of CPUs to safely suspend or hibernate. It may be worth investigating if we should shut down even further when we suspend as it may make the path cleaner, but this was the minimum fix for the hibernation issue mentioned here. Testing-hints: This affects systems with a large number of CPUs, and with multiple devices enabled. Without this change, those platforms are unable to hibernate at all. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: prevent service task from running while we're suspendedJacob Keller
Although the service task does check the suspended status before running, it might already be part way through running when we go to suspend. Lets ensure that the service task is stopped and will not be restarted again until we finish resuming. This ensures that service task code does not cause strange interactions with the suspend/resume handlers. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: don't clear suspended state until we finish resumingJacob Keller
When handling suspend and resume callbacks we want to make sure that (a) we don't suspend again if we're already suspended and (b) we don't resume again if we're already resuming. Lets make sure we test_and_set the __I40E_SUSPENDED bit in i40e_suspend which ensures that a suspend call when already suspended will exit early. Additionally, if __I40E_SUSPENDED is not set when we begin resuming, exit early as well. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: use newer generic PM support instead of legacy PM callbacksJacob Keller
Stop using the old legacy PM support, since we now have stable support for the newer generic PM callbacks. This has several advantages. First, we no longer have to manage our own pci_save_state() and power changes, as it's preferred to have the PCI stack do this. Second, these routines get called for both hibernate and suspend to ram, so we can have the driver properly handle all the suspend/resume flows that it needs to. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: use separate state bit for miscellaneous IRQ setupJacob Keller
We currently (mis)use the __I40E_RECOVERY_PENDING bit to determine when we should actually request a new IRQ in i40e_setup_misc_vector(). This led to a design mistake where we open-coded the re-setup of the miscellaneous vector in i40e_resume() instead of using the function provided. If we did not open-code this and instead tried to use the i40e_setup_misc_vector() function, it would lead to never reallocating the IRQ. This would lead to a second i40e_suspend() call failing to free the vector due to a NULL pointer dereference. A future patch is going to re-work how the i40e_suspend() and i40e_resume() flows work to clear all IRQ vectors, which would require us to use i40e_setup_misc_vector() directly. Since during this time the __I40E_RECOVERY_PENDING bit is set, we'll never re-allocate the vector. Rather than leaving the open-coded setup in i40e_resume() lets just fix the problem properly in i40e_setup_misc_vector(). Introduce a new state bit which indicates when the IRQ has been assigned, which will be set when i40e_setup_misc_vector is first called. This ultimately resolves the issue of re-requesting the vector, without overloading the __I40E_RECOVERY_PENDING state. This ensures that the suspend/resume cycle can use the setup function instead of open-coding the re-request during resume. Additionally, since the only callers of i40e_stop_misc_vector also want to free it, move this code directly into the function to avoid duplication. Due to the new functionality, rename it to i40e_free_misc_vector(). This lets us drop the extra calls to free and re-enable the vector during i40e_suspend() and i40e_resume(). We don't need to call i40e_setup_misc_Vector() in i40e_resume() because it gets called by the i40e_rebuild() call. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40evf: lower message levelMitch Williams
We see this message regularly on VF reset or unload (which invokes a reset). It's essentially meaningless unless it's happening constantly. To prevent consternation, lower the log level to debug so it's not seen under normal circumstance. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: fix for flow director counters not wrapping as expectedMariusz Stachura
An errata with GLQF_PCNT causes it to not wrap as expected. This can cause an error in flow director statistics. This patch resets affected counters just after reading. Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: relax warning message in case of version mismatchMariusz Stachura
Fortville and Fort Park devices are often on different firmware release schedules. This change relaxes the minor version warning message, so it is only displayed for older FW warning version for old firmware Fortville 3 or earlier. Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: simplify member variable accessesSudheer Mogilappagari
This commit replaces usage of vsi->back in i40e_print_link_message() (which is actually a PF pointer) with temp variable. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: Fix link down message when interface is brought upSudheer Mogilappagari
i40e_print_link_message() is intended to compare new link state with current link state and print log message only if the new state is different from current state. However in current driver the new state does not get updated when link is going down because of the if condition. When an interface is brought down, vsi->state is set to I40E_VSI_DOWN in i40e_vsi_close() and later i40e_print_link_message() does not get invoked in i40e_link_event due to if condition. Hence link down message doesn't appear when link is going down. The down state is seen later during i40e_open() and old state gets printed. The actual link state doesn't get updated in i40e_close() or i40e_open() but when i40e_handle_link_event is called inside i40e_clean_adminq_subtask. This change allows i40e_print_link_message() to be called when interface is going down and keeps the state information updated. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e: Fix unqualified module message while bringing link upSudheer Mogilappagari
In current driver, when ifconfig ethx up is done, the link state doesn't transition to UP inside i40e_open(). It changes after AQ command response is handled in i40e_handle_link_event(). When pf->hw.phy.link_info.link_info is DOWN inside i40e_open(), The state is transient and invalid. So log message gets printed based on incorrect info (i.e link_info and an_info). This commit removes check for unqualified module inside i40e_up_complete(). The existing check in i40e_handle_link_event() logs the error message based on correct link state information. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29i40e/i40evf: rename bytes_per_int to bytes_per_usecJacob Keller
This value is not calculating bytes_per_int, which would actually just be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs. Rename the variable for clarity so that future developers understand what the value is actually calculating. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29Merge tag 'pci-v4.14-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert Uytterhoeven) - fix a race in sysfs driver_override store/show (Nicolai Stange) * tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix race condition with driver_override PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
2017-09-29Merge tag 'drm-fixes-for-v4.14-rc3' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Regular fixes pull, some amdkfd, amdgpu, etnaviv, sun4i, qxl, tegra fixes. I've got an outstanding pull for i915 but it wasn't on an rc2 base so I wanted to ship these out first, I might get to it before rc3 or I might not" * tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux: drm/tegra: trace: Fix path to include qxl: fix framebuffer unpinning drm/sun4i: cec: Enable back CEC-pin framework drm/amdkfd: Print event limit messages only once per process drm/amdkfd: Fix kernel-queue wrapping bugs drm/amdkfd: Fix incorrect destroy_mqd parameter drm/radeon: disable hard reset in hibernate for APUs drm/amdgpu: revert tile table update for oland etnaviv: fix gem object list corruption etnaviv: fix submit error path qxl: fix primary surface handling drm/amdkfd: check for null dev to avoid a null pointer dereference
2017-09-29Merge tag 'iommu-fixes-v4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - A comment fix for 'struct iommu_ops' - Format string fixes for AMD IOMMU, unfortunatly I missed that during review. - Limit mediatek physical addresses to 32 bit for v7s to fix a warning triggered in io-page-table code. - Fix dma-sync in io-pgtable-arm-v7s code * tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix comment for iommu_ops.map_sg iommu/amd: pr_err() strings should end with newlines iommu/mediatek: Limit the physical address in 32bit for v7s iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
2017-09-29Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - SPsel register initialisation on reset as the architecture defines its state as unknown - Use READ_ONCE when dereferencing pmd_t pointers to avoid race conditions in page_vma_mapped_walk() (or fast GUP) with concurrent modifications of the page table - Avoid invoking the mm fault handling code for kernel addresses (check against TASK_SIZE) which would otherwise result in calling might_sleep() in atomic context * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fault: Route pte translation faults via do_translation_fault arm64: mm: Use READ_ONCE when dereferencing pointer to pte table arm64: Make sure SPsel is always set
2017-09-29Merge tag 'for-linus-4.14c-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - avoid a warning when compiling with clang - consider read-only bits in xen-pciback when writing to a BAR - fix a boot crash of pv-domains * tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping xen-pciback: relax BAR sizing write value check x86/xen: clean up clang build warning
2017-09-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Mixed bugfixes. Perhaps the most interesting one is a latent bug that was finally triggered by PCID support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/x86: Handle async PF in RCU read-side critical sections KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume KVM: VMX: use cmpxchg64 KVM: VMX: simplify and fix vmx_vcpu_pi_load KVM: VMX: avoid double list add with VT-d posted interrupts KVM: VMX: extract __pi_post_block KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29fix infoleak in waitid(2)Al Viro
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first case and waitid(2) rusage should've been copied out exactly in that case, *not* whenever kernel_waitid() has not returned an error. Compat variant shares that braino; none of kernel_wait4() callers do, so the below ought to fix it. Reported-and-tested-by: Alexander Potapenko <glider@google.com> Fixes: ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland") Cc: stable@vger.kernel.org # v4.13 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-29x86/asm: Use register variable to get stack pointer valueAndrey Ryabinin
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") ... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions: -mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d> +sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a> Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29x86/mm: Disable branch profiling in mem_encrypt.cTom Lendacky
Some routines in mem_encrypt.c are called very early in the boot process, e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined the resulting branch profiling associated with the check to see if SME is active results in a kernel crash. Disable branch profiling for mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any header files. Reported-by: kernel test robot <lkp@01.org> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge tag 'perf-urgent-for-mingo-4.14-20170928' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix syscalltbl build failure (Akemi Yagi) - Fix attr.exclude_kernel setting for default cycles:p, this time for !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo) - Sync kernel ABI headers with tooling headers (Ingo Molnar) - Remove misleading debug messages with --call-graph option (Mengting Zhang) - Revert vmlinux symbol resolution patches for s390x (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge branch 'fixes-v4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys fixes from James Morris: "Notable here is a rewrite of big_key crypto by Jason Donenfeld to address some issues in the original code. From Jason's commit log: "This started out as just replacing the use of crypto/rng with get_random_bytes_wait, so that we wouldn't use bad randomness at boot time. But, upon looking further, it appears that there were even deeper underlying cryptographic problems, and that this seems to have been committed with very little crypto review. So, I rewrote the whole thing, trying to keep to the conventions introduced by the previous author, to fix these cryptographic flaws." There has been positive review of the new code by Eric Biggers and Herbert Xu, and it passes basic testing via the keyutils test suite. Eric also manually tested it. Generally speaking, we likely need to improve the amount of crypto review for kernel crypto users including keys (I'll post a note separately to ksummit-discuss)" * 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/keys: rewrite all of big_key crypto security/keys: properly zero out sensitive key material in big_key KEYS: use kmemdup() in request_key_auth_new() KEYS: restrict /proc/keys by credentials at open time KEYS: reset parent each time before searching key_user_tree KEYS: prevent KEYCTL_READ on negative key KEYS: prevent creating a different user's keyrings KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: fix key refcount leak in keyctl_read_key() KEYS: fix key refcount leak in keyctl_assume_authority() KEYS: don't revoke uninstantiated key in request_key_auth_new() KEYS: fix cred refcount leak in request_key_auth_new()
2017-09-29arm64: fault: Route pte translation faults via do_translation_faultWill Deacon
We currently route pte translation faults via do_page_fault, which elides the address check against TASK_SIZE before invoking the mm fault handling code. However, this can cause issues with the path walking code in conjunction with our word-at-a-time implementation because load_unaligned_zeropad can end up faulting in kernel space if it reads across a page boundary and runs into a page fault (e.g. by attempting to read from a guard region). In the case of such a fault, load_unaligned_zeropad has registered a fixup to shift the valid data and pad with zeroes, however the abort is reported as a level 3 translation fault and we dispatch it straight to do_page_fault, despite it being a kernel address. This results in calling a sleeping function from atomic context: BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313 in_atomic(): 0, irqs_disabled(): 0, pid: 10290 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144 [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c [<ffffff8e016977f0>] do_page_fault+0x140/0x330 [<ffffff8e01681328>] do_mem_abort+0x54/0xb0 Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0) [...] [<ffffff8e016844fc>] el1_da+0x18/0x78 [<ffffff8e017f399c>] path_parentat+0x44/0x88 [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8 [<ffffff8e017f5044>] filename_create+0x4c/0x128 [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8 [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28 Code: 36380080 d5384100 f9400800 9402566d (d4210000) ---[ end trace 2d01889f2bca9b9f ]--- Fix this by dispatching all translation faults to do_translation_faults, which avoids invoking the page fault logic for faults on kernel addresses. Cc: <stable@vger.kernel.org> Reported-by: Ankit Jain <ankijain@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29arm64: mm: Use READ_ONCE when dereferencing pointer to pte tableWill Deacon
On kernels built with support for transparent huge pages, different CPUs can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk and they must take care to use READ_ONCE to avoid value tearing or caching of stale values by the compiler. Unfortunately, these functions call into our pgtable macros, which don't use READ_ONCE, and compiler caching has been observed to cause the following crash during ext4 writeback: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: [<ffff000008233328>] check_pte+0x20/0x170 [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540 [<ffff000008234adc>] page_mkclean_one+0xac/0x278 [<ffff000008234d98>] rmap_walk_file+0xf0/0x238 [<ffff000008236e74>] rmap_walk+0x64/0xa0 [<ffff0000082370c8>] page_mkclean+0x90/0xa8 [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8 [<ffff00000832f984>] mpage_submit_page+0x34/0x98 [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170 [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8 [<ffff00000833530c>] ext4_writepages+0x484/0xe30 [<ffff0000081f6ab4>] do_writepages+0x44/0xe8 [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110 [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8 [<ffff000008324310>] ext4_sync_file+0x80/0x4b8 [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0 [<ffff0000082332b4>] SyS_msync+0x194/0x1e8 This is because page_vma_mapped_walk loads the PMD twice before calling pte_offset_map: the first time without READ_ONCE (where it gets all zeroes due to a concurrent pmdp_invalidate) and the second time with READ_ONCE (where it sees a valid table pointer due to a concurrent pmd_populate). However, the compiler inlines everything and caches the first value in a register, which is subsequently used in pte_offset_phys which returns a junk pointer that is later dereferenced when attempting to access the relevant pte. This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure that a stale value is not used. Whilst this is a point fix for a known failure (and simple to backport), a full fix moving all of our page table accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in page_vma_mapped_walk is in the works for a future kernel release. Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29RDMA/iwpm: Properly mark end of NL messagesShiraz Saleem
Commit 1a1c116f3dcf removes nlmsg_len calculation in ibnl_put_attr causing netlink messages to be rejected due to incorrect length. Add nlmsg_end after all attributes are appended to calculate the nlmsg_len. Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>