summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-06-14CFI: Move function_nocfi() into compiler.hMark Rutland
Currently the common definition of function_nocfi() is provided by <linux/mm.h>, and architectures are expected to provide a definition in <asm/memory.h>. Due to header dependencies, this can make it hard to use function_nocfi() in low-level headers. As function_nocfi() has no dependency on any mm code, nor on any memory definitions, it doesn't need to live in <linux/mm.h> or <asm/memory.h>. Generally, it would make more sense for it to live in <linux/compiler.h>, where an architecture can override it in <asm/compiler.h>. Move the definitions accordingly. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210602153701.35957-1-mark.rutland@arm.com
2021-06-14Revert "cpufreq: CPPC: Add support for frequency invariance"Viresh Kumar
This reverts commit 4c38f2df71c8e33c0b64865992d693f5022eeaad. There are few races in the frequency invariance support for CPPC driver, namely the driver doesn't stop the kthread_work and irq_work on policy exit during suspend/resume or CPU hotplug. A proper fix won't be possible for the 5.13-rc, as it requires a lot of changes. Lets revert the patch instead for now. Fixes: 4c38f2df71c8 ("cpufreq: CPPC: Add support for frequency invariance") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-14mmc: Improve function name when aborting a tuning cmdWolfram Sang
'mmc_abort_tuning()' made me think tuning gets completely aborted. However, it sends only a STOP cmd to cancel the current tuning cmd. Tuning process may still continue after that. So, rename the function to 'mmc_send_abort_tuning()' to better reflect all this. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20210608180620.40059-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-06-14mmc: core: Add support for cache ctrl for SD cardsUlf Hansson
In SD spec v6.x the SD function extension registers for performance enhancements were introduced. As a part of this an optional internal cache on the SD card, can be used to improve performance. The let the SD card use the cache, the host needs to enable it and manage flushing of the cache, so let's add support for this. Note that for an SD card supporting the cache it's mandatory for it, to also support the poweroff notification feature. According to the SD spec, if the cache has been enabled and a poweroff notification is sent to the card, that implicitly also means that the card should flush its internal cache. Therefore, dealing with cache flushing for REQ_OP_FLUSH block requests is sufficient. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210511101359.83521-1-ulf.hansson@linaro.org
2021-06-14mmc: core: Add support for Power Off Notification for SD cardsUlf Hansson
Rather than only deselecting the SD card via a CMD7, before we cut power to it at system suspend, at runtime suspend or at shutdown, let's add support for a graceful power off sequence via enabling the SD Power Off Notification feature. Note that, the Power Off Notification feature was added in the SD spec v4.x, which is several years ago. However, it's still a bit unclear how often the SD card vendors decides to implement support for it. To validate these changes a Sandisk Extreme PRO A2 64GB has been used, which seems to work nicely. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210504161222.101536-12-ulf.hansson@linaro.org
2021-06-14mmc: core: Read performance enhancements registers for SD cardsUlf Hansson
In SD spec v6.x the SD function extension registers for performance enhancements were introduced. These registers let the SD card announce supports for various performance related features, like "self-maintenance", "cache" and "command queuing". Let's extend the parsing of SD function extension registers and store the information in the struct mmc_card. This prepares for subsequent changes to implement the complete support for new the performance enhancement features. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210504161222.101536-11-ulf.hansson@linaro.org
2021-06-14mmc: core: Read the SD function extension registers for power managementUlf Hansson
In the SD spec v4.0 the CMD48/49 and CMD58/59 were introduced as optional commands. In the SD spec v4.1 the SD function extension registers were introduced, which requires support for CMD48/49/58/59 to be read/written from/to. Moreover, a specific function extension register were added to let the card announce support for optional features in regards to power management. The features that were added are "Power Off Notification", "Power Down Mode" and "Power Sustenance". As a first step to support this, let's read and parse the register for power management during the SD card initialization and store the information about the supported features in the struct mmc_card. In this way, we prepare for subsequent changes to implement the complete support for the new features. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210504161222.101536-10-ulf.hansson@linaro.org
2021-06-14mmc: core: Parse the SD SCR register for support of CMD48/49 and CMD58/59Ulf Hansson
In SD spec v4.x the support for CMD48/49 and CMD58/59 were introduced as optional features. To let the card announce whether it supports the commands, the SCR register has been extended with corresponding support bits. Let's parse and store this information for later use. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210504161222.101536-9-ulf.hansson@linaro.org
2021-06-14Merge tag 'for-5.14-regulator' of ↵Mark Brown
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into regulator-5.14 regulator: Changes for v5.14-rc1 This adds regulator_sync_voltage_rdev(), which is used as a dependency for new Tegra power domain code.
2021-06-12mm: relocate 'write_protect_seq' in struct mm_structFeng Tang
0day robot reported a 9.2% regression for will-it-scale mmap1 test case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast from racing with COW during fork"). Further debug shows the regression is due to that commit changes the offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some cache alignment changes. From the perf data, the contention for 'mmap_lock' is very severe and takes around 95% cpu cycles, and it is a rw_semaphore struct rw_semaphore { atomic_long_t count; /* 8 bytes */ atomic_long_t owner; /* 8 bytes */ struct optimistic_spin_queue osq; /* spinner MCS lock */ ... Before commit 57efa1fe5957 adds the 'write_protect_seq', it happens to have a very optimal cache alignment layout, as Linus explained: "and before the addition of the 'write_protect_seq' field, the mmap_sem was at offset 120 in 'struct mm_struct'. Which meant that count and owner were in two different cachelines, and then when you have contention and spend time in rwsem_down_write_slowpath(), this is probably *exactly* the kind of layout you want. Because first the rwsem_write_trylock() will do a cmpxchg on the first cacheline (for the optimistic fast-path), and then in the case of contention, rwsem_down_write_slowpath() will just access the second cacheline. Which is probably just optimal for a load that spends a lot of time contended - new waiters touch that first cacheline, and then they queue themselves up on the second cacheline." After the commit, the rw_semaphore is at offset 128, which means the 'count' and 'owner' fields are now in the same cacheline, and causes more cache bouncing. Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will affect its offset: CONFIG_MMU CONFIG_MEMBARRIER CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES The layout above is on 64 bits system with 0day's default kernel config (similar to RHEL-8.3's config), in which all these 3 options are 'y'. And the layout can vary with different kernel configs. Relayouting a structure is usually a double-edged sword, as sometimes it can helps one case, but hurt other cases. For this case, one solution is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes hole in 'mm_struct' will not change other fields' alignment, while restoring the regression. Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1] Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-12net: qualcomm: rmnet: trailer value is a checksumAlex Elder
The csum_value field in the rmnet_map_dl_csum_trailer structure is a "real" Internet checksum. It is a 16 bit value, in big endian format, which represents an inverted ones' complement sum over pairs of bytes. Make that clear by changing its type to __sum16. This makes a typecast in rmnet_map_ipv4_dl_csum_trailer() and another in rmnet_map_ipv6_dl_csum_trailer() unnecessary. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12wwan: add interface creation supportJohannes Berg
Add support to create (and destroy) interfaces via a new rtnetlink kind "wwan". The responsible driver has to use the new wwan_register_ops() to make this possible. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12rtnetlink: add IFLA_PARENT_[DEV|DEV_BUS]_NAMEJohannes Berg
In some cases, for example in the upcoming WWAN framework changes, there's no natural "parent netdev", so sometimes dummy netdevs are created or similar. IFLA_PARENT_DEV_NAME is a new attribute intended to contain a device (sysfs, struct device) name that can be used instead when creating a new netdev, if the rtnetlink family implements it. As suggested by Parav Pandit, we also introduce IFLA_PARENT_DEV_BUS_NAME attribute in order to uniquely identify a device on the system (with bus/name pair). ip-link(8) support for the generic parent device attributes will help us avoid code duplication, so no other link type will require a custom code to handle the parent name attribute. E.g. the WWAN interface creation command will looks like this: $ ip link add wwan0-1 parent-dev wwan0 type wwan channel-id 1 So, some future subsystem (or driver) FOO will have an interface creation command that looks like this: $ ip link add foo1-3 parent-dev foo1 type foo bar-id 3 baz-type Y Below is an example of dumping link info of a random device with these new attributes: $ ip --details link show wlp0s20f3 4: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000 ... parent_bus pci parent_dev 0000:00:14.3 Co-developed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Co-developed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12rtnetlink: add alloc() method to rtnl_link_opsJohannes Berg
In order to make rtnetlink ops that can create different kinds of devices, like what we want to add to the WWAN framework, the priv_size and setup parameters aren't quite sufficient. Make this easier to manage by allowing ops to allocate their own netdev via an @alloc method that gets the tb netlink data. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12net: make get_net_ns return error if NET_NS is disabledChangbin Du
There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12net: phy: Add 25G BASE-R interface modeSteen Hegelund
Add 25gbase-r phy interface mode Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12Merge tag 'usb-5.13-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of tiny USB fixes for 5.13-rc6. There are more than I would normally like, but there's been a bunch of people banging on the gadget and dwc3 and typec code recently for I think an Android release, which has resulted in a number of small fixes. It's nice to see companies send fixes upstream for this type of work, a notable change from years ago. Anyway, fixes in here are: - usb-serial device id updates - usb-serial cp210x driver fixes for broken firmware versions - typec fixes for crazy charging devices and other reported problems - dwc3 fixes for reported problems found - gadget fixes for reported problems - tiny xhci fixes - other small fixes for reported issues. - revert of a problem fix found by linux-next testing All of these have passed 0-day and linux-next testing with no reported problems (the revert for the found linux-next build problem included)" * tag 'usb-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (44 commits) Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs" usb: typec: mux: Fix copy-paste mistake in typec_mux_match usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path usb: gadget: fsl: Re-enable driver for ARM SoCs usb: typec: wcove: Use LE to CPU conversion when accessing msg->header USB: serial: cp210x: fix CP2102N-A01 modem control USB: serial: cp210x: fix alternate function for CP2102N QFN20 usb: misc: brcmstb-usb-pinmap: check return value after calling platform_get_resource() usb: dwc3: ep0: fix NULL pointer exception usb: gadget: eem: fix wrong eem header operation usb: typec: intel_pmc_mux: Put ACPI device using acpi_dev_put() usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource() usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe() usb: typec: tcpm: Do not finish VDM AMS for retrying Responses usb: fix various gadget panics on 10gbps cabling usb: fix various gadgets null ptr deref on 10gbps cabling. usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD Renoir usb: f_ncm: only first packet of aggregate needs to start timer USB: f_ncm: ncm_bitrate (speed) is unsigned MAINTAINERS: usb: add entry for isp1760 ...
2021-06-12Merge tag 'char-misc-5.13-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small misc driver fixes for 5.13-rc6 that fix some reported problems: - Tiny phy driver fixes for reported issues - rtsx regression for when the device suspended - mhi driver fix for a use-after-free All of these have been in linux-next for a few days with no reported issues" * tag 'char-misc-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG bus: mhi: pci-generic: Fix hibernation bus: mhi: pci_generic: Fix possible use-after-free in mhi_pci_remove() bus: mhi: pci_generic: T99W175: update channel name from AT to DUN phy: Sparx5 Eth SerDes: check return value after calling platform_get_resource() phy: ralink: phy-mt7621-pci: drop 'of_match_ptr' to fix -Wunused-const-variable phy: ti: Fix an error code in wiz_probe() phy: phy-mtk-tphy: Fix some resource leaks in mtk_phy_init() phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe() phy: usb: Fix misuse of IS_ENABLED
2021-06-12Merge tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Just an API change for the registration changes that went into this release. Better to get it sorted out now than before it's too late" * tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block: io_uring: add feature flag for rsrc tags io_uring: change registration/upd/rsrc tagging ABI
2021-06-12Merge tag 'sched-urgent-2021-06-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: - Fix performance regression caused by lack of intended batching of RCU callbacks by over-eager NOHZ-full code. - Fix cgroups related corruption of load_avg and load_sum metrics. - Three fixes to fix blocked load, util_sum/runnable_sum and util_est tracking bugs" * tag 'sched-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling sched/pelt: Ensure that *_sum is always synced with *_avg tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed sched/fair: Make sure to update tg contrib for blocked load sched/fair: Keep load_avg and load_sum synced
2021-06-11net: pcs: xpcs: export xpcs_do_config and xpcs_link_upVladimir Oltean
The sja1105 hardware has a quirk in that some changes require a switch reset, which loses all configuration. When the reset is initiated, everything needs to be reprogrammed, including the MACs and the PCS. This is currently done in sja1105_static_config_reload() - we manually call sja1105_adjust_port_config(), sja1105_sgmii_pcs_config() and sja1105_sgmii_pcs_force_speed() which are all internal functions. There is a desire for sja1105 to use the common xpcs driver, and that means that the equivalents of those functions, xpcs_do_config() and xpcs_link_up() respectively, will no longer be local functions. Forcing phylink to retrigger a resolve somehow, say by doing dev_close() followed by dev_open() is not really an option, because the CPU port might have a PCS as well, and there is no net device which we can close and reopen for that. Additionally, the dev_close/dev_open sequence might force a renegotiation of the copper-side link for SGMII ports connected to a PHY, and this is undesirable as well, because the switch reset is much quicker than a PHY autoneg, so we would have a lot more downtime. The only solution I see is for the sja1105 driver to keep doing what it's doing, and that means we need to export the equivalents from xpcs for sja1105_sgmii_pcs_config and sja1105_sgmii_pcs_force_speed, and call them directly in sja1105_static_config_reload(). This will be done during the conversion patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: pcs: xpcs: add support for NXP SJA1110Vladimir Oltean
The NXP SJA1110 switch integrates its own, non-Synopsys PMA, but it manages it through the register space of the XPCS itself, in a small register window inside MDIO_MMD_VEND2 from address 0x8030 to 0x806e. This coincides with where the registers for the default Synopsys PMA are, but the register definitions are of course not the same. This situation is an odd hardware quirk, but the simplest way to manage it is to drive the SJA1110's PMA from within the XPCS driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: pcs: xpcs: add support for NXP SJA1105Vladimir Oltean
The NXP SJA1105 DSA switch integrates a Synopsys SGMII XPCS on port 4. The generic code works fine, except there is an integration issue which needs to be dealt with: in this switch, the XPCS is integrated with a PMA that has the TX lane polarity inverted by default (PLUS is MINUS, MINUS is PLUS). To obtain normal non-inverted behavior, the TX lane polarity must be inverted in the PCS, via the DIGITAL_CONTROL_2 register. We introduce a pma_config() method in xpcs_compat which is called by the phylink_pcs_config() implementation. Also, the NXP SJA1105 returns all zeroes in the PHY ID registers 2 and 3. We need to hack up an ad-hoc PHY ID (OUI is zero, device ID is 1) in order for the XPCS driver to recognize it. This PHY ID is added to the public include/linux/pcs/pcs-xpcs.h for that reason (for the sja1105 driver to be able to use it in a later patch). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: pcs: xpcs: rename mdio_xpcs_args to dw_xpcsVladimir Oltean
The struct mdio_xpcs_args is reminiscent of when a similarly named struct mdio_xpcs_ops existed. Now that that is removed, we can shorten the name to dw_xpcs (dw for DesignWare). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Jake Keller says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-06-11 Extend the ice driver to support basic PTP clock functionality for E810 devices. This includes some tangential work required to setup the sideband queue and driver shared parameters as well. This series only supports E810-based devices. This is because other devices based on the E822 MAC use a different and more complex PHY. The low level device functionality is kept within ice_ptp_hw.c and is designed to be extensible for supporting E822 devices in a future series. This series also only supports very basic functionality including the ptp_clock device and timestamping. Support for configuring periodic outputs and external input timestamps will be implemented in a future series. There are a couple of potential "what? why?" bits in this series I want to point out: 1) the PTP hardware functionality is shared between multiple functions. This means that the same clock registers are shared across multiple PFs. In order to avoid contention or clashing between PFs, firmware assigns "ownership" to one PF, while other PFs are merely "associated" with the timer. Because we share the hardware resource, only the clock owner will allocate and register a PTP clock device. Other PFs determine the appropriate PTP clock index to report by using a firmware interface to read a shared parameter that is set by the owning PF. 2) the ice driver uses its own kthread instead of using do_aux_work. This is because the periodic and asynchronous tasks are necessary for all PFs, but only one PF will allocate the clock. The series is broken up into functional pieces to allow easy review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: update trace event for SEQPACKETArseny Krasnov
Add SEQPACKET socket type to vsock trace event. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: rest of SOCK_SEQPACKET supportArseny Krasnov
Small updates to make SOCK_SEQPACKET work: 1) Send SHUTDOWN on socket close for SEQPACKET type. 2) Set SEQPACKET packet type during send. 3) Set 'VIRTIO_VSOCK_SEQ_EOR' bit in flags for last packet of message. 4) Implement data check function for SEQPACKET. 5) Check for max datagram size. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: dequeue callback for SOCK_SEQPACKETArseny Krasnov
Callback fetches RW packets from rx queue of socket until whole record is copied(if user's buffer is full, user is not woken up). This is done to not stall sender, because if we wake up user and it leaves syscall, nobody will send credit update for rest of record, and sender will wait for next enter of read syscall at receiver's side. So if user buffer is full, we just send credit update and drop data. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: defines and constants for SEQPACKETArseny Krasnov
Add set of defines and constants for SOCK_SEQPACKET support in vsock. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: rest of SEQPACKET supportArseny Krasnov
Add socket ops for SEQPACKET type and .seqpacket_allow() callback to query transports if they support SEQPACKET. Also split path for data check for STREAM and SEQPACKET branches. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement send logic for SEQPACKETArseny Krasnov
Update current stream enqueue function for SEQPACKET support: 1) Call transport's seqpacket enqueue callback. 2) Return value from enqueue function is whole record length or error for SOCK_SEQPACKET. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement SEQPACKET receive loopArseny Krasnov
Add receive loop for SEQPACKET. It looks like receive loop for STREAM, but there are differences: 1) It doesn't call notify callbacks. 2) It doesn't care about 'SO_SNDLOWAT' and 'SO_RCVLOWAT' values, because there is no sense for these values in SEQPACKET case. 3) It waits until whole record is received. 4) It processes and sets 'MSG_TRUNC' flag. So to avoid extra conditions for two types of socket inside one loop, two independent functions were created. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: phylink: introduce phylink_fwnode_phy_connect()Calvin Johnson
Define phylink_fwnode_phy_connect() to connect phy specified by a fwnode to a phylink instance. Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Grant Likely <grant.likely@arm.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: mdio: Add ACPI support code for mdioCalvin Johnson
Define acpi_mdiobus_register() to Register mii_bus and create PHYs for each ACPI child node. Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11ACPI: utils: Introduce acpi_get_local_address()Calvin Johnson
Introduce a wrapper around the _ADR evaluation. Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: mdiobus: Introduce fwnode_mdiobus_register_phy()Calvin Johnson
Introduce fwnode_mdiobus_register_phy() to register PHYs on the mdiobus. From the compatible string, identify whether the PHY is c45 and based on this create a PHY device instance which is registered on the mdiobus. Along with fwnode_mdiobus_register_phy() also introduce fwnode_find_mii_timestamper() and fwnode_mdiobus_phy_device_register() since they are needed. While at it, also use the newly introduced fwnode operation in of_mdiobus_phy_device_register(). Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: phy: Introduce fwnode_get_phy_id()Calvin Johnson
Extract phy_id from compatible string. This will be used by fwnode_mdiobus_register_phy() to create phy device using the phy_id. Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: phy: Introduce phy related fwnode functionsCalvin Johnson
Define fwnode_phy_find_device() to iterate an mdiobus and find the phy device of the provided phy fwnode. Additionally define device_phy_find_device() to find phy device of provided device. Define fwnode_get_phy_node() to get phy_node using named reference. Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: phy: Introduce fwnode_mdio_find_device()Calvin Johnson
Define fwnode_mdio_find_device() to get a pointer to the mdio_device from fwnode passed to the function. Refactor of_mdio_find_device() to use fwnode_mdio_find_device(). Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Grant Likely <grant.likely@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: sja1105: implement TX timestamping for SJA1110Vladimir Oltean
The TX timestamping procedure for SJA1105 is a bit unconventional because the transmit procedure itself is unconventional. Control packets (and therefore PTP as well) are transmitted to a specific port in SJA1105 using "management routes" which must be written over SPI to the switch. These are one-shot rules that match by destination MAC address on traffic coming from the CPU port, and select the precise destination port for that packet. So to transmit a packet from NET_TX softirq context, we actually need to defer to a process context so that we can perform that SPI write before we send the packet. The DSA master dev_queue_xmit() runs in process context, and we poll until the switch confirms it took the TX timestamp, then we annotate the skb clone with that TX timestamp. This is why the sja1105 driver does not need an skb queue for TX timestamping. But the SJA1110 is a bit (not much!) more conventional, and you can request 2-step TX timestamping through the DSA header, as well as give the switch a cookie (timestamp ID) which it will give back to you when it has the timestamp. So now we do need a queue for keeping the skb clones until their TX timestamps become available. The interesting part is that the metadata frames from SJA1105 haven't disappeared completely. On SJA1105 they were used as follow-ups which contained RX timestamps, but on SJA1110 they are actually TX completion packets, which contain a variable (up to 32) array of timestamps. Why an array? Because: - not only is the TX timestamp on the egress port being communicated, but also the RX timestamp on the CPU port. Nice, but we don't care about that, so we ignore it. - because a packet could be multicast to multiple egress ports, each port takes its own timestamp, and the TX completion packet contains the individual timestamps on each port. This is unconventional because switches typically have a timestamping FIFO and raise an interrupt, but this one doesn't. So the tagger needs to detect and parse meta frames, and call into the main switch driver, which pairs the timestamps with the skbs in the TX timestamping queue which are waiting for one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: add support for the SJA1110 native tagging protocolVladimir Oltean
The SJA1110 has improved a few things compared to SJA1105: - To send a control packet from the host port with SJA1105, one needed to program a one-shot "management route" over SPI. This is no longer true with SJA1110, you can actually send "in-band control extensions" in the packets sent by DSA, these are in fact DSA tags which contain the destination port and switch ID. - When receiving a control packet from the switch with SJA1105, the source port and switch ID were written in bytes 3 and 4 of the destination MAC address of the frame (which was a very poor shot at a DSA header). If the control packet also had an RX timestamp, that timestamp was sent in an actual follow-up packet, so there were reordering concerns on multi-core/multi-queue DSA masters, where the metadata frame with the RX timestamp might get processed before the actual packet to which that timestamp belonged (there is no way to pair a packet to its timestamp other than the order in which they were received). On SJA1110, this is no longer true, control packets have the source port, switch ID and timestamp all in the DSA tags. - Timestamps from the switch were partial: to get a 64-bit timestamp as required by PTP stacks, one would need to take the partial 24-bit or 32-bit timestamp from the packet, then read the current PTP time very quickly, and then patch in the high bits of the current PTP time into the captured partial timestamp, to reconstruct what the full 64-bit timestamp must have been. That is awful because packet processing is done in NAPI context, but reading the current PTP time is done over SPI and therefore needs sleepable context. But it also aggravated a few things: - Not only is there a DSA header in SJA1110, but there is a DSA trailer in fact, too. So DSA needs to be extended to support taggers which have both a header and a trailer. Very unconventional - my understanding is that the trailer exists because the timestamps couldn't be prepared in time for putting them in the header area. - Like SJA1105, not all packets sent to the CPU have the DSA tag added to them, only control packets do: * the ones which match the destination MAC filters/traps in MAC_FLTRES1 and MAC_FLTRES0 * the ones which match FDB entries which have TRAP or TAKETS bits set So we could in theory hack something up to request the switch to take timestamps for all packets that reach the CPU, and those would be DSA-tagged and contain the source port / switch ID by virtue of the fact that there needs to be a timestamp trailer provided. BUT: - The SJA1110 does not parse its own DSA tags in a way that is useful for routing in cross-chip topologies, a la Marvell. And the sja1105 driver already supports cross-chip bridging from the SJA1105 days. It does that by automatically setting up the DSA links as VLAN trunks which contain all the necessary tag_8021q RX VLANs that must be communicated between the switches that span the same bridge. So when using tag_8021q on sja1105, it is possible to have 2 switches with ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and sw1p1, and forwarding will happen according to the expected rules of the Linux bridge. We like that, and we don't want that to go away, so as a matter of fact, the SJA1110 tagger still needs to support tag_8021q. So the sja1110 tagger is a hybrid between tag_8021q for data packets, and the native hardware support for control packets. On RX, packets have a 13-byte trailer if they contain an RX timestamp. That trailer is padded in such a way that its byte 8 (the start of the "residence time" field - not parsed by Linux because we don't care) is aligned on a 16 byte boundary. So the padding has a variable length between 0 and 15 bytes. The DSA header contains the offset of the beginning of the padding relative to the beginning of the frame (and the end of the padding is obviously the end of the packet minus 13 bytes, the length of the trailer). So we discard it. Packets which don't have a trailer contain the source port and switch ID information in the header (they are "trap-to-host" packets). Packets which have a trailer contain the source port and switch ID in the trailer. On TX, the destination port mask and switch ID is always in the trailer, so we always need to say in the header that a trailer is present. The header needs a custom EtherType and this was chosen as 0xdadc, after 0xdada which is for Marvell and 0xdadb which is for VLANs in VLAN-unaware mode on SJA1105 (and SJA1110 in fact too). Because we use tag_8021q in concert with the native tagging protocol, control packets will have 2 DSA tags. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: sja1105: make SJA1105_SKB_CB fit a full timestampVladimir Oltean
In SJA1105, RX timestamps for packets sent to the CPU are transmitted in separate follow-up packets (metadata frames). These contain partial timestamps (24 or 32 bits) which are kept in SJA1105_SKB_CB(skb)->meta_tstamp. Thankfully, SJA1110 improved that, and the RX timestamps are now transmitted in-band with the actual packet, in the timestamp trailer. The RX timestamps are now full-width 64 bits. Because we process the RX DSA tags in the rcv() method in the tagger, but we would like to preserve the DSA code structure in that we populate the skb timestamp in the port_rxtstamp() call which only happens later, the implication is that we must somehow pass the 64-bit timestamp from the rcv() method all the way to port_rxtstamp(). We can use the skb->cb for that. Rename the meta_tstamp from struct sja1105_skb_cb from "meta_tstamp" to "tstamp", and increase its size to 64 bits. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: tag_8021q: refactor RX VLAN parsing into a dedicated functionVladimir Oltean
The added value of this function is that it can deal with both the case where the VLAN header is in the skb head, as well as in the offload field. This is something I was not able to do using other functions in the network stack. Since both ocelot-8021q and sja1105 need to do the same stuff, let's make it a common service provided by tag_8021q. This is done as refactoring for the new SJA1110 tagger, which partly uses tag_8021q as well (just like SJA1105), and will be the third caller. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: tag_8021q: remove shim declarationsVladimir Oltean
All users of tag_8021q select it in Kconfig, so shim functions are not needed because it is not possible for it to be disabled and its callers enabled. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: dsa: generalize overhead for taggers that use both headers and trailersVladimir Oltean
Some really really weird switches just couldn't decide whether to use a normal or a tail tagger, so they just did both. This creates problems for DSA, because we only have the concept of an 'overhead' which can be applied to the headroom or to the tailroom of the skb (like for example during the central TX reallocation procedure), depending on the value of bool tail_tag, but not to both. We need to generalize DSA to cater for these odd switches by transforming the 'overhead / tail_tag' pair into 'needed_headroom / needed_tailroom'. The DSA master's MTU is increased to account for both. The flow dissector code is modified such that it only calls the DSA adjustment callback if the tagger has a non-zero header length. Taggers are trivially modified to declare either needed_headroom or needed_tailroom, based on the tail_tag value that they currently declare. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11blk-mq: remove blk_mq_init_sq_queueChristoph Hellwig
All users are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11blk-mq: add the blk_mq_alloc_disk APIsChristoph Hellwig
Add a new API to allocate a gendisk including the request_queue for use with blk-mq based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11blk-mq: improve the blk_mq_init_allocated_queue interfaceChristoph Hellwig
Don't return the passed in request_queue but a normal error code, and drop the elevator_init argument in favor of just calling elevator_init_mq directly from dm-rq. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11blk-mq: factor out a blk_mq_alloc_sq_tag_set helperChristoph Hellwig
Factour out a helper to initialize a simple single hw queue tag_set from blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue in favor of a more symmetric and general API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11PM: domains: Drop/restore performance state votes for devices at runtime PMUlf Hansson
A subsystem/driver that need to manage OPPs for its device, should typically drop its vote for the OPP when the device becomes runtime suspended. In this way, the corresponding aggregation of the performance state votes that is managed in genpd for the attached PM domain, may find that the aggregated vote can be decreased. Hence, it may allow genpd to set the lower performance state for the PM domain, thus avoiding to waste energy. To accomplish this, typically a subsystem/driver would need to call dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend() callback, to drop the vote for the OPP. Accordingly, it needs another call to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its ->runtime_resume() callback. To avoid boilerplate code in subsystems/driver to deal with these things, let's instead manage this internally in genpd. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>