summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-08-28posix-cpu-timers: Utilize timerqueue for storageThomas Gleixner
Using a linear O(N) search for timer insertion affects execution time and D-cache footprint badly with a larger number of timers. Switch the storage to a timerqueue which is already used for hrtimers and alarmtimers. It does not affect the size of struct k_itimer as it.alarm is still larger. The extra list head for the expiry list will go away later once the expiry is moved into task work context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908272129220.1939@nanos.tec.linutronix.de
2019-08-28posix-cpu-timers: Move state tracking to struct posix_cputimersThomas Gleixner
Put it where it belongs and clean up the ifdeffery in fork completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de
2019-08-28posix-cpu-timers: Get rid of zero checksThomas Gleixner
Deactivation of the expiry cache is done by setting all clock caches to 0. That requires to have a check for zero in all places which update the expiry cache: if (cache == 0 || new < cache) cache = new; Use U64_MAX as the deactivated value, which allows to remove the zero checks when updating the cache and reduces it to the obvious check: if (new < cache) cache = new; This also removes the weird workaround in do_prlimit() which was required to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing in 0 to the posix CPU timer code would have effectively disarmed it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de
2019-08-28posix-cpu-timers: Switch thread group sampling to arrayThomas Gleixner
That allows more simplifications in various places. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.988426956@linutronix.de
2019-08-28posix-cpu-timers: Restructure expiry arrayThomas Gleixner
Now that the abused struct task_cputime is gone, it's more natural to bundle the expiry cache and the list head of each clock into a struct and have an array of those structs. Follow the hrtimer naming convention of 'bases' and rename the expiry cache to 'nextevt' and adapt all usage sites. Generates also better code .text size shrinks by 80 bytes. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908262021140.1939@nanos.tec.linutronix.de
2019-08-28posix-cpu-timers: Remove cputime_expiresThomas Gleixner
The last users of the magic struct cputime based expiry cache are gone. Remove the leftovers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.790209622@linutronix.de
2019-08-28posix-cpu-timers: Remove the odd field rename definesThomas Gleixner
The last users of the odd define based renaming of struct task_cputime members are gone. Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.499058279@linutronix.de
2019-08-28posix-cpu-timers: Provide array based access to expiry cacheThomas Gleixner
Using struct task_cputime for the expiry cache is a pretty odd choice and comes with magic defines to rename the fields for usage in the expiry cache. struct task_cputime is basically a u64 array with 3 members, but it has distinct members. The expiry cache content is different than the content of task_cputime because expiry[PROF] = task_cputime.stime + task_cputime.utime expiry[VIRT] = task_cputime.utime expiry[SCHED] = task_cputime.sum_exec_runtime So there is no direct mapping between task_cputime and the expiry cache and the #define based remapping is just a horrible hack. Having the expiry cache array based allows further simplification of the expiry code. To avoid an all in one cleanup which is hard to review add a temporary anonymous union into struct task_cputime which allows array based access to it. That requires to reorder the members. Add a build time sanity check to validate that the members are at the same place. The union and the build time checks will be removed after conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.105793824@linutronix.de
2019-08-28posix-cpu-timers: Move expiry cache into struct posix_cputimersThomas Gleixner
The expiry cache belongs into the posix_cputimers container where the other cpu timers information is. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de
2019-08-28sched: Move struct task_cputime to types.hThomas Gleixner
For upcoming posix-timer changes to avoid include recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190821192920.909530418@linutronix.de
2019-08-28posix-cpu-timers: Create a container structThomas Gleixner
Per task/process data of posix CPU timers is all over the place which makes the code hard to follow and requires ifdeffery. Create a container to hold all this information in one place, so data is consolidated and the ifdeffery can be confined to the posix timer header file and removed from places like fork. As a first step, move the cpu_timers list head array into the new struct and clean up the initializers and simplify fork. The remaining #ifdef in fork will be removed later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192920.819418976@linutronix.de
2019-08-28posix-cpu-timers: Rename thread_group_cputimer() and make it staticThomas Gleixner
thread_group_cputimer() is a complete misnomer. The function does two things: - For arming process wide timers it makes sure that the atomic time storage is up to date. If no cpu timer is armed yet, then the atomic time storage is not updated by the scheduler for performance reasons. In that case a full summing up of all threads needs to be done and the update needs to be enabled. - Samples the current time into the caller supplied storage. Rename it to thread_group_start_cputime(), make it static and fixup the callsite. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192919.869350319@linutronix.de
2019-08-28posix-cpu-timers: Provide quick sample function for itimerThomas Gleixner
get_itimer() needs a sample of the current thread group cputime. It invokes thread_group_cputimer() - which is a misnomer. That function also starts eventually the group cputime accouting which is bogus because the accounting is already active when a timer is armed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192919.599658199@linutronix.de
2019-08-28mtd: spi-nor: Add a ->setup() methodTudor Ambarus
nor->params.setup() configures the SPI NOR memory. Useful for SPI NOR flashes that have peculiarities to the SPI NOR standard, e.g. different opcodes, specific address calculation, page size, etc. Right now the only user will be the S3AN chips, but other manufacturers can implement it if needed. Move spi_nor_setup() related code in order to avoid a forward declaration to spi_nor_default_setup(). Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Add a ->convert_addr() methodBoris Brezillon
In order to separate manufacturer quirks from the core we need to get rid of all the manufacturer specific flags, like the SNOR_F_S3AN_ADDR_DEFAULT one. This can easily be replaced by a ->convert_addr() hook, which when implemented will provide the core with an easy way to convert an absolute address into something the flash understands. Right now the only user are the S3AN chips, but other manufacturers can implement it if needed. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Rework the SPI NOR lock/unlock logicBoris Brezillon
Add the SNOR_F_HAS_LOCK flag and set it when SPI_NOR_HAS_LOCK is set in the flash_info entry or when it's a Micron or ST flash. Move the locking hooks in a separate struct so that we have just one field to update when we change the locking implementation. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> [tudor.ambarus@microchip.com: use ->default_init() hook, introduce spi_nor_late_init_params(), set ops in nor->params] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Create a ->set_4byte() methodBoris Brezillon
The procedure used to enable 4 byte addressing mode depends on the NOR device, so let's provide a hook so that manufacturer specific handling can be implemented in a sane way. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> [tudor.ambarus@microchip.com: use nor->params.set_4byte() instead of nor->set_4byte()] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Move erase_map to 'struct spi_nor_flash_parameter'Tudor Ambarus
All flash parameters and settings should reside inside 'struct spi_nor_flash_parameter'. Move the SMPT parsed erase map from 'struct spi_nor' to 'struct spi_nor_flash_parameter'. Please note that there is a roll-back mechanism for the flash parameter and settings, for cases when SFDP parser fails. The SFDP parser receives a Stack allocated copy of nor->params, called sfdp_params, and uses it to retrieve the serial flash discoverable parameters. JESD216 SFDP is a standard and has a higher priority than the default initialized flash parameters, so will overwrite the sfdp_params data when needed. All SFDP code uses the local copy of nor->params, that will overwrite it in the end, if the parser succeds. Saving and restoring the nor->params.erase_map is no longer needed, since the SFDP code does not touch it. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Drop quad_enable() from 'struct spi-nor'Tudor Ambarus
All flash parameters and settings should reside inside 'struct spi_nor_flash_parameter'. Drop the local copy of quad_enable() and use the one from 'struct spi_nor_flash_parameter'. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Regroup flash parameter and settingsTudor Ambarus
The scope is to move all [FLASH-SPECIFIC] parameters and settings from 'struct spi_nor' to 'struct spi_nor_flash_parameter'. 'struct spi_nor_flash_parameter' describes the hardware capabilities and associated settings of the SPI NOR flash memory. It includes legacy flash parameters and settings that can be overwritten by the spi_nor_fixups hooks, or dynamically when parsing the JESD216 Serial Flash Discoverable Parameters (SFDP) tables. All SFDP params and settings will fit inside 'struct spi_nor_flash_parameter'. Move spi_nor_hwcaps related code to avoid forward declarations. Add a forward declaration that we can't avoid: 'struct spi_nor' will be used in 'struct spi_nor_flash_parameter'. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2019-08-28mtd: spi-nor: Remove unused macroTudor Ambarus
Remove leftover from nor->cmd_buf. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
2019-08-28Merge tag 'v5.3-rc6' into spi-nor/nextTudor Ambarus
Linux 5.3-rc6 Merge back latest release candidate, to include a fix that we depend on for new development: 834de5c1aa76 ("mtd: spi-nor: Fix the disabling of write protection at init")
2019-08-28perf: Allow normal events to output AUX dataAlexander Shishkin
In some cases, ordinary (non-AUX) events can generate data for AUX events. For example, PEBS events can come out as records in the Intel PT stream instead of their usual DS records, if configured to do so. One requirement for such events is to consistently schedule together, to ensure that the data from the "AUX output" events isn't lost while their corresponding AUX event is not scheduled. We use grouping to provide this guarantee: an "AUX output" event can be added to a group where an AUX event is a group leader, and provided that the former supports writing to the latter. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kan.liang@linux.intel.com Link: https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shishkin@linux.intel.com
2019-08-28ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifierViresh Kumar
The cpufreq core now takes the min/max frequency constraints via QoS requests and the CPUFREQ_ADJUST notifier shall get removed later on. Switch over to using the QoS request for maximum frequency constraint for acpi driver. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-28net/mlx5: Set ODP capabilities for DC transport to maxMichael Guralnik
In mlx5_core initialization, query max ODP capabilities for DC transport from FW and set as current capabilities. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-27net: stmmac: setup higher frequency clk support for EHL & TGLVoon Weifeng
EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk, ptp clock and ptp_max_adj to 200MHz. Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set ep ecn flag by sockoptXin Long
SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: make ecn flag per netns and endpointXin Long
This patch is to add ecn flag for both netns_sctp and sctp_endpoint, net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will be initialized with net->sctp.ecn_enable. asoc->peer.ecn_capable will be set during negotiation only when ep->ecn_enable is set on both sides. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Add genphy_c45_config_aneg() function to phy-c45.cMarco Hartmann
Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg") introduced a check that aborts phy_config_aneg() if the phy is a C45 phy. This causes phy_state_machine() to call phy_error() so that the phy ends up in PHY_HALTED state. Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg() (analogous to the C22 case) so that the state machine can run correctly. genphy_c45_config_aneg() closely resembles mv3310_config_aneg() in drivers/net/phy/marvell10g.c, excluding vendor specific configurations for 1000BaseT. Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver") Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: remove bitmap operationsVivien Didelot
The bitmap operations were introduced to simplify the switch drivers in the future, since most of them could implement the common VLAN and MDB operations (add, del, dump) with simple functions taking all target ports at once, and thus limiting the number of hardware accesses. Programming an MDB or VLAN this way in a single operation would clearly simplify the drivers a lot but would require a new get-set interface in DSA. The usage of such bitmap from the stack also raised concerned in the past, leading to the dynamic allocation of a new ds->_bitmap member in the dsa_switch structure. So let's get rid of them for now. This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add} switch operations into new dsa_switch_{mdb,vlan}_{prepare,add} variants not using any bitmap argument anymore. New dsa_switch_{mdb,vlan}_match helpers have been introduced to make clear which local port of a switch must be programmed with the target object. While the targeted user port is an obvious candidate, the DSA links must also be programmed, as well as the CPU port for VLANs. While at it, also remove local variables that are only used once. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28bpf: introduce verifier internal test flagAlexei Starovoitov
Introduce BPF_F_TEST_STATE_FREQ flag to stress test parentage chain and state pruning. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-27net_sched: fix a NULL pointer deref in ipt actionCong Wang
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page lock leak in nfs_pageio_resend() - Ensure O_DIRECT reports an error if the bytes read/written is 0 - Don't handle errors if the bind/connect succeeded - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidat ed" Bugfixes: - Don't refresh attributes with mounted-on-file information - Fix return values for nfs4_file_open() and nfs_finish_open() - Fix pnfs layoutstats reporting of I/O errors - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort for soft I/O errors when the user specifies a hard mount. - Various fixes to the error handling in sunrpc - Don't report writepage()/writepages() errors twice" * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: remove set but not used variable 'mapping' NFSv2: Fix write regression NFSv2: Fix eof handling NFS: Fix writepage(s) error handling to not report errors twice NFS: Fix spurious EIO read errors pNFS/flexfiles: Don't time out requests on hard mounts SUNRPC: Handle connection breakages correctly in call_status() Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" SUNRPC: Handle EADDRINUSE and ENOBUFS correctly pNFS/flexfiles: Turn off soft RPC calls SUNRPC: Don't handle errors if the bind/connect succeeded NFS: On fatal writeback errors, we need to call nfs_inode_remove_request() NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0 NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend() NFSv4: Fix return value in nfs_finish_open() NFSv4: Fix return values for nfs4_file_open() NFS: Don't refresh attributes with mounted-on-file information
2019-08-27Revert "driver core: Add support for linking devices during device addition"Greg Kroah-Hartman
This reverts commit 5302dd7dd0b6d04c63cdce51d1e9fda9ef0be886. Based on a lot of email and in-person discussions, this patch series is being reworked to address a number of issues that were pointed out that needed to be taken care of before it should be merged. It will be resubmitted with those changes hopefully soon. Cc: Frank Rowand <frowand.list@gmail.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27Revert "driver core: Add edit_links() callback for drivers"Greg Kroah-Hartman
This reverts commit 134b23eec9e3a3c795a6ceb0efe2fa63e87983b2. Based on a lot of email and in-person discussions, this patch series is being reworked to address a number of issues that were pointed out that needed to be taken care of before it should be merged. It will be resubmitted with those changes hopefully soon. Cc: Frank Rowand <frowand.list@gmail.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27Revert "driver core: Add sync_state driver/bus callback"Greg Kroah-Hartman
This reverts commit 8f8184d6bf676a8680d6f441e40317d166b46f73. Based on a lot of email and in-person discussions, this patch series is being reworked to address a number of issues that were pointed out that needed to be taken care of before it should be merged. It will be resubmitted with those changes hopefully soon. Cc: Frank Rowand <frowand.list@gmail.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27ASoC: dapm: Expose snd_soc_dapm_new_control_unlocked properlyAmadeusz Sławiński
We use snd_soc_dapm_new_control_unlocked for topology and have local declaration, instead declare it properly in header like already declared snd_soc_dapm_new_control. Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@intel.com> Link: https://lore.kernel.org/r/20190827141712.21015-4-amadeuszx.slawinski@linux.intel.com Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-08-27Merge tag 'arc-5.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: - support for Edge Triggered IRQs in ARC IDU intc - other fixes here and there * tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: arc: prefer __section from compiler_attributes.h dt-bindings: IDU-intc: Add support for edge-triggered interrupts dt-bindings: IDU-intc: Clean up documentation ARCv2: IDU-intc: Add support for edge-triggered interrupts ARC: unwind: Mark expected switch fall-throughs ARC: [plat-hsdk]: allow to switch between AXI DMAC port configurations ARC: fix typo in setup_dma_ops log message ARCv2: entry: early return from exception need not clear U & DE bits
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya Leoshkevich. 2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for SMC from Jason Baron. 3) ipv6_mc_may_pull() should return 0 for malformed packets, not -EINVAL. From Stefano Brivio. 4) Don't forget to unpin umem xdp pages in error path of xdp_umem_reg(). From Ivan Khoronzhuk. 5) Fix sta object leak in mac80211, from Johannes Berg. 6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2 switches. From Florian Fainelli. 7) Revert DMA sync removal from r8169 which was causing regressions on some MIPS Loongson platforms. From Heiner Kallweit. 8) Use after free in flow dissector, from Jakub Sitnicki. 9) Fix NULL derefs of net devices during ICMP processing across collect_md tunnels, from Hangbin Liu. 10) proto_register() memory leaks, from Zhang Lin. 11) Set NLM_F_MULTI flag in multipart netlink messages consistently, from John Fastabend. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) r8152: Set memory to all 0xFFs on failed reg reads openvswitch: Fix conntrack cache with timeout ipv4: mpls: fix mpls_xmit for iptunnel nexthop: Fix nexthop_num_path for blackhole nexthops net: rds: add service level support in rds-info net: route dump netlink NLM_F_MULTI flag missing s390/qeth: reject oversized SNMP requests sock: fix potential memory leak in proto_register() MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode ipv4/icmp: fix rt dst dev null pointer dereference openvswitch: Fix log message in ovs conntrack bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 bpf: fix use after free in prog symbol exposure bpf: fix precision tracking in presence of bpf2bpf calls flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH Revert "r8169: remove not needed call to dma_sync_single_for_device" ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev net/ncsi: Fix the payload copying for the request coming from Netlink qed: Add cleanup in qed_slowpath_start() ...
2019-08-27staging: greybus: add missing includesRui Miguel Silva
Before moving greybus core out of staging and moving header files to include/linux some greybus header files were missing the necessary includes. This would trigger compilation faillures with some example errors logged bellow for with CONFIG_KERNEL_HEADER_TEST=y. So, add the necessary headers to compile clean before relocating the header files. ./include/linux/greybus/hd.h:23:50: error: unknown type name 'u16' int (*cport_disable)(struct gb_host_device *hd, u16 cport_id); ^~~ ./include/linux/greybus/greybus_protocols.h:1314:2: error: unknown type name '__u8' __u8 data[0]; ^~~~ ./include/linux/greybus/hd.h:24:52: error: unknown type name 'u16' int (*cport_connected)(struct gb_host_device *hd, u16 cport_id); ^~~ ./include/linux/greybus/hd.h:25:48: error: unknown type name 'u16' int (*cport_flush)(struct gb_host_device *hd, u16 cport_id); ^~~ ./include/linux/greybus/hd.h:26:51: error: unknown type name 'u16' int (*cport_shutdown)(struct gb_host_device *hd, u16 cport_id, ^~~ ./include/linux/greybus/hd.h:27:5: error: unknown type name 'u8' u8 phase, unsigned int timeout); ^~ ./include/linux/greybus/hd.h:28:50: error: unknown type name 'u16' int (*cport_quiesce)(struct gb_host_device *hd, u16 cport_id, ^~~ ./include/linux/greybus/hd.h:29:5: error: unknown type name 'size_t' size_t peer_space, unsigned int timeout); ^~~~~~ ./include/linux/greybus/hd.h:29:5: note: 'size_t' is defined in header '<stddef.h>'; did you forget to '#include <stddef.h>'? ./include/linux/greybus/hd.h:1:1: +#include <stddef.h> /* SPDX-License-Identifier: GPL-2.0 */ ./include/linux/greybus/hd.h:29:5: size_t peer_space, unsigned int timeout); ^~~~~~ ./include/linux/greybus/hd.h:30:48: error: unknown type name 'u16' int (*cport_clear)(struct gb_host_device *hd, u16 cport_id); ^~~ ./include/linux/greybus/hd.h:32:49: error: unknown type name 'u16' int (*message_send)(struct gb_host_device *hd, u16 dest_cport_id, ^~~ ./include/linux/greybus/hd.h:33:32: error: unknown type name 'gfp_t' struct gb_message *message, gfp_t gfp_mask); ^~~~~ ./include/linux/greybus/hd.h:35:55: error: unknown type name 'u16' int (*latency_tag_enable)(struct gb_host_device *hd, u16 cport_id); Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Gao Xiang <hsiangkao@aol.com> Signed-off-by: Rui Miguel Silva <rmfrfs@gmail.com> Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org> Link: https://lore.kernel.org/r/20190827155302.25704-1-rui.silva@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27staging: greybus: move core include files to include/linux/greybus/Greg Kroah-Hartman
With the goal of moving the core of the greybus code out of staging, the include files need to be moved to include/linux/greybus.h and include/linux/greybus/ Cc: Vaibhav Hiremath <hvaibhav.linux@gmail.com> Cc: Johan Hovold <johan@kernel.org> Cc: Vaibhav Agarwal <vaibhav.sr@gmail.com> Cc: Rui Miguel Silva <rmfrfs@gmail.com> Cc: David Lin <dtwlin@gmail.com> Cc: "Bryan O'Donoghue" <pure.logic@nexus-software.ie> Cc: greybus-dev@lists.linaro.org Cc: devel@driverdev.osuosl.org Acked-by: Mark Greer <mgreer@animalcreek.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Alex Elder <elder@kernel.org> Link: https://lore.kernel.org/r/20190825055429.18547-8-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27erofs: fix compile warnings when moving out include/trace/events/erofs.hGao Xiang
As Stephon reported [1], many compile warnings are raised when moving out include/trace/events/erofs.h: In file included from include/trace/events/erofs.h:8, from <command-line>: include/trace/events/erofs.h:28:37: warning: 'struct dentry' declared inside parameter list will not be visible outside of this definition or declaration TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags), ^~~~~~ include/linux/tracepoint.h:233:34: note: in definition of macro '__DECLARE_TRACE' static inline void trace_##name(proto) \ ^~~~~ include/linux/tracepoint.h:396:24: note: in expansion of macro 'PARAMS' __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ ^~~~~~ include/linux/tracepoint.h:532:2: note: in expansion of macro 'DECLARE_TRACE' DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) ^~~~~~~~~~~~~ include/linux/tracepoint.h:532:22: note: in expansion of macro 'PARAMS' DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) ^~~~~~ include/trace/events/erofs.h:26:1: note: in expansion of macro 'TRACE_EVENT' TRACE_EVENT(erofs_lookup, ^~~~~~~~~~~ include/trace/events/erofs.h:28:2: note: in expansion of macro 'TP_PROTO' TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags), ^~~~~~~~ That makes me very confused since most original EROFS tracepoint code was taken from f2fs, and finally I found commit 43c78d88036e ("kbuild: compile-test kernel headers to ensure they are self-contained") It seems these warnings are generated from KERNEL_HEADER_TEST feature and ext4/f2fs tracepoint files were in blacklist. Anyway, let's fix these issues for KERNEL_HEADER_TEST feature instead of adding to blacklist... [1] https://lore.kernel.org/lkml/20190826162432.11100665@canb.auug.org.au/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20190826132653.100731-1-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-27block: split .sysfs_lock into two locksMing Lei
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store path. Meantime, inside block's .show/.store callback, q->sysfs_lock is required. However, when mq & iosched kobjects are removed via blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held too. This way causes AB-BA lock because the kernfs built-in lock of 'kn-count' is required inside kobject_del() too, see the lockdep warning[1]. On the other hand, it isn't necessary to acquire q->sysfs_lock for both blk_mq_unregister_dev() & elv_unregister_queue() because clearing REGISTERED flag prevents storing to 'queue/scheduler' from being happened. Also sysfs write(store) is exclusive, so no necessary to hold the lock for elv_unregister_queue() when it is called in switching elevator path. So split .sysfs_lock into two: one is still named as .sysfs_lock for covering sync .store, the other one is named as .sysfs_dir_lock for covering kobjects and related status change. sysfs itself can handle the race between add/remove kobjects and showing/storing attributes under kobjects. For switching scheduler via storing to 'queue/scheduler', we use the queue flag of QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then we can avoid to hold .sysfs_lock during removing/adding kobjects. [1] lockdep warning ====================================================== WARNING: possible circular locking dependency detected 5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted ------------------------------------------------------ rmmod/777 is trying to acquire lock: 00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72 but task is already holding lock: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&q->sysfs_lock){+.+.}: __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __mutex_lock+0x14a/0xa9b blk_mq_hw_sysfs_show+0x63/0xb6 sysfs_kf_seq_show+0x11f/0x196 seq_read+0x2cd/0x5f2 vfs_read+0xc7/0x18c ksys_read+0xc4/0x13e do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __kernfs_remove+0x237/0x40b kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->sysfs_lock); lock(kn->count#202); lock(&q->sysfs_lock); lock(kn->count#202); *** DEADLOCK *** 2 locks held by rmmod/777: #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk] #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b stack backtrace: CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4 Call Trace: dump_stack+0x9a/0xe6 check_noncircular+0x207/0x251 ? print_circular_bug+0x32a/0x32a ? find_usage_backwards+0x84/0xb0 check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 ? check_prev_add+0xc45/0xc45 ? mark_lock+0x11b/0x804 ? check_usage_forwards+0x1ca/0x1ca __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 ? kernfs_remove_by_name_ns+0x59/0x72 __kernfs_remove+0x237/0x40b ? kernfs_remove_by_name_ns+0x59/0x72 ? kernfs_next_descendant_post+0x7d/0x7d ? strlen+0x10/0x23 ? strcmp+0x22/0x44 kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa ? disk_events_poll_msecs_store+0x12b/0x12b ? check_flags+0x1ea/0x204 ? mark_held_locks+0x1f/0x7a null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 ? free_module+0x39f/0x39f ? blkcg_maybe_throttle_current+0x8a/0x718 ? rwlock_bug+0x62/0x62 ? __blkcg_punt_bio_submit+0xd0/0xd0 ? trace_hardirqs_on_thunk+0x1a/0x20 ? mark_held_locks+0x1f/0x7a ? do_syscall_64+0x4c/0x295 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fb696cdbe6b Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008 RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828 RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000 R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0 R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0 Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27block: add helper for checking if queue is registeredMing Lei
There are 4 users which check if queue is registered, so add one helper to check it. Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27block: Remove blk_mq_register_dev()Bart Van Assche
This function has no callers. Hence remove it. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27netfilter: nft_dynset: support for element deletionAnder Juaristi
This patch implements the delete operation from the ruleset. It implements a new delete() function in nft_set_rhash. It is simpler to use than the already existing remove(), because it only takes the set and the key as arguments, whereas remove() expects a full nft_set_elem structure. Signed-off-by: Ander Juaristi <a@juaristi.eus> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27writeback, memcg: Implement foreign dirty flushingTejun Heo
There's an inherent mismatch between memcg and writeback. The former trackes ownership per-page while the latter per-inode. This was a deliberate design decision because honoring per-page ownership in the writeback path is complicated, may lead to higher CPU and IO overheads and deemed unnecessary given that write-sharing an inode across different cgroups isn't a common use-case. Combined with inode majority-writer ownership switching, this works well enough in most cases but there are some pathological cases. For example, let's say there are two cgroups A and B which keep writing to different but confined parts of the same inode. B owns the inode and A's memory is limited far below B's. A's dirty ratio can rise enough to trigger balance_dirty_pages() sleeps but B's can be low enough to avoid triggering background writeback. A will be slowed down without a way to make writeback of the dirty pages happen. This patch implements foreign dirty recording and foreign mechanism so that when a memcg encounters a condition as above it can trigger flushes on bdi_writebacks which can clean its pages. Please see the comment on top of mem_cgroup_track_foreign_dirty_slowpath() for details. A reproducer follows. write-range.c:: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> static const char *usage = "write-range FILE START SIZE\n"; int main(int argc, char **argv) { int fd; unsigned long start, size, end, pos; char *endp; char buf[4096]; if (argc < 4) { fprintf(stderr, usage); return 1; } fd = open(argv[1], O_WRONLY); if (fd < 0) { perror("open"); return 1; } start = strtoul(argv[2], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } size = strtoul(argv[3], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } end = start + size; while (1) { for (pos = start; pos < end; ) { long bread, bwritten = 0; if (lseek(fd, pos, SEEK_SET) < 0) { perror("lseek"); return 1; } bread = read(0, buf, sizeof(buf) < end - pos ? sizeof(buf) : end - pos); if (bread < 0) { perror("read"); return 1; } if (bread == 0) return 0; while (bwritten < bread) { long this; this = write(fd, buf + bwritten, bread - bwritten); if (this < 0) { perror("write"); return 1; } bwritten += this; pos += bwritten; } } } } repro.sh:: #!/bin/bash set -e set -x sysctl -w vm.dirty_expire_centisecs=300000 sysctl -w vm.dirty_writeback_centisecs=300000 sysctl -w vm.dirtytime_expire_seconds=300000 echo 3 > /proc/sys/vm/drop_caches TEST=/sys/fs/cgroup/test A=$TEST/A B=$TEST/B mkdir -p $A $B echo "+memory +io" > $TEST/cgroup.subtree_control echo $((1<<30)) > $A/memory.high echo $((32<<30)) > $B/memory.high rm -f testfile touch testfile fallocate -l 4G testfile echo "Starting B" (echo $BASHPID > $B/cgroup.procs pv -q --rate-limit 70M < /dev/urandom | ./write-range testfile $((2<<30)) $((2<<30))) & echo "Waiting 10s to ensure B claims the testfile inode" sleep 5 sync sleep 5 sync echo "Starting A" (echo $BASHPID > $A/cgroup.procs pv < /dev/urandom | ./write-range testfile 0 $((2<<30))) v2: Added comments explaining why the specific intervals are being used. v3: Use 0 @nr when calling cgroup_writeback_by_id() to use best-effort flushing while avoding possible livelocks. v4: Use get_jiffies_64() and time_before/after64() instead of raw jiffies_64 and arthimetic comparisons as suggested by Jan. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27writeback, memcg: Implement cgroup_writeback_by_id()Tejun Heo
Implement cgroup_writeback_by_id() which initiates cgroup writeback from bdi and memcg IDs. This will be used by memcg foreign inode flushing. v2: Use wb_get_lookup() instead of wb_get_create() to avoid creating spurious wbs. v3: Interpret 0 @nr as 1.25 * nr_dirty to implement best-effort flushing while avoding possible livelocks. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27writeback: Separate out wb_get_lookup() from wb_get_create()Tejun Heo
Separate out wb_get_lookup() which doesn't try to create one if there isn't already one from wb_get_create(). This will be used by later patches. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>