Age | Commit message (Collapse) | Author |
|
TCP stack makes sure packets for a given flow are monotically
increasing, but we want to allow UDP packets to use EDT as
well, so that QUIC servers can use in-kernel pacing.
This patch adds a per-flow rb-tree on which packets might
be stored. We still try to use the linear list for the
typical cases where packets are queued with monotically
increasing skb->tstamp, since queue/dequeue packets on
a standard list is O(1).
Note that the ability to store packets in arbitrary EDT
order will allow us to implement later a per TCP socket
mechanism adding delays (with jitter eventually) and reorders,
to implement convenient network emulators.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When IPoIB receives an SM LID change event, it reacts by flushing its
path record cache and rejoining multicast groups. This is the same
behavior it performs when it receives a reregistration event. This
behavior is unnecessary as an SM may have database backup or
synchronization mechanisms which permit the SM location or LID to change
without loss of multicast membership and without impact to path records.
Both opensm and the OPA FM issue reregistration events if a new SM is
started (or restarted with a new config) or an SM event occurs which
results in loss of multicast membership records by the SM (such as
opensm failover) or the SM encounters new nodes with Active ports (such
as after joining 2 fabrics by connecting switches via ISLs). Hence this
event can be depended on as the trigger for IPoIB cache and multicast
flushing.
It appears that some drivers, such as qib, and hfi1 issue the
IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not.
Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed
that Mellanox EDR HCAs do not generate SM change events and that opensm
does generate reregistration.
An SM LID change event is generated by the mentioned drivers to reflect
that sm_lid and/or sm_sl in the local port info has changed. The intent
of this event is to permit applications and ULPs which have a local copy
of this information (or an address handle using it) to update their
information.
The intent is that the reregistration event (caused by the SM via a bit
in Set(PortInfo)) be used to inform nodes that they need to rejoin
multicast groups, resubscribe for notices and potentially update path
records.
When an SM migrates or fails over, a SM LID change event can occur. In
response IPoIB discards path records and multicast membership and loses
connectivity until these records are restored via SA requests. In very
large fabrics, it may take minutes for the SM to be ready and for the SA
responses to be supplied. This can result in undesirable and
unnecessary IPoIB connectivity impacts. It also can result in an
unnecessary storm of SA queries from all nodes in a cluster potentially
followed by yet another storm if the SM issues the reregistration
request.
The fact the Mellanox HCAs do not even generate this event, is further
evidence that on modern IB fabrics there will be no ill side effects
from the proposed changes below to reduce the reaction by 3 kernel
components to this event. So these changes should be benign for Mellanox
IB fabrics and will benefit OPA fabrics while also making ib_core and
ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event
APIs.
Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib.
IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do
anything on SM LID change. IPoIB makes use of other ib_core components
to issue SA requests for it and those components correctly track SM LID
and SM LID changes.
Also in ib_core multicast handling, remove the test for
IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the
error state, which will trigger rejoins. This code is used by IPoIB as
well as the connection manager and other clients of multicast groups.
This kernel module centralizes group membership status and joins since a
node can only join a given group once but multiple ULPs or applications
may want to join the same group. It makes use of the sa_query.c
component in ib_core, which correctly trackes SM LID and SL. This
component does not track SM LID nor SL itself and hence need not react
to their changes.
Similarly in the ib_core cache code remove the handling for the
IB_EVENT_SM_CHANGE. In this function. The ib_cache_update function
which is ultimately called is updating local copies of the pkey table,
gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As
such it does not need to be called on an SM LID change. It technically
also does not need to be called on a reregistration. The LID_CHANGE,
PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR,
PORT_ACTICE) should be sufficient triggers.
It is worth noting that the alternative of simply having the hfi1 and
qib drivers not generate the SM LID change event was explored. While
this would duplicate what Mellanox drivers do now, it is not the correct
behavior and removes the ability for an SM to migrate without requiring
reregistration. Since both opensm and OPA SM have mechanisms to backup
or synchronize registration information, it is desirable to let them
perform SM migrations (with LID or SL changes) without requiring
reregistration when they deem it appropriate.
Suggested-by: Todd Rimmer <todd.rimmer@intel.com>
Tested-by: Michael Brooks <michael.brooks@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Todd Rimmer <todd.rimmer@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Pull xfs updates from Darrick Wong:
"Here's a big pile of new stuff for XFS for 5.2. XFS has grown the
ability to report metadata health status to userspace after online
fsck checks the filesystem. The online metadata checking code is (I
really hope) feature complete with the addition of checks for the
global fs counters, though it'll remain EXPERIMENTAL for now.
There are also fixes for thundering herds of writeback completions and
some other deadlocks, fixes for theoretical integer overflow attacks
on space accounting, and removal of the long-defunct 'mntpt' option
which was deprecated in the mid-2000s and (it turns out) totally
broken since 2011 (and nobody complained...).
Summary:
- Fix some more buffer deadlocks when performing an unmount after a
hard shutdown.
- Fix some minor space accounting issues.
- Fix some use after free problems.
- Make the (undocumented) FITRIM behavior consistent with other
filesystems.
- Embiggen the xfs geometry ioctl's data structure.
- Introduce a new AG geometry ioctl.
- Introduce a new online health reporting infrastructure and ioctl
for userspace to query a filesystem's health status.
- Enhance online scrub and repair to update the health reports.
- Reduce thundering herd problems when writeback io completes.
- Fix some transaction reservation type errors.
- Fix integer overflow problems with delayed alloc reservation
counters.
- Fix some problems where we would exit to userspace without
unlocking.
- Fix inconsistent behavior when finishing deferred ops fails.
- Strengthen scrub to check incore data against ondisk metadata.
- Remove long-broken mntpt mount option.
- Add an online scrub function for the filesystem summary counters,
which should make online metadata scrub more or less feature
complete for now.
- Various cleanups"
* tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (38 commits)
xfs: change some error-less functions to void types
xfs: add online scrub for superblock counters
xfs: don't parse the mtpt mount option
xfs: always rejoin held resources during defer roll
xfs: add missing error check in xfs_prepare_shift()
xfs: scrub should check incore counters against ondisk headers
xfs: allow scrubbers to pause background reclaim
xfs: rename the speculative block allocation reclaim toggle functions
xfs: track delayed allocation reservations across the filesystem
xfs: fix broken bhold behavior in xrep_roll_ag_trans
xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction
xfs: kill the xfs_dqtrx_t typedef
xfs: widen inode delalloc block counter to 64-bits
xfs: widen quota block counters to 64-bit integers
xfs: abort unaligned nowait directio early
xfs: assert that we don't enter agfl freeing with a non-permanent transaction
xfs: make tr_growdata a permanent transaction
xfs: merge adjacent io completions of the same type
xfs: remove unused m_data_workqueue
xfs: implement per-inode writeback completion queues
...
|
|
- Rewrite how clk parents can be specified to be DT/clkdev based instead
of just string based
* clk-parent-rewrite-1:
clk: Cache core in clk_fetch_parent_index() without names
clk: fixed-factor: Initialize clk_init_data on stack
clk: fixed-factor: Let clk framework find parent
clk: Allow parents to be specified via clkspec index
clk: Look for parents with clkdev based clk_lookups
clk: Allow parents to be specified without string names
clk: Add of_clk_hw_register() API for early clk drivers
driver core: Let dev_of_node() accept a NULL dev
clk: Prepare for clk registration API that uses DT nodes
clkdev: Move clk creation outside of 'clocks_mutex'
|
|
* clk-ti:
clk: Remove CLK_IS_BASIC clk flag
clk: ti: dra7: disable the RNG and TIMER12 clkctrl clocks on HS devices
clk: ti: dra7x: prevent non-existing clkctrl clocks from registering
ARM: omap2+: hwmod: drop CLK_IS_BASIC flag usage
clk: ti: export the omap2_clk_is_hw_omap call
|
|
and 'clk-spdx' into clk-next
- Support for STM32F769
- Rework AT91 sckc DT bindings
- Fix slow RC oscillator issue on sama5d3
- AT91 sam9x60 PMC support
- SiFive FU540 PRCI and PLL support
* clk-stm32f4:
clk: stm32mp1: Add ddrperfm clock
clk: stm32: Introduce clocks of STM32F769 board
* clk-tegra:
clk: tegra: divider: Mark Memory Controller clock as read-only
clk: tegra: emc: Replace BUG() with WARN_ONCE()
clk: tegra: emc: Fix EMC max-rate clamping
clk: tegra: emc: Support multiple RAM codes
clk: tegra: emc: Don't enable EMC clock manually
clk: tegra124: Remove lock-enable bit from PLLM
clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
clk: tegra: Don't enable already enabled PLLs
* clk-at91:
clk: at91: Mark struct clk_range as const
clk: at91: add sam9x60 pmc driver
dt-bindings: clk: at91: add bindings for SAM9X60 pmc
clk: at91: add sam9x60 PLL driver
clk: at91: master: Add sam9x60 support
clk: at91: usb: Add sam9x60 support
clk: at91: allow configuring generated PCR layout
clk: at91: allow configuring peripheral PCR layout
clk: at91: sckc: handle different RC startup time
clk: at91: modernize sckc binding
dt-bindings: clock: at91: new sckc bindings
* clk-sifive-fu540:
clk: sifive: add a driver for the SiFive FU540 PRCI IP block
clk: analogbits: add Wide-Range PLL library
dt-bindings: clk: add documentation for the SiFive PRCI driver
* clk-spdx:
clk: sunxi-ng: Use the correct style for SPDX License Identifier
clk: sprd: Use the correct style for SPDX License Identifier
clk: renesas: Use the correct style for SPDX License Identifier
clk: qcom: Use the correct style for SPDX License Identifier
clk: davinci: Use the correct style for SPDX License Identifier
clk: actions: Use the correct style for SPDX License Identifier
|
|
and 'clk-qoriq' into clk-next
- Mark UFS clk as critical on Hi-Silicon hi3660 SoCs
- Support for Cirrus Logic Lochnagar clks
* clk-hisi:
clk: hi3660: Mark clk_gate_ufs_subsys as critical
* clk-lochnagar:
clk: lochnagar: Add support for the Cirrus Logic Lochnagar
clk: lochnagar: Add initial binding documentation
* clk-allwinner:
clk: sunxi-ng: sun5i: Export the MBUS clock
clk: sunxi-ng: a83t: Add pll-video0 as parent of csi-mclk
clk: sunxi-ng: h6: Allow video & vpu clocks to change parent rate
clk: sunxi-ng: h6: Preset hdmi-cec clock parent
clk: sunxi: Add Kconfig options
clk: sunxi-ng: f1c100s: fix USB PHY gate bit offset
clk: sunxi-ng: Allow DE clock to set parent rate
* clk-rockchip:
clk: rockchip: undo several noc and special clocks as critical on rk3288
clk: rockchip: add a COMPOSITE_DIV_OFFSET clock-type
clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
clk: rockchip: Limit use of USB PHY clock to USB on rk3288
clk: rockchip: Fix video codec clocks on rk3288
clk: rockchip: Make rkpwm a critical clock on rk3288
clk: rockchip: fix wrong clock definitions for rk3328
* clk-qoriq:
clk: qoriq: increase array size of cmux_to_group
dt-bindings: qoriq-clock: Add ls1028a chip compatible string
clk: qoriq: Add ls1028a clock configuration
clk: qoriq: add more PLL divider clocks support
dt-bindings: qoriq-clock: add more PLL divider clocks support
|
|
'clk-zynq' into clk-next
- Various static analysis fixes/finds
- Video Engine (ECLK) support on Aspeed SoCs
- Xilinx ZynqMP Versal platform support
- Convert Xilinx ZynqMP driver to be struct oriented
* clk-sa:
clk: mvebu: fix spelling mistake "gatable" -> "gateable"
clk: ux500: add range to usleep_range
clk: tegra: Make tegra_clk_super_mux_ops static
clk: davinci: cfgchip: use PTR_ERR_OR_ZERO in da8xx_cfgchip_register_div4p5
* clk-aspeed:
clk: Aspeed: Setup video engine clocking
* clk-samsung:
clk: samsung: exynos5410: Add gate clock for ADC
clk: samsung: dt-bindings: Add ADC clock ID to Exynos5410
clk: samsung: dt-bindings: Put CLK_UART3 in order
* clk-ingenic:
clk: ingenic: jz4725b: Add UDC PHY clock
dt-bindings: clock: jz4725b-cgu: Add UDC PHY clock
* clk-zynq:
clk: zynqmp: use structs for clk query responses
clk: zynqmp: fix check for fractional clock
clk: zynqmp: do not export zynqmp_clk_register_* functions
clk: zynqmp: fix kerneldoc of __zynqmp_clock_get_parents
drivers: clk: Update clock driver to handle clock attribute
drivers: clk: zynqmp: Allow zero divisor value
|
|
'clk-basic-be' into clk-next
- Remove clk_readl() and introduce BE versions of basic clk types
* clk-doc:
clk: Drop duplicate clk_register() documentation
clk: Document and simplify clk_core_get_rate_nolock()
clk: Remove 'flags' member of struct clk_fixed_rate
clk: nxp: Drop 'flags' on fixed_rate clk macro
clk: Document __clk_mux_determine_rate()
clk: Document CLK_MUX_READ_ONLY mux flag
clk: Document deprecated things
clk: Collapse gpio clk kerneldoc
* clk-more-critical:
clk: highbank: Convert to CLK_IS_CRITICAL
* clk-meson: (21 commits)
clk: meson: axg-audio: add g12a support
clk: meson: axg-audio: don't register inputs in the onecell data
clk: meson: axg_audio: replace prefix axg by aud
dt-bindings: clk: axg-audio: add g12a support
clk: meson: meson8b: add the video decoder clock trees
clk: meson: meson8b: add the VPU clock trees
clk: meson: meson8b: add support for the GP_PLL clock on Meson8m2
clk: meson: meson8b: use a separate clock table for Meson8m2
dt-bindings: clock: meson8b: export the video decoder clocks
clk: meson-g12a: add video decoder clocks
dt-bindings: clock: meson8b: export the VPU clock
clk: meson-g12a: add PCIE PLL clocks
dt-bindings: clock: g12a-aoclk: expose CLKID_AO_CTS_OSCIN
clk: meson-pll: add reduced specific clk_ops for G12A PCIe PLL
dt-bindings: clock: meson8b: drop the "ABP" clock definition
clk: meson: g12a: add cpu clocks
dt-bindings: clk: g12a-clkc: add VDEC clock IDs
dt-bindings: clock: axg-audio: unexpose controller inputs
dt-bindings: clk: g12a-clkc: add PCIE PLL clock ID
clk: g12a-aoclk: re-export CLKID_AO_SAR_ADC_SEL clock id
...
* clk-basic-be:
clk: core: replace clk_{readl,writel} with {readl,writel}
clk: core: remove powerpc special handling
powerpc/512x: mark clocks as big endian
clk: mux: add explicit big endian support
clk: multiplier: add explicit big endian support
clk: gate: add explicit big endian support
clk: fractional-divider: add explicit big endian support
clk: divider: add explicit big endian support
|
|
'clk-imx' into clk-next
- Qualcomm QCS404 CDSP clk support
- Qualcomm QCS404 Turing clk support
- Mediatek MT8183 clock support
- Mediatek MT8516 clock support
- Milbeaut M10V clk controller support
* clk-renesas:
clk: renesas: rcar-gen3: Remove unused variable
clk: renesas: rcar-gen3: Fix cpg_sd_clock_round_rate() return value
clk: renesas: r8a77980: Fix RPC-IF module clock's parent
clk: renesas: rcar-gen3: Rename DRIF clocks
clk: renesas: rcar-gen3: Correct parent clock of Audio-DMAC
clk: renesas: rcar-gen3: Correct parent clock of SYS-DMAC
clk: renesas: rcar-gen3: Correct parent clock of HS-USB
clk: renesas: rcar-gen3: Correct parent clock of EHCI/OHCI
clk: renesas: r8a774c0: Add Z2 clock
clk: renesas: r8a77990: Add Z2 clock
clk: renesas: rcar-gen3: Support Z and Z2 clocks with high frequency parents
math64: New DIV64_U64_ROUND_CLOSEST helper
clk: renesas: rcar-gen3: Remove CLK_TYPE_GEN3_Z2
clk: renesas: rcar-gen3: Parameterise Z and Z2 clock offset
clk: renesas: rcar-gen3: Parameterise Z and Z2 clock fixed divisor
clk: renesas: r9a06g032: Add missing PCI USB clock
clk: renesas: r7s9210: Always use readl()
clk: renesas: rcar-gen3: Pass name/offset to cpg_sd_clk_register()
* clk-qcom:
clk: qcom: Skip halt checks on gcc_pcie_0_pipe_clk for 8998
clk: qcom: Add QCS404 TuringCC
clk: qcom: branch: Add AON clock ops
dt-bindings: clock: Introduce Qualcomm Turing Clock controller
clk: qcom: gcc-qcs404: Add CDSP related clocks and resets
* clk-mtk:
clk: mediatek: add clock driver for MT8516
dt-bindings: mediatek: apmixedsys: add support for MT8516
dt-bindings: mediatek: infracfg: add support for MT8516
dt-bindings: mediatek: topckgen: add support for MT8516
clk: mediatek: Allow changing PLL rate when it is off
clk: mediatek: Add MT8183 clock support
clk: mediatek: Add configurable pcw_chg_reg to mtk_pll_data
clk: mediatek: Add dt-bindings for MT8183 clocks
dt-bindings: ARM: Mediatek: Document bindings for MT8183
clk: mediatek: Add configurable pcwibits and fmin to mtk_pll_data
clk: mediatek: Add new clkmux register API
clk: mediatek: Disable tuner_en before change PLL rate
* clk-milbeaut:
clock: milbeaut: Add Milbeaut M10V clock controller
dt-bindings: clock: milbeaut: add Milbeaut clock description
* clk-imx:
clk: imx: correct pfdv2 gate_bit/vld_bit operations
clk: imx: clk-pllv3: mark expected switch fall-throughs
clk: imx8mq: Add dsi_ipg_div
clk: imx: pllv4: add fractional-N pll support
clk: imx: keep uart clock on during system boot
clk: imx: correct i.MX7D AV PLL num/denom offset
clk: imx6sll: Fix mispelling uart4_serial as serail
clk: imx: pll14xx: drop unused variable
clk: imx: rename clk-imx51-imx53.c to clk-imx5.c
clk: imx5: Fix i.MX50 ESDHC clock registers
clk: imx5: Fix i.MX50 mainbus clock registers
clk: imx: Remove unused imx_get_clk_hw_fixed
dt-bindings: clock: imx7ulp: remove SNVS clock
clk: imx7ulp: remove snvs clock
|
|
Pull iomap updates from Darrick Wong:
"Nothing particularly exciting here, just adding some callouts for gfs2
and cleaning a few things.
Summary:
- Add some extra hooks to the iomap buffered write path to enable
gfs2 journalled writes
- SPDX conversion
- Various refactoring"
* tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: move iomap_read_inline_data around
iomap: Add a page_prepare callback
iomap: Fix use-after-free error in page_done callback
fs: Turn __generic_write_end into a void function
iomap: Clean up __generic_write_end calling
iomap: convert to SPDX identifier
|
|
Pull jfs updates from Dave Kleikamp:
"Several minor jfs fixes"
* tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggy:
jfs: fix bogus variable self-initialization
fs/jfs: Switch to use new generic UUID API
jfs: compare old and new mode before setting update_mode flag
jfs: remove incorrect comment in jfs_superblock
jfs: fix spelling mistake, EACCESS -> EACCES
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This time the majority of changes are cleanups, though there's still a
number of changes of user interest.
User visible changes:
- better read time and write checks to catch errors early and before
writing data to disk (to catch potential memory corruption on data
that get checksummed)
- qgroups + metadata relocation: last speed up patch int the series
to address the slowness, there should be no overhead comparing
balance with and without qgroups
- FIEMAP ioctl does not start a transaction unnecessarily, this can
result in a speed up and less blocking due to IO
- LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this
can speed up the mentioned ioctl and scrub as well
- fsync on files with many (but not too many) hardlinks is faster,
finer decision if the links should be fsynced individually or
completely
- send tries harder to find ranges to clone
- trim/discard will skip unallocated chunks that haven't been touched
since the last mount
Fixes:
- send flushes delayed allocation before start, otherwise it could
miss some changes in case of a very recent rw->ro switch of a
subvolume
- fix fallocate with qgroups that could lead to space accounting
underflow, reported as a warning
- trim/discard ioctl honours the requested range
- starting send and dedupe on a subvolume at the same time will let
only one of them succeed, this is to prevent changes that send
could miss due to dedupe; both operations are restartable
Core changes:
- more tree-checker validations, errors reported by fuzzing tools:
- device item
- inode item
- block group profiles
- tracepoints for extent buffer locking
- async cow preallocates memory to avoid errors happening too deep in
the call chain
- metadata reservations for delalloc reworked to better adapt in
many-writers/low-space scenarios
- improved space flushing logic for intense DIO vs buffered workloads
- lots of cleanups
- removed unused struct members
- redundant argument removal
- properties and xattrs
- extent buffer locking
- selftests
- use common file type conversions
- many-argument functions reduction"
* tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits)
btrfs: Use kvmalloc for allocating compressed path context
btrfs: Factor out common extent locking code in submit_compressed_extents
btrfs: Set io_tree only once in submit_compressed_extents
btrfs: Replace clear_extent_bit with unlock_extent
btrfs: Make compress_file_range take only struct async_chunk
btrfs: Remove fs_info from struct async_chunk
btrfs: Rename async_cow to async_chunk
btrfs: Preallocate chunks in cow_file_range_async
btrfs: reserve delalloc metadata differently
btrfs: track DIO bytes in flight
btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop
btrfs: delete unused function btrfs_set_prop_trans
btrfs: start transaction in xattr_handler_set_prop
btrfs: drop local copy of inode i_mode
btrfs: drop old_fsflags in btrfs_ioctl_setflags
btrfs: modify local copy of btrfs_inode flags
btrfs: drop useless inode i_flags copy and restore
btrfs: start transaction in btrfs_ioctl_setflags()
btrfs: export btrfs_set_prop
btrfs: refactor btrfs_set_props to validate externally
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs stable fodder fixes from Al Viro:
- acct_on() fix for deadlock caught by overlayfs folks
- autofs RCU use-after-free SNAFU (->d_manage() can be called
locklessly, so we need to RCU-delay freeing the objects it looks at)
- (hopefully) the end of "do we need freeing this dentry RCU-delayed"
whack-a-mole.
* 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs: fix use-after-free in lockless ->d_manage()
dcache: sort the freeing-without-RCU-delay mess for good.
acct_on(): don't mess with freeze protection
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
"Introduction of separate method for RCU-delayed part of
->destroy_inode() (if any).
Pretty much as posted, except that destroy_inode() stashes
->free_inode into the victim (anon-unioned with ->i_fops) before
scheduling i_callback() and the last two patches (sockfs conversion
and folding struct socket_wq into struct socket) are excluded - that
pair should go through netdev once davem reopens his tree"
* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
orangefs: make use of ->free_inode()
shmem: make use of ->free_inode()
hugetlb: make use of ->free_inode()
overlayfs: make use of ->free_inode()
jfs: switch to ->free_inode()
fuse: switch to ->free_inode()
ext4: make use of ->free_inode()
ecryptfs: make use of ->free_inode()
ceph: use ->free_inode()
btrfs: use ->free_inode()
afs: switch to use of ->free_inode()
dax: make use of ->free_inode()
ntfs: switch to ->free_inode()
securityfs: switch to ->free_inode()
apparmor: switch to ->free_inode()
rpcpipe: switch to ->free_inode()
bpf: switch to ->free_inode()
mqueue: switch to ->free_inode()
ufs: switch to ->free_inode()
coda: switch to ->free_inode()
...
|
|
Huazhong Tan says:
====================
cleanup & optimizations & bugfixes for HNS3 driver
This patchset contains some cleanup related to hns3_enet_ring
struct and tx bd filling process, optimizations related
to rx page reusing, barrier using and tso handling process,
bugfixes related to tunnel type handling and error handling for
desc filling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch uses devm_kcalloc instead of kcalloc when allocating
ring->desc_cb, because devm_kcalloc not only ensure to free the
memory when the dev is deallocted, but also allocate the memory
from it's device memory node.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes some unused field in struct hns3_enet_ring,
use ring->dev for ring_to_dev macro, and use dev consistently
in hns3_fill_desc.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When page size is 64K, RX buffer is currently not reused when the
page_offset is moved to last buffer. This patch adds checking to
decide whether the buffer page can be reused when last_offset is
moved beyond last offset.
If the driver is the only user of page when page_offset is moved
to beyond last offset, then buffer can be reused and page_offset
is set to zero.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, a barrier is used when cleaning each TX BD, which may
cause performance degradation.
This patch optimizes it to use one barrier when cleaning TX BD
each round.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When desc filling fails in hns3_nic_net_xmit, it will call
hns3_clear_desc to unmap the dma mapping. But currently the
ring->next_to_use points to the desc where the desc filling
or dma mapping return error, which means the desc that
ring->next_to_use points to has not done the dma mapping,
the desc that need unmapping is before the ring->next_to_use.
This patch fixes it by calling ring_ptr_move_bw(next_to_use)
before doing unmapping operation, and set desc_cb->dma to
zero to avoid freeing it again when unloading.
Also, when filling skb head or frag fails, both need to unmap
all the way back to next_to_use_head, so remove one desc filling
error handling.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When filling len and checksum info to description, there is some
similar checking or calculation.
So this patch adds hns3_set_l2l3l4 to fill the inner(/normal)
header's len and checksum info. If it is a encapsulation skb, it
calls hns3_set_outer_l2l3l4 to handle the outer header's len and
checksum info, in order to avoid some similar checking or
calculation.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch separates the inner and outer l2l3l4 len handling in
hns3_set_l2l3l4_len, this is a preparation to combine the l2l3l4
len and checksum handling for inner and outer header.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to hardware user manual, the tunnel packet type is
available in the rx.ol_info field of struct hns3_desc. Currently
the tunnel packet type is decided by the rx.l234_info, which may
cause RX checksum handling error.
This patch fixes it by using the correct field in struct hns3_desc
to decide the tunnel packet type.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HW requires every continuous 8 buffer data to be larger than MSS,
we simplify it by ensuring skb_headlen + the first continuous
7 frags to to be larger than GSO header len + mss, and the
remaining continuous 7 frags to be larger than MSS except the
last 7 frags.
This patch adds hns3_skb_need_linearized to handle it for TSO
case.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, using "ethtool --statistics" can show how many time RX
page have been reused, but there is no counter for RX page not
being reused.
This patch adds non_reuse_pg counter to better debug the performance
issue, because it is hard to determine when the RX page is reused
or not if there is no such counter.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
napi_schedule_irqoff is introduced to be used from hard interrupts
handlers or when irqs are already masked, see:
https://lists.openwall.net/netdev/2014/10/29/2
So this patch replaces napi_schedule with napi_schedule_irqoff.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, maybe_stop_tx ops for TSO and non-TSO case share some BD
calculation code, so this patch unifies the maybe_stop_tx by removing
the maybe_stop_tx ops. skb_is_gso() can be used to differentiate the
case between TSO and non-TSO case if there is need to handle special
case for TSO case.
This patch also add tx_copy field in "ethtool --statistics" to help
better debug the performance issue caused by calling skb_copy.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use CONFIG_MEMMAP_CACHEATTR to initialize MPU as described in the Xtensa
LSP RM document. Coalesce adjacent regions with the same cacheattr.
Update Kconfig help text.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Implement atomic primitives using exclusive access opcodes available in
the recent xtensa cores.
Since l32ex/s32ex don't have any memory ordering guarantees don't define
__smp_mb__before_atomic/__smp_mb__after_atomic to make them use memw.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Hauke Mehrtens says:
====================
net: dsa: lantiq: Add bridge offloading
This adds bridge offloading for the Intel / Lantiq GSWIP 2.1 switch.
Changes since:
v2:
- Added Fixes tag to patch 1
- Fixed typo
- added GSWIP_TABLE_MAC_BRIDGE_STATIC and made use of it
- used GSWIP_TABLE_MAC_BRIDGE in more places
v1:
- fix typo signle -> single
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds functions to add and remove static entries to and from the
forwarding database and dump the full forwarding database.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fast aging per port is not supported directly by the hardware, it is
only possible to configure a global aging time.
Do the fast aging by iterating over the MAC forwarding table and remove
all dynamic entries for a given port.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VLAN aware bridge offloading is similar to the VLAN unaware
offloading, this makes it possible to offload the VLAN bridge
functionalities.
The hardware supports up to 64 VLAN bridge entries, we already use one
entry for each LAN port to prevent forwarding of packets between the
ports when the ports are not in a bridge, so in the end we have 57
possible VLANs.
The VLAN filtering is currently only active when the ports are in a
bridge, VLAN filtering for ports not in a bridge is not implemented.
It is currently not possible to change between VLAN filtering and not
filtering while the port is already in a bridge, this would make the
driver more complicated.
The VLANs are only defined on bridge entries, so we will not add
anything into the hardware when the port joins a bridge if it is doing
VLAN filtering, but only when an allowed VLAN is added.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows to offload bridges with DSA to the switch hardware and do
the packet forwarding in hardware.
This implements generic functions to access the switch hardware tables,
which are used to control many features of the switch.
This patch activates the MAC learning by removing the MAC address table
lock, to prevent uncontrolled forwarding of packets between all the LAN
ports, they are added into individual bridge tables entries with
individual flow ids and the switch will do the MAC learning for each
port separately before they are added to a real bridge.
Each bridge consist of an entry in the active VLAN table and the VLAN
mapping table, table entries with the same index are matching. In the
VLAN unaware mode we configure everything with VLAN ID 0, but we use
different flow IDs, the switch should handle all VLANs as normal payload
and ignore them. When the hardware looks for the port of the destination
MAC address it only takes the entries which have the same flow ID of the
ingress packet.
The bridges are configured with 64 possible entries with these
information:
Table Index, 0...63
VLAN ID, 0...4095: VLAN ID 0 is untagged
flow ID, 0..63: Same flow IDs share entries in MAC learning table
port map, one bit for each port number
tagged port map, one bit for each port number
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the special tag in ingress only on the CPU port and not on all
ports. A packet with a special tag could circumvent the hardware
forwarding and should only be allowed on the CPU port where Linux
controls the port.
Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200)"
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace numeric inline assembly parameters with named parameters.
Drop unused parameters from the futex_atomic_cmpxchg_inatomic.
Use new temporary variable to hold target address in the fixup code in
futex_atomic_cmpxchg_inatomic. Conditionalize function bodies so that
only 'return -ENOSYS' is left in configurations without futex support.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU state handling updates from Borislav Petkov:
"This contains work started by Rik van Riel and brought to fruition by
Sebastian Andrzej Siewior with the main goal to optimize when to load
FPU registers: only when returning to userspace and not on every
context switch (while the task remains in the kernel).
In addition, this optimization makes kernel_fpu_begin() cheaper by
requiring registers saving only on the first invocation and skipping
that in following ones.
What is more, this series cleans up and streamlines many aspects of
the already complex FPU code, hopefully making it more palatable for
future improvements and simplifications.
Finally, there's a __user annotations fix from Jann Horn"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails
x86/pkeys: Add PKRU value to init_fpstate
x86/fpu: Restore regs in copy_fpstate_to_sigframe() in order to use the fastpath
x86/fpu: Add a fastpath to copy_fpstate_to_sigframe()
x86/fpu: Add a fastpath to __fpu__restore_sig()
x86/fpu: Defer FPU state load until return to userspace
x86/fpu: Merge the two code paths in __fpu__restore_sig()
x86/fpu: Restore from kernel memory on the 64-bit path too
x86/fpu: Inline copy_user_to_fpregs_zeroing()
x86/fpu: Update xstate's PKRU value on write_pkru()
x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
x86/entry: Add TIF_NEED_FPU_LOAD
x86/fpu: Eager switch PKRU state
x86/pkeys: Don't check if PKRU is zero before writing it
x86/fpu: Only write PKRU if it is different from current
x86/pkeys: Provide *pkru() helpers
x86/fpu: Use a feature number instead of mask in two more helpers
x86/fpu: Make __raw_xsave_addr() use a feature number instead of mask
x86/fpu: Add an __fpregs_load_activate() internal helper
...
|
|
As section 15 of Documentation/process/coding-style.rst clearly
describes that compiler will be able to optimize code.
Hence drop inline for get and put helpers for parent.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
device_for_each_child() stops executing callback function for remaining
child devices, if callback hits an error.
Each child mdev device is independent of each other.
While unregistering parent device, mdev core must remove all child mdev
devices.
Therefore, mdev_device_remove_cb() always returns success so that
device_for_each_child doesn't abort if one child removal hits error.
While at it, improve remove and unregister functions for below simplicity.
There isn't need to pass forced flag pointer during mdev parent
removal which invokes mdev_device_remove(). So simplify the flow.
mdev_device_remove() is called from two paths.
1. mdev_unregister_driver()
mdev_device_remove_cb()
mdev_device_remove()
2. remove_store()
mdev_device_remove()
Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
mdev_remove_sysfs_files() should follow exact mirror sequence of a
create, similar to what is followed in error unwinding path of
mdev_create_sysfs_files().
Fixes: 6a62c1dfb5c7 ("vfio/mdev: Re-order sysfs attribute creation")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Instead of masking return error to -EBUSY, return actual error
returned by the driver.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
There is no need use 'extern' for exported functions.
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Remove unused kref from the mdev_device structure.
Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
During mdev parent registration in mdev_register_device(),
if parent device is duplicate, it releases the reference of existing
parent device.
This is incorrect. Existing parent device should not be touched.
Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
"Minor updates to ktest.pl
- Handle meta characters in grub memu
- Use configurable reboot return code for handling ssh reboots
- Display names and iteration number on error message"
* tag 'ktest-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: introduce REBOOT_RETURN_CODE to confirm the result of REBOOT
ktest: Add support for meta characters in GRUB_MENU
ktest: Show name and iteration on errors
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-05-06
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Two AF_XDP libbpf fixes for socket teardown; first one an invalid
munmap and the other one an invalid skmap cleanup, both from Björn.
2) More graceful CONFIG_DEBUG_INFO_BTF handling when pahole is not
present in the system to generate vmlinux btf info, from Andrii.
3) Fix libbpf and thus fix perf build error with uClibc on arc
architecture, from Vineet.
4) Fix missing libbpf_util.h header install in libbpf, from William.
5) Exclude bash-completion/bpftool from .gitignore pattern, from Masahiro.
6) Fix up rlimit in test_libbpf_open kselftest test case, from Yonghong.
7) Minor misc cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-05-06
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Two x32 JIT fixes: one which has buggy signed comparisons in 64
bit conditional jumps and another one for 64 bit negation, both
from Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Allow state reset of printk_once() calls.
- Prevent crashes when dereferencing invalid pointers in vsprintf().
Only the first byte is checked for simplicity.
- Make vsprintf warnings consistent and inlined.
- Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
modifiers.
- Some clean up of vsprintf and test_printf code.
* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Make function pointer_string static
vsprintf: Limit the length of inlined error messages
vsprintf: Avoid confusion between invalid address and value
vsprintf: Prevent crash when dereferencing invalid pointers
vsprintf: Consolidate handling of unknown pointer specifiers
vsprintf: Factor out %pO handler as kobject_string()
vsprintf: Factor out %pV handler as va_format()
vsprintf: Factor out %p[iI] handler as ip_addr_string()
vsprintf: Do not check address of well-known strings
vsprintf: Consistent %pK handling for kptr_restrict == 0
vsprintf: Shuffle restricted_pointer()
printk: Tie printk_once / printk_deferred_once into .data.once for reset
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
lib/test_printf: Switch to bitmap_zalloc()
|
|
When the refcount is 0 the device is invisible to netlink. However in the
patch below the refcount = 1 was moved to after the device_add(). This
creates a race where userspace can issue a netlink query after the
device_add() event and not see the device as visible.
Ensure that no uevent is fired before device is fully registered.
Fixes: d79af7242bb2 ("RDMA/device: Expose ib_device_try_get(()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|