Age | Commit message (Collapse) | Author |
|
Now that we have chip operations for VTU accesses, mark all helpers from
global1_vtu.c as static. Only the various implementations of the
GetNext, LoadPurge and Flush operations need to be exposed.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new vtu_loadpurge operation to the chip info structure to differ
the various implementations of the VTU accesses.
Now that the STU handling is abstracted behind VTU operations, kill the
obsolete MV88E6XXX_FLAG_STU flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new vtu_getnext operation to the chip info structure to differ the
various implementations of the VTU accesses.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the code writes both VTU and STU data when loading a VTU entry,
load the corresponding STU entry at the same time.
This allows us to get rid of the STU management in the
_mv88e6xxx_vtu_new helper and thus remove the separate implementations
of STU Load/Purge and STU GetNext, as well as the unused family checks.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the code reads both VTU and STU data on VTU GetNext operation,
fetch the STU entry data of a VTU entry at the same time.
The STU data bits are masked with the VTU data bits and they are now all
read at the same time a VTU GetNext operation is issued.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extract the generic portion of code to issue an STU GetNext operation,
which will be used in other implementations.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code to access the VTU Data registers currently only supports the
88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.
Even though the 88E6352 family introduced an indirect table to program
the VLAN Spanning Tree states, the usage of the VTU Data registers
remains the same regardless the VTU or STU operation.
Now that the mv88e6xxx_vtu_entry structure contains both port membership
and states data, factorize the code to access them in global1_vtu.c.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even though every switch model has a different way to access the VTU
Data bits, the base implementation of the VTU GetNext operation remains
the same: wait, write the first VID to iterate from, start the
operation, and read the next VID.
Move this generic implementation into global1_vtu.c and abstract the
handling of the start VID (similarly to the ATU GetNext implementation),
before introducing a new chip operation for specific chips.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helpers to access the VTU VID register in the global1_vtu.c file.
At the same time, move mv88e6xxx_g1_vtu_vid_write at the beginning of
_mv88e6xxx_vtu_loadpurge, which adds no functional changes but makes
future patches simpler.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helpers to access the VTU SID register in the global1_vtu.c file.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helpers to access the VTU FID register in the global1_vtu.c file.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the VTU flush operation to global1_vtu.c and call it from a
mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the helper functions to access the Global 1 VTU Operation register
to a new global1_vtu.c file, and get rid of the old underscore prefix
naming convention. This file will be extended will all VTU/STU related
code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.
Some chips such as 88E6185 use different Data registers offsets for
ports state and membership, and program them in a single operation.
Other chips such as 88E6352 use the same register layout but program
them in distinct operations (an indirect table is used for 802.1s.)
Newer chips such as 88E6390 use the same offsets for both state and
membership in distinct operations, thus require multiple data accesses.
To correctly abstract this, split the "data" structure member of
mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
VTU support for newer chips.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some chips don't have a VLAN Table Unit, most of them do have a 4K
table, some others as the 88E6390 family has a 13th bit for the VID.
Add a new max_vid member to the info structure, used to check the
presence of a VTU as well as the value used to iterate from in VTU
GetNext operations.
This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- removed twl4030-madc driver
- added ASPEED PWM/fan driver
- various minor improvements and fixes in several drivers
* tag 'hwmon-for-linus-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (36 commits)
hwmon: (twl4030-madc) drop driver
hwmon: (tmp103) Use SIMPLE_DEV_PM_OPS helper macro
hwmon: (adt7475) set start bit in probe
hwmon: (ina209) Handled signed registers
hwmon: (lm87) Add OF device ID table
hwmon: (lm87) Remove unused I2C devices driver_data
drivers: hwmon: Support for ASPEED PWM/Fan tach
Documentation: dt-bindings: Document bindings for ASPEED AST2400/AST2500 PWM and Fan tach controller device driver
hwmon: (lm87) Allow channel data to be set from dts file
Documentation: dtb: lm87: Add hwmon binding documentation
hwmon: (ads7828) Accept optional parameters from device tree
hwmon: (dell-smm) Add Dell XPS 15 9560 into DMI list
hwmon: Constify str parameter of hwmon_ops->read_string
dt: Add vendor prefix for Sensirion
hwmon: (tmp421) Add OF device ID table
hwmon: (tmp103) Add OF device ID table
hwmon: (tmp102) Add OF device ID table
hwmon: (stts751) Add OF device ID table
hwmon: (ucd9200) Add OF device ID table
hwmon: (ucd9000) Add OF device ID table
...
|
|
Pull EDAC updates from Borislav Petkov:
- an EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)
- removal of DRAM error reporting through PCI SERR NMI (Borislav
Petkov)
- misc small fixes (Jan Glauber, Thor Thayer)
* tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, ghes: Do not enable it by default
EDAC: Rename report status accessors
EDAC: Delete edac_stub.c
EDAC: Update Kconfig help text
EDAC: Remove EDAC_MM_EDAC
EDAC: Issue tracepoint only when it is defined
ACPI/extlog: Add EDAC dependency
EDAC: Move edac_op_state to edac_mc.c
EDAC: Remove edac_err_assert
EDAC: Get rid of edac_handlers
x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
EDAC, highbank: Align Makefile directives
EDAC, thunderx: Remove unused code
EDAC, thunderx: Change LMC index calculation
EDAC, altera: Fix peripheral warnings for Cyclone5
EDAC, thunderx: Fix L2C MCI interrupt disable
EDAC, thunderx: Add Cavium ThunderX EDAC driver
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-04-30
Here's one last batch of Bluetooth patches in the bluetooth-next tree
targeting the 4.12 kernel.
- Remove custom ECDH implementation and use new KPP API instead
- Add protocol checks to hci_ldisc
- Add module license to HCI UART Nokia H4+ driver
- Minor fix for 32bit user space - 64 bit kernel combination
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull second round of block layer updates from Jens Axboe:
- Further fixups to the NVMe APST code, from Andy.
- Various fixes for (mostly) nvme-fc, from Christoph and James.
- NVMe scsi fixes from Jon and Christoph.
* 'for-4.12/post-merge' of git://git.kernel.dk/linux-block: (39 commits)
nvme-scsi: remove nvme_trans_security_protocol
nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io
nvme-scsi: Consider LBA format in IO splitting calculation
nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice
lpfc: Fix memory corruption of the lpfc_ncmd->list pointers
nvme: Add nvme_core.force_apst to ignore the NO_APST quirk
nvme: Display raw APST configuration via DYNAMIC_DEBUG
nvme: Fix APST comment
lpfc revison 11.2.0.12
Fix Express lane queue creation.
Update ABORT processing for NVMET.
Fix implicit logo and RSCN handling for NVMET
Add Fabric assigned WWN support.
Fix max_sgl_segments settings for NVME / NVMET
Fix crash after issuing lip reset
Fix driver load issues when MRQ=8
Remove hba lock from NVMET issue WQE.
Fix nvme initiator handling when not enabled.
Fix driver usage of 128B WQEs when WQ_CREATE is V1.
Fix driver unload/reload operation.
...
|
|
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
|
|
When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).
>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.
This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.
Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-updates-2017-04-30
Or says:
================
mlx5 neigh update
This series (whose code name is 'neigh update') from Hadar, enhances the
mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
neighbours used in offloaded flows which involved encapsulation.
In order to keep track on the validity state of such neighbours, we register
a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
becomes valid, offload the related flows to HW (the other way around when
neigh becomes invalid) and similarly when a neigh mac addresses changes.
Since this traffic is offloaded from the host OS, the neighbour for the IP
tunnel destination can mistakenly become STALE and deleted by the kernel
since its 'used' value wasn't changed. To address that, we proactively
update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
time stamps generated by the existing driver code for HW flow counters.
We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.
Prior to the core of the series, there's a patch from Saeed that introduces an
extendable vport representor implementation scheme. It provides a separation
between the eswitch to the netdev related aspects of the representors.
We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
through the long design and review cycles while we struggled to understand and
(hopefully correctly) implement the locking around the different driver flows(..) .
- Or.
=================
Misc Updates:
From Tariq:
Some small performance and trivial code optimization for mlx5 netdev driver
- Optimize poll ICOSQ completion queue
- Use prefetchw when a write is to follow
- Use u8 as ownership type in mlx5e_get_cqe()
From Eran:
- Disable LRO by default on specific setups
From Eli:
- Small cleanup for E-Switch to avoid redundant allocation
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After removing the PTP related initialization from slowpath start,
the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
Otherwise, it leads to a warning due to it being unused.
Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Output to the RDMA driver whether DPM mode is enabled or disabled in
the HW and if so what is the number of WIDs it supports
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When calculating doorbell BAR partitioning round up the number of
CPUs to the nearest power of 2 so the size of the DPI (per user
section) configured in the hardware will be stored properly and
not truncated.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add mechanism to verify RoCE resources are released prior to freeing the
bitmaps. If this is not the case, print what resources were not released.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the posting of the ramrod for the purpose of TID deregistration
fails, abort the deregistration operation without using the FW's
return code.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The internal RoCE SQE QP state isn't being used. Instead we mark the
QP as in regular error state.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use time_before_eq for time comparison more safe and dealing
with timer wrapping to be future-proof.
Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Try to carry error messages to the user via the netlink extended
ack message attribute.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Try to carry error messages to the user via the netlink extended
ack message attribute.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
reference to fix pmem crash
The x86 conversion to the generic GUP code included a small change which causes
crashes and data corruption in the pmem code - not good.
The root cause is that the /dev/pmem driver code implicitly relies on the x86
get_user_pages() implementation doing a get_page() on the page refcount, because
get_page() does a get_zone_device_page() which properly refcounts pmem's separate
page struct arrays that are not present in the regular page struct structures.
(The pmem driver does this because it can cover huge memory areas.)
But the x86 conversion to the generic GUP code changed the get_page() to
page_cache_get_speculative() which is faster but doesn't do the
get_zone_device_page() call the pmem code relies on.
One way to solve the regression would be to change the generic GUP code to use
get_page(), but that would slow things down a bit and punish other generic-GUP
using architectures for an x86-ism they did not care about. (Arguably the pmem
driver was probably not working reliably for them: but nvdimm is an Intel
feature, so non-x86 exposure is probably still limited.)
So restructure the pmem code's interface with the MM instead: get rid of the
get/put_zone_device_page() distinction, integrate put_zone_device_page() into
__put_page() and and restructure the pmem completion-wait and teardown machinery:
Kirill points out that the calls to {get,put}_dev_pagemap() can be
removed from the mm fast path if we take a single get_dev_pagemap()
reference to signify that the page is alive and use the final put of the
page to drop that reference.
This does require some care to make sure that any waits for the
percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
since it now maintains its own elevated reference.
This speeds up things while also making the pmem refcounting more robust going
forward.
Suggested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch allows users to enable/disable internal TX and/or RX
clock delay for BCM5481x series PHYs so as to satisfy RGMII timing
specifications.
On a particular platform, whether TX and/or RX clock delay is required
depends on how PHY connected to the MAC IP. This requirement can be
specified through "phy-mode" property in the platform device tree.
Signed-off-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
trivial fix to spelling mistakes in printk message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bnx2x driver is not providing proper alignment on the receive buffers it
passes to build_skb(), causing skb_shared_info to be misaligned.
skb_shared_info contains an atomic, and while PPC normally supports
unaligned accesses, it does not support unaligned atomics.
Aligning the size of rx buffers will ensure that page_frag_alloc() returns
aligned addresses.
This can be reproduced on PPC by setting the network MTU to 1450 (or other
non-multiple-of-4) and then generating sufficient inbound network traffic
(one or two large "wget"s usually does it), producing the following oops:
Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656
Faulting instruction address: 0xc00000000080ef8c
Oops: Kernel access of bad area, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common
CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761da #2
task: c00000ffd4892400 task.stack: c00000ffd4920000
NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320
REGS: c00000ffffc33710 TRAP: 0600 Not tainted (4.11.0-rc8-00088-g4c761da)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
CR: 24082042 XER: 00000000
CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1
GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100
GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000
GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f
GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001
GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000
GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a
GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100
NIP [c00000000080ef8c] __skb_clone+0xdc/0x140
LR [c00000000080eee8] __skb_clone+0x38/0x140
Call Trace:
[c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable)
[c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510
[c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80
[c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0
[c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260
[c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x]
[c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480
[c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0
[c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120
[c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200
[c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24
[c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110
[c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Presumably we never hit this return, but static checkers complain that
we need to unlock so we may as well fix that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
My static checker complains that we're holding a mutex on this error
path. Let's goto exit instead of returning directly.
Fixes: b0bccb69eba3 ("qed: Change locking scheme for VF channel")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.
We check for probe deferral in the mac init, so we not init DSAF
when there is no mdio, and free all resource, to later learn that
we need to defer the probe.
Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.
Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For legacy reasons NFP FW may be compiled to DMA packets to a constant
offset into the buffer and use the space before it for metadata. This
ensures that packets data always start at a certain offset regardless of
the amount of preceding metadata.
If rx offset is set to 0 there may still be up to 64 bytes of metadata
but metadata will start at the beginning of the buffer, instead of:
data_start_offset = rx_offset - meta_len
Even though we make the buffers larger to accommodate up to 64 bytes of
metadata, if there is only N bytes of metadata, we will end up with
N bytes of headroom and 64 - N bytes of tailroom. Therefore we can't
rely on that space for XDP headroom. Make sure we always allocate
full 256 bytes. This, unfortunately, means we can't fit the headroom
on an u8 any more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now the required Service Process ABI version is still tied
to max ID of known commands. For new NSP commands we are adding
we are checking if NSP version is recent enough on command-by-command
basis. The driver doesn't have to force the device to have the
very latest flash, anything newer than 0.8 should do.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reading TX queue indexes from the device memory on each interrupt
is expensive. It's doubly expensive with XDP running since we have
two TX rings to check there. If the software indexes indicate that
the TX queue is completely empty, however, we don't need to look at
the device completion index at all.
The queuing CPU is doing a wmb() before kicking the device TX so
we should be safe to assume on the CPU handling the completions will
never see old value of the software copy of the index.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On the RX path we follow the "drop if allocation of replacement
buffer fails" rule. With XDP we extended that to the TX action,
so if XDP prog returned TX but allocation of replacement RX buffer
failed, we will drop the packet.
To improve our XDP TX performance extend the idea of rings being
always full to XDP TX rings. Pre-fill the XDP TX rings with RX
buffers, and when XDP prog returns TX action swap the RX buffer
with the next buffer from the TX ring.
XDP TX complete will no longer free the buffers but let them
sit on the TX ring and wait for swap with RX buffer, instead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We will soon allocate RX buffers for caching on XDP TX rings.
The rx_ring parameter passed to nfp_net_rx_alloc_one() is not
actually used, remove it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Or points out in commit 423b3aecf290 ("net/mlx4: Change ENOTSUPP
to EOPNOTSUPP"), ENOTSUPP is NFS specific error. Replace it with
EOPNOTSUPP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid hashing the tx napi struct into napi_hash[], which is used for
busy polling receive queues.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creating a geneve link with 'udpcsum' set results in a creation of link
for which UDP checksum will NOT be computed on outbound packets, as can
be seen below.
11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0
geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum
Similarly, creating a link with 'noudpcsum' set results in a creation
of link for which UDP checksum will be computed on outbound packets.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The message "Cannot bind port X, err=Y" creates only confusion. In metadata
based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
no error message should be printed. But when IPv6 tunnel was requested, such
failure is fatal. The vxlan_socket_create does not know when the error is
harmless and when it's not.
Instead of passing such information down to vxlan_socket_create, remove the
message completely. It's not useful. We propagate the error code up to the
user space and the port number comes from the user space. There's nothing in
the message that the process creating vxlan interface does not know.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.
Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.
Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|