summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-11-08net: 8021q: vlan_core: allow use list of vlans for real deviceIvan Khoronzhuk
It's redundancy for the drivers to hold the list of vlans when absolutely the same list exists in vlan core. In most cases it's needed only to traverse the vlan devices, their vids and sync some settings with h/w, so add API to simplify this. At least some of these drivers also can benefit: grep "for_each.*vid" -r drivers/net/ethernet/ drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: drivers/net/ethernet/qlogic/qlge/qlge_main.c: drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c: drivers/net/ethernet/via/via-rhine.c: drivers/net/ethernet/via/via-velocity.c: drivers/net/ethernet/intel/igb/igb_main.c: drivers/net/ethernet/intel/ice/ice_main.c: drivers/net/ethernet/intel/e1000/e1000_main.c: drivers/net/ethernet/intel/i40e/i40e_main.c: drivers/net/ethernet/intel/e1000e/netdev.c: drivers/net/ethernet/intel/igbvf/netdev.c: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: drivers/net/ethernet/intel/ixgb/ixgb_main.c: drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: drivers/net/ethernet/amd/xgbe/xgbe-dev.c: drivers/net/ethernet/emulex/benet/be_main.c: drivers/net/ethernet/neterion/vxge/vxge-main.c: drivers/net/ethernet/adaptec/starfire.c: drivers/net/ethernet/brocade/bna/bnad.c: Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: core: dev_addr_lists: add auxiliary func to handle reference address ↵Ivan Khoronzhuk
updates In order to avoid all table update, and only remove or add new address, the auxiliary function exists, named __hw_addr_sync_dev(). It allows end driver do nothing when nothing changed and add/rm when concrete address is firstly added or lastly removed. But it doesn't include cases when an address of real device or vlan was reused by other vlans or vlan/macval devices. For handaling events when address was reused/unreused the patch adds new auxiliary routine - __hw_addr_ref_sync_dev(). It allows to do nothing when nothing was changed and do updates only for an address being added/reused/deleted/unreused. Thus, clone address changes for vlans can be mirrored in the table. The function is exclusive with __hw_addr_sync_dev(). It's responsibility of the end driver to identify address vlan device, if it needs so. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net/mlx5: Fix offsets of ifc reserved fieldsGal Pressman
Fix wrong offsets of reserved fields in ifc file. Issues found using pahole. Signed-off-by: Gal Pressman <pressmangal@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-08udp: Support for error handlers of tunnels with arbitrary destination portStefano Brivio
ICMP error handling is currently not possible for UDP tunnels not employing a receiving socket with local destination port matching the remote one, because we have no way to look them up. Add an err_handler tunnel encapsulation operation that can be exported by tunnels in order to pass the error to the protocol implementing the encapsulation. We can't easily use a lookup function as we did for VXLAN and GENEVE, as protocol error handlers, which would be in turn called by implementations of this new operation, handle the errors themselves, together with the tunnel lookup. Without a socket, we can't be sure which encapsulation error handler is the appropriate one: encapsulation handlers (the ones for FoU and GUE introduced in the next patch, e.g.) will need to check the new error codes returned by protocol handlers to figure out if errors match the given encapsulation, and, in turn, report this error back, so that we can try all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match. v2: - Name all arguments in err_handler prototypes (David Miller) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: Convert protocol error handlers from void to intStefano Brivio
We'll need this to handle ICMP errors for tunnels without a sending socket (i.e. FoU and GUE). There, we might have to look up different types of IP tunnels, registered as network protocols, before we get a match, so we want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both inet_protos and inet6_protos. These error codes will be used in the next patch. For consistency, return sensible error codes in protocol error handlers whenever handlers can't handle errors because, even if valid, they don't match a protocol or any of its states. This has no effect on existing error handling paths. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08geneve: Allow configuration of DF behaviourStefano Brivio
draft-ietf-nvo3-geneve-08 says: It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], [RFC1981]) be used by setting the DF bit in the IP header when Geneve packets are transmitted over IPv4 (this is the default with IPv6). Now that ICMP error handling is working for GENEVE, we can comply with this recommendation. Make this configurable, though, to avoid breaking existing setups. By default, DF won't be set. It can be set or inherited from inner IPv4 packets. If it's configured to be inherited and we are encapsulating IPv6, it will be set. This only applies to non-lwt tunnels: if an external control plane is used, tunnel key will still control the DF flag. v2: - DF behaviour configuration only applies for non-lwt tunnels, apply DF setting only if (!geneve->collect_md) in geneve_xmit_skb() (Stephen Hemminger) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08vxlan: Allow configuration of DF behaviourStefano Brivio
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its value from the IPv4 inner header. If the encapsulated protocol is IPv6 and DF is configured to be inherited, always set it. For IPv4, inheriting DF from the inner header was probably intended from the very beginning judging by the comment to vxlan_xmit(), but it wasn't actually implemented -- also because it would have done more harm than good, without handling for ICMP Fragmentation Needed messages. According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe PMTUD implementation, says that "is a MUST that Vxlan gateways [...] SHOULD set the DF-bit [...]", whatever that means. Given this background, the only sane option is probably to let the user decide, and keep the current behaviour as default. This only applies to non-lwt tunnels: if an external control plane is used, tunnel key will still control the DF flag. v2: - DF behaviour configuration only applies for non-lwt tunnels, move DF setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08udp: Handle ICMP errors for tunnels with same destination port on both endpointsStefano Brivio
For both IPv4 and IPv6, if we can't match errors to a socket, try tunnels before ignoring them. Look up a socket with the original source and destination ports as found in the UDP packet inside the ICMP payload, this will work for tunnels that force the same destination port for both endpoints, i.e. VXLAN and GENEVE. Actually, lwtunnels could break this assumption if they are configured by an external control plane to have different destination ports on the endpoints: in this case, we won't be able to trace ICMP messages back to them. For IPv6 redirect messages, call ip6_redirect() directly with the output interface argument set to the interface we received the packet from (as it's the very interface we should build the exception on), otherwise the new nexthop will be rejected. There's no such need for IPv4. Tunnels can now export an encap_err_lookup() operation that indicates a match. Pass the packet to the lookup function, and if the tunnel driver reports a matching association, continue with regular ICMP error handling. v2: - Added newline between network and transport header sets in __udp{4,6}_lib_err_encap() (David Miller) - Removed redundant skb_reset_network_header(skb); in __udp4_lib_err_encap() - Removed redundant reassignment of iph in __udp4_lib_err_encap() (Sabrina Dubroca) - Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this won't work with lwtunnels configured to use asymmetric ports. By the way, it's VXLAN, not VxLAN (Jiri Benc) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: sched: add an offload graft helperJakub Kicinski
Qdisc graft operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: sched: add an offload dump helperJakub Kicinski
Qdisc dump operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: remove state PHY_ANHeiner Kallweit
After the recent changes in the state machine state PHY_AN isn't used any longer and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08mlx5: remove support for ib_get_vector_affinitySagi Grimberg
Devices that does not use managed affinity can not export a vector affinity as the consumer relies on having a static mapping it can map to upper layer affinity (e.g. sw queues). If the driver allows the user to set the device irq affinity, then the affinitization of a long term existing entites is not relevant. For example, nvme-rdma controllers queue-irq affinitization is determined at init time so if the irq affinity changes over time, we are no longer aligned. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08cpuset: Expose cpuset.cpus.subpartitions with cgroup_debugWaiman Long
For debugging purpose, it will be useful to expose the content of the subparts_cpus as a read-only file to see if the code work correctly. However, subparts_cpus will not be used at all in most use cases. So adding a new cpuset file that clutters the cgroup directory may not be desirable. This is now being done by using the hidden "cgroup_debug" kernel command line option to expose a new "cpuset.cpus.subpartitions" file. That option was originally used by the debug controller to expose itself when configured into the kernel. This is now extended to set an internal flag used by cgroup_addrm_files(). A new CFTYPE_DEBUG flag can now be used to specify that a cgroup file should only be created when the "cgroup_debug" option is specified. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-11-08ACPICA: Update version to 20181031Bob Moore
Version 20181031. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-08ACPICA: iASL: adding definition and disassembly for TPM2 revision 3Erik Schmauss
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-08blk-mq: provide a helper to check if a queue is busyJens Axboe
Returns true if the queue currently has requests pending, false if not. DM can use this to replace the atomic_inc/dec they do per device to see if a device is busy. Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08blk-mq-tag: change busy_iter_fn to return whether to continue or notJens Axboe
We have this functionality in sbitmap, but we don't export it in blk-mq for users of the tags busy iteration. This can be useful for stopping the iteration, if the caller doesn't need to find more requests. Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08libceph: assume argonaut on the server sideIlya Dryomov
No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2018-11-08regulator: core: Add new max_uV_step constraintDmitry Osipenko
On NVIDIA Tegra30 there is a requirement for regulator "A" to have voltage higher than voltage of regulator "B" by N microvolts, the N value changes depending on the voltage of regulator "B". This is similar to min-spread between voltages of regulators, the difference is that the spread value isn't fixed. This means that extra carefulness is required for regulator "A" to drop its voltage without violating the requirement, hence its voltage should be changed in steps so that its couple "B" could follow (there is also max-spread requirement). Add new "max_uV_step" constraint that breaks voltage change into several steps, each step is limited by the max_uV_step value. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-11-08regulator: core: Limit regulators coupling to a single coupleDmitry Osipenko
Device tree binding was changed in a way that now max-spread values must be defied per regulator pair. Limit number of pairs in order to adapt to the new binding without changing regulators code. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-11-08Merge tag 'compiler-attributes-for-linus-v4.20-rc2' of ↵Linus Torvalds
https://github.com/ojeda/linux Pull compiler attribute fixlets from Miguel Ojeda: "Small improvements to Compiler Attributes: - Define asm_volatile_goto for non-gcc compilers (Nick Desaulniers) - Improve the explanation of compiler_attributes.h" * tag 'compiler-attributes-for-linus-v4.20-rc2' of https://github.com/ojeda/linux: Compiler Attributes: improve explanation of header include/linux/compiler*.h: define asm_volatile_goto
2018-11-08Merge tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Boris Brezillon: "MTD changes: - Kill a VLA in sa1100 SPI NOR changes: - Make sure ->addr_width is restored when SFDP parsing fails - Propate errors happening in cqspi_direct_read_execute() NAND changes: - Fix kernel-doc mismatch - Fix nanddev_neraseblocks() to return the correct value - Avoid selection of BCH_CONST_PARAMS when some users require dynamic BCH settings" * tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtd: mtd: nand: Fix nanddev_pos_next_page() kernel-doc header mtd: sa1100: avoid VLA in sa1100_setup_mtd mtd: spi-nor: Reset nor->addr_width when SFDP parsing failed mtd: spi-nor: cadence-quadspi: Return error code in cqspi_direct_read_execute() mtd: nand: Fix nanddev_neraseblocks() mtd: nand: drop kernel-doc notation for a deleted function parameter mtd: docg3: don't set conflicting BCH_CONST_PARAMS option
2018-11-08soc/tegra: bpmp: Update ABI headerTimo Alho
Update the firmware header file to a more recent version. The major changes in the new version are: * add a new MRQ for firmware version query ABI and deprecates the old * add ABI to query Tegra194 CPU frequency limits * add ABI to control subset of PCIE UPHY state The new header contains also some editorial changes to the documentation. Signed-off-by: Timo Alho <talho@nvidia.com> Acked-by: Sivaram Nair <sivaramn@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2018-11-08firmware: tegra: Add helper to check for supported MRQsTimo Alho
Add a helper function to check that firmware is supporting a given MRQ command. Signed-off-by: Timo Alho <talho@nvidia.com> Acked-by: Sivaram Nair <sivaramn@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2018-11-08Compiler Attributes: improve explanation of headerMiguel Ojeda
Explain better what "optional" attributes are, and avoid calling them so to avoid confusion. Simply retain "Optional" as a word to look for in the comments. Moreover, add a couple sentences to explain a bit more the intention and the documentation links. Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-11-08drm/syncobj: disable the timeline UAPI for now v2Christian König
Until we have sorted out all problems. v2: return -EINVAL during create if flag is set. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/260937/
2018-11-07net: add netif_is_geneve()John Hurley
Add a helper function to determine if the type of a netdev is geneve based on its rtnl_link_ops. This allows drivers that may wish to offload tunnels to check the underlying type of the device. A recent patch added a similar helper to vxlan.h Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: remove unused #define HAVE_VLAN_GET_TAGMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: include the shift in skb_vlan_tag_get_prio()Michał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: introduce __vlan_hwaccel_copy_tag() helperMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: introduce __vlan_hwaccel_clear_tag() helperMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: phy: make phy_trigger_machine staticHeiner Kallweit
phy_trigger_machine() is used in phy.c only, so we can make it static. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: phy: bcm7xxx: Add entry for BCM7255Justin Chen
Add support for BCM7255 EPHY. Signed-off-by: Justin Chen <justinpopo6@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: cope with UDP GRO packet misdirectionPaolo Abeni
In some scenarios, the GRO engine can assemble an UDP GRO packet that ultimately lands on a non GRO-enabled socket. This patch tries to address the issue explicitly checking for the UDP socket features before enqueuing the packet, and eventually segmenting the unexpected GRO packet, as needed. We must also cope with re-insertion requests: after segmentation the UDP code calls the helper introduced by the previous patches, as needed. Segmentation is performed by a common helper, which takes care of updating socket and protocol stats is case of failure. rfc v3 -> v1 - fix compile issues with rxrpc - when gso_segment returns NULL, treat is as an error - added 'ipv4' argument to udp_rcv_segment() rfc v2 -> rfc v3 - moved udp_rcv_segment() into net/udp.h, account errors to socket and ns, always return NULL or segs list Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: factor out protocol delivery helperPaolo Abeni
So that we can re-use it at the UDP level in the next patch rfc v3 -> v1: - add the helper declaration into the ipv6 header Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ip: factor out protocol delivery helperPaolo Abeni
So that we can re-use it at the UDP level in a later patch rfc v3 -> v1 - add the helper declaration into the ip header Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: add support for UDP_GRO cmsgPaolo Abeni
When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress datagram size. User-space can use such info to compute the original packets layout. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement GRO for plain UDP sockets.Paolo Abeni
This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also eligible for GRO in the rx path: UDP segments directed to such socket are assembled into a larger GSO_UDP_L4 packet. The core UDP GRO support is enabled with setsockopt(UDP_GRO). Initial benchmark numbers: Before: udp rx: 1079 MB/s 769065 calls/s After: udp rx: 1466 MB/s 24877 calls/s This change introduces a side effect in respect to UDP tunnels: after a UDP tunnel creation, now the kernel performs a lookup per ingress UDP packet, while before such lookup happened only if the ingress packet carried a valid internal header csum. rfc v2 -> rfc v3: - fixed typos in macro name and comments - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1 - acquire socket lock in UDP_GRO setsockopt rfc v1 -> rfc v2: - use a new option to enable UDP GRO - use static keys to protect the UDP GRO socket lookup Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement complete book-keeping for encap_neededPaolo Abeni
The *encap_needed static keys are enabled by UDP tunnels and several UDP encapsulations type, but they are never turned off. This can cause unneeded overall performance degradation for systems where such features are used transiently. This patch introduces complete book-keeping for such keys, decreasing the usage at socket destruction time, if needed, and avoiding that the same socket could increase the key usage multiple times. rfc v3 -> v1: - add socket lock around udp_tunnel_encap_enable() rfc v2 -> rfc v3: - use udp_tunnel_encap_enable() in setsockopt() Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: fix raw socket lookup device bind matching with VRFsDuncan Eastoe
When there exist a pair of raw sockets one unbound and one bound to a VRF but equal in all other respects, when a packet is received in the VRF context, __raw_v4_lookup() matches on both sockets. This results in the packet being delivered over both sockets, instead of only the raw socket bound to the VRF. The bound device checks in __raw_v4_lookup() are replaced with a call to raw_sk_bound_dev_eq() which correctly handles whether the packet should be delivered over the unbound socket in such cases. In __raw_v6_lookup() the match on the device binding of the socket is similarly updated to use raw_sk_bound_dev_eq() which matches the handling in __raw_v4_lookup(). Importantly raw_sk_bound_dev_eq() takes the raw_l3mdev_accept sysctl into account. Signed-off-by: Duncan Eastoe <deastoe@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFsMike Manning
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept for datagram sockets. Have this default to enabled for reasons of backwards compatibility. This is so as to specify the output device with cmsg and IP_PKTINFO, but using a socket not bound to the corresponding VRF. This allows e.g. older ping implementations to be run with specifying the device but without executing it in the VRF. If the option is disabled, packets received in a VRF context are only handled by a raw socket bound to the VRF, and correspondingly packets in the default VRF are only handled by a socket not bound to any VRF. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound datagram socket to be chosen when not in a VRFMike Manning
Ensure an unbound datagram skt is chosen when not in a VRF. The check for a device match in compute_score() for UDP must be performed when there is no device match. For this, a failure is returned when there is no device match. This ensures that bound sockets are never selected, even if there is no unbound socket. Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These packets are currently blocked, as flowi6_oif was set to that of the master vrf device, and the ipi6_ifindex is that of the slave device. Allow these packets to be sent by checking the device with ipi6_ifindex has the same L3 scope as that of the bound device of the skt, which is the master vrf device. Note that this check always succeeds if the skt is unbound. Even though the right datagram skt is now selected by compute_score(), a different skt is being returned that is bound to the wrong vrf. The difference between these and stream sockets is the handling of the skt option for SO_REUSEPORT. While the handling when adding a skt for reuse correctly checks that the bound device of the skt is a match, the skts in the hashslot are already incorrect. So for the same hash, a skt for the wrong vrf may be selected for the required port. The root cause is that the skt is immediately placed into a slot when it is created, but when the skt is then bound using SO_BINDTODEVICE, it remains in the same slot. The solution is to move the skt to the correct slot by forcing a rehash. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound stream socket to be chosen when not in a VRFMike Manning
The commit a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") only ensures that the correct socket is selected for packets in a VRF. However, there is no guarantee that the unbound socket will be selected for packets when not in a VRF. By checking for a device match in compute_score() also for the case when there is no bound device and attaching a score to this, the unbound socket is selected. And if a failure is returned when there is no device match, this ensures that bound sockets are never selected, even if there is no unbound socket. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: allow binding socket in a VRF when there's an unbound socketRobert Shearman
Change the inet socket lookup to avoid packets arriving on a device enslaved to an l3mdev from matching unbound sockets by removing the wildcard for non sk_bound_dev_if and instead relying on check against the secondary device index, which will be 0 when the input device is not enslaved to an l3mdev and so match against an unbound socket and not match when the input device is enslaved. Change the socket binding to take the l3mdev into account to allow an unbound socket to not conflict sockets bound to an l3mdev given the datapath isolation now guaranteed. Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07EDAC, skx: Fix randconfig builds in a better wayLuck, Tony
It was previously noted that Kconfig complained about unmet dependencies when trying to configure skx_edac together with CONFIG_ACPI=n. First fix for this checked for ACPI when doing select ACPI_ADXL but this required stub functions for the case where ACPI wasn't selected. It also allowed building a driver that didn't actually work for a system that has non-volatile DIMMs. Arnd Bergmann pointed out that the right fix is to make EDAC_SKX "depend on ACPI". Fixes: a324e9396ca3 ("EDAC, skx: Fix randconfig builds") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> CC: Arnd Bergmann <arnd@arndb.de> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: linux-edac <linux-edac@vger.kernel.org> CC: qiuxu.zhuo@intel.com Link: http://lkml.kernel.org/r/20181106183914.GA26731@agluck-desk
2018-11-07nvme: add separate poll queue mapJens Axboe
Adds support for defining a variable number of poll queues, currently configurable with the 'poll_queues' module parameter. Defaults to a single poll queue. And now we finally have poll support without triggering interrupts! Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: add REQ_HIPRI and inherit it from IOCB_HIPRIJens Axboe
We use IOCB_HIPRI to poll for IO in the caller instead of scheduling. This information is not available for (or after) IO submission. The driver may make different queue choices based on the type of IO, so make the fact that we will poll for this IO known to the lower layers as well. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07blk-mq: initial support for multiple queue mapsJens Axboe
Add a queue offset to the tag map. This enables users to map iteratively, for each queue map type they support. Bump maximum number of supported maps to 2, we're now fully able to support more than 1 map. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07blk-mq: cache request hardware queue mappingJens Axboe
We call blk_mq_map_queue() a lot, at least two times for each request per IO, sometimes more. Since we now have an indirect call as well in that function. cache the mapping so we don't have to re-call blk_mq_map_queue() for the same request multiple times. Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07blk-mq: support multiple hctx mapsJens Axboe
Add support for the tag set carrying multiple queue maps, and for the driver to inform blk-mq how many it wishes to support through setting set->nr_maps. This adds an mq_ops helper for drivers that support more than 1 map, mq_ops->rq_flags_to_type(). The function takes request/bio flags and CPU, and returns a queue map index for that. We then use the type information in blk_mq_map_queue() to index the map set. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>