summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-12ALSA: hda/ca0132: Add Recon3Di quirk for Gigabyte G1.Sniper Z97Alastair Bridgewater
These motherboards have Sound Core3D and apparently "support" Recon3Di. Added to the quirk list as QUIRK_R3DI. Issue report, PCI Subsystem ID, and testing by a contributor on IRC who wished to remain anonymous. Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com> Reviewed-by: Connor McAdams <conmanx360@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-12Merge tag 'gvt-fixes-2018-07-11' of https://github.com/intel/gvt-linux into ↵Rodrigo Vivi
drm-intel-fixes gvt-fixes-2018-07-11 - Fix KBL virtual register update from LRI for GPU hang (Henry) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180711024056.GV1267@zhen-hp.sh.intel.com
2018-07-12Merge branch 'be2net-small-structures-clean-up'David S. Miller
Ivan Vecera says: ==================== be2net: small structures clean-up The series: - removes unused / unneccessary fields in several be2net structures - re-order fields in some structures to eliminate holes, cache-lines crosses - as result reduces size of main struct be_adapter by 4kB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: move rss_flags field in rss_info to ensure proper alignmentIvan Vecera
The current position of .rss_flags field in struct rss_info causes that fields .rsstable and .rssqueue (both 128 bytes long) crosses cache-line boundaries. Moving it at the end properly align all fields. Before patch: struct rss_info { u64 rss_flags; /* 0 8 */ u8 rsstable[128]; /* 8 128 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ u8 rss_queue[128]; /* 136 128 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ u8 rss_hkey[40]; /* 264 40 */ }; After patch: struct rss_info { u8 rsstable[128]; /* 0 128 */ /* --- cacheline 2 boundary (128 bytes) --- */ u8 rss_queue[128]; /* 128 128 */ /* --- cacheline 4 boundary (256 bytes) --- */ u8 rss_hkey[40]; /* 256 40 */ u64 rss_flags; /* 296 8 */ }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: re-order fields in be_error_recovert to avoid holeIvan Vecera
- Unionize two u8 fields where only one of them is used depending on NIC chipset. - Move recovery_supported field after that union These changes eliminate 7-bytes hole in the struct and makes it smaller by 8 bytes. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: remove unused tx_jiffies field from be_tx_statsIvan Vecera
Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: move txcp field in be_tx_obj to eliminate holes in the structIvan Vecera
Before patch: struct be_tx_obj { u32 db_offset; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct be_queue_info q; /* 8 56 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct be_queue_info cq; /* 64 56 */ struct be_tx_compl_info txcp; /* 120 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ struct sk_buff * sent_skb_list[2048]; /* 128 16384 */ ... }: After patch: struct be_tx_obj { u32 db_offset; /* 0 4 */ struct be_tx_compl_info txcp; /* 4 4 */ struct be_queue_info q; /* 8 56 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct be_queue_info cq; /* 64 56 */ struct sk_buff * sent_skb_list[2048]; /* 120 16384 */ ... }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: reorder fields in be_eq_obj structureIvan Vecera
Re-order fields in struct be_eq_obj to ensure that .napi field begins at start of cache-line. Also the .adapter field is moved to the first cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr) and the 4-bytes hole to 3rd cache-line. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: remove desc field from be_eq_objIvan Vecera
The event queue description (be_eq_obj.desc) field is used only to format string for IRQ name and it is not really needed to hold this value. Remove it and use local variable to format string for IRQ name. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: remove unused old custom busy-poll fieldsIvan Vecera
The commit fb6113e688e0 ("be2net: get rid of custom busy poll code") replaced custom busy-poll code by the generic one but left several macros and fields in struct be_eq_obj that are currently unused. Remove this stuff. Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12be2net: remove unused old AIC infoIvan Vecera
The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing") introduced a separate struct be_aic_obj to hold AIC information but unfortunately left the old stuff in be_eq_obj. So remove it. Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12qed: fix spelling mistake "successffuly" -> "successfully"Ewan D. Milne
Trivial fix to spelling mistake in qed_probe message. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net: ethernet: ti: cpts: break cycle once late ts is matchedIvan Khoronzhuk
The late ts queue can contain a bunch of skbs while hi rate testing, no need to check all of them if timestamp is already matched. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11selftests: forwarding: mirror_gre_nh: Unset rp_filter on host VRFPetr Machata
The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP address from 192.0.2.128/28 network. However the interface is configured as a member of 192.0.2.160/28 and there's no route directing traffic from the former network through that interface. Correspondingly, the RP filter on the VRF rejects it. Therefore turn off the VRF's RP filter. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12nvme-pci: fix memory leak on probe failureKeith Busch
The nvme driver specific structures need to be initialized prior to enabling the generic controller so we can unwind on failure with out using the reference counting callbacks so that 'probe' and 'remove' can be symmetric. The newly added iod_mempool is the only resource that was being allocated out of order, and a failure there would leak the generic controller memory. This patch just moves that allocation above the controller initialization. Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Reported-by: Weiping Zhang <zwp10758@gmail.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-11sfp: fix module initialisation with netdev already upRussell King
It was been observed that with a particular order of initialisation, the netdev can be up, but the SFP module still has its TX_DISABLE signal asserted. This occurs when the network device brought up before the SFP kernel module has been inserted by userspace. This occurs because sfp-bus layer does not hear about the change in network device state, and so assumes that it is still down. Set netdev->sfp when the upstream is registered to work around this problem. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11sfp: ensure we clean up properly on bus registration failureRussell King
We fail to correctly clean up after a bus registration failure, which can lead to an incorrect assumption about the registration state of the upstream or sfp cage. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Merge branch 'mlxsw-ERSPAN-Take-LACP-state-into-consideration'David S. Miller
Ido Schimmel says: ==================== mlxsw: ERSPAN: Take LACP state into consideration Petr says: When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. This patch set aims to address these problems. Patch #1 publishes team_port_get_rcu(). Then in patch #2, a new function is introduced, mlxsw_sp_port_dev_check(). That returns, for a given netdevice that is a slave of a LAG device, whether that device is "txable", i.e. whether the LAG master would send traffic through it. Since there's no good place to put LAG-wide helpers, introduce a new header include/net/lag.h. Finally in patch #3, fix the slave selection logic to take into consideration whether a given slave has a carrier and whether it is txable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11mlxsw: spectrum_span: Change LAG lower selectionPetr Machata
When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. So instead of checking upness of the device, check carrier state and txability. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: Add lag.h, net_lag_port_dev_txable()Petr Machata
LAG devices (team or bond) recognize for each one of their slave devices whether LAG traffic is going to be sent through that device. Bond calls such devices "active", team calls them "txable". When this state changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together with a netdev_notifier_changelowerstate_info structure that for LAG devices includes a tx_enabled flag that refers to the new state. The notification thus makes it possible to react to the changes in txability in drivers. However there's no way to query txability from the outside on demand. That is problematic namely for mlxsw, which when resolving ERSPAN packet path, may encounter a LAG device, and needs to determine which of the slaves it should choose. To that end, introduce a new function, net_lag_port_dev_txable(), which determines whether a given slave device is "active" or "txable" (depending on the flavor of the LAG device). That function then dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp. team_port_dev_txable(). Because there currently is no good place where net_lag_port_dev_txable() should be added, introduce a new header file, lag.h, which should from now on hold any logic common to both team and bond. (But keep netif_is_lag_master() together with the rest of netif_is_*_master() functions). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11team: Publish team_port_get_rcu()Petr Machata
A follow-up patch adds a new entry point, team_port_dev_txable(). Making it an ordinary exported function would mean that any module that may need the service in one of the supported configurations also unconditionally needs to pull in the team module, whether or not the user actually intends to create team interfaces. To prevent that, team_port_dev_txable() is defined in if_team.h, and therefore all dependencies of that function also need to be publicly-visible. Therefore move team_port_get_rcu() from team.c to if_team.h. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11macvlan: Change status when lower device goes downTravis Brown
Today macvlan ignores the notification when a lower device goes administratively down, preventing the lack of connectivity from bubbling up. Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN with NO-CARRIER which should be easy to interpret in userspace. 2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 Signed-off-by: Suresh Krishnan <skrishnan@arista.com> Signed-off-by: Travis Brown <travisb@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Merge branch 'tipc-make-link-protocol-more-resilient'David S. Miller
Jon Maloy says: ==================== tipc: make link protocol more resilient These two commits make the link ptotocol more resilient to infrastructures with frequent packet duplication and long delays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11tipc: check session number before accepting link protocol messagesJon Maloy
In some virtual environments we observe a significant higher number of packet reordering and delays than we have been used to traditionally. This makes it necessary with stricter checks on incoming link protocol messages' session number, which until now only has been validated for RESET messages. Since the other two message types, ACTIVATE and STATE messages also carry this number, it is easy to extend the validation check to those messages. We also introduce a flag indicating if a link has a valid peer session number or not. This eliminates the mixing of 32- and 16-bit arithmethics we are currently using to achieve this. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11tipc: add sequence number check for link STATE messagesJon Maloy
Some switch infrastructures produce huge amounts of packet duplicates. This becomes a problem if those messages are STATE/NACK protocol messages, causing unnecessary retransmissions of already accepted packets. We now introduce a unique sequence number per STATE protocol message so that duplicates can be identified and ignored. This will also be useful when tracing such cases, and to avert replay attacks when TIPC is encrypted. For compatibility reasons we have to introduce a new capability flag TIPC_LINK_PROTO_SEQNO to handle this new feature. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Merge branch '10GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09 This patch series is meant to allow support for the L2 forward offload, aka MACVLAN offload without the need for using ndo_select_queue. The existing solution currently requires that we use ndo_select_queue in the transmit path if we want to associate specific Tx queues with a given MACVLAN interface. In order to get away from this we need to repurpose the tc_to_txq array and XPS pointer for the MACVLAN interface and use those as a means of accessing the queues on the lower device. As a result we cannot offload a device that is configured as multiqueue, however it doesn't really make sense to configure a macvlan interfaced as being multiqueue anyway since it doesn't really have a qdisc of its own in the first place. The big changes in this set are: Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN Disable XPS for single queue devices Replace accel_priv with sb_dev in ndo_select_queue Add sb_dev parameter to fallback function for ndo_select_queue Consolidated ndo_select_queue functions that appeared to be duplicates ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11tcp: expose both send and receive intervals for rate sampleDeepti Raghavan
Congestion control algorithms, which access the rate sample through the tcp_cong_control function, only have access to the maximum of the send and receive interval, for cases where the acknowledgment rate may be inaccurate due to ACK compression or decimation. Algorithms may want to use send rates and receive rates as separate signals. Signed-off-by: Deepti Raghavan <deeptir@mit.edu> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: sched: fix unprotected access to rcu cookie pointerVlad Buslov
Fix action attribute size calculation function to take rcu read lock and access act_cookie pointer with rcu dereference. Fixes: eec94fdb0480 ("net: sched: use rcu for action cookie update") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Merge branch 'cxgb4-move-stats-fetched-from-firmware-to-debugfs'David S. Miller
Rahul Lakkireddy says: ==================== cxgb4: move stats fetched from firmware to debugfs Some stats are fetched via slow firmware mailbox, which can cause packet drops under heavy load. So, this series removes these stats from ethtool -S and expose them via debugfs. Patch 1 removes stats fetched via firmware from ethtool -S. Patch 2 exposes stats removed in Patch 1 via debugfs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11cxgb4: expose stats fetched from firmware via debugfsRahul Lakkireddy
Expose stats obtained from firmware via debugfs. These stats can't be part of ethtool -S because the slow firmware mailbox can cause packet drops under heavy load. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11cxgb4: remove stats fetched from firmwareRahul Lakkireddy
When running ethtool -S, some stats are requested from firmware. Since getting these stats via firmware mailbox is slow, some packets get dropped under heavy load while running ethtool -S. So, remove these stats from ethtool -S. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: mvpp2: explicitly include linux/interrupt.hAntoine Tenart
The Marvell PPv2 driver uses interrupts and tasklet but does not explicitly include linux/interrupt.h, relying on implicit includes. This one particularly is included by chance after a long unlogical chain of inclusions. Fix this so we do not get future build breaks. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11cnic: use kvzalloc to allocate memory for csk_tblJan Dakinevich
Size of csk_tbl is about 58K, which means 3rd order page allocation. kvzalloc provides a fallback if no high order memory is available. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11wimax/i2400m: remove redundant variables ack_status, bcf and protocolColin Ian King
Variables ack_status, bcf and protocol are being assigned but are never used hence they are redundant and can be removed. Also declare ack_type as unsigned int rather than unsigned to clean up a checkpatch warning. Cleans up clang warnings: warning: variable 'ack_status' set but not used [-Wunused-but-set-variable] warning: variable 'bcf' set but not used [-Wunused-but-set-variable] warning: variable 'protocol' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: sched: act_ife: fix memory leak in ife initVlad Buslov
Free params if tcf_idr_check_alloc() returned error. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11cxgb4: specify IQTYPE in fw_iq_cmdArjun Vynipadath
congestion argument passed to t4_sge_alloc_rxq() is used to differentiate between nic/ofld queues. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Merge branch 'net-ipv6-addr_gen_mode-fixes'David S. Miller
Sabrina Dubroca says: ==================== net/ipv6: addr_gen_mode fixes This series fixes bugs in handling of the addr_gen_mode option, mainly related to the sysctl. A minor netlink issue was also present in the initial commit introducing the option on a per-netdevice basis. v2: add patch 4, requested by David Ahern during review of v1 add patch 5, missing documentation for the sysctl patches 1, 2, 3 are unchanged ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11Documentation: ip-sysctl.txt: document addr_gen_modeSabrina Dubroca
addr_gen_mode was introduced in without documentation, add it now. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devicesSabrina Dubroca
This aligns the addr_gen_mode sysctl with the expected behavior of the "all" variant. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODESabrina Dubroca
inet6_ifla6_size() is called to check how much space is needed by inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it. Fixes: bc91b0f07ada ("ipv6: addrconf: implement address generation modes") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_devSabrina Dubroca
The value has already been copied from this netns's devconf_dflt, it shouldn't be reset to the global kernel default. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/ipv6: fix addrconf_sysctl_addr_gen_modeSabrina Dubroca
addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores the errors returned by proc_dointvec(). addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which writes the value to memory, and then checks if it's valid and may return EINVAL. If a bad value is given, the value displayed when reading net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the value provided by the user was valid, addrconf_dev_config() won't be called since idev->cnf.addr_gen_mode has already been updated. Fix this in the usual way we deal with values that need to be checked after the proc_do*() helper has returned: define a local ctl_table and storage, call proc_dointvec() on that temporary area, then check and store. addrconf_sysctl_addr_gen_mode() also writes the new value to the global ipv6_devconf_dflt, when we're writing to some netns's default, so that new netns will inherit the value that was set by the change occuring in any netns. That doesn't make any sense, so let's drop this assignment. Finally, since addr_gen_mode is a __u32, switch to proc_douintvec(). Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/sched: flower: Fix null pointer dereference when run tc vlan commandJianbo Liu
Zahari issued tc vlan command without setting vlan_ethtype, which will crash kernel. To avoid this, we must check tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE] is not null before use it. Also we don't need to dump vlan_ethtype or cvlan_ethtype in this case. Fixes: d64efd0926ba ('net/sched: flower: Add supprt for matching on QinQ vlan headers') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Zahari Doychev <zahari.doychev@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11bpf: fix panic due to oob in bpf_prog_test_run_skbDaniel Borkmann
sykzaller triggered several panics similar to the below: [...] [ 248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90 [ 248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425 [...] [ 248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13 [ 248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018 [ 248.865905] Call Trace: [ 248.865910] dump_stack+0xd6/0x185 [ 248.865911] ? show_regs_print_info+0xb/0xb [ 248.865913] ? printk+0x9c/0xc3 [ 248.865915] ? kmsg_dump_rewind_nolock+0xe4/0xe4 [ 248.865919] print_address_description+0x6f/0x270 [ 248.865920] kasan_report+0x25b/0x380 [ 248.865922] ? _copy_to_user+0x5c/0x90 [ 248.865924] check_memory_region+0x137/0x190 [ 248.865925] kasan_check_read+0x11/0x20 [ 248.865927] _copy_to_user+0x5c/0x90 [ 248.865930] bpf_test_finish.isra.8+0x4f/0xc0 [ 248.865932] bpf_prog_test_run_skb+0x6a0/0xba0 [...] After scrubbing the BPF prog a bit from the noise, turns out it called bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb() and the same skb thus keeps changing until the pskb_expand_head() called from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM. So upon return we'll basically have 0 headroom left yet blindly do the __skb_push() of 14 bytes and keep copying data from there in bpf_test_finish() out of bounds. Fix to check if we have enough headroom and if pskb_expand_head() fails, bail out with error. Another bug independent of this fix (but related in triggering above) is that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to it's original state from input as otherwise repeating the same test in a loop won't work for benchmarking when underlying input buffer is getting changed by the prog each time and reused for the next run leading to unexpected results. Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") Reported-by: syzbot+709412e651e55ed96498@syzkaller.appspotmail.com Reported-by: syzbot+54f39d6ab58f39720a55@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-07-11ARM: 8780/1: ftrace: Only set kernel memory back to read-only after bootSteven Rostedt (VMware)
Dynamic ftrace requires modifying the code segments that are usually set to read-only. To do this, a per arch function is called both before and after the ftrace modifications are performed. The "before" function will set kernel code text to read-write to allow for ftrace to make the modifications, and the "after" function will set the kernel code text back to "read-only" to keep the kernel code text protected. The issue happens when dynamic ftrace is tested at boot up. The test is done before the kernel code text has been set to read-only. But the "before" and "after" calls are still performed. The "after" call will change the kernel code text to read-only prematurely, and other boot code that expects this code to be read-write will fail. The solution is to add a variable that is set when the kernel code text is expected to be converted to read-only, and make the ftrace "before" and "after" calls do nothing if that variable is not yet set. This is similar to the x86 solution from commit 162396309745 ("ftrace, x86: make kernel text writable only for conversions"). Link: http://lkml.kernel.org/r/20180620212906.24b7b66e@vmware.local.home Reported-by: Stefan Agner <stefan@agner.ch> Tested-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-11bpf: btf: Fix bitfield extraction for big endianOkash Khawaja
When extracting bitfield from a number, btf_int_bits_seq_show() builds a mask and accesses least significant byte of the number in a way specific to little-endian. This patch fixes that by checking endianness of the machine and then shifting left and right the unneeded bits. Thanks to Martin Lau for the help in navigating potential pitfalls when dealing with endianess and for the final solution. Fixes: b00b8daec828 ("bpf: btf: Add pretty print capability for data with BTF type info") Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-11bpf: fix availability probing for seg6 helpersMathieu Xhonneux
bpf_lwt_seg6_* helpers require CONFIG_IPV6_SEG6_BPF, and currently return -EOPNOTSUPP to indicate unavailability. This patch forces the BPF verifier to reject programs using these helpers when !CONFIG_IPV6_SEG6_BPF, allowing users to more easily probe if they are available or not. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-11RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error pathKamal Heib
Fix memory leak in the error path of mlx5_ib_create_srq() by making sure to free the allocated srq. Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11Merge branch 'bpf-bpftool-improved-prog-load'Daniel Borkmann
Jakub Kicinski says: ==================== This series starts with two minor clean ups to test_offload.py selftest script. The next 11 patches extend the abilities of bpftool prog load beyond the simple cgroup use cases. Three new parameters are added: - type - allows specifying program type, independent of how code sections are named; - map - allows reusing existing maps, instead of creating a new map on every program load; - dev - offload/binding to a device. A number of changes to libbpf is required to accomplish the task. The section - program type logic mapping is exposed. We should probably aim to use the libbpf program section naming everywhere. For reuse of maps we need to allow users to set FD for bpf map object in libbpf. Examples Load program my_xdp.o and pin it as /sys/fs/bpf/my_xdp, for xdp program type: $ bpftool prog load my_xdp.o /sys/fs/bpf/my_xdp \ type xdp As above but for offload: $ bpftool prog load my_xdp.o /sys/fs/bpf/my_xdp \ type xdp \ dev netdevsim0 Load program my_maps.o, but for the first map reuse map id 17, and for the map called "other_map" reuse pinned map /sys/fs/bpf/map0: $ bpftool prog load my_maps.o /sys/fs/bpf/prog \ map idx 0 id 17 \ map name other_map pinned /sys/fs/bpf/map0 v3: - fix return codes in patch 5; - rename libbpf_prog_type_by_string() -> libbpf_prog_type_by_name(); - fold file path into xattr in patch 8; - add patch 10; - use dup3() in patch 12; - depend on fd value in patch 12; - close old fd in patch 12. v2: - add compat for reallocarray(). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-11tools: bpftool: allow reuse of maps with bpftool prog loadJakub Kicinski
Add map parameter to prog load which will allow reuse of existing maps instead of creating new ones. We need feature detection and compat code for reallocarray, since it's not available in many libc versions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>