summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-18arm64: futex: Restore oldval initialization to work around buggy compilersNathan Chancellor
Commit 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") removed oldval's zero initialization in arch_futex_atomic_op_inuser because it is not necessary. Unfortunately, Android's arm64 GCC 4.9.4 [1] does not agree: ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:73:0: ../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here int oldval, ret, tmp; ^ GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser returns right away, avoiding the uninitialized use that it claims. Restoring the zero initialization works around this issue. [1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/ Cc: stable@vger.kernel.org Fixes: 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-04-18ice: Calculate ITR increment based on direct calculationBrett Creeley
Currently when calculating how much to increment ITR by inside of ice_update_itr() we do some estimations and intermediate calculations. Instead of doing estimations, just do the calculation directly. This allows for a more accurate value and it makes it easier for the next person to understand and update. Also, remove the dividing the ITR value by 2 when latency driven because the ITR values are already so low for 100Gbps speed. This should help get to the desired ITR value faster. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Bump driver versionAnirudh Venkataramanan
Update driver version to 0.7.4 Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code to control FW LLDP and DCBXAnirudh Venkataramanan
This patch adds code to start or stop LLDP and DCBX in firmware through use of ethtool private flags. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code for DCB rebuildAnirudh Venkataramanan
This patch introduces a new function ice_dcb_rebuild which reinitializes DCB after a reset. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code to get DCB related statisticsAnirudh Venkataramanan
This patch adds a new function ice_update_dcb_stats to get DCB stats from the hardware and ethtool support for displaying these stats. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add priority information into VLAN headerAnirudh Venkataramanan
This patch introduces a new function ice_tx_prepare_vlan_flags_dcb to insert 802.1p priority information into the VLAN header Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Update rings based on TC informationAnirudh Venkataramanan
This patch adds a new function ice_vsi_cfg_dcb_rings which updates a VSI's rings based on DCB traffic class information. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code to process LLDP MIB change eventsAnirudh Venkataramanan
This patch adds support to process LLDP MIB change notifications sent by the firmware. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code for DCB initialization part 4/4Anirudh Venkataramanan
When the firmware doesn't support LLDP or DCBX, the driver should switch to "software LLDP mode". This patch adds support for the same. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code for DCB initialization part 3/4Anirudh Venkataramanan
This patch adds a new function ice_pf_dcb_cfg (and related helpers) which applies the DCB configuration obtained from the firmware. As part of this, VSIs/netdevs are updated with traffic class information. This patch requires a bit of a refactor of existing code. 1. For a MIB change event, the associated VSI is closed and brought up again. The gap between closing and opening the VSI can cause a race condition. Fix this by grabbing the rtnl_lock prior to closing the VSI and then only free it after re-opening the VSI during a MIB change event. 2. ice_sched_query_elem is used in ice_sched.c and with this patch, in ice_dcb.c as well. However, ice_dcb.c is not built when CONFIG_DCB is unset. This results in namespace warnings (ice_sched.o: Externally defined symbols with no external references) when CONFIG_DCB is unset. To avoid this move ice_sched_query_elem from ice_sched.c to ice_common.c. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code for DCB initialization part 2/4Anirudh Venkataramanan
This patch introduces a new top level function ice_init_dcb (and related lower level helper functions) which continues the DCB init flow. This function uses ice_get_dcb_cfg to get, parse and store the DCB configuration. Once this is done, it sets itself up to be notified by the firmware on LLDP MIB change events. Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Add code for DCB initialization part 1/4Anirudh Venkataramanan
This patch introduces a skeleton for ice_init_pf_dcb, the top level function for DCB initialization. Subsequent patches will add to this DCB init flow. In this patch, ice_init_pf_dcb checks if DCB is a supported capability. If so, an admin queue call to start the LLDP and DCBx in firmware is issued. If not, an error is reported. Note that we don't fail the driver init if DCB init fails. Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Bump versionAnirudh Venkataramanan
Bump driver version to 0.7.3 Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Fix incorrect use of abbreviationsAnirudh Venkataramanan
Capitalize abbreviations and spell out some that aren't obvious. Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18ice: Fix typos in code commentsAnirudh Venkataramanan
This patch fixes typos in code comments. Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18signal: use fdget() since we don't allow O_PATHChristian Brauner
As stated in the original commit for pidfd_send_signal() we don't allow to signal processes through O_PATH file descriptors since it is semantically equivalent to a write on the pidfd. We already correctly error out right now and return EBADF if an O_PATH fd is passed. This is because we use file->f_op to detect whether a pidfd is passed and O_PATH fds have their file->f_op set to empty_fops in do_dentry_open() and thus fail the test. Thus, there is no regression. It's just semantically correct to use fdget() and return an error right from there instead of taking a reference and returning an error later. Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jann@thejh.net> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-18Merge tag 's390-5.1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 bug fixes from Martin Schwidefsky: - Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED - Fix integer overflow in the dasd driver resulting in incorrect number of blocks for large devices - Fix a lockdep false positive in the 3270 driver - Fix a deadlock in the zcrypt driver - Fix incorrect debug feature entries in the pkey api - Fix inline assembly constraints fallout with CONFIG_KASAN=y * tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: correct some inline assembly constraints s390/pkey: add one more argument space for debug feature entry s390/zcrypt: fix possible deadlock situation on ap queue remove s390/3270: fix lockdep false positive on view->lock s390/dasd: Fix capacity calculation for large volumes s390/mem_detect: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD)
2019-04-18Merge tag 'afs-fixes-20190413' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Stop using the deprecated get_seconds(). - Don't make tracepoint strings const as the section they go in isn't read-only. - Differentiate failure due to unmarshalling from other failure cases. We shouldn't abort with RXGEN_CC/SS_UNMARSHAL if it's not due to unmarshalling. - Add a missing unlock_page(). - Fix the interaction between receiving a notification from a server that it has invalidated all outstanding callback promises and a client call that we're in the middle of making that will get a new promise. * tag 'afs-fixes-20190413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix in-progess ops to ignore server-level callback invalidation afs: Unlock pages for __pagevec_release() afs: Differentiate abort due to unmarshalling from other errors afs: Avoid section confusion in CM_NAME afs: avoid deprecated get_seconds()
2019-04-18Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a bug in the implementation of the x86 accelerated version of poly1305" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/poly1305 - fix overflow during partial reduction
2019-04-18Merge tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Since Easter is looming for me, I'm just pushing whatever is in my tree, I'll see what else turns up and maybe I'll send another pull early next week if there is anything. tegra: - stream id programming fix - avoid divide by 0 for bad hdmi audio setup code ttm: - Hugepages fix - refcount imbalance in error path fix amdgpu: - GPU VM fixes for Vega/RV - DC AUX fix for active DP-DVI dongles - DC fix for multihead regression" * tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drm: drm/tegra: hdmi: Setup audio only if configured drm/amd/display: If one stream full updates, full update all planes drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR gpu: host1x: Program stream ID to bypass without SMMU drm/amd/display: extending AUX SW Timeout drm/ttm: fix dma_fence refcount imbalance on error path drm/ttm: fix incrementing the page pointer for huge pages drm/ttm: fix start page for huge page check in ttm_put_pages() drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
2019-04-18rtlwifi: rtl8188ee: Remove extraneous fileLarry Finger
Somehow file drivers/net/wireless/realtek/rtlwifi/rtl8188ee/trx.c.rej was incorporated into the sources. Obviously, it can be removed. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-04-18timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()Chang-An Chen
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM / sleep: Make it possible to quiesce timers during suspend-to-idle") uses timekeeping_suspend() instead of syscore_suspend() during suspend-to-idle. As a consequence generic sched_clock will keep going because sched_clock_suspend() and sched_clock_resume() are not invoked during suspend-to-idle which can result in a generic sched_clock wrap. On a ARM system with suspend-to-idle enabled, sched_clock is registered as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which means the real wrapping duration is 8796093022202ns. [ 134.551779] suspend-to-idle suspend (timekeeping_suspend()) [ 1204.912239] suspend-to-idle resume (timekeeping_resume()) ...... [ 1206.912239] suspend-to-idle suspend (timekeeping_suspend()) [ 5880.502807] suspend-to-idle resume (timekeeping_resume()) ...... [ 6000.403724] suspend-to-idle suspend (timekeeping_suspend()) [ 8035.753167] suspend-to-idle resume (timekeeping_resume()) ...... [ 8795.786684] (2)[321:charger_thread]...... [ 8795.788387] (2)[321:charger_thread]...... [ 0.057226] (0)[0:swapper/0]...... [ 0.061447] (2)[0:swapper/2]...... sched_clock was not stopped during suspend-to-idle, and sched_clock_poll hrtimer was not expired because timekeeping_suspend() was invoked during suspend-to-idle. It makes sched_clock wrap at kernel time 8796s. To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in tick_freeze() together with timekeeping_suspend() and timekeeping_resume(). Fixes: 124cf9117c5f (PM / sleep: Make it possible to quiesce timers during suspend-to-idle) Signed-off-by: Chang-An Chen <chang-an.chen@mediatek.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Corey Minyard <cminyard@mvista.com> Cc: <linux-mediatek@lists.infradead.org> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: <kuohong.wang@mediatek.com> Cc: <freddy.hsin@mediatek.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1553828349-8914-1-git-send-email-chang-an.chen@mediatek.com
2019-04-18perf/x86/amd: Add event map for AMD Family 17hKim Phillips
Family 17h differs from prior families by: - Does not support an L2 cache miss event - It has re-enumerated PMC counters for: - L2 cache references - front & back end stalled cycles So we add a new amd_f17h_perfmon_event_map[] so that the generic perf event names will resolve to the correct h/w events on family 17h and above processors. Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2): https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") [ Improved the formatting a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18x86/mm/KASLR: Fix the size of the direct mapping sectionBaoquan He
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate the maximum amount of system RAM supported. The size of the direct mapping section is obtained from the smaller one of the below two values: (actual system RAM size + padding size) vs (max system RAM size supported) This calculation is wrong since commit b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52"). In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether the kernel is using 4-level or 5-level page tables. Thus, it will always use 4 PB as the maximum amount of system RAM, even in 4-level paging mode where it should actually be 64 TB. Thus, the size of the direct mapping section will always be the sum of the actual system RAM size plus the padding size. Even when the amount of system RAM is 64 TB, the following layout will still be used. Obviously KALSR will be weakened significantly. |____|_______actual RAM_______|_padding_|______the rest_______| 0 64TB ~120TB Instead, it should be like this: |____|_______actual RAM_______|_________the rest______________| 0 64TB ~120TB The size of padding region is controlled by CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default. The above issue only exists when CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value, which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise, using __PHYSICAL_MASK_SHIFT doesn't affect KASLR. Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS. [ bp: Massage commit message. ] Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Garnier <thgarnie@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: frank.ramsay@hpe.com Cc: herbert@gondor.apana.org.au Cc: kirill@shutemov.name Cc: mike.travis@hpe.com Cc: thgarnie@google.com Cc: x86-ml <x86@kernel.org> Cc: yamada.masahiro@socionext.com Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
2019-04-17s390: ctcm: fix ctcm_new_device error return codeArnd Bergmann
clang points out that the return code from this function is undefined for one of the error paths: ../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (priv->channel[direction] == NULL) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here return result; ^~~~~~ ../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false if (priv->channel[direction] == NULL) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning int result; ^ Make it return -ENODEV here, as in the related failure cases. gcc has a known bug in underreporting some of these warnings when it has already eliminated the assignment of the return code based on some earlier optimization step. Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17nfp: abm: fix spelling mistake "offseting" -> "offsetting"Colin Ian King
There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error messages. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net: stmmac: Use bfsize1 in ndesc_init_rx_descYueHaibing
gcc warn this: drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc: drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable] Like enh_desc_init_rx_desc, we should use bfsize1 in ndesc_init_rx_desc to calculate 'p->des1' Fixes: 583e63614149 ("net: stmmac: use correct DMA buffer size in the RX descriptor") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net ipv6: Prevent neighbor add if protocol is disabled on deviceDavid Ahern
Disabling IPv6 on an interface removes existing entries but nothing prevents new entries from being manually added. To that end, add a new neigh_table operation, allow_add, that is called on RTM_NEWNEIGH to see if neighbor entries are allowed on a given device. If IPv6 is disabled on the device, allow_add returns false and passes a message back to the user via extack. $ echo 1 > /proc/sys/net/ipv6/conf/eth1/disable_ipv6 $ ip -6 neigh add fe80::4c88:bff:fe21:2704 dev eth1 lladdr de:ad:be:ef:01:01 Error: IPv6 is disabled on this device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17Merge branch 'ipv6-Use-fib6_result-for-fib_lookups'David S. Miller
David Ahern says: ==================== ipv6: Use fib6_result for fib_lookups Add fib6_result as a single data structure to hold results from a fib lookup. IPv6 currently has everything in 1 data structure - a fib6_info, but with nexthop objects the fib6_nh can be in a nexthop or a nexthop can be a blackhole which affects the fib6_type and flags (REJECT). v2 - fixed 2 bugs in patch12: i. checking return from fib6_table_lookup in fib6_lookup ii. call to fib6_rule_saddr in fib6_rule_action_alt should use res->nh ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Add fib6_type and fib6_flags to fib6_resultDavid Ahern
Add the fib6_flags and fib6_type to fib6_result. Update the lookup helpers to set them and update post fib lookup users to use the version from the result. This allows nexthop objects to have blackhole nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to fib lookupsDavid Ahern
Change fib6_lookup and fib6_table_lookup to take a fib6_result and set f6i and nh rather than returning a fib6_info. For now both always return 0. A later patch set can make these more like the IPv4 counterparts and return EINVAL, EACCESS, etc based on fib6_type. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to fib6_table_lookup tracepointDavid Ahern
Change fib6_table_lookup tracepoint to take the fib6_result and use the fib6_info and fib6_nh from it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to rt6_select and find_rr_leafDavid Ahern
Pass fib6_result to rt6_select. Instead of returning the fib entry, it will set f6i and nh based on the lookup. find_rr_leaf is changed to remove the match option in favor of taking fib6_result and having __find_rr_leaf set f6i in the result. In the process, update fib6_info references in __find_rr_leaf to f6i names. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to rt6_device_matchDavid Ahern
Pass fib6_result to rt6_device_match with f6i set. rt6_device_match updates f6i in the result if it finds a better match and sets nh. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to ip6_mtu_from_fib6 and fib6_mtuDavid Ahern
Change ip6_mtu_from_fib6 and fib6_mtu to take a fib6_result over a fib6_info. Update both to use the fib6_nh from fib6_result. Since the signature of ip6_mtu_from_fib6 is already changing, add const to daddr and saddr. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to rt6_insert_exceptionDavid Ahern
Update rt6_insert_exception to take a fib6_result over a fib6_info. Change ort to f6i from the fib6_result and rename to better reflect what it references (a fib6_info). Since this function is already getting changed, update the comments to reference fib6_info variables rather than the older rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to ip6_rt_get_dev_rcu and ip6_rt_copy_initDavid Ahern
Now that all callers are update to have a fib6_result, pass it down to ip6_rt_get_dev_rcu, ip6_rt_copy_init, and ip6_rt_init_dst. In the process, change ort to f6i in ip6_rt_copy_init to make it clear it is a reference to a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to pcpu route functionsDavid Ahern
Update ip6_rt_pcpu_alloc, rt6_get_pcpu_route and rt6_make_pcpu_route to a fib6_result over a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to ip6_create_rt_rcuDavid Ahern
Change ip6_create_rt_rcu to take fib6_result over a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to ip6_rt_cache_allocDavid Ahern
Change ip6_rt_cache_alloc to take a fib6_result over a fib6_info. Since ip6_rt_cache_alloc is only the caller, update the rt6_is_gw_or_nonexthop helper to take fib6_result. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Pass fib6_result to rt6_find_cached_rtDavid Ahern
Simplify rt6_find_cached_rt for the fast path cases and pass fib6_result to rt6_find_cached_rt. Rename the local return variable to ret to maintain consisting with fib6_result name. Update the comment in rt6_find_cached_rt to reference the new names in a fib6_info vs the old name when fib entries were an rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17ipv6: Rename fib6_multipath_select and pass fib6_resultDavid Ahern
Add 'struct fib6_result' to hold the fib entry and fib6_nh from a fib lookup as separate entries, similar to what IPv4 now has with fib_result. Rename fib6_multipath_select to fib6_select_path, pass fib6_result to it, and set f6i and nh in the result once a path selection is done. Call fib6_select_path unconditionally for path selection which means moving the sibling and oif check to fib6_select_path. To handle the two different call paths (2 only call multipath_select if flowi6_oif == 0 and the other always calls it), add a new have_oif_match that controls the sibling walk if relevant. Update callers of fib6_multipath_select accordingly and have them use the fib6_info and fib6_nh from the result. This is needed for multipath nexthop objects where a single f6i can point to multiple fib6_nh (similar to IPv4). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17selftests/bpf: fix a compilation errorYonghong Song
I hit the following compilation error with gcc 4.8.5. prog_tests/flow_dissector.c: In function ‘test_flow_dissector’: prog_tests/flow_dissector.c:155:2: error: ‘for’ loop initial declarations are only allowed in C99 mode for (int i = 0; i < ARRAY_SIZE(tests); i++) { ^ prog_tests/flow_dissector.c:155:2: note: use option -std=c99 or -std=gnu99 to compile your code Let us fix the issue by avoiding this particular c99 feature. Fixes: a5cb33464e53 ("selftests/bpf: make flow dissector tests more extensible") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17Merge branch 'bulk-cpumap-redirect'Alexei Starovoitov
Jesper Dangaard Brouer says: ==================== This patchset utilize a number of different kernel bulk APIs for optimizing the performance for the XDP cpumap redirect feature. Benchmark details are available here: https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap/cpumap03-optimizations.org Performance measurements can be considered micro benchmarks, as they measure dropping packets at different stages in the network stack. Summary based on above: Baseline benchmarks - baseline-redirect: UdpNoPorts: 3,180,074 - baseline-redirect: iptables-raw drop: 6,193,534 Patch1: bpf: cpumap use ptr_ring_consume_batched - redirect: UdpNoPorts: 3,327,729 - redirect: iptables-raw drop: 6,321,540 Patch2: net: core: introduce build_skb_around - redirect: UdpNoPorts: 3,221,303 - redirect: iptables-raw drop: 6,320,066 Patch3: bpf: cpumap do bulk allocation of SKBs - redirect: UdpNoPorts: 3,290,563 - redirect: iptables-raw drop: 6,650,112 Patch4: bpf: cpumap memory prefetchw optimizations for struct page - redirect: UdpNoPorts: 3,520,250 - redirect: iptables-raw drop: 7,649,604 In this V2 submission I have chosen drop the SKB-list patch using netif_receive_skb_list() as it was not showing a performance improvement for these micro benchmarks. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17bpf: cpumap memory prefetchw optimizations for struct pageJesper Dangaard Brouer
A lot of the performance gain comes from this patch. While analysing performance overhead it was found that the largest CPU stalls were caused when touching the struct page area. It is first read with a READ_ONCE from build_skb_around via page_is_pfmemalloc(), and when freed written by page_frag_free() call. Measurements show that the prefetchw (W) variant operation is needed to achieve the performance gain. We believe this optimization it two fold, first the W-variant saves one step in the cache-coherency protocol, and second it helps us to avoid the non-temporal prefetch HW optimizations and bring this into all cache-levels. It might be worth investigating if prefetch into L2 will have the same benefit. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17bpf: cpumap do bulk allocation of SKBsJesper Dangaard Brouer
As cpumap now batch consume xdp_frame's from the ptr_ring, it knows how many SKBs it need to allocate. Thus, lets bulk allocate these SKBs via kmem_cache_alloc_bulk() API, and use the previously introduced function build_skb_around(). Notice that the flag __GFP_ZERO asks the slab/slub allocator to clear the memory for us. This does clear a larger area than needed, but my micro benchmarks on Intel CPUs show that this is slightly faster due to being a cacheline aligned area is cleared for the SKBs. (For SLUB allocator, there is a future optimization potential, because SKBs will with high probability originate from same page. If we can find/identify continuous memory areas then the Intel CPU memset rep stos will have a real performance gain.) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17net: core: introduce build_skb_aroundJesper Dangaard Brouer
The function build_skb() also have the responsibility to allocate and clear the SKB structure. Introduce a new function build_skb_around(), that moves the responsibility of allocation and clearing to the caller. This allows caller to use kmem_cache (slab/slub) bulk allocation API. Next patch use this function combined with kmem_cache_alloc_bulk. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17bpf: cpumap use ptr_ring_consume_batchedJesper Dangaard Brouer
Move ptr_ring dequeue outside loop, that allocate SKBs and calls network stack, as these operations that can take some time. The ptr_ring is a communication channel between CPUs, where we want to reduce/limit any cacheline bouncing. Do a concentrated bulk dequeue via ptr_ring_consume_batched, to shorten the period and times the remote cacheline in ptr_ring is read Batch size 8 is both to (1) limit BH-disable period, and (2) consume one cacheline on 64-bit archs. After reducing the BH-disable section further then we can consider changing this, while still thinking about L1 cacheline size being active. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17ipv4: set the tcp_min_rtt_wlen range from 0 to one dayZhangXiaoxu
There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: <IRQ> dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf </IRQ> It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>