summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-25selftests: forwarding: lib: Allow reading TC rule byte countersPetr Machata
The function tc_rule_stats_get() fetches a packet counter of a given TC rule. Extend it to support byte counters as well by adding an optional argument with selector. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25selftests: forwarding: lib: Add helpers for busywaitingPetr Machata
The function busywait() is handy as a safety-latched variant of a while loop. Many selftests deal specifically with counter values, and busywaiting on them is likely to be rather common (it is not quite common now, but busywait() has not been around for very long). To facilitate expressing simply what is tested, introduce two helpers: - until_counter_is(), which can be used as a predicate passed to busywait(), which holds when expression, which is itself passed as an argument to until_counter_is(), reaches a desired value. - busywait_for_counter(), which is useful for waiting until a given counter changes "by" (as opposed to "to") a certain amount. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25selftests: Move two functions from mlxsw's qos_lib to libPetr Machata
The function humanize() is used for converting value in bits/s to a human-friendly approximate value in Kbps, Mbps or Gbps. There is nothing hardware-specific in that, so move the function to lib.sh. Similarly for the rate() function, which just does a bit of math to calculate a rate, given two counter values and a time interval. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Support offloading of TBF QdiscPetr Machata
React to the TC messages that were introduced in a preceding patch and configure egress maximum shaper as appropriate. TBF can be used as a root qdisc or under one of PRIO or strict ETS bands. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum: Configure shaper rate and burst size togetherPetr Machata
In order to allow configuration of burst size together with shaper rate, extend mlxsw_sp_port_ets_maxrate_set() with a burst_size argument. Convert call sites to pass 0 (for default). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum: Add lowest_shaper_bs to struct mlxsw_spPetr Machata
Lower limit of burst size configuration is dependent on system type. Add a datum to track the value. Initialize as appropriate in mlxsw_spX_init(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: reg: Increase MLXSW_REG_QEEC_MAS_DISPetr Machata
As the port speeds grow, the current value of "unlimited shaper", 200000000Kbps, might become lower than the actually supported speeds. Bump it to the maximum value that fits in the corresponding QEEC field, which is about 2.1Tbps. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: reg: Add max_shaper_bs to QoS ETS Element ConfigurationPetr Machata
The QEEC register configures scheduling elements. One of the bits of configuration is the burst size to use for the shaper installed on the element. Add the necessary fields to support this configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Extract a common leaf unoffload functionPetr Machata
When the RED Qdisc is unoffloaded, it needs to reduce the reported backlog by the amount that is in the HW, so that only the SW backlog is contained in the counter. The same thing will need to be done by TBF, and likely any other leaf Qdisc as well. Extract a helper mlxsw_sp_qdisc_leaf_unoffload() and call it from mlxsw_sp_qdisc_red_unoffload(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Add mlxsw_sp_qdisc_get_class_stats()Petr Machata
Add a wrapper around mlxsw_sp_qdisc_collect_tc_stats() and mlxsw_sp_qdisc_update_stats() for the simple case of doing both in one go: mlxsw_sp_qdisc_get_class_stats(). Dispatch to that function from mlxsw_sp_qdisc_get_red_stats(). This new function will be useful for other leaf Qdiscs as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Extract a per-TC stat functionPetr Machata
Extract from mlxsw_sp_qdisc_get_prio_stats() two new functions: mlxsw_sp_qdisc_collect_tc_stats() to accumulate stats for that one TC only, and mlxsw_sp_qdisc_update_stats() that makes the stats relative to base values stored earlier. Use them from mlxsw_sp_qdisc_get_red_stats(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: sched: Make TBF Qdisc offloadablePetr Machata
Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying and dumping of TBF Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: sched: sch_tbf: Don't overwrite backlog before dumpingPetr Machata
In 2011, in commit b0460e4484f9 ("sch_tbf: report backlog information"), TBF started copying backlog depth from the child Qdisc before dumping, with the motivation that the backlog was otherwise not visible in "tc -s qdisc show". Later, in 2016, in commit 8d5958f424b6 ("sch_tbf: update backlog as well"), TBF got a full-blown backlog tracking. However it kept copying the child's backlog over before dumping. That line is now unnecessary, so remove it. As shown in the following example, backlog is still reported correctly: # tc -s qdisc show dev veth0 invisible qdisc tbf 1: root refcnt 2 rate 1Mbit burst 128Kb lat 82.8s Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 812544 requeues 0) backlog 81972b 66p requeues 0 qdisc bfifo 0: parent 1:1 limit 10Mb Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 0 requeues 0) backlog 81972b 66p requeues 0 Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEMMichael Ellerman
The cxgb3 driver for "Chelsio T3-based gigabit and 10Gb Ethernet adapters" implements a custom ioctl as SIOCCHIOCTL/SIOCDEVPRIVATE in cxgb_extension_ioctl(). One of the subcommands of the ioctl is CHELSIO_GET_MEM, which appears to read memory directly out of the adapter and return it to userspace. It's not entirely clear what the contents of the adapter memory contains, but the assumption is that it shouldn't be accessible to all users. So add a CAP_NET_ADMIN check to the CHELSIO_GET_MEM case. Put it after the is_offload() check, which matches two of the other subcommands in the same function which also check for is_offload() and CAP_NET_ADMIN. Found by Ilja by code inspection, not tested as I don't have the required hardware. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25Merge branch 'hv_netvsc-Add-XDP-support'David S. Miller
Haiyang Zhang says: ==================== hv_netvsc: Add XDP support Add XDP support and update related document. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25hv_netvsc: Update document for XDP supportHaiyang Zhang
Added the new section in the document regarding XDP support by hv_netvsc driver. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25hv_netvsc: Add XDP supportHaiyang Zhang
This patch adds support of XDP in native mode for hv_netvsc driver, and transparently sets the XDP program on the associated VF NIC as well. Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to VF NIC automatically. Setting / unsetting XDP program on VF NIC directly is not recommended, also not propagated to synthetic NIC, and may be overwritten by setting of synthetic NIC. The Azure/Hyper-V synthetic NIC receive buffer doesn't provide headroom for XDP. We thought about re-use the RNDIS header space, but it's too small. So we decided to copy the packets to a page buffer for XDP. And, most of our VMs on Azure have Accelerated Network (SRIOV) enabled, so most of the packets run on VF NIC. The synthetic NIC is considered as a fallback data-path. So the data copy on netvsc won't impact performance significantly. XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO before running XDP: ethtool -K eth0 lro off XDP actions not yet supported: XDP_REDIRECT Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25devlink: Add health recover notifications on devlink flowsMoshe Shemesh
Devlink health recover notifications were added only on driver direct updates of health_state through devlink_health_reporter_state_update(). Add notifications on updates of health_state by devlink flows of report and recover. Moved functions devlink_nl_health_reporter_fill() and devlink_recover_notify() to avoid forward declaration. Fixes: 97ff3bd37fac ("devlink: add devink notification when reporter update health state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: bcmgenet: Use netif_tx_napi_add() for TX NAPIFlorian Fainelli
Before commit 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization") moved the code, this used to be netif_tx_napi_add(), but we lost that small semantic change in the process, restore that. Fixes: 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25tipc: change maintainer email addressJon Maloy
Reflecting new realities. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: fddi: skfp: Use print_hex_dump() helperAndy Shevchenko
Use the print_hex_dump() helper, instead of open-coding the same operations. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: atm: use %*ph to print small bufferAndy Shevchenko
Use %*ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25efi/x86: Disable instrumentation in the EFI runtime handling codeArd Biesheuvel
We already disable KASAN instrumentation specifically for the EFI routines that are known to dereference memory addresses that KASAN does not know about, avoiding false positive KASAN splats. However, as it turns out, having GCOV or KASAN instrumentation enabled interferes with the compiler's ability to optimize away function calls that are guarded by IS_ENABLED() checks that should have resulted in those references to have been const-propagated out of existence. But with instrumenation enabled, we may get build errors like: ld: arch/x86/platform/efi/efi_64.o: in function `efi_thunk_set_virtual_address_map': ld: arch/x86/platform/efi/efi_64.o: in function `efi_set_virtual_address_map': in builds where CONFIG_EFI=y but CONFIG_EFI_MIXED or CONFIG_X86_UV are not defined, even though the invocations are conditional on IS_ENABLED() checks against the respective Kconfig symbols. So let's disable instrumentation entirely for this subdirectory, which isn't that useful here to begin with. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-efi@vger.kernel.org
2020-01-25efi/libstub/x86: Fix EFI server boot failureQian Cai
x86_64 EFI systems are unable to boot due to a typo in a recent commit: EFI config tables not found. -- System halted This was probably due to the absense of CONFIG_EFI_MIXED=y in testing. Fixes: 796eb8d26a57 ("efi/libstub/x86: Use const attribute for efi_is_64bit()") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20200122191430.4888-1-cai@lca.pw
2020-01-25net: stmmac: platform: fix probe for ACPI devicesAjay Gupta
Use generic device API to get phy mode to fix probe failure with ACPI based devices. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-25jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft()zhangyi (F)
__jbd2_journal_abort_hard() is no longer used, so now we can merge __jbd2_journal_abort_hard() and __journal_abort_soft() these two functions into jbd2_journal_abort() and remove them. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25jbd2: make sure ESHUTDOWN to be recorded in the journal superblockzhangyi (F)
Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want to allow jbd2 layer to distinguish shutdown journal abort from other error cases. So the ESHUTDOWN should be taken precedence over any other errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set, but it only update errno in the journal suoerblock now if the old errno is 0. Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25ext4, jbd2: ensure panic when aborting with zero errnozhangyi (F)
JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2 aborted, and then __ext4_abort() and ext4_handle_error() can invoke panic if ERRORS_PANIC is specified. But if the journal has been aborted with zero errno, jbd2_journal_abort() didn't set this flag so we can no longer panic. Fix this by always record the proper errno in the journal superblock. Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25jbd2: switch to use jbd2_journal_abort() when failed to submit the commit recordzhangyi (F)
We invoke jbd2_journal_abort() to abort the journal and record errno in the jbd2 superblock when committing journal transaction besides the failure on submitting the commit record. But there is no need for the case and we can also invoke jbd2_journal_abort() instead of __jbd2_journal_abort_hard(). Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25jbd2_seq_info_next should increase position indexVasily Averin
if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Script below generates endless output $ q=;while read -r r;do echo "$((++q)) $r";done </proc/fs/jbd2/DEV/info https://bugzilla.kernel.org/show_bug.cgi?id=206283 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Cc: stable@kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25jbd2: remove pointless assertion in __journal_remove_journal_headShijie Luo
Only when jh->b_jcount = 0 in jbd2_journal_put_journal_head, we are allowed to call __journal_remove_journal_head. This assertion is meaningless, just remove it. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200123070054.50585-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25ext4,jbd2: fix comment and code styleShijie Luo
Fix comment and remove unneccessary blank. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200123064325.36358-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25jbd2: delete the duplicated words in the commentswangyan
Delete the duplicated words "is" in the comments Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/12087f77-ab4d-c7ba-53b4-893dbf0026f0@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25mptcp: Fix code formattingMat Martineau
checkpatch.pl had a few complaints in the last set of MPTCP patches: ERROR: code indent should use tabs where possible +^I subflow, sk->sk_family, icsk->icsk_af_ops, target, mapped);$ CHECK: Comparison to NULL could be written "!new_ctx" + if (new_ctx == NULL) { ERROR: "foo * bar" should be "foo *bar" +static const struct proto_ops * tcp_proto_ops(struct sock *sk) Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mptcp: do not inherit inet proto opsFlorian Westphal
We need to initialise the struct ourselves, else we expose tcp-specific callbacks such as tcp_splice_read which will then trigger splat because the socket is an mptcp one: BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478 CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3 Call Trace: tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791 do_splice+0x1259/0x1560 fs/splice.c:1205 To prevent build error with ipv6, add the recv/sendmsg function declaration to ipv6.h. The functions are already accessible "thanks" to retpoline related work, but they are currently only made visible by socket.c specific INDIRECT_CALLABLE macros. Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25ext4: fix extent_status trace pointsDmitry Monakhov
Show pblock only if it has meaningful value. # before ext4:ext4_es_lookup_extent_exit: dev 253,0 ino 12 found 1 [1/4294967294) 576460752303423487 H ext4:ext4_es_lookup_extent_exit: dev 253,0 ino 12 found 1 [2/4294967293) 576460752303423487 HR # after ext4:ext4_es_lookup_extent_exit: dev 253,0 ino 12 found 1 [1/4294967294) 0 H ext4:ext4_es_lookup_extent_exit: dev 253,0 ino 12 found 1 [2/4294967293) 0 HR Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Link: https://lore.kernel.org/r/20191114200147.1073-2-dmonakhov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25ext4: fix symbolic enum printing in trace outputDmitry Monakhov
Trace's macro __print_flags() produce raw event's decraration w/o knowing actual flags value cat /sys/kernel/debug/tracing/events/ext4/ext4_ext_map_blocks_exit/format .. __print_flags(REC->mflags, "", { (1 << BH_New), For that reason we have to explicitly define it via special macro TRACE_DEFINE_ENUM() Also add missed EXTENT_STATUS_REFERENCED flag. #Before patch ext4:ext4_ext_map_blocks_exit: dev 253,0 ino 2 flags lblk 0 pblk 4177 len 1 mflags 0x20 ret 1 ext4:ext4_ext_map_blocks_exit: dev 253,0 ino 12 flags CREATE lblk 0 pblk 34304 len 1 mflags 0x60 ret 1 #With patch ext4:ext4_ext_map_blocks_exit: dev 253,0 ino 2 flags lblk 0 pblk 4177 len 1 mflags M ret 1 ext4:ext4_ext_map_blocks_exit: dev 253,0 ino 12 flags CREATE lblk 0 pblk 34816 len 1 mflags NM ret 1 Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Link: https://lore.kernel.org/r/20191114200147.1073-1-dmonakhov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25ext4: choose hardlimit when softlimit is larger than hardlimit in ↵Chengguang Xu
ext4_statfs_project() Setting softlimit larger than hardlimit seems meaningless for disk quota but currently it is allowed. In this case, there may be a bit of comfusion for users when they run df comamnd to directory which has project quota. For example, we set 20M softlimit and 10M hardlimit of block usage limit for project quota of test_dir(project id 123). [root@hades mnt_ext4]# repquota -P -a *** Report for project quotas on device /dev/loop0 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- 0 -- 13 0 0 2 0 0 123 -- 10237 20480 10240 5 200 100 The result of df command as below: [root@hades mnt_ext4]# df -h test_dir Filesystem Size Used Avail Use% Mounted on /dev/loop0 20M 10M 10M 50% /home/cgxu/test/mnt_ext4 Even though it looks like there is another 10M free space to use, if we write new data to diretory test_dir(inherit project id), the write will fail with errno(-EDQUOT). After this patch, the df result looks like below. [root@hades mnt_ext4]# df -h test_dir Filesystem Size Used Avail Use% Mounted on /dev/loop0 10M 10M 3.0K 100% /home/cgxu/test/mnt_ext4 Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191016022501.760-1-cgxu519@mykernel.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-24ext4: fix race conditions in ->d_compare() and ->d_hash()Eric Biggers
Since ->d_compare() and ->d_hash() can be called in RCU-walk mode, ->d_parent and ->d_inode can be concurrently modified, and in particular, ->d_inode may be changed to NULL. For ext4_d_hash() this resulted in a reproducible NULL dereference if a lookup is done in a directory being deleted, e.g. with: int main() { if (fork()) { for (;;) { mkdir("subdir", 0700); rmdir("subdir"); } } else { for (;;) access("subdir/file", 0); } } ... or by running the 't_encrypted_d_revalidate' program from xfstests. Both repros work in any directory on a filesystem with the encoding feature, even if the directory doesn't actually have the casefold flag. I couldn't reproduce a crash in ext4_d_compare(), but it appears that a similar crash is possible there. Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and falling back to the case sensitive behavior if the inode is NULL. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200124041234.159740-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - add sanity checks to USB endpoints in various dirvers - max77650-onkey was missing an OF table which was preventing module autoloading - a revert and a different fix for F54 handling in Synaptics dirver - a fixup for handling register in pm8xxx vibrator driver * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: pm8xxx-vib - fix handling of separate enable register Input: keyspan-remote - fix control-message timeouts Input: max77650-onkey - add of_match table Input: rmi_f54 - read from FIFO in 32 byte blocks Revert "Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers" Input: sur40 - fix interface sanity checks Input: gtco - drop redundant variable reinit Input: gtco - fix extra-descriptor debug message Input: gtco - fix endpoint sanity check Input: aiptek - use descriptors of current altsetting Input: aiptek - fix endpoint sanity check Input: pegasus_notetaker - fix endpoint sanity check Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register Input: evdev - convert kzalloc()/vzalloc() to kvzalloc()
2020-01-24ext4: make dioread_nolock the defaultTheodore Ts'o
This fixes the direct I/O versus writeback race which can reveal stale data, and it improves the tail latency of commits on slow devices. Link: https://lore.kernel.org/r/20200125022254.1101588-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-24ice: Allocate flow profileTony Nguyen
Create an extraction sequence based on the packet header protocols to be programmed and allocate a flow profile for the extraction sequence. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-24libbpf: Fix realloc usage in bpf_core_find_candsAndrii Nakryiko
Fix bug requesting invalid size of reallocated array when constructing CO-RE relocation candidate list. This can cause problems if there are many potential candidates and a very fine-grained memory allocator bucket sizes are used. Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm") Reported-by: William Smith <williampsmith@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200124201847.212528-1-andriin@fb.com
2020-01-24ice: Enable writing hardware filtering tablesTony Nguyen
Enable the driver to write the filtering hardware tables to allow for changing of RSS rules. Upon loading of DDP package, a minimal configuration should be written to hardware. Introduce and initialize structures for storing configuration and make the top level calls to configure the RSS tables to initial values. A packet segment will be created but nothing is written to hardware yet. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-24libbpf: Improve handling of failed CO-RE relocationsAndrii Nakryiko
Previously, if libbpf failed to resolve CO-RE relocation for some instructions, it would either return error immediately, or, if .relaxed_core_relocs option was set, would replace relocatable offset/imm part of an instruction with a bogus value (-1). Neither approach is good, because there are many possible scenarios where relocation is expected to fail (e.g., when some field knowingly can be missing on specific kernel versions). On the other hand, replacing offset with invalid one can hide programmer errors, if this relocation failue wasn't anticipated. This patch deprecates .relaxed_core_relocs option and changes the approach to always replacing instruction, for which relocation failed, with invalid BPF helper call instruction. For cases where this is expected, BPF program should already ensure that that instruction is unreachable, in which case this invalid instruction is going to be silently ignored. But if instruction wasn't guarded, BPF program will be rejected at verification step with verifier log pointing precisely to the place in assembly where the problem is. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200124053837.2434679-1-andriin@fb.com
2020-01-24selftests: bpf: Reset global state between reuseport test runsLorenz Bauer
Currently, there is a lot of false positives if a single reuseport test fails. This is because expected_results and the result map are not cleared. Zero both after individual test runs, which fixes the mentioned false positives. Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
2020-01-24selftests: bpf: Make reuseport test output more legibleLorenz Bauer
Include the name of the mismatching result in human readable format when reporting an error. The new output looks like the following: unexpected result result: [1, 0, 0, 0, 0, 0] expected: [0, 0, 0, 0, 0, 0] mismatch on DROP_ERR_INNER_MAP (bpf_prog_linum:153) check_results:FAIL:382 Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-4-lmb@cloudflare.com
2020-01-24selftests: bpf: Ignore FIN packets for reuseport testsLorenz Bauer
The reuseport tests currently suffer from a race condition: FIN packets count towards DROP_ERR_SKB_DATA, since they don't contain a valid struct cmd. Tests will spuriously fail depending on whether check_results is called before or after the FIN is processed. Exit the BPF program early if FIN is set. Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-3-lmb@cloudflare.com
2020-01-24selftests: bpf: Use a temporary file in test_sockmapLorenz Bauer
Use a proper temporary file for sendpage tests. This means that running the tests doesn't clutter the working directory, and allows running the test on read-only filesystems. Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-2-lmb@cloudflare.com