summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-20bpf: fix mixed signed/unsigned derived min/max value boundsDaniel Borkmann
Edward reported that there's an issue in min/max value bounds tracking when signed and unsigned compares both provide hints on limits when having unknown variables. E.g. a program such as the following should have been rejected: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff8a94cda93400 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 11: (65) if r1 s> 0x1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 14: (b7) r0 = 0 15: (95) exit What happens is that in the first part ... 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 ... r1 carries an unsigned value, and is compared as unsigned against a register carrying an immediate. Verifier deduces in reg_set_min_max() that since the compare is unsigned and operation is greater than (>), that in the fall-through/false case, r1's minimum bound must be 0 and maximum bound must be r2. Latter is larger than the bound and thus max value is reset back to being 'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now 'R1=inv,min_value=0'. The subsequent test ... 11: (65) if r1 s> 0x1 goto pc+2 ... is a signed compare of r1 with immediate value 1. Here, verifier deduces in reg_set_min_max() that since the compare is signed this time and operation is greater than (>), that in the fall-through/false case, we can deduce that r1's maximum bound must be 1, meaning with prior test, we result in r1 having the following state: R1=inv,min_value=0,max_value=1. Given that the actual value this holds is -8, the bounds are wrongly deduced. When this is being added to r0 which holds the map_value(_adj) type, then subsequent store access in above case will go through check_mem_access() which invokes check_map_access_adj(), that will then probe whether the map memory is in bounds based on the min_value and max_value as well as access size since the actual unknown value is min_value <= x <= max_value; commit fce366a9dd0d ("bpf, verifier: fix alu ops against map_value{, _adj} register types") provides some more explanation on the semantics. It's worth to note in this context that in the current code, min_value and max_value tracking are used for two things, i) dynamic map value access via check_map_access_adj() and since commit 06c1c049721a ("bpf: allow helpers access to variable memory") ii) also enforced at check_helper_mem_access() when passing a memory address (pointer to packet, map value, stack) and length pair to a helper and the length in this case is an unknown value defining an access range through min_value/max_value in that case. The min_value/max_value tracking is /not/ used in the direct packet access case to track ranges. However, the issue also affects case ii), for example, the following crafted program based on the same principle must be rejected as well: 0: (b7) r2 = 0 1: (bf) r3 = r10 2: (07) r3 += -512 3: (7a) *(u64 *)(r10 -16) = -8 4: (79) r4 = *(u64 *)(r10 -16) 5: (b7) r6 = -1 6: (2d) if r4 > r6 goto pc+5 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 7: (65) if r4 s> 0x1 goto pc+4 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 8: (07) r4 += 1 9: (b7) r5 = 0 10: (6a) *(u16 *)(r10 -512) = 0 11: (85) call bpf_skb_load_bytes#26 12: (b7) r0 = 0 13: (95) exit Meaning, while we initialize the max_value stack slot that the verifier thinks we access in the [1,2] range, in reality we pass -7 as length which is interpreted as u32 in the helper. Thus, this issue is relevant also for the case of helper ranges. Resetting both bounds in check_reg_overflow() in case only one of them exceeds limits is also not enough as similar test can be created that uses values which are within range, thus also here learned min value in r1 is incorrect when mixed with later signed test to create a range: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff880ad081fa00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (65) if r1 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 14: (b7) r0 = 0 15: (95) exit This leaves us with two options for fixing this: i) to invalidate all prior learned information once we switch signed context, ii) to track min/max signed and unsigned boundaries separately as done in [0]. (Given latter introduces major changes throughout the whole verifier, it's rather net-next material, thus this patch follows option i), meaning we can derive bounds either from only signed tests or only unsigned tests.) There is still the case of adjust_reg_min_max_vals(), where we adjust bounds on ALU operations, meaning programs like the following where boundaries on the reg get mixed in context later on when bounds are merged on the dst reg must get rejected, too: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff89b2bf87ce00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+6 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (b7) r7 = 1 12: (65) if r7 s> 0x0 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp 13: (b7) r0 = 0 14: (95) exit from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp 15: (0f) r7 += r1 16: (65) if r7 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 17: (0f) r0 += r7 18: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 19: (b7) r0 = 0 20: (95) exit Meaning, in adjust_reg_min_max_vals() we must also reset range values on the dst when src/dst registers have mixed signed/ unsigned derived min/max value bounds with one unbounded value as otherwise they can be added together deducing false boundaries. Once both boundaries are established from either ALU ops or compare operations w/o mixing signed/unsigned insns, then they can safely be added to other regs also having both boundaries established. Adding regs with one unbounded side to a map value where the bounded side has been learned w/o mixing ops is possible, but the resulting map value won't recover from that, meaning such op is considered invalid on the time of actual access. Invalid bounds are set on the dst reg in case i) src reg, or ii) in case dst reg already had them. The only way to recover would be to perform i) ALU ops but only 'add' is allowed on map value types or ii) comparisons, but these are disallowed on pointers in case they span a range. This is fine as only BPF_JEQ and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers which potentially turn them into PTR_TO_MAP_VALUE type depending on the branch, so only here min/max value cannot be invalidated for them. In terms of state pruning, value_from_signed is considered as well in states_equal() when dealing with adjusted map values. With regards to breaking existing programs, there is a small risk, but use-cases are rather quite narrow where this could occur and mixing compares probably unlikely. Joint work with Josef and Edward. [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html Fixes: 484611357c19 ("bpf: allow access into map value arrays") Reported-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20Merge tag 'pm-4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These are two stable-candidate fixes for the intel_pstate driver and the generic power domains (genpd) framework. Specifics: - Fix the average CPU load computations in the intel_pstate driver on Knights Landing (Xeon Phi) processors that require an extra factor to compensate for a rate change differences between the TSC and MPERF which is missing (Srinivas Pandruvada). - Fix an initialization ordering issue in the generic power domains (genpd) framework (Sudeep Holla)" * tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present cpufreq: intel_pstate: Correct the busy calculation for KNL
2017-07-20x86: mark kprobe templates as character arrays, not single charactersLinus Torvalds
They really are, and the "take the address of a single character" makes the string fortification code unhappy (it believes that you can now only acccess one byte, rather than a byte range, and then raises errors for the memory copies going on in there). We could now remove a few 'addressof' operators (since arrays naturally degrade to pointers), but this is the minimal patch that just changes the C prototypes of those template arrays (the templates themselves are defined in inline asm). Reported-by: kernel test robot <xiaolong.ye@intel.com> Acked-and-tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-20Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "Several ACL related fixes for ext2, reiserfs, and hfsplus. And also one minor isofs cleanup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: hfsplus: Don't clear SGID when inheriting ACLs isofs: Fix off-by-one in 'session' mount option parsing reiserfs: preserve i_mode if __reiserfs_set_acl() fails ext2: preserve i_mode if ext2_set_acl() fails ext2: Don't clear SGID when inheriting ACLs reiserfs: Don't clear SGID when inheriting ACLs
2017-07-20Merge tag 'for-f2fs-v4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "We've filed some bug fixes: - missing f2fs case in terms of stale SGID bit, introduced by Jan - build error for seq_file.h - avoid cpu lockup - wrong inode_unlock in error case" * tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid cpu lockup f2fs: include seq_file.h for sysfs.c f2fs: Don't clear SGID when inheriting ACLs f2fs: remove extra inode_unlock() in error path
2017-07-20Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fix from Paul Moore: "A small audit fix, just a single line, to plug a memory leak in some audit error handling code" * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit: audit: fix memleak in auditd_send_unicast_skb.
2017-07-20Merge tag 'libnvdimm-fixes-4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A handful of small fixes for 4.13-rc2. Three of these fixes are tagged for -stable. They have all appeared in at least one -next release with no reported issues - Fix handling of media errors that span a sector - Fix support of multiple namespaces in a libnvdimm region being in device-dax mode - Clean up the machine check notifier properly when the nfit driver fails to register - Address a static analysis (smatch) report in device-dax" * tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: fix sysfs duplicate warnings MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system acpi/nfit: Fix memory corruption/Unregister mce decoder on failure device-dax: fix 'passing zero to ERR_PTR()' warning libnvdimm: fix badblock range handling of ARS range
2017-07-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - HID multitouch 4.12 regression fix from Dmitry Torokhov - error handling fix for HID++ driver from Gustavo A. R. Silva * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
2017-07-20Merge branches 'intel_pstate' and 'pm-domains'Rafael J. Wysocki
* intel_pstate: cpufreq: intel_pstate: Correct the busy calculation for KNL * pm-domains: PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
2017-07-20HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return valueGustavo A. R. Silva
Check return value from call to devm_kmemdup() in order to prevent a NULL pointer dereference. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-07-19ipv6: avoid overflow of offset in ip6_find_1stfragoptSabrina Dubroca
In some cases, offset can overflow and can cause an infinite loop in ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and cap it at IPV6_MAXPLEN, since packets larger than that should be invalid. This problem has been here since before the beginning of git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: tehuti: don't process data if it has not been copied from userspaceColin Ian King
The array data is only populated with valid information from userspace if cmd != SIOCDEVPRIVATE, other cases the array contains garbage on the stack. The subsequent switch statement acts on a subcommand in data[0] which could be any garbage value if cmd is SIOCDEVPRIVATE which seems incorrect to me. Instead, just return EOPNOTSUPP for the case where cmd == SIOCDEVPRIVATE to avoid this issue. As a side note, I suspect that the original intention of the code was for this ioctl to work just for cmd == SIOCDEVPRIVATE (and the current logic is reversed). However, I don't wont to change the current semantics in case any userspace code relies on this existing behaviour. Detected by CoverityScan, CID#139647 ("Uninitialized scalar variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"David Ahern
This reverts commit cd8966e75ed3c6b41a37047a904617bc44fa481f. The duplicate CHANGEADDR event message is sent regardless of link status whereas the setlink changes only generate a notification when the link is up. Not sending a notification when the link is down breaks dhcpcd which only processes hwaddr changes when the link is down. Fixes reported regression: https://bugzilla.kernel.org/show_bug.cgi?id=196355 Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19cxgb4: Update register ranges of T4/T5/T6 adaptersArjun Vynipadath
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: dsa: mv88e6xxx: Enable CMODE config support for 6390XMartin Hundebøll
Commit f39908d3b1c45 ('net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10') added support for setting the CMODE for the 6390X family, but only enabled it for 9290 and 6390 - and left out 6390X. Fix support for setting the CMODE on 6390X also by assigning mv88e6390x_port_set_cmode() to the .port_set_cmode function pointer in mv88e6390x_ops too. Fixes: f39908d3b1c4 ("net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10") Signed-off-by: Martin Hundebøll <mnhu@prevas.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19Merge branch 'netvsc-lockdep-and-related-fixes'David S. Miller
Stephen Hemminger says: ==================== netvsc: lockdep and related fixes These fix sparse and lockdep warnings from netvsc driver. Targeting these at net-next since no actual related failures have been observed in non-debug kernels. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: add rtnl annotations in rndisstephen hemminger
The rndis functions are used when changing device state. Therefore the references from network device to internal state are protected by RTNL mutex. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: save pointer to parent netvsc_device in channel tablestephen hemminger
Keep back pointer in the per-channel data structure to avoid any possible RCU related issues when napi poll is called but netvsc_device is in RCU limbo. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: need rcu_derefence when accessing internal device infostephen hemminger
The netvsc_device structure should be accessed by rcu_dereference in the send path. Change arguments to netvsc_send() to make this easier to do correctly. Remove no longer needed hv_device_to_netvsc_device. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: use ERR_PTR to avoid dereference issuesstephen hemminger
The rndis_filter_device_add function is called both in probe context and RTNL context,and creates the netvsc_device inner structure. It is easier to get the RTNL lock annotation correct if it returns the object directly, rather than implicitly by updating network device private data. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: change logic for change mtu and set_queuesstephen hemminger
Use device detach/attach to ensure that no packets are handed to device during state changes. Call rndis_filter_open/close directly as part of later VF related changes. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: change order of steps in setting queuesstephen hemminger
This fixes the error unwind logic for incorrect number of queues. If netif_set_real_num_XX_queues failed then rndis_filter_device_add would have been called twice. Since input arguments are already ranged checked this is a hypothetical only problem, not possible in actual code. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: add some rtnl_dereference annotationsstephen hemminger
In a couple places RTNL is held, and the netvsc_device pointer is acquired without annotation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19netvsc: force link update after MTU changestephen hemminger
If two MTU changes are in less than update interval (2 seconds), then the netvsc network device may get stuck with no carrier. The netvsc driver debounces link status events which is fine for unsolicited updates, but blocks getting the update after down/up from MTU reinitialization. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19Merge branch 'dev_close-void'David S. Miller
Stephen Hemminger says: ==================== net: make dev_close void Noticed while working on other changes. Why is dev_close() returning int, it should be void. Should also change ndo_close to be void, but that requires more work and someone with more coccinelle foo (smpl) than me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: make dev_close and related functions voidstephen hemminger
There is no useful return value from dev_close. All paths return 0. Change dev_close and helper functions to void. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19hns: remove useless void caststephen hemminger
There is no need to cast away return value of dev_close. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19bluetooth: 6lowpan dev_close never returns errorstephen hemminger
The function dev_close in current kernel will never return an error. Later changes will make it void. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19liquidio: lio_main: remove unnecessary static in setup_io_queues()Gustavo A. R. Silva
Remove unnecessary static on local variables cpu_id_modulus and cpu_id. Such variables are initialized before being used, on every execution path throughout the function. The static has no benefit and, removing it reduces the object file size. This issue was detected using Coccinelle and the following semantic patch: @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; In the following log you can see a significant difference in the object file size. Also, there is a significant difference in the bss segment. This log is the output of the size command, before and after the code change: before: text data bss dec hex filename 78689 15272 27808 121769 1dba9 drivers/net/ethernet/cavium/liquidio/lio_main.o after: text data bss dec hex filename 78667 15128 27680 121475 1da83 drivers/net/ethernet/cavium/liquidio/lio_main.o Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19liquidio: lio_vf_main: remove unnecessary static in setup_io_queues()Gustavo A. R. Silva
Remove unnecessary static on local variables cpu_id_modulus and cpu_id. Such variables are initialized before being used, on every execution path throughout the function. The static has no benefit and, removing it reduces the object file size. This issue was detected using Coccinelle and the following semantic patch: @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; In the following log you can see a significant difference in the object file size. Also, there is a significant difference in the bss segment. This log is the output of the size command, before and after the code change: before: text data bss dec hex filename 55656 10680 576 66912 10560 drivers/net/ethernet/cavium/liquidio/lio_vf_main.o after: text data bss dec hex filename 55796 10536 448 66780 104dc drivers/net/ethernet/cavium/liquidio/lio_vf_main.o Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: ethernet: mediatek: remove useless code in mtk_poll_tx()Gustavo A. R. Silva
Remove useless local variable _condition_ and the code related. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19qlcnic: remove unnecessary static in qlcnic_dump_fw()Gustavo A. R. Silva
Remove unnecessary static on local variable fw_dump_ops. Such variable is initialized before being used, on every execution path throughout the function. The static has no benefit and, removing it reduces the object file size. This issue was detected using Coccinelle and the following semantic patch: @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; In the following log you can see a difference in the object file size. This log is the output of the size command, before and after the code change: before: text data bss dec hex filename 19032 2136 64 21232 52f0 drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.o after: text data bss dec hex filename 19020 2048 0 21068 524c drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.o Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: tulip: remove useless code in tulip_init_one()Gustavo A. R. Silva
Remove useless local variable multiport_cnt and the code related. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19rtlwifi: remove useless codeGustavo A. R. Silva
Remove useless local variables last_read_point and last_txw_point and the code related. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19wireless: airo: remove unnecessary static in writerids()Gustavo A. R. Silva
Remove unnecessary static on local function pointer _writer_. Such pointer is initialized before being used, on every execution path throughout the function. The static has no benefit and, removing it reduces the object file size. This issue was detected using Coccinelle and the following semantic patch: @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; In the following log you can see a significant difference in the object file size. This log is the output of the size command, before and after the code change: before: text data bss dec hex filename 113797 19152 1216 134165 20c15 drivers/net/wireless/cisco/airo.o after: text data bss dec hex filename 113881 19096 1152 134129 20bf1 drivers/net/wireless/cisco/airo.o Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: dsa: unexport dsa_is_port_initializedVivien Didelot
The dsa_is_port_initialized helper is only used by dsa_switch_resume and dsa_switch_suspend, if CONFIG_PM_SLEEP is enabled. Make it static to dsa.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net/packet: remove unused PGV_FROM_VMALLOC definition.Rosen, Rami
This patch removes the definition of PGV_FROM_VMALLOC from af_packet.c. The PGV_FROM_VMALLOC definition was already removed by commit 441c793a5650 ("net: cleanup unused macros in net directory"), and its usage was removed even before by commit c56b4d90123b ("af_packet: remove pgv.flags"); but it was added back by mistake later on, in commit f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation"). Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19dt-binding: ptp: Add SoC compatibility strings for dte ptp clockArun Parameswaran
Add SoC specific compatibility strings to the Broadcom DTE based PTP clock binding document. Fixed the document heading and node name. Fixes: 80d6076140b2 ("dt-binding: ptp: add bindings document for dte based ptp clock") Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19ISDN: eicon: switch to use native bitmapsAndy Shevchenko
Two arrays are clearly bit maps, so, make that explicit by converting to bitmap API and remove custom helpers. Note sig_ind() uses out of boundary bit to (looks like) protect against potential bitmap_empty() checks for the same bitmap. This patch removes that since: 1) that didn't guarantee atomicity anyway; 2) the first operation inside the for-loop is set bit in the bitmap (which effectively makes it non-empty); 3) group_optimization() doesn't utilize possible emptiness of the bitmap in question. Thus, if there is a protection needed it should be implemented properly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19sfc: Add ethtool -m support for QSFP modulesMartin Habets
This also adds support for non-QSFP modules attached to QSFP. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19tcp: adjust tail loss probe timeoutYuchung Cheng
This patch adjusts the timeout formula to schedule the TCP loss probe (TLP). The previous formula uses 2*SRTT or 1.5*RTT + DelayACKMax if only one packet is in flight. It keeps a lower bound of 10 msec which is too large for short RTT connections (e.g. within a data-center). The new formula = 2*RTT + (inflight == 1 ? 200ms : 2ticks) which performs better for short and fast connections. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19llist: clang: introduce member_address_is_nonnull()Alexander Potapenko
Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate until &pos->member != NULL. But when building the kernel with Clang, the compiler assumes &pos->member cannot be NULL if the member's offset is greater than 0 (which would be equivalent to the object being non-contiguous in memory). Therefore the loop condition is always true, and the loops become infinite. To work around this, introduce the member_address_is_nonnull() macro, which casts object pointer to uintptr_t, thus letting the member pointer to be NULL. Signed-off-by: Alexander Potapenko <glider@google.com> Tested-by: Sodagudi Prasad <psodagud@codeaurora.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-19NET: dwmac: Make dwmac reset unconditionalEugeniy Paltsev
Unconditional reset dwmac before HW init if reset controller is present. In existing implementation we reset dwmac only after second module probing: (module load -> unload -> load again [reset happens]) Now we reset dwmac at every module load: (module load [reset happens] -> unload -> load again [reset happens]) Also some reset controllers have only reset callback instead of assert + deassert callbacks pair, so handle this case. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19openvswitch: Optimize operations for OvS flow_stats.Tonghao Zhang
When calling the flow_free() to free the flow, we call many times (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will take up our CPU usage if we call the flow_free() frequently. When we put all packets to userspace via upcall, and OvS will send them back via netlink to ovs_packet_cmd_execute(will call flow_free). The test topo is shown as below. VM01 sends TCP packets to VM02, and OvS forward packtets. When testing, we use perf to report the system performance. VM01 --- OvS-VM --- VM02 Without this patch, perf-top show as below: The flow_free() is 3.02% CPU usage. 4.23% [kernel] [k] _raw_spin_unlock_irqrestore 3.62% [kernel] [k] __do_softirq 3.16% [kernel] [k] __memcpy 3.02% [kernel] [k] flow_free 2.42% libc-2.17.so [.] __memcpy_ssse3_back 2.18% [kernel] [k] copy_user_generic_unrolled 2.17% [kernel] [k] find_next_bit When applied this patch, perf-top show as below: Not shown on the list anymore. 4.11% [kernel] [k] _raw_spin_unlock_irqrestore 3.79% [kernel] [k] __do_softirq 3.46% [kernel] [k] __memcpy 2.73% libc-2.17.so [.] __memcpy_ssse3_back 2.25% [kernel] [k] copy_user_generic_unrolled 1.89% libc-2.17.so [.] _int_malloc 1.53% ovs-vswitchd [.] xlate_actions With this patch, the TCP throughput(we dont use Megaflow Cache + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec (maybe ~10% performance imporve). This patch adds cpumask struct, the cpu_used_mask stores the cpu_id that the flow used. And we only check the flow_stats on the cpu we used, and it is unncessary to check all possible cpu when getting, cleaning, and updating the flow_stats. Adding the cpu_used_mask to sw_flow struct does’t increase the cacheline number. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19openvswitch: Optimize updating for OvS flow_stats.Tonghao Zhang
In the ovs_flow_stats_update(), we only use the node var to alloc flow_stats struct. But this is not a common case, it is unnecessary to call the numa_node_id() everytime. This patch is not a bugfix, but there maybe a small increase. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19net: Zero terminate ifr_name in dev_ifname().David S. Miller
The ifr.ifr_name is passed around and assumed to be NULL terminated. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19wireless: wext: terminate ifr name coming from userspaceLevin, Alexander
ifr name is assumed to be a valid string by the kernel, but nothing was forcing username to pass a valid string. In turn, this would cause panics as we tried to access the string past it's valid memory. Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19Merge branch 'liquidio-lowmem-fixes'David S. Miller
Rick Farrington says: ==================== liquidio: avoid vm low memory crashes This patchset addresses issues brought about by low memory conditions in a VM. These conditions were not seen when the driver was exercised normally. Rather, they were brought about through manual fault injection. They are being included in the interest of hardening the driver against unforeseen circumstances. 1. Fix GPF in octeon_init_droq(); zero the allocated block 'recv_buf_list'. This prevents a GPF trying to access an invalid 'recv_buf_list[i]' entry in octeon_droq_destroy_ring_buffers() if init didn't alloc all entries. 2. Don't dereference a NULL ptr in octeon_droq_destroy_ring_buffers(). 3. For defensive programming, zero the allocated block 'oct->droq' in octeon_setup_output_queues() and 'oct->instr_queue' in octeon_setup_instr_queues(). change log: V1 -> V2: 1. Corrected syntax in 'Subject' lines; no functional or code changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19liquidio: lowmem: init allocated memory to 0Rick Farrington
For defensive programming, zero the allocated block 'oct->droq[0]' in octeon_setup_output_queues() and 'oct->instr_queue[0]' in octeon_setup_instr_queues(). Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19liquidio: lowmem: do not dereference null ptrRick Farrington
Don't dereference a NULL ptr in octeon_droq_destroy_ring_buffers(). Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>