linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-09	mmc: mmci: add dma_finalize callback	Ludovic Barre
	This patch adds dma_finalize callback at mmci_host_ops to allow to call specific variant. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: add dma_start callback	Ludovic Barre
	This patch adds dma_start callback to mmci_host_ops. Create a generic mmci_dma_start function which regroup common action between variant. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: add get_next_data callback	Ludovic Barre
	This patch adds get_next_data callback to mmci_host_ops. Generic mmci_get_next_data factorizes next_cookie check and the host ops call. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: add prepare/unprepare_data callbacks	Ludovic Barre
	This patch adds prepare/unprepare callbacks to mmci_host_ops. Like this mmci_pre/post_request can be generic, mmci_prepare_data and mmci_unprepare_data provide common next_cookie management. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: merge prepare data functions	Ludovic Barre
	This patch merges the prepare data functions. This allows to define a single access to prepare data service. This prepares integration for mmci host ops. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: introduce dma_priv pointer to mmci_host	Ludovic Barre
	-Introduces dma_priv pointer to define specific needs for each dma engine. This patch is needed to prepare sdmmc variant with internal dma which not use dmaengine API. -Moves next cookie to mmci host structure to share same cookie management between all variants. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: mmci: create common mmci_dma_setup/release	Ludovic Barre
	This patch creates a common mmci_dma_setup/release which calls dma_setup/release callbacks of mmci_host_ops and manages common features like use_dma... If there is a fallbacks to pio mode, dma functions must check use_dma. error management: -mmci_dmae_setup fail if Tx and Rx dma channels are not defined -qcom_dma_setup fail if one of both dma channels is not defined, Qcom has no specific resource to release, just mmci dmae resource. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: sdhci-of-arasan: Add Support for AM654 MMC and PHY	Faiz Abbas
	The current arasan sdhci PHY configuration isn't compatible with the PHY on TI's AM654 devices. Therefore, add a new compatible, AM654 specific quirks and a new AM654 specific set_clock function which configures the PHY in a sane way. Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	mmc: sdhci-of-arasan: Add a single data structure to incorporate pdata and ↵	Faiz Abbas
	soc_ctl_map Currently, the driver passes platform data as a global structure and uses the .data of of_device_id to pass the soc_ctl_map. To make the implementation more flexible add a single data structure that incorporates both of the above and pass it in the .data of of_device_id. Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	dt-bindings: mmc: sdhci-of-arasan: Add new compatible for AM654 MMC PHY	Faiz Abbas
	Add a new compatible to use the host controller driver with the MMC PHY on TI's AM654 SOCs Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-10-09	x86/mm: Avoid VLA in pgd_alloc()	Kees Cook
	Arnd Bergmann reported that turning on -Wvla found a new (unintended) VLA usage: arch/x86/mm/pgtable.c: In function 'pgd_alloc': include/linux/build_bug.h:29:45: error: ISO C90 forbids variable length array 'u_pmds' [-Werror=vla] arch/x86/mm/pgtable.c:190:34: note: in expansion of macro 'static_cpu_has' #define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \ ^~~~~~~~~~~~~~ arch/x86/mm/pgtable.c:431:16: note: in expansion of macro 'PREALLOCATED_USER_PMDS' pmd_t *u_pmds[PREALLOCATED_USER_PMDS]; ^~~~~~~~~~~~~~~~~~~~~~ Use the actual size of the array that is used for X86_FEATURE_PTI, which is known at build time, instead of the variable size. [ mingo: Squashed original fix with followup fix to avoid bisection breakage, wrote new changelog. ] Reported-by: Arnd Bergmann <arnd@arndb.de> Original-written-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Fixes: 1be3f247c288 ("x86/mm: Avoid VLA in pgd_alloc()") Link: http://lkml.kernel.org/r/20181008235434.GA35035@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	x86/intel_rdt: Fix initial allocation to consider CDP	Reinette Chatre
	When a new resource group is created it is initialized with a default allocation that considers which portions of cache are currently available for sharing across all resource groups or which portions of cache are currently unused. If a CDP allocation forms part of a resource group that is in exclusive mode then it should be ensured that no new allocation overlaps with any resource that shares the underlying hardware. The current initial allocation does not take this sharing of hardware into account and a new allocation in a resource that shares the same hardware would affect the exclusive resource group. Fix this by considering the allocation of a peer RDT domain - a RDT domain sharing the same hardware - as part of the test to determine which portion of cache is in use and available for use. Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: tony.luck@intel.com Cc: jithu.joseph@intel.com Cc: gavin.hindman@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b1f7ec08b1695be067de416a4128466d49684317.1538603665.git.reinette.chatre@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	x86/intel_rdt: CBM overlap should also check for overlap with CDP peer	Reinette Chatre
	The CBM overlap test is used to manage the allocations of RDT resources where overlap is possible between resource groups. When a resource group is in exclusive mode then there should be no overlap between resource groups. The current overlap test only considers overlap between the same resources, for example, that usage of a RDT_RESOURCE_L2DATA resource in one resource group does not overlap with usage of a RDT_RESOURCE_L2DATA resource in another resource group. The problem with this is that it allows overlap between a RDT_RESOURCE_L2DATA resource in one resource group with a RDT_RESOURCE_L2CODE resource in another resource group - even if both resource groups are in exclusive mode. This is a problem because even though these appear to be different resources they end up sharing the same underlying hardware and thus does not fulfill the user's request for exclusive use of hardware resources. Fix this by including the CDP peer (if there is one) in every CBM overlap test. This does not impact the overlap between resources within the same exclusive resource group that is allowed. Fixes: 49f7b4efa110 ("x86/intel_rdt: Enable setting of exclusive mode") Reported-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jithu Joseph <jithu.joseph@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: tony.luck@intel.com Cc: gavin.hindman@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/e538b7f56f7ca15963dce2e00ac3be8edb8a68e1.1538603665.git.reinette.chatre@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	x86/intel_rdt: Introduce utility to obtain CDP peer	Reinette Chatre
	Introduce a utility that, when provided with a RDT resource and an instance of this RDT resource (a RDT domain), would return pointers to the RDT resource and RDT domain that share the same hardware. This is specific to the CDP resources that share the same hardware. For example, if a pointer to the RDT_RESOURCE_L2DATA resource (struct rdt_resource) and a pointer to an instance of this resource (struct rdt_domain) is provided, then it will return a pointer to the RDT_RESOURCE_L2CODE resource as well as the specific instance that shares the same hardware as the provided rdt_domain. This utility is created in support of the "exclusive" resource group mode where overlap of resource allocation between resource groups need to be avoided. The overlap test need to consider not just the matching resources, but also the resources that share the same hardware. Temporarily mark it as unused in support of patch testing to avoid compile warnings until it is used. Fixes: 49f7b4efa110 ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jithu Joseph <jithu.joseph@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: tony.luck@intel.com Cc: gavin.hindman@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/9b4bc4d59ba2e903b6a3eb17e16ef41a8e7b7c3e.1538603665.git.reinette.chatre@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	Merge branch 'x86/urgent' into x86/cache, to pick up dependent fix	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller
	Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) sk_lookup_[tcp\|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09	mm, sched/numa: Remove remaining traces of NUMA rate-limiting	Srikar Dronamraju
	Remove the leftover pglist_data::numabalancing_migrate_lock and its initialization, we stopped using this lock with: efaffc5e40ae ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration") [ mingo: Rewrote the changelog. ] Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1538824999-31230-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	x86/intel_rdt: Fix out-of-bounds memory access in CBM tests	Reinette Chatre
	While the DOC at the beginning of lib/bitmap.c explicitly states that "The number of valid bits in a given bitmap does _not_ need to be an exact multiple of BITS_PER_LONG.", some of the bitmap operations do indeed access BITS_PER_LONG portions of the provided bitmap no matter the size of the provided bitmap. For example, if bitmap_intersects() is provided with an 8 bit bitmap the operation will access BITS_PER_LONG bits from the provided bitmap. While the operation ensures that these extra bits do not affect the result, the memory is still accessed. The capacity bitmasks (CBMs) are typically stored in u32 since they can never exceed 32 bits. A few instances exist where a bitmap_* operation is performed on a CBM by simply pointing the bitmap operation to the stored u32 value. The consequence of this pattern is that some bitmap_* operations will access out-of-bounds memory when interacting with the provided CBM. This is confirmed with a KASAN test that reports: BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100 and BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90 Fix this by moving any CBM provided to a bitmap operation needing BITS_PER_LONG to an 'unsigned long' variable. [ tglx: Changed related function arguments to unsigned long and got rid of the _cbm extra step ] Fixes: 72d505056604 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility") Fixes: 49f7b4efa110 ("x86/intel_rdt: Enable setting of exclusive mode") Fixes: d9b48c86eb38 ("x86/intel_rdt: Display resource groups' allocations' size in bytes") Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	dma-direct: document the zone selection logic	Christoph Hellwig
	What we are doing here isn't quite obvious, so add a comment explaining it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-09	Merge tag 'perf-core-for-mingo-4.20-20181008' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Fix building the python bindings with python3, which fixes some problems with building with clang on Clear Linux (Eduardo Habkost) - Fix coverity warnings, fixing up some error paths and plugging some temporary small buffer leaks (Sanskriti Sharma) - Adopt a wrapper for strerror_r() for the same reasons as recently for libbpf (Steven Rostedt) - S390 does not support watchpoints in perf test 22', check if that test is supported by the arch. (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09	Merge branch 'perf/urgent' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for matching on ipsec policy already set in the route, from Florian Westphal. 2) Split set destruction into deactivate and destroy phase to make it fit better into the transaction infrastructure, also from Florian. This includes a patch to warn on imbalance when setting the new activate and deactivate interfaces. 3) Release transaction list from the workqueue to remove expensive synchronize_rcu() from configuration plane path. This speeds up configuration plane quite a bit. From Florian Westphal. 4) Add new xfrm/ipsec extension, this new extension allows you to match for ipsec tunnel keys such as source and destination address, spi and reqid. From Máté Eckl and Florian Westphal. 5) Add secmark support, this includes connsecmark too, patches from Christian Gottsche. 6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng. One follow up patch to calm a clang warning for this one, from Nathan Chancellor. 7) Flush conntrack entries based on layer 3 family, from Kristian Evensen. 8) New revision for cgroups2 to shrink the path field. 9) Get rid of obsolete need_conntrack(), as a result from recent demodularization works. 10) Use WARN_ON instead of BUG_ON, from Florian Westphal. 11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian. 12) Remove superfluous check for timeout netlink parser and dump functions in layer 4 conntrack helpers. 13) Unnecessary redundant rcu read side locks in NAT redirect, from Taehee Yoo. 14) Pass nf_hook_state structure to error handlers, patch from Florian Westphal. 15) Remove ->new() interface from layer 4 protocol trackers. Place them in the ->packet() interface. From Florian. 16) Place conntrack ->error() handling in the ->packet() interface. Patches from Florian Westphal. 17) Remove unused parameter in the pernet initialization path, also from Florian. 18) Remove additional parameter to specify layer 3 protocol when looking up for protocol tracker. From Florian. 19) Shrink array of layer 4 protocol trackers, from Florian. 20) Check for linear skb only once from the ALG NAT mangling codebase, from Taehee Yoo. 21) Use rhashtable_walk_enter() instead of deprecated rhashtable_walk_init(), also from Taehee. 22) No need to flush all conntracks when only one single address is gone, from Tan Hu. 23) Remove redundant check for NAT flags in flowtable code, from Taehee Yoo. 24) Use rhashtable_lookup() instead of rhashtable_lookup_fast() from netfilter codebase, since rcu read lock side is already assumed in this path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09	spi: spi-ep93xx: Use dma_data_direction for ep93xx_spi_dma_{finish,prepare}	Nathan Chancellor
	Clang warns when one enumerated type is implicitly converted to another. drivers/spi/spi-ep93xx.c:342:62: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion] nents = dma_map_sg(chan->device->dev, sgt->sgl, sgt->nents, dir); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ ./include/linux/dma-mapping.h:428:58: note: expanded from macro 'dma_map_sg' #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0) ~~~~~~~~~~~~~~~~ ^ drivers/spi/spi-ep93xx.c:348:57: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion] dma_unmap_sg(chan->device->dev, sgt->sgl, sgt->nents, dir); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ ./include/linux/dma-mapping.h:429:62: note: expanded from macro 'dma_unmap_sg' #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0) ~~~~~~~~~~~~~~~~~~ ^ drivers/spi/spi-ep93xx.c:377:56: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion] dma_unmap_sg(chan->device->dev, sgt->sgl, sgt->nents, dir); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ ./include/linux/dma-mapping.h:429:62: note: expanded from macro 'dma_unmap_sg' #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0) ~~~~~~~~~~~~~~~~~~ ^ 3 warnings generated. dma_{,un}map_sg expect an enum of type dma_data_direction but this driver uses dma_transfer_direction for everything. Convert the driver to use dma_data_direction for these two functions. There are two places that strictly require an enum of type dma_transfer_direction: the direction member in struct dma_slave_config and the direction parameter in dmaengine_prep_slave_sg. To avoid using an explicit cast, add a simple function, ep93xx_dma_data_to_trans_dir, to safely map between the two types because they are not 1 to 1 in meaning. Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-09	bpf: fix building without CONFIG_INET	Arnd Bergmann
	The newly added TCP and UDP handling fails to link when CONFIG_INET is disabled: net/core/filter.o: In function `sk_lookup': filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo' filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo' filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established' filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener' filter.c:(.text+0x8068): undefined reference to `udp_table' filter.c:(.text+0x8070): undefined reference to `udp_table' filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup' net/core/filter.o: In function `bpf_sk_release': filter.c:(.text+0x82e8): undefined reference to `sock_gen_put' Wrap the related sections of code in #ifdefs for the config option. Furthermore, sk_lookup() should always have been marked 'static', this also avoids a warning about a missing prototype when building with 'make W=1'. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-09	netfilter: xt_quota: Don't use aligned attribute in sizeof	Nathan Chancellor
	Clang warns: net/netfilter/xt_quota.c:47:44: warning: 'aligned' attribute ignored when parsing type [-Wignored-attributes] BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64)); ^~~~~~~~~~~~~ Use 'sizeof(__u64)' instead, as the alignment doesn't affect the size of the type. Fixes: e9837e55b020 ("netfilter: xt_quota: fix the behavior of xt_quota module") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-08	rxrpc: Fix the packet reception routine	David Howells
	The rxrpc_input_packet() function and its call tree was built around the assumption that data_ready() handler called from UDP to inform a kernel service that there is data to be had was non-reentrant. This means that certain locking could be dispensed with. This, however, turns out not to be the case with a multi-queue network card that can deliver packets to multiple cpus simultaneously. Each of those cpus can be in the rxrpc_input_packet() function at the same time. Fix by adding or changing some structure members: (1) Add peer->rtt_input_lock to serialise access to the RTT buffer. (2) Make conn->service_id into a 32-bit variable so that it can be cmpxchg'd on all arches. (3) Add call->input_lock to serialise access to the Rx/Tx state. Note that although the Rx and Tx states are (almost) entirely separate, there's no point completing the separation and having separate locks since it's a bi-phasal RPC protocol rather than a bi-direction streaming protocol. Data transmission and data reception do not take place simultaneously on any particular call. and making the following functional changes: (1) In rxrpc_input_data(), hold call->input_lock around the core to prevent simultaneous producing of packets into the Rx ring and updating of tracking state for a particular call. (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and check it before checking RXRPC_CALL_PINGING as that's a cheaper test. The bit test and bit clear can then be combined. No further locking is needed here. (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of the ACK packet. The superseded ACK check is then done both before and after the lock is taken. The handing of ackinfo data is split, parsing before the lock is taken and processing with it held. This is keyed on rxMTU being non-zero. Congestion management is also done within the locked section. (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window rotation. The ACKALL packet carries no information and is only really useful after all packets have been transmitted since it's imprecise. (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to prevent calls being simultaneously implicitly ended on two cpus and also to prevent any races with incoming call setup. (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade on a connection. It is only permitted to happen once for a connection. (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside rx->incoming_lock to see if someone else set up the call, connection or peer whilst we were getting there. We can't trust the values from the earlier routing check unless we pin refs on them - which we want to avoid. Further, we need to allow for an incoming call to have its state changed on another CPU between us making it live and us adjusting it because the conn is now in the RXRPC_CONN_SERVICE state. (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access to the RTT buffer. Don't need to lock around setting peer->rtt. For reference, the inventory of state-accessing or state-altering functions used by the packet input procedure is: > rxrpc_input_packet() * PACKET CHECKING * ROUTING > rxrpc_post_packet_to_local() > rxrpc_find_connection_rcu() - uses RCU > rxrpc_lookup_peer_rcu() - uses RCU > rxrpc_find_service_conn_rcu() - uses RCU > idr_find() - uses RCU * CONNECTION-LEVEL PROCESSING - Service upgrade - Can only happen once per conn ! Changed to use cmpxchg > rxrpc_post_packet_to_conn() - Setting conn->hi_serial - Probably safe not using locks - Maybe use cmpxchg * CALL-LEVEL PROCESSING > Old-call checking > rxrpc_input_implicit_end_call() > rxrpc_call_completed() > rxrpc_queue_call() ! Need to take rx->incoming_lock > __rxrpc_disconnect_call() > rxrpc_notify_socket() > rxrpc_new_incoming_call() - Uses rx->incoming_lock for the entire process - Might be able to drop this earlier in favour of the call lock > rxrpc_incoming_call() ! Conflicts with rxrpc_input_implicit_end_call() > rxrpc_send_ping() - Don't need locks to check rtt state > rxrpc_propose_ACK * PACKET DISTRIBUTION > rxrpc_input_call_packet() > rxrpc_input_data() * QUEUE DATA PACKET ON CALL > rxrpc_reduce_call_timer() - Uses timer_reduce() ! Needs call->input_lock() > rxrpc_receiving_reply() ! Needs locking around ack state > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_proto_abort() > rxrpc_input_dup_data() - Fills the Rx buffer - rxrpc_propose_ACK() - rxrpc_notify_socket() > rxrpc_input_ack() * APPLY ACK PACKET TO CALL AND DISCARD PACKET > rxrpc_input_ping_response() - Probably doesn't need any extra locking ! Need READ_ONCE() on call->ping_serial > rxrpc_input_check_for_lost_ack() - Takes call->lock to consult Tx buffer > rxrpc_peer_add_rtt() ! Needs to take a lock (peer->rtt_input_lock) ! Could perhaps manage with cmpxchg() and xadd() instead > rxrpc_input_requested_ack - Consults Tx buffer ! Probably needs a lock > rxrpc_peer_add_rtt() > rxrpc_propose_ack() > rxrpc_input_ackinfo() - Changes call->tx_winsize ! Use cmpxchg to handle change ! Should perhaps track serial number - Uses peer->lock to record MTU specification changes > rxrpc_proto_abort() ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_input_soft_acks() - Consults the Tx buffer > rxrpc_congestion_management() - Modifies the Tx annotations ! Needs call->input_lock() > rxrpc_queue_call() > rxrpc_input_abort() * APPLY ABORT PACKET TO CALL AND DISCARD PACKET > rxrpc_set_call_completion() > rxrpc_notify_socket() > rxrpc_input_ackall() * APPLY ACKALL PACKET TO CALL AND DISCARD PACKET ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_reject_packet() There are some functions used by the above that queue the packet, after which the procedure is terminated: - rxrpc_post_packet_to_local() - local->event_queue is an sk_buff_head - local->processor is a work_struct - rxrpc_post_packet_to_conn() - conn->rx_queue is an sk_buff_head - conn->processor is a work_struct - rxrpc_reject_packet() - local->reject_queue is an sk_buff_head - local->processor is a work_struct And some that offload processing to process context: - rxrpc_notify_socket() - Uses RCU lock - Uses call->notify_lock to call call->notify_rx - Uses call->recvmsg_lock to queue recvmsg side - rxrpc_queue_call() - call->processor is a work_struct - rxrpc_propose_ACK() - Uses call->lock to wrap __rxrpc_propose_ACK() And a bunch that complete a call, all of which use call->state_lock to protect the call state: - rxrpc_call_completed() - rxrpc_set_call_completion() - rxrpc_abort_call() - rxrpc_proto_abort() - Also uses rxrpc_queue_call() Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08	rxrpc: Fix the rxrpc_tx_packet trace line	David Howells
	Fix the rxrpc_tx_packet trace line by storing the where parameter. Fixes: 4764c0da69dc ("rxrpc: Trace packet transmission") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08	rxrpc: Fix connection-level abort handling	David Howells
	Fix connection-level abort handling to cache the abort and error codes properly so that a new incoming call can be properly aborted if it races with the parent connection being aborted by another CPU. The abort_code and error parameters can then be dropped from rxrpc_abort_calls(). Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08	rxrpc: Only take the rwind and mtu values from latest ACK	David Howells
	Move the out-of-order and duplicate ACK packet check to before the call to rxrpc_input_ackinfo() so that the receive window size and MTU size are only checked in the latest ACK packet and don't regress. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08	mtd: maps: gpio-addr-flash: Convert to gpiod	Ricardo Ribalda Delgado
	Convert from legacy gpio API to gpiod. Board files will have to use gpiod_lookup_tables. Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Suggested-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: maps: gpio-addr-flash: Replace array with an integer	Ricardo Ribalda Delgado
	By replacing the array with an integer we can avoid completely the bit comparison loop if the value has not changed (by far the most common case). Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: maps: gpio-addr-flash: Use order instead of size	Ricardo Ribalda Delgado
	By using the order of the window instead of the size, we can replace a lot of expensive division and modulus on the code with simple bit operations. Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: spi-nor: fsl-quadspi: Don't let -EINVAL on the bus	Ahmad Fatoum
	fsl_qspi_get_seqid() may return -EINVAL, but fsl_qspi_init_ahb_read() doesn't check for error codes with the result that -EINVAL could find itself signalled over the bus. In conjunction with the LS1046A SoC's A-009283 errata ("Illegal accesses to SPI flash memory can result in a system hang") this illegal access to SPI flash memory results in a system hang if userspace attempts reading later on. Avoid this by always checking fsl_qspi_get_seqid()'s return value and bail out otherwise. Fixes: e46ecda764dc ("mtd: spi-nor: Add Freescale QuadSPI driver") Cc: stable@vger.kernel.org Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: devices: m25p80: Make sure WRITE_EN is issued before each write	Yogesh Gaur
	Some SPI controllers can't write nor->page_size bytes in a single step because their TX FIFO is too small, but when that happens we should make sure a WRITE_EN command before each write access and READ_SR command after each write access is issued. The core is already taking care of that, so all we have to do here is return the actual number of bytes that were written during the spi_mem_exec_op() operation. Signed-off-by: Yogesh Gaur <yogeshnarayan.gaur@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: spi-nor: Support controllers with limited TX FIFO size	Yogesh Gaur
	Some SPI controllers can't write nor->page_size bytes in a single step because their TX FIFO is too small. Allow nor->write() to return a size that is smaller than the requested write size to gracefully handle this case. Signed-off-by: Yogesh Gaur <yogeshnarayan.gaur@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: spi-nor: cadence-quadspi: Use proper enum for dma_[un]map_single	Nathan Chancellor
	Clang warns when one enumerated type is converted implicitly to another. drivers/mtd/spi-nor/cadence-quadspi.c:962:47: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion] dma_dst = dma_map_single(nor->dev, buf, len, DMA_DEV_TO_MEM); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ./include/linux/dma-mapping.h:428:66: note: expanded from macro 'dma_map_single' ~~~~~~~~~~~~~~~~~~~~ ^ drivers/mtd/spi-nor/cadence-quadspi.c:997:43: warning: implicit conversion from enumeration type 'enum dma_transfer_direction' to different enumeration type 'enum dma_data_direction' [-Wenum-conversion] dma_unmap_single(nor->dev, dma_dst, len, DMA_DEV_TO_MEM); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ./include/linux/dma-mapping.h:429:70: note: expanded from macro 'dma_unmap_single' ~~~~~~~~~~~~~~~~~~~~~~ ^ 2 warnings generated. Use the proper enums from dma_data_direction to satisfy Clang. DMA_FROM_DEVICE = DMA_DEV_TO_MEM = 2 Link: https://github.com/ClangBuiltLinux/linux/issues/108 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: spi-nor: parse SFDP Sector Map Parameter Table	Tudor Ambarus
	Add support for the SFDP (JESD216B) Sector Map Parameter Table. This table is optional, but when available, we parse it to identify the location and size of sectors within the main data array of the flash memory device and to identify which Erase Types are supported by each sector. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	mtd: spi-nor: add support to non-uniform SFDP SPI NOR flash memories	Tudor Ambarus
	Based on Cyrille Pitchen's patch https://lkml.org/lkml/2017/3/22/935. This patch is a transitional patch in introducing the support of SFDP SPI memories with non-uniform erase sizes like Spansion s25fs512s. Non-uniform erase maps will be used later when initialized based on the SFDP data. Introduce the memory erase map which splits the memory array into one or many erase regions. Each erase region supports up to 4 erase types, as defined by the JEDEC JESD216B (SFDP) specification. To be backward compatible, the erase map of uniform SPI NOR flash memories is initialized so it contains only one erase region and this erase region supports only one erase command. Hence a single size is used to erase any sector/block of the memory. Besides, since the algorithm used to erase sectors on non-uniform SPI NOR flash memories is quite expensive, when possible, the erase map is tuned to come back to the uniform case. The 'erase with the best command, move forward and repeat' approach was suggested by Cristian Birsan in a brainstorm session, so: Suggested-by: Cristian Birsan <cristian.birsan@microchip.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-08	filesystem-dax: Fix dax_layout_busy_page() livelock	Dan Williams
	In the presence of multi-order entries the typical pagevec_lookup_entries() pattern may loop forever: while (index < end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { ... for (i = 0; i < pagevec_count(&pvec); i++) { index = indices[i]; ... } index++; /* BUG */ } The loop updates 'index' for each index found and then increments to the next possible page to continue the lookup. However, if the last entry in the pagevec is multi-order then the next possible page index is more than 1 page away. Fix this locally for the filesystem-dax case by checking for dax-multi-order entries. Going forward new users of multi-order entries need to be similarly careful, or we need a generic way to report the page increment in the radix iterator. Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax...") Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-08	sunvdc: Remove VLA usage	Kees Cook
	In the quest to remove all stack VLA usage from the kernel[1], this moves the math for cookies calculation into macros and allocates a fixed size array for the maximum number of cookies and adds a runtime sanity check. (Note that the size was always fixed, but just hidden from the compiler.) [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	tools lib traceevent, perf tools: Move struct tep_handler definition in a ↵	Tzvetomir Stoyanov
	local header file As traceevent is going to be transferred into a proper library, its local data should be protected from the library users. This patch encapsulates struct tep_handler into a local header, not visible outside of the library. It implements also a bunch of new APIs, which library users can use to access tep_handler members. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linux trace devel <linux-trace-devel@vger.kernel.org> Cc: tzvetomir stoyanov <tstoyanov@vmware.com> Link: http://lkml.kernel.org/r/20181005122225.522155df@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-08	dpaa2-eth: Don't account Tx confirmation frames on NAPI poll	Ioana Ciocoi Radulescu
	Until now, both Rx and Tx confirmation frames handled during NAPI poll were counted toward the NAPI budget. However, Tx confirmations are lighter to process than Rx frames, which can skew the amount of work actually done inside one NAPI cycle. Update the code to only count Rx frames toward the NAPI budget and set a separate threshold on how many Tx conf frames can be processed in one poll cycle. The NAPI poll routine stops when either the budget is consumed by Rx frames or when Tx confirmation frames reach this threshold. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: mscc: ocelot: remove set but not used variable 'phy_mode'	YueHaibing
	Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/mscc/ocelot_board.c: In function 'mscc_ocelot_probe': drivers/net/ethernet/mscc/ocelot_board.c:262:17: warning: variable 'phy_mode' set but not used [-Wunused-but-set-variable] enum phy_mode phy_mode; It never used since introduction in commit 71e32a20cfbf ("net: mscc: ocelot: make use of SerDes PHYs for handling their configuration") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	Merge branch 'more-pmtu-selftests'	David S. Miller
	Sabrina Dubroca says: ==================== selftests: add more PMTU tests The current selftests for PMTU cover VTI tunnels, but there's nothing about the generation and handling of PMTU exceptions by intermediate routers. This series adds and improves existing helpers, then adds IPv4 and IPv6 selftests with a setup involving an intermediate router. Joint work with Stefano Brivio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	selftests: pmtu: add basic IPv4 and IPv6 PMTU tests	Sabrina Dubroca
	Commit d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") and follow-ups introduced some PMTU tests, but they all rely on tunneling, and, particularly, on VTI. These new tests use simple routing to exercise the generation and update of PMTU exceptions in IPv4 and IPv6. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	selftests: pmtu: extend MTU parsing helper to locked MTU	Sabrina Dubroca
	The mtu_parse helper introduced in commit f2c929feeccd ("selftests: pmtu: Factor out MTU parsing helper") can only handle "mtu 1234", but not "mtu lock 1234". Extend it, so that we can do IPv4 tests with PMTU smaller than net.ipv4.route.min_pmtu Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	selftests: pmtu: Introduce check_pmtu_value()	Stefano Brivio
	Introduce and use a function that checks PMTU values against expected values and logs error messages, to remove some clutter. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	isdn/gigaset/isocdata: mark expected switch fall-through	Gustavo A. R. Silva
	Notice that in this particular case, I replaced the "--v-- fall through --v--" comment with a proper "fall through", which is what GCC is expecting to find. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	Merge branch 'rtnetlink-Add-support-for-rigid-checking-of-data-in-dump-request'	David S. Miller
	David Ahern says: ==================== rtnetlink: Add support for rigid checking of data in dump request There are many use cases where a user wants to influence what is returned in a dump for some rtnetlink command: one is wanting data for a different namespace than the one the request is received and another is limiting the amount of data returned in the dump to a specific set of interest to userspace, reducing the cpu overhead of both kernel and userspace. Unfortunately, the kernel has historically not been strict with checking for the proper header or checking the values passed in the header. This lenient implementation has allowed iproute2 and other packages to pass any struct or data in the dump request as long as the family is the first byte. For example, ifinfomsg struct is used by iproute2 for all generic dump requests - links, addresses, routes and rules when it is really only valid for link requests. There is 1 is example where the kernel deals with the wrong struct: link dumps after VF support was added. Older iproute2 was sending rtgenmsg as the header instead of ifinfomsg so a patch was added to try and detect old userspace vs new: e5eca6d41f53 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0") The latest example is Christian's patch set wanting to return addresses for a target namespace. It guesses the header struct is an ifaddrmsg and if it guesses wrong a netlink warning is generated in the kernel log on every address dump which is unacceptable. Another example where the kernel is a bit lenient is route dumps: iproute2 can send either a request with either ifinfomsg or a rtmsg as the header struct, yet the kernel always treats the header as an rtmsg (see inet_dump_fib and rtm_flags check). The header inconsistency impacts the ability to add kernel side filters for route dumps - a necessary feature for scale setups with 100k+ routes. How to resolve the problem of not breaking old userspace yet be able to move forward with new features such as kernel side filtering which are crucial for efficient operation at high scale? This patch set addresses the problem by adding a new socket flag, NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to request strict checking of headers and attributes on dump requests and hence unlock the ability to use kernel side filters as they are added. Kernel side, the dump handlers are updated to verify the message contains at least the expected header struct: RTM_GETLINK: ifinfomsg RTM_GETADDR: ifaddrmsg RTM_GETMULTICAST: ifaddrmsg RTM_GETANYCAST: ifaddrmsg RTM_GETADDRLABEL: ifaddrlblmsg RTM_GETROUTE: rtmsg RTM_GETSTATS: if_stats_msg RTM_GETNEIGH: ndmsg RTM_GETNEIGHTBL: ndtmsg RTM_GETNSID: rtgenmsg RTM_GETRULE: fib_rule_hdr RTM_GETNETCONF: netconfmsg RTM_GETMDB: br_port_msg And then every field in the header struct should be 0 with the exception of the family. There are a few exceptions to this rule where the kernel already influences the data returned by values in the struct. Next the message should not contain attributes unless the kernel implements filtering for it. Any unexpected data causes the dump to fail with EINVAL. If the new flag is honored by the kernel and the dump contents adjusted by any data passed in the request, the dump handler can set the NLM_F_DUMP_FILTERED flag in the netlink message header. For old userspace on new kernel there is no impact as all checks are wrapped in a check on the new strict flag. For new userspace on old kernel, the data in the headers and any appended attributes are silently ignored though the setsockopt failing is the clue to userspace the feature is not supported. New userspace on new kernel gets the requested data dump. iproute2 patches can be found here: https://github.com/dsahern/iproute2 dump-enhancements Major changes since v1 - inner header is supposed to be 4-bytes aligned. So for dumps that should not have attributes appended changed the check to use: if (nlmsg_attrlen(nlh, sizeof(hdr))) Only impacts patches with headers that are not multiples of 4-bytes (rtgenmsg, netconfmsg), but applied the change to all patches not calling nlmsg_parse for consistency. - Added nlmsg_parse_strict and nla_parse_strict for tighter control on attribute parsing. There should be no unknown attribute types or extra bytes. - Moved validation to a helper in most cases Changes since rfc-v2 - dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per Jiri's objections - changed the opt-in uapi from a netlink message flag to a socket flag. setsockopt provides an api for userspace to definitively know if the kernel supports strict checking on dumps. - re-ordered patches to peel off the extack on dumps if needed to keep this set size within limits - misc cleanups in patches based on testing ==================== Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	rtnetlink: Update rtnl_fdb_dump for strict data checking	David Ahern
	Update rtnl_fdb_dump for strict data checking. If the flag is set, the dump request is expected to have an ndmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the NDA_IFINDEX and NDA_MASTER attributes are supported. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>