summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-12-09rculist: Describe variadic macro argument in a Sphinx-compatible wayJonathan Neuschäfer
Without this patch, Sphinx shows "variable arguments" as the description of the cond argument, rather than the intended description, and prints the following warnings: ./include/linux/rculist.h:374: warning: Excess function parameter 'cond' description in 'list_for_each_entry_rcu' ./include/linux/rculist.h:651: warning: Excess function parameter 'cond' description in 'hlist_for_each_entry_rcu' Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09rcu: Enable tick for nohz_full CPUs slow to provide expedited QSPaul E. McKenney
An expedited grace period can be stalled by a nohz_full CPU looping in kernel context. This possibility is currently handled by some carefully crafted checks in rcu_read_unlock_special() that enlist help from ksoftirqd when permitted by the scheduler. However, it is exactly these checks that require the scheduler avoid holding any of its rq or pi locks across rcu_read_unlock() without also having held them across the entire RCU read-side critical section. It would therefore be very nice if expedited grace periods could handle nohz_full CPUs looping in kernel context without such checks. This commit therefore adds code to the expedited grace period's wait and cleanup code that forces the scheduler-clock interrupt on for CPUs that fail to quickly supply a quiescent state. "Quickly" is currently a hard-coded single-jiffy delay. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09rcu: Fix data-race due to atomic_t copy-by-valueMarco Elver
This fixes a data-race where `atomic_t dynticks` is copied by value. The copy is performed non-atomically, resulting in a data-race if `dynticks` is updated concurrently. This data-race was found with KCSAN: ================================================================== BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3: atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline] rcu_dynticks_snap kernel/rcu/tree.c:310 [inline] dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984 force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286 rcu_gp_fqs kernel/rcu/tree.c:1601 [inline] rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653 rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799 kthread+0x1b5/0x200 kernel/kthread.c:255 <snip> read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7: rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline] rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870 irq_enter+0x5/0x50 kernel/softirq.c:347 <snip> Reported by Kernel Concurrency Sanitizer on: CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: kblockd blk_mq_run_work_fn ================================================================== Signed-off-by: Marco Elver <elver@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09soc: fsl: qe: avoid IS_ERR_VALUE in ucc_fast.cRasmus Villemoes
When building this on a 64-bit platform gcc rightly warns that the error checking is broken (-ENOMEM stored in an u32 does not compare greater than (unsigned long)-MAX_ERRNO). Instead, change the ucc_fast_[tr]x_virtual_fifo_base_offset members to s32 and use an ordinary check-for-negative. Also, this avoids treating 0 as "this cannot have been returned from qe_muram_alloc() so don't free it". Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: avoid IS_ERR_VALUE in ucc_slow.cRasmus Villemoes
When trying to build this for a 64-bit platform, one gets warnings from using IS_ERR_VALUE on something which is not sizeof(long). Instead, change the various *_offset fields to store a signed integer, and simply check for a negative return from qe_muram_alloc(). Since qe_muram_free() now accepts and ignores a negative argument, we only need to make sure these fields are initialized with -1, and we can just unconditionally call qe_muram_free() in ucc_slow_free(). Note that the error case for us_pram_offset failed to set that field to 0 (which, as noted earlier, is anyway a bogus sentinel value). Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: make cpm_muram_free() return voidRasmus Villemoes
Nobody uses the return value from cpm_muram_free, and functions that free resources usually return void. One could imagine a use for a "how much have I allocated" a la ksize(), but knowing how much one had access to after the fact is useless. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: change return type of cpm_muram_alloc() to s32Rasmus Villemoes
There are a number of problems with cpm_muram_alloc() and its callers. Most callers assign the return value to some variable and then use IS_ERR_VALUE to check for allocation failure. However, when that variable is not sizeof(long), this leads to warnings - and it is indeed broken to do e.g. u32 foo = cpm_muram_alloc(); if (IS_ERR_VALUE(foo)) on a 64-bit platform, since the condition foo >= (unsigned long)-ENOMEM is tautologically false. There are also callers that ignore the possibility of error, and then there are those that check for error by comparing the return value to 0... One could fix that by changing all callers to store the return value temporarily in an "unsigned long" and test that. However, use of IS_ERR_VALUE() is error-prone and should be restricted to things which are inherently long-sized (stuff in pt_regs etc.). Instead, let's aim for changing to the standard kernel style int foo = cpm_muram_alloc(); if (foo < 0) deal_with_it() some->where = foo; Changing the return type from unsigned long to s32 (aka signed int) doesn't change the value that gets stored into any of the callers' variables except if the caller was storing the result in a u64 _and_ the allocation failed, so in itself this patch should be a no-op. Another problem with cpm_muram_alloc() is that it can certainly validly return 0 - and except if some cpm_muram_alloc_fixed() call interferes, the very first cpm_muram_alloc() call will return just that. But that shows that both ucc_slow_free() and ucc_fast_free() are buggy, since they assume that a value of 0 means "that field was never allocated". We'll later change cpm_muram_free() to accept (and ignore) a negative offset, so callers can use a sentinel of -1 instead of 0 and just unconditionally call cpm_muram_free(). Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc/fsl/qe/qe.h: update include path for cpm.hRasmus Villemoes
asm/cpm.h under arch/powerpc is now just a wrapper for including soc/fsl/cpm.h. In order to make the qe.h header usable on other architectures, use the latter path directly. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: move cpm.h from powerpc/include/asm to include/soc/fslRasmus Villemoes
Some drivers, e.g. ucc_uart, need definitions from cpm.h. In order to allow building those drivers for non-ppc based SOCs, move the header to include/soc/fsl. For now, leave a trivial wrapper at the old location so drivers can be updated one by one. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: merge qe_ic.h headers into qe_ic.cRasmus Villemoes
The public qe_ic.h header is no longer included by anything but qe_ic.c. Merge both headers into qe_ic.c, and drop the unused constants. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: make qe_ic_get_{low,high}_irq staticRasmus Villemoes
These are only called from within qe_ic.c, so make them static. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: remove unused qe_ic_set_* functionsRasmus Villemoes
There are no current callers of these functions, and they use the ppc-specific virq_to_hw(). So removing them gets us one step closer to building QE support for ARM. If the functionality is ever actually needed, the code can be dug out of git and then adapted to work on all architectures, but for future reference please note that I believe qe_ic_set_priority is buggy: The "priority < 4" should be "priority <= 4", and in the else branch 24 should be replaced by 28, at least if I'm reading the data sheet right. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: move qe_ic_cascade_* functions to qe_ic.cRasmus Villemoes
These functions are only ever called through a function pointer, and therefore it makes no sense for them to be "static inline" - gcc has no choice but to emit a copy in each translation unit that takes the address of one of these. Since they are now only referenced from qe_ic.c, just make them local to that file. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: move calls of qe_ic_init out of arch/powerpc/Rasmus Villemoes
Having to call qe_ic_init() from platform-specific code makes it awkward to allow building the QE drivers for ARM. It's also a needless duplication of code, and slightly error-prone: Instead of the caller needing to know the details of whether the QUICC Engine High and QUICC Engine Low are actually the same interrupt (see e.g. the machine_is() in mpc85xx_mds_qeic_init), just let the init function choose the appropriate handlers after it has parsed the DT and figured it out. If the two interrupts are distinct, use separate handlers, otherwise use the handler which first checks the CHIVEC register (for the high priority interrupts), then the CIVEC. All existing callers pass 0 for flags, so continue to do that from the new single caller. Later cleanups will remove that argument from qe_ic_init and simplify the body, as well as make qe_ic_init into a proper init function for an IRQCHIP_DECLARE, eliminating the need to manually look up the fsl,qe-ic node. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: use qe_ic_cascade_{low, high}_mpic also on 83xxRasmus Villemoes
The *_ipic and *_mpic handlers are almost identical - the only difference is that the latter end with an unconditional chip->irq_eoi() call. Since IPIC does not have ->irq_eoi, we can reduce some code duplication by calling irq_eoi conditionally. This is similar to what is already done in mpc8xxx_gpio_irq_cascade(). This leaves the functions slightly misnamed, but that will be fixed in a subsequent patch. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: introduce qe_io{read,write}* wrappersRasmus Villemoes
The QUICC engine drivers use the powerpc-specific out_be32() etc. In order to allow those drivers to build for other architectures, those must be replaced by iowrite32be(). However, on powerpc, out_be32() is a simple inline function while iowrite32be() is out-of-line. So in order not to introduce a performance regression on powerpc when making the drivers work on other architectures, introduce qe_io* helpers. Also define the qe_{clr,set,clrset}bits* helpers in terms of these new macros. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09soc: fsl: qe: rename qe_(clr/set/clrset)bit* helpersRasmus Villemoes
Make it clear that these operate on big-endian registers (i.e. use the iowrite*be primitives) before we introduce more uses of them and allow the QE drivers to be built for platforms other than ppc32. Reviewed-by: Timur Tabi <timur@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-12-09Merge tag 'printk-for-5.5-pr-warning-removal' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull pr_warning() removal from Petr Mladek. - Final removal of the unused pr_warning() alias. You're supposed to use just "pr_warn()" in the kernel. * tag 'printk-for-5.5-pr-warning-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: checkpatch: Drop pr_warning check printk: Drop pr_warning definition Fix up for "printk: Drop pr_warning definition" workqueue: Use pr_warn instead of pr_warning
2019-12-09ASoC: SOF: nocodec: Amend arguments for sof_nocodec_setup()Ranjani Sridharan
Set the drv_name and tplg_filename for nocodec machine driver in sof_machine_check(). This means the sof_nocodec_setup() does not need the mach, plat_data or desc arguments any longer. Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191204211556.12671-14-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-12-09ASoC: SOF: Remove unused drv_name in sof_pdataDaniel Baluta
This field is only set but never used. Let's remove it to make code cleaner. Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191204211556.12671-13-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-12-09ASoC: SOF: remove nocodec_fw_filenameRanjani Sridharan
Remove nocodec_fw_filename from struct sof_dev_desc as it is not longer needed. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191204211556.12671-12-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-12-09ASoC: SOF: Introduce default_fw_filename member in sof_dev_descRanjani Sridharan
Currently the FW filename is obtained from the ACPI matching table when determining which machine driver to use. In preparation for making the machine driver ACPI match optional for Device Tree platforms and moving the machine driver selection out of the SOF core, this patch introduces the default_fw_filename member in struct sof_dev_desc. Once the machine driver selection is moved out of SOF core, the nocodec_fw_filename will become obsolete and will be removed. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191204211556.12671-8-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09net/x25: add new state X25_STATE_5Martin Schiller
This is needed, because if the flag X25_ACCPT_APPRV_FLAG is not set on a socket (manual call confirmation) and the channel is cleared by remote before the manual call confirmation was sent, this situation needs to be handled. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09neighbour: remove neigh_cleanup() methodEric Dumazet
neigh_cleanup() has not been used for seven years, and was a wrong design. Messing with shared pointer in bond_neigh_init() without proper memory barriers would at least trigger syzbot complains eventually. It is time to remove this stuff. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09netfilter: uapi: Avoid undefined left-shift in xt_sctp.hPhil Sutter
With 'bytes(__u32)' being 32, a left-shift of 31 may happen which is undefined for the signed 32-bit value 1. Avoid this by declaring 1 as unsigned. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09moduleparam: fix kerneldocFabien Dessenne
Document missing @arg in xxx_param_cb(). Describe all parameters of module_param_[named_]unsafe() and all *_param_cb() to make ./scripts/kernel-doc happy. Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-12-09drm/bridge: Clarify the atomic enable/disable hooks semanticsBoris Brezillon
The [pre_]enable/[post_]disable hooks are passed the old atomic state. Update the doc and rename the arguments to make it clear. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-8-boris.brezillon@collabora.com
2019-12-09drm/bridge: Add the drm_bridge_get_prev_bridge() helperBoris Brezillon
The drm_bridge_get_prev_bridge() helper will be useful for bridge drivers that want to do bus format negotiation with their neighbours. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-7-boris.brezillon@collabora.com
2019-12-09drm/bridge: Add the drm_for_each_bridge_in_chain() helperBoris Brezillon
To iterate over all bridges attached to a specific encoder. Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-6-boris.brezillon@collabora.com
2019-12-09drm/bridge: Make the bridge chain a double-linked listBoris Brezillon
So that each element in the chain can easily access its predecessor. This will be needed to support bus format negotiation between elements of the bridge chain. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-5-boris.brezillon@collabora.com
2019-12-09drm: Stop accessing encoder->bridge directlyBoris Brezillon
We are about to replace the single-linked bridge list by a double-linked one based on list.h, leading to the suppression of the encoder->bridge field. But before we can do that we must provide a drm_bridge_chain_get_first_bridge() bridge helper and patch all drivers and core helpers to use it instead of directly accessing encoder->bridge. Note that we still have 2 drivers (VC4 and Exynos) manipulating the encoder->bridge field directly because they need to cut the bridge chain in order to control the enable/disable sequence. This is definitely not something we want to encourage, so let's keep those 2 oddities around until we find a better solution. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-4-boris.brezillon@collabora.com
2019-12-09drm/bridge: Introduce drm_bridge_get_next_bridge()Boris Brezillon
And use it in drivers accessing the bridge->next field directly. This is part of our attempt to make the bridge chain a double-linked list based on the generic list helpers. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-3-boris.brezillon@collabora.com
2019-12-09drm/bridge: Rename bridge helpers targeting a bridge chainBoris Brezillon
Change the prefix of bridge helpers targeting a bridge chain from drm_bridge_ to drm_bridge_chain_ to better reflect the fact that the operation will happen on all elements of chain, starting at the bridge passed in argument. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203141515.3597631-2-boris.brezillon@collabora.com
2019-12-09xfrm: add espintcp (RFC 8229)Sabrina Dubroca
TCP encapsulation of IKE and IPsec messages (RFC 8229) is implemented as a TCP ULP, overriding in particular the sendmsg and recvmsg operations. A Stream Parser is used to extract messages out of the TCP stream using the first 2 bytes as length marker. Received IKE messages are put on "ike_queue", waiting to be dequeued by the custom recvmsg implementation. Received ESP messages are sent to XFRM, like with UDP encapsulation. Some of this code is taken from the original submission by Herbert Xu. Currently, only IPv4 is supported, like for UDP encapsulation. Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-12-09xfrm: introduce xfrm_trans_queue_netSabrina Dubroca
This will be used by TCP encapsulation to write packets to the encap socket without holding the user socket's lock. Without this reinjection, we're already holding the lock of the user socket, and then try to lock the encap socket as well when we enqueue the encrypted packet. While at it, add a BUILD_BUG_ON like we usually do for skb->cb, since it's missing for struct xfrm_trans_cb. Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-12-09net: add queue argument to __skb_wait_for_more_packets and ↵Sabrina Dubroca
__skb_{,try_}recv_datagram This will be used by ESP over TCP to handle the queue of IKE messages. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-12-09PM / devfreq: Use PM QoS for sysfs min/max_freqLeonard Crestez
Switch the handling of min_freq and max_freq from sysfs to use the dev_pm_qos_request interface. Since PM QoS handles frequencies as kHz this change reduces the precision of min_freq and max_freq. This shouldn't introduce problems because frequencies which are not an integer number of kHz are likely not an integer number of Hz either. Try to ensure compatibility by rounding min values down and rounding max values up. Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> [cw00.choi: Return -EAGAIN instead of -EINVAL if dev_pm_qos is inactive] Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2019-12-09PM / devfreq: Add PM QoS supportLeonard Crestez
Register notifiers with the PM QoS framework in order to respond to requests for DEV_PM_QOS_MIN_FREQUENCY and DEV_PM_QOS_MAX_FREQUENCY. No notifiers are added by this patch but PM QoS constraints can be imposed externally (for example from other devices). Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2019-12-08net: WireGuard secure network tunnelJason A. Donenfeld
WireGuard is a layer 3 secure networking tunnel made specifically for the kernel, that aims to be much simpler and easier to audit than IPsec. Extensive documentation and description of the protocol and considerations, along with formal proofs of the cryptography, are available at: * https://www.wireguard.com/ * https://www.wireguard.com/papers/wireguard.pdf This commit implements WireGuard as a simple network device driver, accessible in the usual RTNL way used by virtual network drivers. It makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of networking subsystem APIs. It has a somewhat novel multicore queueing system designed for maximum throughput and minimal latency of encryption operations, but it is implemented modestly using workqueues and NAPI. Configuration is done via generic Netlink, and following a review from the Netlink maintainer a year ago, several high profile userspace tools have already implemented the API. This commit also comes with several different tests, both in-kernel tests and out-of-kernel tests based on network namespaces, taking profit of the fact that sockets used by WireGuard intentionally stay in the namespace the WireGuard interface was originally created, exactly like the semantics of userspace tun devices. See wireguard.com/netns/ for pictures and examples. The source code is fairly short, but rather than combining everything into a single file, WireGuard is developed as cleanly separable files, making auditing and comprehension easier. Things are laid out as follows: * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the cryptographic aspects of the protocol, and are mostly data-only in nature, taking in buffers of bytes and spitting out buffers of bytes. They also handle reference counting for their various shared pieces of data, like keys and key lists. * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for ratelimiting certain types of cryptographic operations in accordance with particular WireGuard semantics. * allowedips.[ch], peerlookup.[ch]: The main lookup structures of WireGuard, the former being trie-like with particular semantics, an integral part of the design of the protocol, and the latter just being nice helper functions around the various hashtables we use. * device.[ch]: Implementation of functions for the netdevice and for rtnl, responsible for maintaining the life of a given interface and wiring it up to the rest of WireGuard. * peer.[ch]: Each interface has a list of peers, with helper functions available here for creation, destruction, and reference counting. * socket.[ch]: Implementation of functions related to udp_socket and the general set of kernel socket APIs, for sending and receiving ciphertext UDP packets, and taking care of WireGuard-specific sticky socket routing semantics for the automatic roaming. * netlink.[ch]: Userspace API entry point for configuring WireGuard peers and devices. The API has been implemented by several userspace tools and network management utility, and the WireGuard project distributes the basic wg(8) tool. * queueing.[ch]: Shared function on the rx and tx path for handling the various queues used in the multicore algorithms. * send.c: Handles encrypting outgoing packets in parallel on multiple cores, before sending them in order on a single core, via workqueues and ring buffers. Also handles sending handshake and cookie messages as part of the protocol, in parallel. * receive.c: Handles decrypting incoming packets in parallel on multiple cores, before passing them off in order to be ingested via the rest of the networking subsystem with GRO via the typical NAPI poll function. Also handles receiving handshake and cookie messages as part of the protocol, in parallel. * timers.[ch]: Uses the timer wheel to implement protocol particular event timeouts, and gives a set of very simple event-driven entry point functions for callers. * main.c, version.h: Initialization and deinitialization of the module. * selftest/*.h: Runtime unit tests for some of the most security sensitive functions. * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing script using network namespaces. This commit aims to be as self-contained as possible, implementing WireGuard as a standalone module not needing much special handling or coordination from the network subsystem. I expect for future optimizations to the network stack to positively improve WireGuard, and vice-versa, but for the time being, this exists as intentionally standalone. We introduce a menu option for CONFIG_WIREGUARD, as well as providing a verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: David Miller <davem@davemloft.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-08fs: Delete timespec64_trunc()Deepa Dinamani
There are no more callers to the function remaining. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: LOOKUP_IN_ROOT: chroot-like scoped resolutionAleksa Sarai
/* Background. */ Container runtimes or other administrative management processes will often interact with root filesystems while in the host mount namespace, because the cost of doing a chroot(2) on every operation is too prohibitive (especially in Go, which cannot safely use vfork). However, a malicious program can trick the management process into doing operations on files outside of the root filesystem through careful crafting of symlinks. Most programs that need this feature have attempted to make this process safe, by doing all of the path resolution in userspace (with symlinks being scoped to the root of the malicious root filesystem). Unfortunately, this method is prone to foot-guns and usually such implementations have subtle security bugs. Thus, what userspace needs is a way to resolve a path as though it were in a chroot(2) -- with all absolute symlinks being resolved relative to the dirfd root (and ".." components being stuck under the dirfd root). It is much simpler and more straight-forward to provide this functionality in-kernel (because it can be done far more cheaply and correctly). More classical applications that also have this problem (which have their own potentially buggy userspace path sanitisation code) include web servers, archive extraction tools, network file servers, and so on. /* Userspace API. */ LOOKUP_IN_ROOT will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_IN_ROOT applies to all components of the path. With LOOKUP_IN_ROOT, any path component which attempts to cross the starting point of the pathname lookup (the dirfd passed to openat) will remain at the starting point. Thus, all absolute paths and symlinks will be scoped within the starting point. There is a slight change in behaviour regarding pathnames -- if the pathname is absolute then the dirfd is still used as the root of resolution of LOOKUP_IN_ROOT is specified (this is to avoid obvious foot-guns, at the cost of a minor API inconsistency). As with LOOKUP_BENEATH, Jann's security concern about ".."[1] applies to LOOKUP_IN_ROOT -- therefore ".." resolution is blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely. Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled. /* Testing. */ LOOKUP_IN_ROOT is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolutionAleksa Sarai
/* Background. */ There are many circumstances when userspace wants to resolve a path and ensure that it doesn't go outside of a particular root directory during resolution. Obvious examples include archive extraction tools, as well as other security-conscious userspace programs. FreeBSD spun out O_BENEATH from their Capsicum project[1,2], so it also seems reasonable to implement similar functionality for Linux. This is part of a refresh of Al's AT_NO_JUMPS patchset[3] (which was a variation on David Drysdale's O_BENEATH patchset[4], which in turn was based on the Capsicum project[5]). /* Userspace API. */ LOOKUP_BENEATH will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_BENEATH applies to all components of the path. With LOOKUP_BENEATH, any path component which attempts to "escape" the starting point of the filesystem lookup (the dirfd passed to openat) will yield -EXDEV. Thus, all absolute paths and symlinks are disallowed. Due to a security concern brought up by Jann[6], any ".." path components are also blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely. Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled. /* Testing. */ LOOKUP_BENEATH is tested as part of the openat2(2) selftests. [1]: https://reviews.freebsd.org/D2808 [2]: https://reviews.freebsd.org/D17547 [3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ [6]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: David Drysdale <drysdale@google.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: LOOKUP_NO_XDEV: block mountpoint crossingAleksa Sarai
/* Background. */ The need to contain path operations within a mountpoint has been a long-standing usecase that userspace has historically implemented manually with liberal usage of stat(). find, rsync, tar and many other programs implement these semantics -- but it'd be much simpler to have a fool-proof way of refusing to open a path if it crosses a mountpoint. This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]). /* Userspace API. */ LOOKUP_NO_XDEV will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_XDEV applies to all components of the path. With LOOKUP_NO_XDEV, any path component which crosses a mount-point during path resolution (including "..") will yield an -EXDEV. Absolute paths, absolute symlinks, and magic-links will only yield an -EXDEV if the jump involved changing mount-points. /* Testing. */ LOOKUP_NO_XDEV is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [2]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [3]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: David Drysdale <drysdale@google.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: LOOKUP_NO_MAGICLINKS: block magic-link resolutionAleksa Sarai
/* Background. */ There has always been a special class of symlink-like objects in procfs (and a few other pseudo-filesystems) which allow for non-lexical resolution of paths using nd_jump_link(). These "magic-links" do not follow traditional mount namespace boundaries, and have been used consistently in container escape attacks because they can be used to trick unsuspecting privileged processes into resolving unexpected paths. It is also non-trivial for userspace to unambiguously avoid resolving magic-links, because they do not have a reliable indication that they are a magic-link (in order to verify them you'd have to manually open the path given by readlink(2) and then verify that the two file descriptors reference the same underlying file, which is plagued with possible race conditions or supplementary attack scenarios). It would therefore be very helpful for userspace to be able to avoid these symlinks easily, thus hopefully removing a tool from attackers' toolboxes. This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]). /* Userspace API. */ LOOKUP_NO_MAGICLINKS will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_MAGICLINKS applies to all components of the path. With LOOKUP_NO_MAGICLINKS, any magic-link path component encountered during path resolution will yield -ELOOP. The handling of ~LOOKUP_FOLLOW for a trailing magic-link is identical to LOOKUP_NO_SYMLINKS. LOOKUP_NO_SYMLINKS implies LOOKUP_NO_MAGICLINKS. /* Testing. */ LOOKUP_NO_MAGICLINKS is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [2]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [3]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: David Drysdale <drysdale@google.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: LOOKUP_NO_SYMLINKS: block symlink resolutionAleksa Sarai
/* Background. */ Userspace cannot easily resolve a path without resolving symlinks, and would have to manually resolve each path component with O_PATH and O_NOFOLLOW. This is clearly inefficient, and can be fairly easy to screw up (resulting in possible security bugs). Linus has mentioned that Git has a particular need for this kind of flag[1]. It also resolves a fairly long-standing perceived deficiency in O_NOFOLLOw -- that it only blocks the opening of trailing symlinks. This is part of a refresh of Al's AT_NO_JUMPS patchset[2] (which was a variation on David Drysdale's O_BENEATH patchset[3], which in turn was based on the Capsicum project[4]). /* Userspace API. */ LOOKUP_NO_SYMLINKS will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_SYMLINKS applies to all components of the path. With LOOKUP_NO_SYMLINKS, any symlink path component encountered during path resolution will yield -ELOOP. If the trailing component is a symlink (and no other components were symlinks), then O_PATH|O_NOFOLLOW will not error out and will instead provide a handle to the trailing symlink -- without resolving it. /* Testing. */ LOOKUP_NO_SYMLINKS is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/CA+55aFyOKM7DW7+0sdDFKdZFXgptb5r1id9=Wvhd8AgSP7qjwQ@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [3]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [4]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08namei: allow nd_jump_link() to produce errorsAleksa Sarai
In preparation for LOOKUP_NO_MAGICLINKS, it's necessary to add the ability for nd_jump_link() to return an error which the corresponding get_link() caller must propogate back up to the VFS. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08nsfs: clean-up ns_get_path() signature to return intAleksa Sarai
ns_get_path() and ns_get_path_cb() only ever return either NULL or an ERR_PTR. It is far more idiomatic to simply return an integer, and it makes all of the callers of ns_get_path() more straightforward to read. Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-08sched/rt, fs: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the i_size() and part_nr_sects_…() code over to use CONFIG_PREEMPTION. Update the comment for fsstack_copy_inode_size() also to refer to CONFIG_PREEMPTION. [bigeasy: +PREEMPT comments] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20191015191821.11479-24-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>