summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-11net: Add ID (if needed) to sock_reuseport and expose reuseport_lockMartin KaFai Lau
A later patch will introduce a BPF_MAP_TYPE_REUSEPORT_ARRAY which allows a SO_REUSEPORT sk to be added to a bpf map. When a sk is removed from reuse->socks[], it also needs to be removed from the bpf map. Also, when adding a sk to a bpf map, the bpf map needs to ensure it is indeed in a reuse->socks[]. Hence, reuseport_lock is needed by the bpf map to ensure its map_update_elem() and map_delete_elem() operations are in-sync with the reuse->socks[]. The BPF_MAP_TYPE_REUSEPORT_ARRAY map will only acquire the reuseport_lock after ensuring the adding sk is already in a reuseport group (i.e. reuse->socks[]). The map_lookup_elem() will be lockless. This patch also adds an ID to sock_reuseport. A later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT which allows a bpf prog to select a sk from a bpf map. It is inflexible to statically enforce a bpf map can only contain the sk belonging to a particular reuse->socks[] (i.e. same IP:PORT) during the bpf verification time. For example, think about the the map-in-map situation where the inner map can be dynamically changed in runtime and the outer map may have inner maps belonging to different reuseport groups. Hence, when the bpf prog (in the new BPF_PROG_TYPE_SK_REUSEPORT type) selects a sk, this selected sk has to be checked to ensure it belongs to the requesting reuseport group (i.e. the group serving that IP:PORT). The "sk->sk_reuseport_cb" pointer cannot be used for this checking purpose because the pointer value will change after reuseport_grow(). Instead of saving all checking conditions like the ones preced calling "reuseport_add_sock()" and compare them everytime a bpf_prog is run, a 32bits ID is introduced to survive the reuseport_grow(). The ID is only acquired if any of the reuse->socks[] is added to the newly introduced "BPF_MAP_TYPE_REUSEPORT_ARRAY" map. If "BPF_MAP_TYPE_REUSEPORT_ARRAY" is not used, the changes in this patch is a no-op. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-11tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socketMartin KaFai Lau
Although the actual cookie check "__cookie_v[46]_check()" does not involve sk specific info, it checks whether the sk has recent synq overflow event in "tcp_synq_no_recent_overflow()". The tcp_sk(sk)->rx_opt.ts_recent_stamp is updated every second when it has sent out a syncookie (through "tcp_synq_overflow()"). The above per sk "recent synq overflow event timestamp" works well for non SO_REUSEPORT use case. However, it may cause random connection request reject/discard when SO_REUSEPORT is used with syncookie because it fails the "tcp_synq_no_recent_overflow()" test. When SO_REUSEPORT is used, it usually has multiple listening socks serving TCP connection requests destinated to the same local IP:PORT. There are cases that the TCP-ACK-COOKIE may not be received by the same sk that sent out the syncookie. For example, if reuse->socks[] began with {sk0, sk1}, 1) sk1 sent out syncookies and tcp_sk(sk1)->rx_opt.ts_recent_stamp was updated. 2) the reuse->socks[] became {sk1, sk2} later. e.g. sk0 was first closed and then sk2 was added. Here, sk2 does not have ts_recent_stamp set. There are other ordering that will trigger the similar situation below but the idea is the same. 3) When the TCP-ACK-COOKIE comes back, sk2 was selected. "tcp_synq_no_recent_overflow(sk2)" returns true. In this case, all syncookies sent by sk1 will be handled (and rejected) by sk2 while sk1 is still alive. The userspace may create and remove listening SO_REUSEPORT sockets as it sees fit. E.g. Adding new thread (and SO_REUSEPORT sock) to handle incoming requests, old process stopping and new process starting...etc. With or without SO_ATTACH_REUSEPORT_[CB]BPF, the sockets leaving and joining a reuseport group makes picking the same sk to check the syncookie very difficult (if not impossible). The later patches will allow bpf prog more flexibility in deciding where a sk should be located in a bpf map and selecting a particular SO_REUSEPORT sock as it sees fit. e.g. Without closing any sock, replace the whole bpf reuseport_array in one map_update() by using map-in-map. Getting the syncookie check working smoothly across socks in the same "reuse->socks[]" is important. A partial solution is to set the newly added sk's ts_recent_stamp to the max ts_recent_stamp of a reuseport group but that will require to iterate through reuse->socks[] OR pessimistically set it to "now - TCP_SYNCOOKIE_VALID" when a sk is joining a reuseport group. However, neither of them will solve the existing sk getting moved around the reuse->socks[] and that sk may not have ts_recent_stamp updated, unlikely under continuous synflood but not impossible. This patch opts to treat the reuseport group as a whole when considering the last synq overflow timestamp since they are serving the same IP:PORT from the userspace (and BPF program) perspective. "synq_overflow_ts" is added to "struct sock_reuseport". The tcp_synq_overflow() and tcp_synq_no_recent_overflow() will update/check reuse->synq_overflow_ts if the sk is in a reuseport group. Similar to the reuseport decision in __inet_lookup_listener(), both sk->sk_reuseport and sk->sk_reuseport_cb are tested for SO_REUSEPORT usage. Update on "synq_overflow_ts" happens at roughly once every second. A synflood test was done with a 16 rx-queues and 16 reuseport sockets. No meaningful performance change is observed. Before and after the change is ~9Mpps in IPv4. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10smb3: create smb3 equivalent alias for cifs pseudo-xattrsSteve French
We really, really don't want to be encouraging people to use cifs (the dialect) since it is insecure, so to avoid confusion we want to move them to names which include 'smb3' instead of 'cifs' - so this simply creates an alias for the pseudo-xattrs e.g. can now do: getfattr -n user.smb3.creationtime /mnt1/file and getfattr -n user.smb3.dosattrib /mnt1/file and getfattr -n system.smb3_acl /mnt1/file instead of forcing you to use the string 'cifs' in these (e.g. getfattr -n system.cifs_acl /mnt1/file) Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-10drm/msm: a6xx: fix spelling mistake: "initalization" -> "initialization"Colin Ian King
Trivial fix to spelling mistake in dev_err message and comment Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10drm/msm/disp/dpu: fix early dereference of physical encoderJeykumar Sankaran
This change validates the physical encoder before it is dereferenced. Signed-off-by: Jeykumar Sankaran <jsanka@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10drm/msm: Add A6XX device supportJordan Crouse
Add support for the A6XX family of Adreno GPUs. The biggest addition is the GMU (Graphics Management Unit) which takes over most of the power management of the GPU itself but in a ironic twist of fate needs a goodly amount of management itself. Add support for the A6XX core code, the GMU and the HFI (hardware firmware interface) queue that the CPU uses to communicate with the GMU. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10drm/msm: update generated headersRob Clark
Resync generated headers to pull in a6xx registers. Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10drm/msm/adreno: Load the firmware before bringing up the hardwareJordan Crouse
Failure to load firmware is the primary reason to fail adreno_load_gpu(). Try to load it first before going into the hardware initialization code and unwinding it. This is important for a6xx because the GMU gets loaded from the runtime power code and it is more costly to fail in that path because of missing firmware. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10drm/msm: Add a helper function to parse clock namesJordan Crouse
Add a helper function to parse the clock names and set up the bulk data so we can take advantage of the bulk clock functions instead of rolling our own. This is added as a helper function so the upcoming a6xx GMU code can also take advantage of it. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-10Documentation: corrections to console/console.txtRandy Dunlap
Fix typos, line length, grammar, punctuation, and capitalization in console.txt. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Antonino A. Daplas <adaplas@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-10Documentation: add ioctl number entry for v4l2-subdev.hRandy Dunlap
Update ioctl-number.txt for ioctl's that are defined in <media/v4l2-subdev.h>. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-10Remove gendered language from management style documentationFox Foster
This small commit replaces gendered pronouns for neutral ones. Signed-off-by: Fox Foster <fox@tardis.ed.ac.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-10IB/uverbs: Remove the ib_uverbs_attr pointer from each attrJason Gunthorpe
Memory in the bundle is valuable, do not waste it holding an 8 byte pointer for the rare case of writing to a PTR_OUT. We can compute the pointer by storing a small 1 byte array offset and the base address of the uattr memory in the bundle private memory. This also means we can access the kernel's copy of the ib_uverbs_attr, so drop the copy of flags as well. Since the uattr base should be private bundle information this also de-inlines the already too big uverbs_copy_to inline and moves create_udata into uverbs_ioctl.c so they can see the private struct definition. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10IB/uverbs: Provide implementation private memory for the uverbs_attr_bundleJason Gunthorpe
This already existed as the anonymous 'ctx' structure, but this was not really a useful form. Hoist this struct into bundle_priv and rework the internal things to use it instead. Move a bunch of the processing internal state into the priv and reduce the excessive use of function arguments. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10IB/uverbs: Use uverbs_api to manage the object type inside the uobjectJason Gunthorpe
Currently the struct uverbs_obj_type stored in the ib_uobject is part of the .rodata segment of the module that defines the object. This is a problem if drivers define new uapi objects as we will be left with a dangling pointer after device disassociation. Switch the uverbs_obj_type for struct uverbs_api_object, which is allocated memory that is part of the uverbs_api and is guaranteed to always exist. Further this moves the 'type_class' into this memory which means access to the IDR/FD function pointers is also guaranteed. Drivers cannot define new types. This makes it safe to continue to use all uobjects, including driver defined ones, after disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-10IB/uverbs: Build the specs into a radix tree at runtimeJason Gunthorpe
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-10IB/uverbs: Have the core code create the uverbs_root_specJason Gunthorpe
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10liquidio: copperhead LED identificationRaghu Vatsavayi
Add LED identification support for liquidio TP copperhead cards. Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10qed/qede: qede_setup_tc() can be statickbuild test robot
Fixes: 5e7baf0fcb2a ("qed/qede: Multi CoS support.") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10mlxsw: core: remove unnecessary function mlxsw_core_driver_putYueHaibing
The function mlxsw_core_driver_put only traverse mlxsw_core_driver_list to find the matched mlxsw_driver,but never used it. So it can be removed safely. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10net: mvneta: fix mvneta_config_rss on armada 3700Jisheng Zhang
The mvneta Ethernet driver is used on a few different Marvell SoCs. Some SoCs have per cpu interrupts for Ethernet events, the driver uses a per CPU napi structure for this case. Some SoCs such as armada 3700 have a single interrupt for Ethernet events, the driver uses a global napi structure for this case. Current mvneta_config_rss() always operates the per cpu napi structure. Fix it by operating a global napi for "single interrupt" case, and per cpu napi structure for remaining cases. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10net/smc: send response to test link signalUrsula Braun
With SMC-D z/OS sends a test link signal every 10 seconds. Linux is supposed to answer, otherwise the SMC-D connection breaks. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10Merge branch 'r8169-smaller-improvements'David S. Miller
Heiner Kallweit says: ==================== r8169: smaller improvements This series includes smaller improvements, no functional change intended. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't configure max jumbo frame size per chip versionHeiner Kallweit
We don't have to configure the max jumbo frame size per chip (sub-)version. It can be easily determined based on the chip family. And new members of the RTL8168 family (if there are any) should be automatically covered. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't configure csum function per chip versionHeiner Kallweit
We don't have to configure the csum function per chip (sub-)version. The distinction is simple, versions RTL8102e and from RTL8168c onwards support csum_v2. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: simplify interrupt handlerHeiner Kallweit
Simplify the interrupt handler a little and make it better readable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't include asm headers directlyHeiner Kallweit
The asm headers shouldn't be included directly. asm/irq.h is implicitly included by linux/interrupt.h, and instead of asm/io.h include linux/io.h. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: remove version infoHeiner Kallweit
The version number hasn't changed for ages and in general I doubt it provides any benefit. The message in rtl_init_one() may even be misleading because it's printed also if something fails in probe. Therefore let's remove the version information. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-08-10 Here's one more (most likely last) bluetooth-next pull request for the 4.19 kernel. - Added support for MediaTek serial Bluetooth devices - Initial skeleton for controller-side address resolution support - Fix BT_HCIUART_RTL related Kconfig dependencies - A few other minor fixes/cleanups Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10gpio: it87: Add support for IT8613Leonid Bloch
This was tested on actual hardware and found to work fine, but currently the official specifications of this chip could not be obtained to confirm the numbers. Signed-off-by: Leonid Bloch <lbloch@janustech.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: it87: add support for IT8718F Super I/O.Ivan Podovalov
The DIO connector on the WAFER-945GSE is interfaced to GPIO ports on the ITE IT8718F Super I/O chipset. From the datasheet of ITE IT8718F, the GPIO interface is identical to IT8728, so just add it to the same case as the other chip. Signed-off-by: Ivan Podovalov <ipodovalov@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpiolib: Avoid calling chip->request() for unused gpiosBiju Das
Add a check for unused gpios to avoid chip->request() call to client driver for unused gpios. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Reviewed-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: tegra: Include the right headerLinus Walleij
This is a GPIO driver so include only <linux/gpio/driver.h>. Drop the use of GPIOF_* flags: these are for consumers, not drivers. Just return 0/1. Cc: Stefan Agner <stefan@agner.ch> Cc: Thierry Reding <treding@nvidia.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: mmio: Fix up inverted direction registersLinus Walleij
The bgpio_init() takes one of two arguments to specify a register to set the direction of the GPIO line: either dirout that indicates that a 1 in the bit in that register sets the corresponding line to output, or dirin which indicates that a 1 in the bit in that register sets the corresponding line to input. Conversely setting the bit to 0 on these will turn the line into input and output respectively. One of these can be defined but not both. This means that a platform that sets a bit to 1 for output only defines dirout and a platform that sets a bit to 0 for output only defines dirin. In short this defines the polarity of the direction register. Both can also be left as NULL meaning the GPIO chip is either input only or output only. Tomer Maimon discovered that for get/set chips (those where the get and set registers are defined but no separate clear register, and specifying BGPIOF_READ_OUTPUT_REG_SET so that we say we want to read the output value from the SET register) we are unconditionally reading the value from the SET register when the direction bit is 1 and from the DAT register when the direction bit is 0, not taking the direction bit polarity into account. It would be expected that when the direction bit is inverted (dirin is defined but not dirout) we read the current value from the DAT register when the bit is 1 and from the SET register when the bit is 0. Currently only some versions of ATH79, brcmstb, some versions of CLP711x, GE, IOP and Loongson use the dirin mode (a 1 in the register means input). They are unaffected because BGPIOF_READ_OUTPUT_REG_SET is not set on any of them. (They do not read back the SET register to figure out the output value.) So this is no regression with current drivers. However the behaviour is wrong and does not work with Tomer's new driver where he needs to use the BGIOF_READ_OUTPUT_REG_SET. This fixes the above issue by: - Instead of defining separate functions for the inverted case, set up a flag in the gpio_chip that indicates that the direction is inverted. - Remove the special inverted functions for setting input/output and getting the direction, rely on the flag instead. - Respect this flag in bgpio_get_set() and bgpio_get_set_multiple() Reported-by: Tomer Maimon <tmaimon77@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: xilinx: Use the right includeLinus Walleij
This is a GPIO driver so use only <linux/gpio/driver.h>. Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10pinctrl: nomadik: silence uninitialized variable warningDan Carpenter
This is harmless, but "val" isn't necessarily initialized if abx500_get_register_interruptible() fails. I've re-arranged the code to just return an error code in that situation. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10pinctrl: axp209: Fix NULL pointer dereference after allocationAnton Vasilyev
There is no check that allocation in axp20x_funcs_groups_from_mask is successful. The patch adds corresponding check and return values. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: timberdale: Include the right headerLinus Walleij
This is a GPIO driver so include only <linux/gpio/driver.h>. Cc: Richard Röjfors <richard.rojfors@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpio: tb10x: Use the right includeLinus Walleij
This driver includes the legacy <linux/gpio.h> and <linux/of_gpio.h> but all it needs is really <linux/gpio/driver.h>. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10gpiolib: Fix of_node inconsistencyBiju Das
Some platforms are not setting of_node in the driver. On these platforms defining gpio-reserved-ranges on device tree leads to kernel crash. It is due to some parts of the gpio core relying on the driver to set up of_node,while other parts do themselves.This inconsistent behaviour leads to a crash. gpiochip_add_data_with_key() calls gpiochip_init_valid_mask() with of_node as NULL. of_gpiochip_add() fills "of_node" and calls of_gpiochip_init_valid_mask(). The fix is to move the assignment to chip->of_node from of_gpiochip_add() to gpiochip_add_data_with_key(). Signed-off-by: Biju Das <biju.das@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10pinctrl: samsung: Remove duplicated "wakeup" in printkKrzysztof Kozlowski
Double "wakeup" appears in printed message. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-10tracepoints: Free early tracepoints after RCU is initializedSteven Rostedt (VMware)
When enabling trace events via the kernel command line, I hit this warning: WARNING: CPU: 0 PID: 13 at kernel/rcu/srcutree.c:236 check_init_srcu_struct+0xe/0x61 Modules linked in: CPU: 0 PID: 13 Comm: watchdog/0 Not tainted 4.18.0-rc6-test+ #6 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 RIP: 0010:check_init_srcu_struct+0xe/0x61 Code: 48 c7 c6 ec 8a 65 b4 e8 ff 79 fe ff 48 89 df 31 f6 e8 f2 fa ff ff 5a 5b 41 5c 5d c3 0f 1f 44 00 00 83 3d 68 94 b8 01 01 75 02 <0f> 0b 48 8b 87 f0 0a 00 00 a8 03 74 45 55 48 89 e5 41 55 41 54 4c RSP: 0000:ffff96eb9ea03e68 EFLAGS: 00010246 RAX: ffff96eb962b5b01 RBX: ffffffffb4a87420 RCX: 0000000000000001 RDX: ffffffffb3107969 RSI: ffff96eb962b5b40 RDI: ffffffffb4a87420 RBP: ffff96eb9ea03eb0 R08: ffffabbd00cd7f48 R09: 0000000000000000 R10: ffff96eb9ea03e68 R11: ffffffffb4a6eec0 R12: ffff96eb962b5b40 R13: ffff96eb9ea03ef8 R14: ffffffffb3107969 R15: ffffffffb3107948 FS: 0000000000000000(0000) GS:ffff96eb9ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff96eb13ab2000 CR3: 0000000192a1e001 CR4: 00000000001606f0 Call Trace: <IRQ> ? __call_srcu+0x2d/0x290 ? rcu_process_callbacks+0x26e/0x448 ? allocate_probes+0x2b/0x2b call_srcu+0x13/0x15 rcu_free_old_probes+0x1f/0x21 rcu_process_callbacks+0x2ed/0x448 __do_softirq+0x172/0x336 irq_exit+0x62/0xb2 smp_apic_timer_interrupt+0x161/0x19e apic_timer_interrupt+0xf/0x20 </IRQ> The problem is that the enabling of trace events before RCU is set up will cause SRCU to give this warning. To avoid this, add a list to store probes that need to be freed till after RCU is initialized, and then free them then. Link: http://lkml.kernel.org/r/20180810113554.1df28050@gandalf.local.home Link: http://lkml.kernel.org/r/20180810123517.5e9714ad@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10uprobes: Use synchronize_rcu() not synchronize_sched()Steven Rostedt (VMware)
While debugging another bug, I was looking at all the synchronize*() functions being used in kernel/trace, and noticed that trace_uprobes was using synchronize_sched(), with a comment to synchronize with {u,ret}_probe_trace_func(). When looking at those functions, the data is protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This is using the wrong synchronize_*() function. Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer") Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10x86/mm/pti: Move user W+X check into pti_finalize()Joerg Roedel
The user page-table gets the updated kernel mappings in pti_finalize(), which runs after the RO+X permissions got applied to the kernel page-table in mark_readonly(). But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in mark_readonly() for insecure mappings. This causes false-positive warnings, because the user page-table did not get the updated mappings yet. Move the W+X check for the user page-table into pti_finalize() after it updated all required mappings. [ tglx: Folded !NX supported fix ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
2018-08-10tracing: Fix synchronizing to event changes with ↵Steven Rostedt (VMware)
tracepoint_synchronize_unregister() Now that some trace events can be protected by srcu_read_lock(tracepoint_srcu), we need to make sure all locations that depend on this are also protected. There were many places that did a synchronize_sched() thinking that it was enough to protect againts access to trace events. This use to be the case, but now that we use SRCU for _rcuidle() trace events, they may not be protected by synchronize_sched(), as they may be called in paths that RCU is not watching for preempt disable. Fixes: e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10ftrace: Remove unused pointer ftrace_swapper_pidColin Ian King
Pointer ftrace_swapper_pid is defined but is never used hence it is redundant and can be removed. The use of this variable was removed in commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do"). Cleans up clang warning: warning: 'ftrace_swapper_pid' defined but not used [-Wunused-const-variable=] Link: http://lkml.kernel.org/r/20180809125609.13142-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10tracing: More reverting of "tracing: Centralize preemptirq tracepoints and ↵Steven Rostedt (VMware)
unify their usage" Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff latency tracer. It made both use tracepoints. But the latency tracer is triggering warnings when using tracepoints to call into the latency tracer's routines. Mainly, they can be called from NMI context. If that happens, then the SRCU may not work properly because on some architectures, SRCU is not safe to be called in both NMI and non-NMI context. This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") that adds back the direct calls into the latency tracer. It also only calls the trace events when not in NMI. Link: http://lkml.kernel.org/r/20180809210654.622445925@goodmis.org Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10tracing/irqsoff: Handle preempt_count for different configsSteven Rostedt (VMware)
I was hitting the following warning: WARNING: CPU: 0 PID: 1 at kernel/trace/trace_irqsoff.c:631 tracer_hardirqs_off+0x15/0x2a Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6-test+ #13 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tracer_hardirqs_off+0x15/0x2a Code: ff 85 c0 74 0e 8b 45 00 8b 50 04 8b 45 04 e8 35 ff ff ff 5d c3 55 64 a1 cc 37 51 c1 a9 ff ff ff 7f 89 e5 53 89 d3 89 ca 75 02 <0f> 0b e8 90 fc ff ff 85 c0 74 07 89 d8 e8 0c ff ff ff 5b 5d c3 55 EAX: 80000000 EBX: c04337f0 ECX: c04338e3 EDX: c04338e3 ESI: c04337f0 EDI: c04338e3 EBP: f2aa1d68 ESP: f2aa1d64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046 CR0: 80050033 CR2: 00000000 CR3: 01668000 CR4: 001406f0 Call Trace: trace_irq_disable_rcuidle+0x63/0x6c trace_hardirqs_off+0x26/0x30 default_send_IPI_mask_allbutself_logical+0x31/0x93 default_send_IPI_allbutself+0x37/0x48 native_send_call_func_ipi+0x4d/0x6a smp_call_function_many+0x165/0x19d ? add_nops+0x34/0x34 ? trace_hardirqs_on_caller+0x2d/0x2d ? add_nops+0x34/0x34 smp_call_function+0x1f/0x23 on_each_cpu+0x15/0x43 ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_hardirqs_on_caller+0x2d/0x2d ? trace_irq_disable_rcuidle+0x1/0x6c text_poke_bp+0xa0/0xc2 ? trace_hardirqs_on_caller+0x2d/0x2d arch_jump_label_transform+0xa7/0xcb ? trace_irq_disable_rcuidle+0x5/0x6c __jump_label_update+0x3e/0x6d jump_label_update+0x7d/0x81 static_key_slow_inc_cpuslocked+0x58/0x6d static_key_slow_inc+0x19/0x20 tracepoint_probe_register_prio+0x19e/0x1d1 ? start_critical_timings+0x1c/0x1c tracepoint_probe_register+0xf/0x11 irqsoff_tracer_init+0x21/0xf2 tracer_init+0x16/0x1a trace_selftest_startup_irqsoff+0x25/0xc4 run_tracer_selftest+0xca/0x131 register_tracer+0xd5/0x172 ? trace_event_define_fields_preemptirq_template+0x45/0x45 init_irqsoff_tracer+0xd/0x11 do_one_initcall+0xab/0x1e8 ? rcu_read_lock_sched_held+0x3d/0x44 ? trace_initcall_level+0x52/0x86 kernel_init_freeable+0x195/0x21a ? rest_init+0xb4/0xb4 kernel_init+0xd/0xe4 ret_from_fork+0x2e/0x38 It is due to running a CONFIG_PREEMPT_VOLUNTARY kernel, which would trigger this warning every time: WARN_ON_ONCE(preempt_count()); Because on CONFIG_PREEMPT_VOLUNTARY, preempt_count() is always zero. This warning is to make sure preempt_count is set because tracer_hardirqs_on() does a preempt_enable_notrace() to make the preempt_trace() work properly, as being called by a trace event, the trace event code disables preemption, and the tracer wants to know what the preemption was before it was called. Instead of enabling preemption like this, just record the preempt_count, subtract PREEMPT_DISABLE_OFFSET from it (which is zero with !CONFIG_PREEMPT set), and pass that value to the necessary functions, which should use the passed in parameter instead of calling preempt_count() directly. Fixes: da5b3ebb45277 ("tracing: irqsoff: Account for additional preempt_disable") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and ↵Steven Rostedt (VMware)
unify their usage" Joel Fernandes created a nice patch that cleaned up the duplicate hooks used by lockdep and irqsoff latency tracer. It made both use tracepoints. But it caused lockdep to trigger several false positives. We have not figured out why yet, but removing lockdep from using the trace event hooks and just call its helper functions directly (like it use to), makes the problem go away. This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") that adds direct calls for lockdep, but also keeps most of the clean up done to get rid of the horrible preprocessor if statements. Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10Revert "media: vivid: shut up warnings due to a non-trivial logic"Mauro Carvalho Chehab
0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit 3354b54f9f7037a1122d3b6009aa9d39829d6843 [ 248.847809] BUG: unable to handle kernel paging request at ffffc90000393131 [ 248.848015] Call Trace: [ 248.848015] ? vivid_dev_release+0xc0/0xc0 [ 248.848015] ? acpi_dev_pm_attach+0x27/0xd0 This reverts commit 3354b54f9f7037a1122d3b6009aa9d39829d6843.