summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2015-05-22tcp: fix a potential deadlock in tcp_get_info()Eric Dumazet
Taking socket spinlock in tcp_get_info() can deadlock, as inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i], while packet processing can use the reverse locking order. We could avoid this locking for TCP_LISTEN states, but lockdep would certainly get confused as all TCP sockets share same lockdep classes. [ 523.722504] ====================================================== [ 523.728706] [ INFO: possible circular locking dependency detected ] [ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted [ 523.739202] ------------------------------------------------------- [ 523.745474] ss/18032 is trying to acquire lock: [ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360 [ 523.758129] [ 523.758129] but task is already holding lock: [ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0 [ 523.774661] [ 523.774661] which lock already depends on the new lock. [ 523.774661] [ 523.782850] [ 523.782850] the existing dependency chain (in reverse order) is: [ 523.790326] -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}: [ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270 [ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50 [ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110 [ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350 [ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500 [ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0 [ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10 [ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0 [ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0 [ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0 [ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420 Lets use u64_sync infrastructure instead. As a bonus, 64bit arches get optimized, as these are nop for them. Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22time: Remove read_boot_clock()Xunlei Pang
Now that we have a read_boot_clock64() function available on every architecture, and converted all the users to it, it's time to remove the (now unused) read_boot_clock() completely from the kernel. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [jstultz: Minor commit message tweak suggested by Ingo] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22time: Include math64.h in time64.hXunlei Pang
On 32-bit systems, timespec64_add_ns() calls __iter_div_u64_rem() which needs math64.h, and we want to include time64.h in some cases. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22time: Rework debugging variables so they aren't globalJohn Stultz
Ingo suggested that the timekeeping debugging variables recently added should not be global, and should be tied to the timekeeper's read_base. Thus this patch implements that suggestion. This version is different from the earlier versions as it keeps the variables in the timekeeper structure rather then in the tkr. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22timekeeping: Provide new API to get the current time resolutionHarald Geyer
This patch series introduces a new function u32 ktime_get_resolution_ns(void) which allows to clean up some driver code. In particular the IIO subsystem has a function to provide timestamps for events but no means to get their resolution. So currently the dht11 driver tries to guess the resolution in a rather messy and convoluted way. We can do much better with the new code. This API is not designed to be exposed to user space. This has been tested on i386, sunxi and mxs. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Harald Geyer <harald@ccbib.org> [jstultz: Tweaked to make it build after upstream changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22PCI: Add dev->has_secondary_link to track downstream PCIe linksYijing Wang
A PCIe Port is an interface to a Link. A Root Port is a PCI-PCI bridge in a Root Complex and has a Link on its secondary (downstream) side. For other Ports, the Link may be on either the upstream (closer to the Root Complex) or downstream side of the Port. The usual topology has a Root Port connected to an Upstream Port. We previously assumed this was the only possible topology, and that a Downstream Port's Link was always on its downstream side, like this: +---------------------+ +------+ | Downstream | | Root | | Upstream Port +--Link-- | Port +--Link--+ Port | +------+ | Downstream | | Port +--Link-- +---------------------+ But systems do exist (see URL below) where the Root Port is connected to a Downstream Port. In this case, a Downstream Port's Link may be on either the upstream or downstream side: +---------------------+ +------+ | Upstream | | Root | | Downstream Port +--Link-- | Port +--Link--+ Port | +------+ | Downstream | | Port +--Link-- +---------------------+ We can't use the Port type to determine which side the Link is on, so add a bit in struct pci_dev to keep track. A Root Port's Link is always on the Port's secondary side. A component (Endpoint or Port) on the other end of the Link obviously has the Link on its upstream side. If that component is a Port, it is part of a Switch or a Bridge. A Bridge has a PCI or PCI-X bus on its secondary side, not a Link. The internal bus of a Switch connects the Port to another Port whose Link is on the downstream side. [bhelgaas: changelog, comment, cache "type", use if/else] Link: http://lkml.kernel.org/r/54EB81B2.4050904@pobox.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=94361 Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-05-22block, dm: don't copy bios for request clonesChristoph Hellwig
Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22block: remove management of bi_remaining when restoring original bi_end_ioMike Snitzer
Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") regressed all existing callers that followed this pattern: 1) saving a bio's original bi_end_io 2) wiring up an intermediate bi_end_io 3) restoring the original bi_end_io from intermediate bi_end_io 4) calling bio_endio() to execute the restored original bi_end_io The regression was due to BIO_CHAIN only ever getting set if bio_inc_remaining() is called. For the above pattern it isn't set until step 3 above (step 2 would've needed to establish BIO_CHAIN). As such the first bio_endio(), in step 2 above, never decremented __bi_remaining before calling the intermediate bi_end_io -- leaving __bi_remaining with the value 1 instead of 0. When bio_inc_remaining() occurred during step 3 it brought it to a value of 2. When the second bio_endio() was called, in step 4 above, it should've called the original bi_end_io but it didn't because there was an extra reference that wasn't dropped (due to atomic operations being optimized away since BIO_CHAIN wasn't set upfront). Fix this issue by removing the __bi_remaining management complexity for all callers that use the above pattern -- bio_chain() is the only interface that _needs_ to be concerned with __bi_remaining. For the above pattern callers just expect the bi_end_io they set to get called! Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls that aren't associated with the bio_chain() interface. Also, the bio_inc_remaining() interface has been moved local to bio.c. Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22Merge tag 'at91-cleanup' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux into next/cleanup Merge "First batch of cleanup for 4.2" from Alexandre Belloni: - use syscon in the ata and cf drivers to configure the SMC and drop sam9_smc.c - switch the at91rm9200 memory controller to syscon - remove last useless headers - remove now useless Makefile.boot * tag 'at91-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: ARM: at91: remove useless Makefile.boot ARM: at91: remove at91rm9200_sdramc.h ARM: at91: remove mach/at91_ramc.h and mach/at91rm9200_mc.h ARM: at91/pm: use the atmel-mc syscon defines pcmcia: at91_cf: Use syscon to configure the MC/smc ARM: at91: declare the at91rm9200 memory controller as a syscon mfd: syscon: Add Atmel MC (Memory Controller) registers definition ARM: at91: drop sam9_smc.c ata: at91: use syscon to configure the smc
2015-05-22nvme: submit internal commands through the block layerChristoph Hellwig
Use block layer queues with an internal cmd_type to submit internally generated NVMe commands. This both simplifies the code a lot and allow for a better structure. For example now the LighNVM code can construct commands without knowing the details of the underlying I/O descriptors. Or a future NVMe over network target could inject commands, as well as could the SCSI translation and ioctl code be reused for such a beast. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: store a struct device pointer in struct nvme_devChristoph Hellwig
Most users want the generic device, so store that in struct nvme_dev instead of the pci_dev. This also happens to be a nice step towards making some code reusable for non-PCI transports. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: consolidate synchronous command submission helpersChristoph Hellwig
Note that we keep the unused timeout argument, but allow callers to pass 0 instead of a timeout if they want the default. This will allow adding a timeout to the pass through path later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22Merge tag 'arm-soc/for-4.2/soc-take2' of http://github.com/broadcom/stblinux ↵Arnd Bergmann
into next/soc Merge mach-bcm changes from Florian Fainelli: This pull request contains the following changes: - Rafal adds an additional fault code to be ignored by the kernel on BCM5301X SoC - BCM63138 SMP support which: * common code to control the PMB bus, to be shared with a reset controller driver in drivers/reset * secondary CPU initialization sequence using PMB helpers * small changes suggested by Russell King to allow platforms to disable VFP * tag 'arm-soc/for-4.2/soc-take2' of http://github.com/broadcom/stblinux: ARM: BCM63xx: Add SMP support for BCM63138 ARM: vfp: Add vfp_disable for problematic platforms ARM: vfp: Add include guards ARM: BCM63xx: Add secondary CPU PMB initialization sequence ARM: BCM63xx: Add Broadcom BCM63xx PMB controller helpers ARM: BCM5301X: Ignore another (BCM4709 specific) fault code
2015-05-22regulator: max8973: add mechanism to enable/disable through GPIOLaxman Dewangan
MAX8973 supports the voltage output enable/disable through its EN pin. This EN pin can be connected through GPIO from host processor. Add support to provide GPIO number from platform/DT and if it is valid GPIO then enable external control default. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22regmap: Introduce regmap_get_reg_strideSrinivas Kandagatla
This patch introduces regmap_get_reg_stride() function which would be used by the infrastructures like nvmem framework built on top of regmap. Mostly this function would be used for sanity checks on inputs within such infrastructure. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22regmap: Introduce regmap_get_max_registerSrinivas Kandagatla
This patch introduces regmap_get_max_register() function which would be used by the infrastructures like nvmem framework built on top of regmap. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22extcon: Update the prototype of extcon_register_notifier() with enum extconChanwoo Choi
Previously, extcon consumer driver used the extcon_register_interest() to register the notifier chain and then to receive the notifier event when external connector's state is changed. When registering the notifier chain for specific external connector with extcon_register_interest(), it used the the string name of external connector directly. There are potential problem because of unclear, non-standard and inconsequent cable name. Namely, it is not appropriate method to identify each external connector. So, this patch modify the prototype of extcon_register_notifier() by using the 'enum extcon' which are the unique id for each external connector instead of unclear string method. - Previously, the extcon consumer driver used the extcon_register_interest() with 'cable_name' to point out the specific external connector. Also. it used the un-needed structure (struct extcon_specific_cable_nb). : int extcon_register_interest(struct extcon_specific_cable_nb *obj, const char *extcon_name, const char *cable_name, struct notifier_block *nb) - Newly, the updated extcon_register_notifier() would definitely support the same feature to detech the changed state of external connector without any specific structure (struct extcon_specific_cable_nb). : int extcon_register_notifier(struct extcon_dev *edev, enum extcon id, struct notifier_block *nb) This patch support the both extcon_register_interest() and new extcon_register_ notifier(). But the extcon_{register|unregister}_interest() will be deprecated because extcon core would support the notifier event for extcon consumer driver with only updated extcon_register_notifier() and 'extcon_specific_cable_nb' will be removed if there are no extcon consumer driver with legacy extcon_{register|unregister}_interest(). Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2015-05-22extcon: Use the unique id for external connector instead of stringChanwoo Choi
This patch uses the unique id to identify the type of external connector instead of string name. The string name have the many potential issues. So, this patch defines the 'extcon' enumeration which includes all supported external connector on EXTCON subsystem. If new external connector is necessary, the unique id of new connector have to be added in 'extcon' enumeration. There are current supported external connector in 'enum extcon' as following: enum extcon { EXTCON_NONE = 0x0, /* USB external connector */ EXTCON_USB = 0x1, EXTCON_USB_HOST = 0x2, /* Charger external connector */ EXTCON_TA = 0x10, EXTCON_FAST_CHARGER = 0x11, EXTCON_SLOW_CHARGER = 0x12, EXTCON_CHARGE_DOWNSTREAM = 0x13, /* Audio and video external connector */ EXTCON_LINE_IN = 0x20, EXTCON_LINE_OUT = 0x21, EXTCON_MICROPHONE = 0x22, EXTCON_HEADPHONE = 0x23, EXTCON_HDMI = 0x30, EXTCON_MHL = 0x31, EXTCON_DVI = 0x32, EXTCON_VGA = 0x33, EXTCON_SPDIF_IN = 0x34, EXTCON_SPDIF_OUT = 0x35, EXTCON_VIDEO_IN = 0x36, EXTCON_VIDEO_OUT = 0x37, /* Miscellaneous external connector */ EXTCON_DOCK = 0x50, EXTCON_JIG = 0x51, EXTCON_MECHANICAL = 0x52, EXTCON_END, }; For example in extcon-arizona.c: To use unique id removes the potential issue about handling the inconsistent name of external connector with string. - Previously, use the string to register the type of arizona jack connector static const char *arizona_cable[] = { "Mechanical", "Microphone", "Headphone", "Line-out", }; - Newly, use the unique id to register the type of arizona jack connector static const enum extcon arizona_cable[] = { EXTCON_MECHANICAL, EXTCON_MICROPHONE, EXTCON_HEADPHONE, EXTCON_LINE_OUT, EXTCON_NONE, }; And this patch modify the prototype of extcon_{get|set}_cable_state_() which uses the 'enum extcon id' instead of 'cable_index'. Because although one more extcon drivers support USB cable, each extcon driver might has the differnt 'cable_index' for USB cable. All extcon drivers can use the unique id number for same external connector with modified extcon_{get|set}_cable_state_(). - Previously, use 'cable_index' on these functions: extcon_get_cable_state_(struct extcon_dev*, int cable_index) extcon_set_cable_state_(struct extcon_dev*, int cable_index, bool state) -Newly, use 'enum extcon id' on these functions: extcon_get_cable_state_(struct extcon_dev*, enum extcon id) extcon_set_cable_state_(struct extcon_dev*, enum extcon id, bool state) Cc: Arnd Bergmann <arnd@arndb.de> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Acked-by: Roger Quadros <rogerq@ti.com> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Ramakrishna Pallala <ramakrishna.pallala@intel.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> [arnd: Report the build break about drivers/usb/phy/phy-tahvo.c after using the unique id for external connector insteadf of string] Reported-by: Arnd Bergmann <arnd@arndb.de> [dan.carpenter: Report the build warning of extcon_{set|get}_cable_state_()] Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2015-05-22crypto: aead - Rename aead_alg to old_aead_algHerbert Xu
This patch is the first step in the introduction of a new AEAD alg type. Unlike normal conversions this patch only renames the existing aead_alg structure because there are external references to it. Those references will be removed after this patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-21tcp: add tcpi_segs_in and tcpi_segs_out to tcp_infoMarcelo Ricardo Leitner
This patch tracks the total number of inbound and outbound segments on a TCP socket. One may use this number to have an idea on connection quality when compared against the retransmissions. RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut These are a 32bit field each and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->segs_out was placed near tp->snd_nxt for good data locality and minimal performance impact, while tp->segs_in was placed near tp->bytes_received for the same reason. Join work with Eric Dumazet. Note that received SYN are accounted on the listener, but sent SYNACK are not accounted. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: "Bugfixes for HID subsystem that should go in 4.1. Important highlights: - the patch that extended support for HID++ protocol for TK820 touchpad turns out to be causing regressions due to firmware issues; patch reverting back to basic support from Benjamin Tissoires - Wacom driver can oops for devices that report non-touch data on touch interfaces. Fix from Ping Cheng - gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail if gpiolib is not enabled. Fix from Mika Westerberg" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch HID: hid-sensor-hub: Fix debug lock warning Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820" HID: i2c-hid: Do not fail probing if gpiolib is not enabled
2015-05-21Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Michael Turquette: "The first set of clk fixes for 4.1 are all driver bugs, with the exception of a single locking fix in the core code. All driver fixes are for code that was merged recently. The Samsung stuff is mostly fixes around suspend/resume, the Qualcomm fixes are for invalid hardware configuration data and the Silicon Labs patches are fixes following their move away from platform_data to Device Tree" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: si5351: Do not pass struct clk in platform_data clk: si5351: Mention clock-names in the binding documentation clk: add missing lock when call clk_core_enable in clk_set_parent clk: exynos5420: Restore GATE_BUS_TOP on suspend clk: qcom: Fix MSM8916 gfx3d_clk_src configuration clk: qcom: Fix MSM8916 venus divider value clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates clk: exynos5433: Fix wrong parent clock of sclk_apollo clock clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
2015-05-21workqueue: move flush_scheduled_work() to workqueue.hLai Jiangshan
flush_scheduled_work() is just a simple call to flush_work(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-05-21bpf: allow bpf programs to tail-call other bpf programsAlexei Starovoitov
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program. bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal. New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables. The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. Use cases: Acked-by: Daniel Borkmann <daniel@iogearbox.net> ========== - simplify complex programs by splitting them into a sequence of small programs - dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event; For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp; - for TC use case, bpf_tail_call() allows to implement reclassify-like logic - bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks. In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue. - bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid. - jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21power_supply: Fix possible NULL pointer dereference on early ueventKrzysztof Kozlowski
Don't call the power_supply_changed() from power_supply_register() when parent is still probing because it may lead to accessing parent too early. In bq27x00_battery this caused NULL pointer exception because uevent of power_supply_changed called back the the get_property() method provided by the driver. The get_property() method accessed pointer which should be returned by power_supply_register(). Starting from bq27x00_battery_probe(): di->bat = power_supply_register() power_supply_changed() kobject_uevent() power_supply_uevent() power_supply_show_property() power_supply_get_property() bq27x00_battery_get_property() dereference of di->bat which is NULL here The dereference of di->bat (value returned by power_supply_register()) is the currently visible problem. However calling back the methods provided by driver before ending the probe may lead to accessing other driver-related data which is not yet initialized. The call to power_supply_changed() is postponed till probing ends - mutex of parent device is released. Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core") Tested-By: Dr. H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-05-20ARM: BCM63xx: Add Broadcom BCM63xx PMB controller helpersFlorian Fainelli
This patch adds both common register definitions and helper functions used to issue read/write commands to the Broadcom BCM63xx PMB controller which is used to power on and release from reset internal on-chip peripherals such as the integrated Ethernet switch, AHCI, USB, as well as the secondary CPU core. This is going to be utilized by the BCM63138 SMP code, as well as by the BCM63138 reset controller later. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2015-05-20Merge branch 'irq/for-arm' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/soc * 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: vf610-mscm: Support NVIC parent chip irqchip: nvic: Support hierarchy irq domain genirq: generic chip: Support hierarchy domain genirq: Add irq_chip_(enable/disable)_parent irqdomain: Add non-hierarchy helper irq_domain_set_info
2015-05-20block: export blkdev_reread_part() and __blkdev_reread_part()Jarod Wilson
This patch exports blkdev_reread_part() for block drivers, also introduce __blkdev_reread_part(). For some drivers, such as loop, reread of partitions can be run from the release path, and bd_mutex may already be held prior to calling ioctl_by_bdev(bdev, BLKRRPART, 0), so introduce __blkdev_reread_part for use in such cases. CC: Christoph Hellwig <hch@lst.de> CC: Jens Axboe <axboe@kernel.dk> CC: Tejun Heo <tj@kernel.org> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Markus Pargmann <mpa@pengutronix.de> CC: Stefan Weinhuber <wein@de.ibm.com> CC: Stefan Haberland <stefan.haberland@de.ibm.com> CC: Sebastian Ott <sebott@linux.vnet.ibm.com> CC: Fabian Frederick <fabf@skynet.be> CC: Ming Lei <ming.lei@canonical.com> CC: David Herrmann <dh.herrmann@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <peterz@infradead.org> CC: nbd-general@lists.sourceforge.net CC: linux-s390@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20mfd: syscon: Add Atmel MC (Memory Controller) registers definitionBoris Brezillon
The at91rm9200 SoC embeds a Memory Controller block which is used to configure several aspects of the platform: - AHB/APB Bus behavior - SDRAM Controller - EBI (External Bus Interface) and SMC (Static Memory Controller) config Those registers might be accessed by different drivers, hence we need to define it as a syscon device. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Lee Jones <lee.jones@linaro.org> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2015-05-20Revert "netfilter: bridge: query conntrack about skb dnat"Florian Westphal
This reverts commit c055d5b03bb4cb69d349d787c9787c0383abd8b2. There are two issues: 'dnat_took_place' made me think that this is related to -j DNAT/MASQUERADE. But thats only one part of the story. This is also relevant for SNAT when we undo snat translation in reverse/reply direction. Furthermore, I originally wanted to do this mainly to avoid storing ipv6 addresses once we make DNAT/REDIRECT work for ipv6 on bridges. However, I forgot about SNPT/DNPT which is stateless. So we can't escape storing address for ipv6 anyway. Might as well do it for ipv4 too. Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-20module: add core_param_unsafeDmitry Torokhov
Similarly to module_param_unsafe(), add the helper to be used by core code wishing to expose unsafe debugging or testing parameters that taint the kernel when set. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20driver-core: enable drivers to opt-out of async probeLuis R. Rodriguez
There are drivers that can not be probed asynchronously. One such group is platform drivers registered with platform_driver_probe(), which expects driver's probe routine be discarded after the driver has been registered and initial binding attempt executed. Also platform_driver_probe() an error when no devices were bound to the driver, allowing failing to load such driver module altogether. Other drivers do not work well with asynchronous probing because of driver bug or not optimal driver organization. To allow using such drivers even when user requests asynchronous probing as default boot strategy, let's allow them to opt out. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20driver-core: add driver module asynchronous probe supportLuis R. Rodriguez
Some init systems may wish to express the desire to have device drivers run their probe() code asynchronously. This implements support for this and allows userspace to request async probe as a preference through a generic shared device driver module parameter, async_probe. Implementation for async probe is supported through a module parameter given that since synchronous probe has been prevalent for years some userspace might exist which relies on the fact that the device driver will probe synchronously and the assumption that devices it provides will be immediately available after this. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20driver-core: add asynchronous probing support for driversDmitry Torokhov
Some devices take a long time when initializing, and not all drivers are suited to initialize their devices when they are open. For example, input drivers need to interrogate their devices in order to publish device's capabilities before userspace will open them. When such drivers are compiled into kernel they may stall entire kernel initialization. This change allows drivers request for their probe functions to be called asynchronously during driver and device registration (manual binding is still synchronous). Because async_schedule is used to perform asynchronous calls module loading will still wait for the probing to complete. Note that the end goal is to make the probing asynchronous by default, so annotating drivers with PROBE_PREFER_ASYNCHRONOUS is a temporary measure that allows us to speed up boot process while we validating and fixing the rest of the drivers and preparing userspace. This change is based on earlier patch by "Luis R. Rodriguez" <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20module: add extra argument for parse_params() callbackLuis R. Rodriguez
This adds an extra argument onto parse_params() to be used as a way to make the unused callback a bit more useful and generic by allowing the caller to pass on a data structure of its choice. An example use case is to allow us to easily make module parameters for every module which we will do next. @ parse @ identifier name, args, params, num, level_min, level_max; identifier unknown, param, val, doing; type s16; @@ extern char *parse_args(const char *name, char *args, const struct kernel_param *params, unsigned num, s16 level_min, s16 level_max, + void *arg, int (*unknown)(char *param, char *val, const char *doing + , void *arg )); @ parse_mod @ identifier name, args, params, num, level_min, level_max; identifier unknown, param, val, doing; type s16; @@ char *parse_args(const char *name, char *args, const struct kernel_param *params, unsigned num, s16 level_min, s16 level_max, + void *arg, int (*unknown)(char *param, char *val, const char *doing + , void *arg )) { ... } @ parse_args_found @ expression R, E1, E2, E3, E4, E5, E6; identifier func; @@ ( R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, func); | R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, &func); | R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, NULL); | parse_args(E1, E2, E3, E4, E5, E6, + NULL, func); | parse_args(E1, E2, E3, E4, E5, E6, + NULL, &func); | parse_args(E1, E2, E3, E4, E5, E6, + NULL, NULL); ) @ parse_args_unused depends on parse_args_found @ identifier parse_args_found.func; @@ int func(char *param, char *val, const char *unused + , void *arg ) { ... } @ mod_unused depends on parse_args_found @ identifier parse_args_found.func; expression A1, A2, A3; @@ - func(A1, A2, A3); + func(A1, A2, A3, NULL); Generated-by: Coccinelle SmPL Cc: cocci@systeme.lip6.fr Cc: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Felipe Contreras <felipe.contreras@gmail.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jani Nikula <jani.nikula@intel.com> Cc: linux-kernel@vger.kernel.org Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20PM / Wakeirq: Add automated device wake IRQ handlingTony Lindgren
Turns out we can automate the handling for the device_may_wakeup() quite a bit by using the kernel wakeup source list as suggested by Rafael J. Wysocki <rjw@rjwysocki.net>. And as some hardware has separate dedicated wake-up interrupt in addition to the IO interrupt, we can automate the handling by adding a generic threaded interrupt handler that just calls the device PM runtime to wake up the device. This allows dropping code from device drivers as we currently are doing it in multiple ways, and often wrong. For most drivers, we should be able to drop the following boilerplate code from runtime_suspend and runtime_resume functions: ... device_init_wakeup(dev, true); ... if (device_may_wakeup(dev)) enable_irq_wake(irq); ... if (device_may_wakeup(dev)) disable_irq_wake(irq); ... device_init_wakeup(dev, false); ... We can replace it with just the following init and exit time code: ... device_init_wakeup(dev, true); dev_pm_set_wake_irq(dev, irq); ... dev_pm_clear_wake_irq(dev); device_init_wakeup(dev, false); ... And for hardware with dedicated wake-up interrupts: ... device_init_wakeup(dev, true); dev_pm_set_dedicated_wake_irq(dev, irq); ... dev_pm_clear_wake_irq(dev); device_init_wakeup(dev, false); ... Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-19livepatch: introduce patch/func-walking helpersJiri Slaby
klp_for_each_object and klp_for_each_func are now used all over the code. One need not think what is the proper condition to check in the for loop now. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-19livepatch: make kobject in klp_object statically allocatedMiroslav Benes
Make kobj variable (of type struct kobject) statically allocated in klp_object structure. It will allow us to move in the func-object-patch hierarchy through kobject links. The only reason to have it dynamic was to not have empty release callback in the code. However we have empty callbacks for function and patch in the code now, so it is no longer valid and the advantage of static allocation is clear. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-19KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_asyncPaolo Bonzini
gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19suspend: simplify block I/O handlingChristoph Hellwig
Stop abusing struct page functionality and the swap end_io handler, and instead add a modified version of the blk-lib.c bio_batch helpers. Also move the block I/O code into swap.c as they are directly tied into each other. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Pavel Machek <pavel@ucw.cz> Tested-by: Ming Lin <mlin@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19block: collapse bio bit spaceJens Axboe
Various previous patches removed bits and left holes, collapse them all. Leave the reset start bit where it is, we don't need to change that. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19jiffies: Remove the extra indentation levelThomas Gleixner
Somehow I missed to clean that up when applying the patches. Fix it up now. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at>
2015-05-19block: remove unused BIO_RW_BLOCK and BIO_EOF flagsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19block: remove BIO_EOPNOTSUPPChristoph Hellwig
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag to help propagating those to the buffer_head layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19block: use an atomic_t for mq_freeze_depthChristoph Hellwig
lockdep gets unhappy about the not disabling irqs when using the queue_lock around it. Instead of trying to fix that up just switch to an atomic_t and get rid of the lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED stateViresh Kumar
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes. When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick. As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate). In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked. A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508. This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: linaro-kernel@lists.linaro.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19Merge branch 'linus' into timers/coreThomas Gleixner
Make sure the upstream fixes are applied before adding further modifications.
2015-05-19Merge branch 'irq/for-x86' into x86/apicThomas Gleixner
Pull the irq core change which is required to merge the preparatory patches for posted interrupts.
2015-05-19Merge branch 'irq/for-x86' into irq/coreThomas Gleixner
Pull in the branch which can be consumed by x86 to build their changes on top.
2015-05-19genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPUJiang Liu
With Posted-Interrupts support in Intel CPU and IOMMU, an external interrupt from assigned-devices could be directly delivered to a virtual CPU in a virtual machine. Instead of hacking KVM and Intel IOMMU drivers, we propose a platform independent interface to target an interrupt to a specific virtual CPU in a virtual machine, or set virtual CPU affinity for an interrupt. By adopting this new interface and the hierarchy irqdomain, we could easily support posted-interrupts on Intel platforms, and also provide flexible enough interfaces for other platforms to support similar features. Here is the usage scenario for this interface: Guest update MSI/MSI-X interrupt configuration -->QEMU and KVM handle this -->KVM call this interface (passing posted interrupts descriptor and guest vector) -->irq core will transfer the control to IOMMU -->IOMMU will do the real work of updating IRTE (IRTE has new format for VT-d Posted-Interrupts) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Link: http://lkml.kernel.org/r/1432026437-16560-2-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>