summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-09-21blkcg: associate writeback bios with a blkgDennis Zhou (Facebook)
One of the goals of this series is to remove a separate reference to the css of the bio. This can and should be accessed via bio_blkcg. In this patch, the wbc_init_bio call is changed such that it must be called after a queue has been associated with the bio. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: associate a blkg for pages being evicted by swapDennis Zhou (Facebook)
A prior patch in this series added blkg association to bios issued by cgroups. There are two other paths that we want to attribute work back to the appropriate cgroup: swap and writeback. Here we modify the way swap tags bios to include the blkg. Writeback will be tackle in the next patch. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: consolidate bio_issue_init to be a part of coreDennis Zhou (Facebook)
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: always associate a bio with a blkgDennis Zhou (Facebook)
Previously, blkg's were only assigned as needed by blk-iolatency and blk-throttle. bio->css was also always being associated while blkg was being looked up and then thrown away in blkcg_bio_issue_check. This patch begins the cleanup of bio->css and bio->bi_blkg by always associating a blkg in blkcg_bio_issue_check. This tries to create the blkg, but if it is not possible, falls back to using the root_blkg of the request_queue. Therefore, a bio will always be associated with a blkg. The duplicate association logic is removed from blk-throttle and blk-iolatency. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: convert blkg_lookup_create to find closest blkgDennis Zhou (Facebook)
There are several scenarios where blkg_lookup_create can fail. Examples include the blkcg dying, request_queue is dying, or simply being OOM. At the end of the day, most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create, if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest is introduced and used during association so a bio is always attached to a blkg. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: update blkg_lookup_create to do lockingDennis Zhou (Facebook)
To know when to create a blkg, the general pattern is to do a blkg_lookup and if that fails, lock and then do a lookup again and if that fails finally create. It doesn't make much sense for everyone who wants to do creation to write this themselves. This changes blkg_lookup_create to do locking and implement this pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create. If a call site wants to do its own error handling or already owns the queue lock, they can use __blkg_lookup_create. This will be used in upcoming patches. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21blkcg: fix ref count issue with bio_blkcg using task_cssDennis Zhou (Facebook)
The accessor function bio_blkcg either returns the blkcg associated with the bio or finds one in the current context. This can cause an issue when trying to associate a bio with a blkcg. Particularly, it's the third case that is problematic: return css_to_blkcg(task_css(current, io_cgrp_id)); As the above may race against task migration and the cgroup exiting, it is not always ok to take a reference on the blkcg returned from bio_blkcg. This patch adds association ahead of calling bio_blkcg rather than after. This makes association a required and explicit step along the code paths for calling bio_blkcg. blk_get_rl is modified as well to get a reference to the blkcg it may use and blk_put_rl will always put the reference back. Association is also moved above the bio_blkcg call to ensure it will not return NULL in blk-iolatency. BFQ and CFQ utilize this flaw, but due to the complexity, I do not want to address this in this series. I've created a private version of the function with notes not to use it describing the flaw. Hopefully soon, that code can be cleaned up. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-20Blk-throttle: update to use rbtree with leftmost node cachedLiu Bo
As rbtree has native support of caching leftmost node, i.e. rb_root_cached, no need to do the caching by ourselves. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-20block: use bio_add_page in bio_iov_iter_get_pagesChristoph Hellwig
Replace a nasty hack with a different nasty hack to prepare for multipage bio_vecs. By moving the temporary page array as far up as possible in the space allocated for the bio_vec array we can iterate forward over it and thus use bio_add_page. Using bio_add_page means we'll be able to merge physically contiguous pages once support for multipath bio_vecs is merged. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-14blok, bfq: do not plug I/O if all queues are weight-raisedPaolo Valente
To reduce latency for interactive and soft real-time applications, bfq privileges the bfq_queues containing the I/O of these applications. These privileged queues, referred-to as weight-raised queues, get a much higher share of the device throughput w.r.t. non-privileged queues. To preserve this higher share, the I/O of any non-weight-raised queue must be plugged whenever a sync weight-raised queue, while being served, remains temporarily empty. To attain this goal, bfq simply plugs any I/O (from any queue), if a sync weight-raised queue remains empty while in service. Unfortunately, this plugging typically lowers throughput with random I/O, on devices with internal queueing (because it reduces the filling level of the internal queues of the device). This commit addresses this issue by restricting the cases where plugging is performed: if a sync weight-raised queue remains empty while in service, then I/O plugging is performed only if some of the active bfq_queues are *not* weight-raised (which is actually the only circumstance where plugging is needed to preserve the higher share of the throughput of weight-raised queues). This restriction proved able to boost throughput in really many use cases needing only maximum throughput. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-14block, bfq: inject other-queue I/O into seeky idle queues on NCQ flashPaolo Valente
The Achilles' heel of BFQ is its failing to reach a high throughput with sync random I/O on flash storage with internal queueing, in case the processes doing I/O have differentiated weights. The cause of this failure is as follows. If at least two processes do sync I/O, and have a different weight from each other, then BFQ plugs I/O dispatching every time one of these processes, while it is being served, remains temporarily without pending I/O requests. This plugging is necessary to guarantee that every process enjoys a bandwidth proportional to its weight; but it empties the internal queue(s) of the drive. And this kills throughput with random I/O. So, if some processes have differentiated weights and do both sync and random I/O, the end result is a throughput collapse. This commit tries to counter this problem by injecting the service of other processes, in a controlled way, while the process in service happens to have no I/O. This injection is performed only if the medium is non rotational and performs internal queueing, and the process in service does random I/O (service injection might be beneficial for sequential I/O too, we'll work on that). As an example of the benefits of this commit, on a PLEXTOR PX-256M5S SSD, and with five processes having differentiated weights and doing sync random 4KB I/O, this commit makes the throughput with bfq grow by 400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than that reached with none. As some less random I/O is added to the mix, the throughput becomes equal to or higher than that with none. This commit is a very first attempt to recover throughput without losing control, and certainly has many limitations. One is, e.g., that the processes whose service is injected are not chosen so as to distribute the extra bandwidth they receive in accordance to their weights. Thus there might be loss of weighted fairness in some cases. Anyway, this loss concerns extra service, which would not have been received at all without this commit. Other limitations and issues will probably show up with usage. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-14block, bfq: correctly charge and reset entity service in all casesPaolo Valente
BFQ schedules entities (which represent either per-process queues or groups of queues) as a function of their timestamps. In particular, as a function of their (virtual) finish times. The finish time of an entity is computed as a function of the budget assigned to the entity, assuming, tentatively, that the entity, once in service, will receive an amount of service equal to its budget. Then, when the entity is expired because it finishes to be served, this finish time is updated as a function of the actual service received by the entity. This allows the entity to be correctly charged with only the service received, and then to be correctly re-scheduled. Yet an entity may receive service also while not being the entity in service (in the scheduling environment of its parent entity), for several reasons. If the entity remains with no backlog while receiving this 'unofficial' service, then it is expired. Also on such an expiration, the finish time of the entity should be updated to account for only the service actually received by the entity. Unfortunately, such an update is not performed for an entity expiring without being the entity in service. In a similar vein, the service counter of the entity in service is reset when the entity is expired, to be ready to be used for next service cycle. This reset too should be performed also in case an entity is expired because it remains empty after receiving service while not being the entity in service. But in this case the reset is not performed. This commit performs the above update of the finish time and reset of the service received, also for an entity expiring while not being the entity in service. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-13blk-iolatency: remove set but not used variables 'changed' and 'blkiolat'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: block/blk-iolatency.c: In function 'scale_change': block/blk-iolatency.c:301:7: warning: variable 'changed' set but not used [-Wunused-but-set-variable] block/blk-iolatency.c: In function 'iolatency_set_limit': block/blk-iolatency.c:765:24: warning: variable 'blkiolat' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-11rsxx: Remove unnecessary parenthesesNathan Chancellor
Clang warns when more than one set of parentheses is used for a single conditional statement: drivers/block/rsxx/cregs.c:279:15: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((cmd->op == CREG_OP_READ)) { ~~~~~~~~^~~~~~~~~~~~~~~ drivers/block/rsxx/cregs.c:279:15: note: remove extraneous parentheses around the comparison to silence this warning if ((cmd->op == CREG_OP_READ)) { ~ ^ ~ drivers/block/rsxx/cregs.c:279:15: note: use '=' to turn this equality comparison into an assignment if ((cmd->op == CREG_OP_READ)) { ^~ = 1 warning generated. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-07block: umem: replace spin_lock_bh with spin_lock in tasklet callbackjun qian
As you are already in a tasklet, it is unnecessary to call spin_lock_bh. Signed-off-by: jun qian <hangdianqj@163.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06block: remove bio_rewind_iter()Ming Lei
It is pointed that bio_rewind_iter() is one very bad API[1]: 1) bio size may not be restored after rewinding 2) it causes some bogus change, such as 5151842b9d8732 (block: reset bi_iter.bi_done after splitting bio) 3) rewinding really makes things complicated wrt. bio splitting 4) unnecessary updating of .bi_done in fast path [1] https://marc.info/?t=153549924200005&r=1&w=2 So this patch takes Kent's suggestion to restore one bio into its original state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(), given now bio_rewind_iter() is only used by bio integrity code. Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Hannes Reinecke <hare@suse.com> Suggested-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06drbd: Convert from ahash to shashKees Cook
In preparing to remove all stack VLA usage from the kernel[1], this removes the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct shash. By removing a layer of indirection this both improves performance and reduces stack usage. The stack allocation will be made a fixed size in a later patch to the crypto subsystem. The bulk of the lines in this change are simple s/ahash/shash/, but the main logic differences are in drbd_csum_ee() and drbd_csum_bio(), which externalizes the page walking with k(un)map_atomic() instead of using scattergather. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Small collection of fixes that should go into this release. This contains: - Small series that fixes a race between blkcg teardown and writeback (Dennis Zhou) - Fix disallowing invalid block size settings from the nbd ioctl (me) - BFQ fix for a use-after-free on last release of a bfqg (Konstantin Khlebnikov) - Fix for the "don't warn for flush" fix (Mikulas)" * tag 'for-linus-20180906' of git://git.kernel.dk/linux-block: block: bfq: swap puts in bfqg_and_blkg_put block: don't warn when doing fsync on read-only devices nbd: don't allow invalid blocksize settings blkcg: use tryget logic when associating a blkg with a bio blkcg: delay blkg destruction until after writeback has finished Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
2018-09-06block: bfq: swap puts in bfqg_and_blkg_putKonstantin Khlebnikov
Fix trivial use-after-free. This could be last reference to bfqg. Fixes: 8f9bebc33dd7 ("block, bfq: access and cache blkg data only when safe") Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06Merge tag 'apparmor-pr-2018-09-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "A fix for an issue syzbot discovered last week: - Fix for bad debug check when converting secids to secctx" * tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix bad debug check in apparmor_secid_to_secctx()
2018-09-06Merge tag 'trace-v4.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This fixes two annoying bugs: - The first one is a side effect caused by using SRCU for rcuidle tracepoints. It seems that the perf was depending on the rcuidle tracepoints to make RCU watch when it wasn't. The real fix will be to have perf use SRCU instead of depending on RCU watching, but that can't be done until SRCU is safe to use in NMI context (Paul's working on that). - The second bug fix is for a bug that's been periodically making my tests fail randomly for some time. I haven't had time to track it down, but finally have. It has to do with stressing NMIs (via perf) while enabling or disabling ftrace function handling with lockdep enabled. If an interrupt happens and just as it returns, it sets lockdep back to "interrupts enabled" but before it returns an NMI is triggered, and if this happens while printk_nmi_enter has a breakpoint attached to it (because ftrace is converting it to or from nop to call fentry), the breakpoint trap also calls into lockdep, and since returning from the NMI to a interrupt handler, interrupts were disabled when the NMI went off, lockdep keeps its state as interrupts disabled when it returns back from the interrupt handler where interrupts are enabled. This causes lockdep_assert_irqs_enabled() to trigger a false positive" * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk/tracing: Do not trace printk_nmi_enter() tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06Merge tag 'for-4.19-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix for improper fsync after hardlink - fix for a corruption during file deduplication - use after free fixes - RCU warning fix - fix for buffered write to nodatacow file * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu btrfs: use after free in btrfs_quota_enable btrfs: btrfs_shrink_device should call commit transaction at the end btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata Btrfs: fix data corruption when deduplicating between different files Btrfs: sync log after logging new name Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
2018-09-06printk/tracing: Do not trace printk_nmi_enter()Steven Rostedt (VMware)
I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] <interrupt> [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] <nmi> nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] <breakpoint exception> TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: stable@vger.kernel.org Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-05block: don't warn when doing fsync on read-only devicesMikulas Patocka
It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch b089cfd95d32 ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-05Merge tag 'gpio-v4.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Some GPIO fixes. The ACPI stuff is probably the most annoying for users that get fixed this time. - Atomic contexts, cansleep* calls and such fastpath/slopwpath things. - Defer ACPI event handler registration to late_initcall() so IRQs do not fire in our face before other drivers have a chance to register handlers. - Race condition if a consumer requests a GPIO after gpiochip_add_data_with_key() but before of_gpiochip_add() - Probe errorpath in the dwapb driver" * tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: Fix crash due to registration race gpio: dwapb: Fix error handling in dwapb_gpio_probe() gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall gpiolib: acpi: Switch to cansleep version of GPIO library call gpio: adp5588: Fix sleep-in-atomic-context bug
2018-09-05Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "A set of very minor fixes and a couple of reverts to fix a major problem (the attempt to change the busy count causes a hang when attempting to change the drive cache type)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: aacraid: fix a signedness bug Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq" Revert "scsi: core: fix scsi_host_queue_ready" scsi: libata: Add missing newline at end of file scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info() scsi: hpsa: limit transfer length to 1MB, not 512kB scsi: lpfc: Correct MDS diag and nvmet configuration scsi: lpfc: Default fdmi_on to on scsi: csiostor: fix incorrect port capabilities scsi: csiostor: add a check for NULL pointer after kmalloc() scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters scsi: core: Update SCSI_MQ_DEFAULT help text to match default
2018-09-05Merge tag 'nds32-for-linus-4.19-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux Pull nds32 updates from Greentime Hu: "Contained in here are the bug fixes, building error fixes and ftrace support for nds32" * tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux: nds32: linker script: GCOV kernel may refers data in __exit nds32: fix build error because of wrong semicolon nds32: Fix a kernel panic issue because of wrong frame pointer access. nds32: Only print one page of stack when die to prevent printing too much information. nds32: Add macro definition for offset of lp register on stack nds32: Remove the deprecated ABI implementation nds32/stack: Get real return address by using ftrace_graph_ret_addr nds32/ftrace: Support dynamic function graph tracer nds32/ftrace: Support dynamic function tracer nds32/ftrace: Add RECORD_MCOUNT support nds32/ftrace: Support static function graph tracer nds32/ftrace: Support static function tracer nds32: Extract the checking and getting pointer to a macro nds32: Clean up the coding style nds32: Fix get_user/put_user macro expand pointer problem nds32: Fix empty call trace nds32: add NULL entry to the end of_device_id array nds32: fix logic for module
2018-09-05tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepointsSteven Rostedt (VMware)
Borislav reported the following splat: ============================= WARNING: suspicious RCU usage 4.19.0-rc1+ #1 Not tainted ----------------------------- ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 1 RCU used illegally from extended quiescent state! 1 lock held by swapper/0/0: #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 Call Trace: dump_stack+0x85/0xcb perf_event_output_forward+0xf6/0x130 __perf_event_overflow+0x52/0xe0 perf_swevent_overflow+0x91/0xb0 perf_tp_event+0x11a/0x350 ? find_held_lock+0x2d/0x90 ? __lock_acquire+0x2ce/0x1350 ? __lock_acquire+0x2ce/0x1350 ? retint_kernel+0x2d/0x2d ? find_held_lock+0x2d/0x90 ? tick_nohz_get_sleep_length+0x83/0xb0 ? perf_trace_cpu+0xbb/0xd0 ? perf_trace_buf_alloc+0x5a/0xa0 perf_trace_cpu+0xbb/0xd0 cpuidle_enter_state+0x185/0x340 do_idle+0x1eb/0x260 cpu_startup_entry+0x5f/0x70 start_kernel+0x49b/0x4a6 secondary_startup_64+0xa4/0xb0 This is due to the tracepoints moving to SRCU usage which does not require RCU to be "watching". But perf uses these tracepoints with RCU and expects it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson() calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU working in NMI context, and then perf can be converted to use that instead of normal RCU. Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home Cc: x86-ml <x86@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Fixes: e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-05nds32: linker script: GCOV kernel may refers data in __exitGreentime Hu
This patch is used to fix nds32 allmodconfig/allyesconfig build error because GCOV kernel embeds counters in the kernel for each line and a part of that embed in __exit text. So we need to keep the EXIT_TEXT and EXIT_DATA if CONFIG_GCOV_KERNEL=y. Link: https://lkml.org/lkml/2018/9/1/125 Signed-off-by: Greentime Hu <greentime@andestech.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
2018-09-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: nilfs2: convert to SPDX license tags drivers/dax/device.c: convert variable to vm_fault_t type lib/Kconfig.debug: fix three typos in help text checkpatch: add __ro_after_init to known $Attribute mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name memory_hotplug: fix kernel_panic on offline page processing checkpatch: add optional static const to blank line declarations test ipc/shm: properly return EIDRM in shm_lock() mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. mm/util.c: improve kvfree() kerneldoc tools/vm/page-types.c: fix "defined but not used" warning tools/vm/slabinfo.c: fix sign-compare warning kmemleak: always register debugfs file mm: respect arch_dup_mmap() return value mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm(). mm: memcontrol: print proper OOM header when no eligible victim left
2018-09-04nilfs2: convert to SPDX license tagsRyusuke Konishi
Remove the verbose license text from NILFS2 files and replace them with SPDX tags. This does not change the license of any of the code. Link: http://lkml.kernel.org/r/1535624528-5982-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04drivers/dax/device.c: convert variable to vm_fault_t typeSouptick Joarder
As part of 226ab561075f ("device-dax: Convert to vmf_insert_mixed and vm_fault_t") in 4.19-rc1, 'rc' was not converted to vm_fault_t. Now converted. Link: http://lkml.kernel.org/r/20180830153813.GA26059@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04lib/Kconfig.debug: fix three typos in help textThibaut Sautereau
Fix three typos in CONFIG_WARN_ALL_UNSEEDED_RANDOM help text. Link: http://lkml.kernel.org/r/20180830194505.4778-1-thibaut@sautereau.fr Signed-off-by: Thibaut Sautereau <thibaut@sautereau.fr> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04checkpatch: add __ro_after_init to known $AttributeJoe Perches
__ro_after_init is a specific __attribute__ that checkpatch does currently not understand. Add it to the known $Attribute types so that code that uses variables declared with __ro_after_init are not thought to be a modifier type. This appears as a defect in checkpatch output of code like: static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU); [...] if (trust_cpu && arch_init) { where checkpatch reports: ERROR: space prohibited after that '&&' (ctx:WxW) if (trust_cpu && arch_init) { Link: http://lkml.kernel.org/r/0fa8a2cb83ade4c525e18261ecf6cfede3015983.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removalDave Jiang
It looks like I missed the PUD path when doing VM_MIXEDMAP removal. This can be triggered by: 1. Boot with memmap=4G!8G 2. build ndctl with destructive flag on 3. make TESTS=device-dax check [ +0.000675] kernel BUG at mm/huge_memory.c:824! Applying the same change that was applied to vmf_insert_pfn_pmd() in the original patch. Link: http://lkml.kernel.org/r/153565957352.35524.1005746906902065126.stgit@djiang5-desk3.ch.intel.com Fixes: e1fb4a08649 ("dax: remove VM_MIXEDMAP for fsdax and device dax") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member nameRandy Dunlap
Since this header is in "include/uapi/linux/", apparently people want to use it in userspace programs -- even in C++ ones. However, the header uses a C++ reserved keyword ("private"), so change that to "dh_private" instead to allow the header file to be used in C++ userspace. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051 Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org Fixes: ddbb41148724 ("KEYS: Add KEYCTL_DH_COMPUTE command") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04memory_hotplug: fix kernel_panic on offline page processingMikhail Zaslonko
Within show_valid_zones() the function test_pages_in_a_zone() should be called for online memory blocks only. Otherwise it might lead to the VM_BUG_ON due to uninitialized struct pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set): page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) ------------[ cut here ]------------ Call Trace: ([<000000000038f91e>] test_pages_in_a_zone+0xe6/0x168) [<0000000000923472>] show_valid_zones+0x5a/0x1a8 [<0000000000900284>] dev_attr_show+0x3c/0x78 [<000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150 [<00000000003ef662>] seq_read+0x212/0x4b8 [<00000000003bf202>] __vfs_read+0x3a/0x178 [<00000000003bf3ca>] vfs_read+0x8a/0x148 [<00000000003bfa3a>] ksys_read+0x62/0xb8 [<0000000000bc2220>] system_call+0xdc/0x2d8 That VM_BUG_ON was triggered by the page poisoning introduced in mm/sparse.c with the git commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug"). With the same commit the new 'nid' field has been added to the struct memory_block in order to store and later on derive the node id for offline pages (instead of accessing struct page which might be uninitialized). But one reference to nid in show_valid_zones() function has been overlooked. Fixed with current commit. Also, nr_pages will not be used any more after test_pages_in_a_zone() call, do not update it. Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: <stable@vger.kernel.org> [4.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04checkpatch: add optional static const to blank line declarations testJoe Perches
Using a static const struct definition as part of a series of declarations produces a false positive "Missing a blank line after declarations" for code like: WARNING: Missing a blank line after declarations #710: FILE: drivers/gpu/drm/tidss/tidss_scale_coefs.c:137: + int inc; + static const struct { So fix it. Link: http://lkml.kernel.org/r/5905126e70b0ed1781e49265fd5c49c5090d0223.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Jyri Sarha <jsarha@ti.com> Cc: "Valkeinen, Tomi" <tomi.valkeinen@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04ipc/shm: properly return EIDRM in shm_lock()Davidlohr Bueso
When getting rid of the general ipc_lock(), this was missed furthermore, making the comment around the ipc object validity check bogus. Under EIDRM conditions, callers will in turn not see the error and continue with the operation. Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5 Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian Fixes: 82061c57ce9 ("ipc: drop ipc_lock()") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.Aneesh Kumar K.V
When scanning for movable pages, filter out Hugetlb pages if hugepage migration is not supported. Without this we hit infinte loop in __offline_pages() where we do pfn = scan_movable_pages(start_pfn, end_pfn); if (pfn) { /* We have movable pages */ ret = do_migrate_range(pfn, end_pfn); goto repeat; } Fix this by checking hugepage_migration_supported both in has_unmovable_pages which is the primary backoff mechanism for page offlining and for consistency reasons also into scan_movable_pages because it doesn't make any sense to return a pfn to non-migrateable huge page. This issue was revealed by, but not caused by 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early"). Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com Fixes: 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Haren Myneni <haren@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm/util.c: improve kvfree() kerneldocAndrew Morton
Scooped from an email from Matthew. Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04tools/vm/page-types.c: fix "defined but not used" warningNaoya Horiguchi
debugfs_known_mountpoints[] is not used any more, so let's remove it. Link: http://lkml.kernel.org/r/1535102651-19418-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04tools/vm/slabinfo.c: fix sign-compare warningNaoya Horiguchi
Currently we get the following compiler warning: slabinfo.c:854:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (s->object_size < min_objsize) ^ due to the mismatch of signed/unsigned comparison. ->object_size and ->slab_size are never expected to be negative, so let's define them as unsigned int. [n-horiguchi@ah.jp.nec.com: convert everything - none of these can be negative] Link: http://lkml.kernel.org/r/20180826234947.GA9787@hori1.linux.bs1.fc.nec.co.jp Link: http://lkml.kernel.org/r/1535103134-20239-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04kmemleak: always register debugfs fileVincent Whitchurch
If kmemleak built in to the kernel, but is disabled by default, the debugfs file is never registered. Because of this, it is not possible to find out if the kernel is built with kmemleak support by checking for the presence of this file. To allow this, always register the file. After this patch, if the file doesn't exist, kmemleak is not available in the kernel. If writing "scan" or any other value than "clear" to this file results in EBUSY, then kmemleak is available but is disabled by default and can be activated via the kernel command line. Catalin: "that's also consistent with a late disabling of kmemleak when the debugfs entry sticks around." Link: http://lkml.kernel.org/r/20180824131220.19176-1-vincent.whitchurch@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm: respect arch_dup_mmap() return valueNadav Amit
Commit d70f2a14b72a ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") ignored the return value of arch_dup_mmap(). As a result, on x86, a failure to duplicate the LDT (e.g. due to memory allocation error) would leave the duplicated memory mapping in an inconsistent state. Fix by using the return value, as it was before the change. Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com Fixes: d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().Tetsuo Handa
Commit 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has added an ability to skip over vmas with blockable mmu notifiers. This however didn't call tlb_finish_mmu as it should. As a result inc_tlb_flush_pending has been called without its pairing dec_tlb_flush_pending and all callers mm_tlb_flush_pending would flush even though this is not really needed. This alone is not harmful and it seems there shouldn't be any such callers for oom victims at all but there is no real reason to skip tlb_finish_mmu on early skip either so call it. [mhocko@suse.com: new changelog] Link: http://lkml.kernel.org/r/b752d1d5-81ad-7a35-2394-7870641be51c@i-love.sakura.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04mm: memcontrol: print proper OOM header when no eligible victim leftJohannes Weiner
When the memcg OOM killer runs out of killable tasks, it currently prints a WARN with no further OOM context. This has caused some user confusion. Warnings indicate a kernel problem. In a reported case, however, the situation was triggered by a nonsensical memcg configuration (hard limit set to 0). But without any VM context this wasn't obvious from the report, and it took some back and forth on the mailing list to identify what is actually a trivial issue. Handle this OOM condition like we handle it in the global OOM killer: dump the full OOM context and tell the user we ran out of tasks. This way the user can identify misconfigurations easily by themselves and rectify the problem - without having to go through the hassle of running into an obscure but unsettling warning, finding the appropriate kernel mailing list and waiting for a kernel developer to remote-analyze that the memcg configuration caused this. If users cannot make sense of why the OOM killer was triggered or why it failed, they will still report it to the mailing list, we know that from experience. So in case there is an actual kernel bug causing this, kernel developers will very likely hear about it. Link: http://lkml.kernel.org/r/20180821160406.22578-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Must perform TXQ teardown before unregistering interfaces in mac80211, from Toke Høiland-Jørgensen. 2) Don't allow creating mac80211_hwsim with less than one channel, from Johannes Berg. 3) Division by zero in cfg80211, fix from Johannes Berg. 4) Fix endian issue in tipc, from Haiqing Bai. 5) BPF sockmap use-after-free fixes from Daniel Borkmann. 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park. 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang. 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when kvzalloc() tries to use kmalloc() pages. From Eric Dumazet. 9) Fix deadlock in hv_netvsc, from Dexuan Cui. 10) Do not restart timewait timer on RST, from Florian Westphal. 11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev. 12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu. 13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai. 14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu. 15) Use-after-free in act_ife, fix from Cong Wang. 16) Missing hold to meta module in act_ife, from Vlad Buslov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits) net: phy: sfp: Handle unimplemented hwmon limits and alarms net: sched: action_ife: take reference to meta module act_ife: fix a potential use-after-free net/mlx5: Fix SQ offset in QPs with small RQ tipc: correct spelling errors for tipc_topsrv_queue_evt() comments tipc: correct spelling errors for struct tipc_bc_base's comment bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA. bnxt_en: Clean up unused functions. bnxt_en: Fix firmware signaled resource change logic in open. sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel sctp: fix invalid reference to the index variable of the iterator net/ibm/emac: wrong emac_calc_base call was used by typo net: sched: null actions array pointer before releasing action vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition r8169: add support for NCube 8168 network card ip6_tunnel: respect ttl inherit for ip6tnl mac80211: shorten the IBSS debug messages mac80211: don't Tx a deauth frame if the AP forbade Tx mac80211: Fix station bandwidth setting after channel switch mac80211: fix a race between restart and CSA flows ...
2018-09-04net: phy: sfp: Handle unimplemented hwmon limits and alarmsAndrew Lunn
Not all SFPs implement the registers containing sensor limits and alarms. Luckily, there is a bit indicating if they are implemented or not. Add checking for this bit, when deciding if the hwmon attributes should be visible. Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04net: sched: action_ife: take reference to meta moduleVlad Buslov
Recent refactoring of add_metainfo() caused use_all_metadata() to add metainfo to ife action metalist without taking reference to module. This causes warning in module_put called from ife action cleanup function. Implement add_metainfo_and_get_ops() function that returns with reference to module taken if metainfo was added successfully, and call it from use_all_metadata(), instead of calling __add_metainfo() directly. Example warning: [ 646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230 [ 646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas [ 646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799 [ 646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 646.440595] RIP: 0010:module_put+0x1cb/0x230 [ 646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b [ 646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286 [ 646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db [ 646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518 [ 646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3 [ 646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d [ 646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100 [ 646.508213] FS: 00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000 [ 646.516961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0 [ 646.530595] Call Trace: [ 646.533408] ? find_symbol_in_section+0x260/0x260 [ 646.538509] tcf_ife_cleanup+0x11b/0x200 [act_ife] [ 646.543695] tcf_action_cleanup+0x29/0xa0 [ 646.548078] __tcf_action_put+0x5a/0xb0 [ 646.552289] ? nla_put+0x65/0xe0 [ 646.555889] __tcf_idr_release+0x48/0x60 [ 646.560187] tcf_generic_walker+0x448/0x6b0 [ 646.564764] ? tcf_action_dump_1+0x450/0x450 [ 646.569411] ? __lock_is_held+0x84/0x110 [ 646.573720] ? tcf_ife_walker+0x10c/0x20f [act_ife] [ 646.578982] tca_action_gd+0x972/0xc40 [ 646.583129] ? tca_get_fill.constprop.17+0x250/0x250 [ 646.588471] ? mark_lock+0xcf/0x980 [ 646.592324] ? check_chain_key+0x140/0x1f0 [ 646.596832] ? debug_show_all_locks+0x240/0x240 [ 646.601839] ? memset+0x1f/0x40 [ 646.605350] ? nla_parse+0xca/0x1a0 [ 646.609217] tc_ctl_action+0x215/0x230 [ 646.613339] ? tcf_action_add+0x220/0x220 [ 646.617748] rtnetlink_rcv_msg+0x56a/0x6d0 [ 646.622227] ? rtnl_fdb_del+0x3f0/0x3f0 [ 646.626466] netlink_rcv_skb+0x18d/0x200 [ 646.630752] ? rtnl_fdb_del+0x3f0/0x3f0 [ 646.634959] ? netlink_ack+0x500/0x500 [ 646.639106] netlink_unicast+0x2d0/0x370 [ 646.643409] ? netlink_attachskb+0x340/0x340 [ 646.648050] ? _copy_from_iter_full+0xe9/0x3e0 [ 646.652870] ? import_iovec+0x11e/0x1c0 [ 646.657083] netlink_sendmsg+0x3b9/0x6a0 [ 646.661388] ? netlink_unicast+0x370/0x370 [ 646.665877] ? netlink_unicast+0x370/0x370 [ 646.670351] sock_sendmsg+0x6b/0x80 [ 646.674212] ___sys_sendmsg+0x4a1/0x520 [ 646.678443] ? copy_msghdr_from_user+0x210/0x210 [ 646.683463] ? lock_downgrade+0x320/0x320 [ 646.687849] ? debug_show_all_locks+0x240/0x240 [ 646.692760] ? do_raw_spin_unlock+0xa2/0x130 [ 646.697418] ? _raw_spin_unlock+0x24/0x30 [ 646.701798] ? __handle_mm_fault+0x1819/0x1c10 [ 646.706619] ? __pmd_alloc+0x320/0x320 [ 646.710738] ? debug_show_all_locks+0x240/0x240 [ 646.715649] ? restore_nameidata+0x7b/0xa0 [ 646.720117] ? check_chain_key+0x140/0x1f0 [ 646.724590] ? check_chain_key+0x140/0x1f0 [ 646.729070] ? __fget_light+0xbc/0xd0 [ 646.733121] ? __sys_sendmsg+0xd7/0x150 [ 646.737329] __sys_sendmsg+0xd7/0x150 [ 646.741359] ? __ia32_sys_shutdown+0x30/0x30 [ 646.746003] ? up_read+0x53/0x90 [ 646.749601] ? __do_page_fault+0x484/0x780 [ 646.754105] ? do_syscall_64+0x1e/0x2c0 [ 646.758320] do_syscall_64+0x72/0x2c0 [ 646.762353] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 646.767776] RIP: 0033:0x7f4163872150 [ 646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24 [ 646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150 [ 646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003 [ 646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000 [ 646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20 [ 646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0 [ 646.837360] irq event stamp: 6083 [ 646.841043] hardirqs last enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500 [ 646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c [ 646.859775] softirqs last enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee [ 646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife] [ 646.878845] ---[ end trace b1b8c12ffe51e657 ]--- Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>