summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-13selftests/bpf: Add cgroup id helpers to bpf_helpers.hAndrey Ignatov
Add bpf_skb_cgroup_id and bpf_skb_ancestor_cgroup_id helpers to bpf_helpers.h to use them in tests and samples. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-13bpf: Sync bpf.h to tools/Andrey Ignatov
Sync skb_ancestor_cgroup_id() related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-13bpf: Introduce bpf_skb_ancestor_cgroup_id helperAndrey Ignatov
== Problem description == It's useful to be able to identify cgroup associated with skb in TC so that a policy can be applied to this skb, and existing bpf_skb_cgroup_id helper can help with this. Though in real life cgroup hierarchy and hierarchy to apply a policy to don't map 1:1. It's often the case that there is a container and corresponding cgroup, but there are many more sub-cgroups inside container, e.g. because it's delegated to containerized application to control resources for its subsystems, or to separate application inside container from infra that belongs to containerization system (e.g. sshd). At the same time it may be useful to apply a policy to container as a whole. If multiple containers like this are run on a host (what is often the case) and many of them have sub-cgroups, it may not be possible to apply per-container policy in TC with existing helpers such as bpf_skb_under_cgroup or bpf_skb_cgroup_id: * bpf_skb_cgroup_id will return id of immediate cgroup associated with skb, i.e. if it's a sub-cgroup inside container, it can't be used to identify container's cgroup; * bpf_skb_under_cgroup can work only with one cgroup and doesn't scale, i.e. if there are N containers on a host and a policy has to be applied to M of them (0 <= M <= N), it'd require M calls to bpf_skb_under_cgroup, and, if M changes, it'd require to rebuild & load new BPF program. == Solution == The patch introduces new helper bpf_skb_ancestor_cgroup_id that can be used to get id of cgroup v2 that is an ancestor of cgroup associated with skb at specified level of cgroup hierarchy. That way admin can place all containers on one level of cgroup hierarchy (what is a good practice in general and already used in many configurations) and identify specific cgroup on this level no matter what sub-cgroup skb is associated with. E.g. if there is a cgroup hierarchy: root/ root/container1/ root/container1/app11/ root/container1/app11/sub-app-a/ root/container1/app12/ root/container2/ root/container2/app21/ root/container2/app22/ root/container2/app22/sub-app-b/ , then having skb associated with root/container1/app11/sub-app-a/ it's possible to get ancestor at level 1, what is container1 and apply policy for this container, or apply another policy if it's container2. Policies can be kept e.g. in a hash map where key is a container cgroup id and value is an action. Levels where container cgroups are created are usually known in advance whether cgroup hierarchy inside container may be hard to predict especially in case when its creation is delegated to containerized application. == Implementation details == The helper gets ancestor by walking parents up to specified level. Another option would be to get different kind of "id" from cgroup->ancestor_ids[level] and use it with idr_find() to get struct cgroup for ancestor. But that would require radix lookup what doesn't seem to be better (at least it's not obviously better). Format of return value of the new helper is same as that of bpf_skb_cgroup_id. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-13bpf: decouple btf from seq bpf fs dump and enable more mapsDaniel Borkmann
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>
2018-08-12Linux 4.18Linus Torvalds
2018-08-12Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight fixes. The most important one is the mpt3sas fix which makes the driver work again on big endian systems. The rest are mostly minor error path or checker issues and the vmw_scsi one fixes a performance problem" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled scsi: mpt3sas: Swap I/O memory read value back to cpu endianness scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO scsi: fcoe: drop frames in ELS LOGO error path scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send scsi: qedi: Fix a potential buffer overflow scsi: qla2xxx: Fix memory leak for allocating abort IOCB
2018-08-12init: rename and re-order boot_cpu_state_init()Linus Torvalds
This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-12Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "A bunch of race fixes, mostly around lazy pathwalk. All of it is -stable fodder, a large part going back to 2013" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make sure that __dentry_kill() always invalidates d_seq, unhashed or not fix __legitimize_mnt()/mntput() race fix mntput/mntput race root dentries need RCU-delayed freeing
2018-08-12tty: serial: 8250: Revert NXP SC16C2552 workaroundMark
Revert commit ecb988a3b7985913d1f0112f66667cdd15e40711: tty: serial: 8250: 8250_core: NXP SC16C2552 workaround The above commit causes userland application to no longer write correctly its first write to a dumb terminal connected to /dev/ttyS0. This commit seems to be the culprit. It's as though the TX FIFO is being reset during that write. What should be displayed is: PSW 80000000 INST 00000000 HALT // What is displayed is some variation of: T 00000000 HAL// Reverting this commit via this patch fixes my problem. Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com> Fixes: ecb988a3b798 ("tty: serial: 8250: 8250_core: NXP SC16C2552 workaround") Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-12xfs: fix a null pointer dereference in xfs_bmap_extents_to_btreeShan Hai
Fuzzing tool reports a write to null pointer error in the xfs_bmap_extents_to_btree, fix it by bailing out on encountering a null pointer. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12xfs: remove b_last_holder & associated macrosEric Sandeen
The old lock tracking infrastructure in xfs using the b_last_holder field seems to only be useful if you can get into the system with a debugger; it seems that the existing tracepoints would be the way to go these days, and this old infrastructure can be removed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12iomap: Switch to offset_in_page for clarityAndreas Gruenbacher
Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use the offset_in_page macro. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12xfs: Close race between direct IO and xfs_break_layouts()Dave Jiang
This patch is the duplicate of ross's fix for ext4 for xfs. If the refcount of a page is lowered between the time that it is returned by dax_busy_page() and when the refcount is again checked in xfs_break_layouts() => ___wait_var_event(), the waiting function xfs_wait_dax_page() will never be called. This means that xfs_break_layouts() will still have 'retry' set to false, so we'll stop looping and never check the refcount of other pages in this inode. Instead, always continue looping as long as dax_layout_busy_page() gives us a page which it found with an elevated refcount. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12Revert "uio: use request_threaded_irq instead"Xiubo Li
Since mutex lock in irq hanler is useless currently, here will remove it together with it. This reverts commit 9421e45f5ff3d558cf8b75a8cc0824530caf3453. Reported-by: james.r.harris@intel.com CC: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-12KVM: arm: Use true and false for boolean valuesGustavo A. R. Silva
Return statements in functions returning bool should use true or false instead of an integer value. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabledJia He
kvm_vgic_sync_hwstate is only called with IRQ being disabled. There is thus no need to call spin_lock_irqsave/restore in vgic_fold_lr_state and vgic_prune_ap_list. This patch replace them with the non irq-safe version. Signed-off-by: Jia He <jia.he@hxt-semitech.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.hJia He
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3, so let's move it to vgic.h Signed-off-by: Jia He <jia.he@hxt-semitech.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accessesMarc Zyngier
In order to generate Group0 SGIs, let's add some decoding logic to access_gic_sgi(), and pass the generating group accordingly. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accessesMarc Zyngier
In order to generate Group0 SGIs, let's add some decoding logic to access_gic_sgi(), and pass the generating group accordingly. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIsMarc Zyngier
Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm64: Remove non-existent AArch32 ICC_SGI1R encodingMarc Zyngier
ICC_SGI1R is a 64bit system register, even on AArch32. It is thus pointless to have such an encoding in the 32bit cp15 array. Let's drop it. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12Merge branch 'for-next' into for-linusTakashi Iwai
Preparation for 4.19 merge material. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-08-11Merge branch 'ip-faster-in-order-IP-fragments'David S. Miller
Peter Oskolkov says: ==================== ip: faster in-order IP fragments Added "Signed-off-by" in v2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11ip: process in-order fragments efficientlyPeter Oskolkov
This patch changes the runtime behavior of IP defrag queue: incoming in-order fragments are added to the end of the current list/"run" of in-order fragments at the tail. On some workloads, UDP stream performance is substantially improved: RX: ./udp_stream -F 10 -T 2 -l 60 TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60 with this patchset applied on a 10Gbps receiver: throughput=9524.18 throughput_units=Mbit/s upstream (net-next): throughput=4608.93 throughput_units=Mbit/s Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11ip: add helpers to process in-order fragments faster.Peter Oskolkov
This patch introduces several helper functions/macros that will be used in the follow-up patch. No runtime changes yet. The new logic (fully implemented in the second patch) is as follows: * Nodes in the rb-tree will now contain not single fragments, but lists of consecutive fragments ("runs"). * At each point in time, the current "active" run at the tail is maintained/tracked. Fragments that arrive in-order, adjacent to the previous tail fragment, are added to this tail run without triggering the re-balancing of the rb-tree. * If a fragment arrives out of order with the offset _before_ the tail run, it is inserted into the rb-tree as a single fragment. * If a fragment arrives after the current tail fragment (with a gap), it starts a new "tail" run, as is inserted into the rb-tree at the end as the head of the new run. skb->cb is used to store additional information needed here (suggested by Eric Dumazet). Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-08-11bcache: add the missing comments for smp_mb()/smp_wmb()Coly Li
Checkpatch.pl warns there are 2 locations of smp_mb() and smp_wmb() without code comment. This patch adds the missing code comments for these memory barrier calls. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: remove unnecessary space before ioctl function pointer argumentsColy Li
This is warned by checkpatch.pl, this patch removes the extra space. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: add missing SPDX headerColy Li
The SPDX header is missing fro closure.c, super.c and util.c, this patch adds SPDX header for GPL-2.0 into these files. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: move open brace at end of function definitions to next lineColy Li
This is not a preferred style to place open brace '{' at the end of function definition, checkpatch.pl reports error for such coding style. This patch moves them into the start of the next new line. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: add static const prefix to char * array declarationsColy Li
This patch declares char * array with const prefix in sysfs.c, which is suggested by checkpatch.pl. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: fix code comments styleColy Li
This patch fixes 3 style issues warned by checkpatch.pl, - Comment lines are not aligned - Comments use "/*" on subsequent lines - Comment lines use a trailing "*/" Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: do not check NULL pointer before calling kmem_cache_destroyColy Li
kmem_cache_destroy() is safe for NULL pointer as input, the NULL pointer checking is unncessary. This patch just removes the NULL pointer checking to make code simpler. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: prefer 'help' in KconfigColy Li
Current bcache Kconfig uses '---help---' as header of help information, for now 'help' is prefered. This patch fixes this style by replacing '---help---' by 'help' in bcache Kconfig file. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: fix typo 'succesfully' to 'successfully'Coly Li
This patch fixes typo 'succesfully' to correct 'successfully', which is suggested by checkpatch.pl. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: replace '%pF' by '%pS' in seq_printf()Coly Li
'%pF' and '%pf' are deprecated vsprintf pointer extensions, this patch replace them by '%pS', which is suggested by checkpatch.pl. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: fix indent by replacing blank by tabsColy Li
bch_btree_insert_check_key() has unaligned indent, or indent by blank characters. This patch makes the indent aligned and replace blank by tabs. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: replace printk() by pr_*() routinesColy Li
There are still many places in bcache use printk to display kernel message, which are suggested to be preplaced by pr_*() routines like pr_err(), pr_info(), or pr_notice(). This patch replaces all printk() with a proper pr_*() routine for bcache code. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: replace Symbolic permissions by octal permission numbersColy Li
Symbolic permission names are used in bcache, for now octal permission numbers are encouraged to use for readability. This patch replaces all symbolic permissions by octal permission numbers. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: style fixes for lines over 80 charactersColy Li
This patch fixes the lines over 80 characters into more lines, to minimize warnings by checkpatch.pl. There are still some lines exceed 80 characters, but it is better to be a single line and I don't change them. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: add identifier names to arguments of function definitionsColy Li
There are many function definitions do not have identifier argument names, scripts/checkpatch.pl complains warnings like this, WARNING: function definition argument 'struct bcache_device *' should also have an identifier name #16735: FILE: writeback.h:120: +void bch_sectors_dirty_init(struct bcache_device *); This patch adds identifier argument names to all bcache function definitions to fix such warnings. Signed-off-by: Coly Li <colyli@suse.de> Reviewed: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: style fix to add a blank line after declarationsColy Li
Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11bcache: style fix to replace 'unsigned' by 'unsigned int'Coly Li
This patch fixes warning reported by checkpatch.pl by replacing 'unsigned' with 'unsigned int'. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11blkcg: Make blkg_root_lookup() work for queues in bypass modeBart Van Assche
For legacy queues the only call of blkg_root_lookup() happens after bypass mode has been enabled. Since blkg_lookup() returns NULL for queues in bypass mode, modify the blkg_root_lookup() such that it no longer depends on bypass mode. Rename the function into blk_queue_root_blkg() as suggested by Tejun. Suggested-by: Tejun Heo <tj@kernel.org> Fixes: 6bad9b210a22 ("blkcg: Introduce blkg_root_lookup()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11Merge branch 'Remove-rtnl-lock-dependency-from-all-action-implementations'David S. Miller
Vlad Buslov says: ==================== Remove rtnl lock dependency from all action implementations Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a second step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of tc actions that prevent their control path from being executed concurrently. Additional changes are required to refactor classifiers API and individual classifiers for parallel execution. This patch set lays groundwork to eventually register rule update handlers as rtnl-unlocked. Action API is already prepared for parallel execution with previous patch set, which means that action ops that use action API for their implementation do not require additional modifications. (delete, search, etc.) Action API implements concurrency-safe reference counting and guarantees that cleanup/delete is called only once, after last reference to action is released. The goal of this change is to update specific actions APIs that access action private state directly, in order to be independent from external locking. General approach is to re-use existing tcf_lock spinlock (used by some action implementation to synchronize control path with data path) to protect action private state from concurrent modification. If action has rcu-protected pointer, tcf spinlock is used to protect its update code, instead of relying on rtnl lock. Some actions need to determine rtnl mutex status in order to release it. For example, ife action can load additional kernel modules(meta ops) and must make sure that no locks are held during module load. In such cases 'rtnl_held' argument is used to conditionally release rtnl mutex. Changes from V1 to V2: - Patch 12: - new patch - Patch 14: - refactor gen_new_estimator() to reuse stats_lock when re-assigning rate estimator statistics pointer - Remove mirred and tunnel_key helper function changes. (to be submitted and standalone patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_police: remove dependency on rtnl lockVlad Buslov
Use tcf spinlock to protect police action private data from concurrent modification during dump. (init already uses tcf spinlock when changing police action state) Pass tcf spinlock as estimator lock argument to gen_replace_estimator() during action init. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: core: protect rate estimator statistics pointer with lockVlad Buslov
Extend gen_new_estimator() to also take stats_lock when re-assigning rate estimator statistics pointer. (to be used by unlocked actions) Rename 'stats_lock' to 'lock' and change argument description to explain that it is now also used for control path. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_mirred: remove dependency on rtnl lockVlad Buslov
Re-introduce mirred list spinlock, that was removed some time ago, in order to protect it from concurrent modifications, instead of relying on rtnl lock. Use tcf spinlock to protect mirred action private data from concurrent modification in init and dump. Rearrange access to mirred data in order to be performed only while holding the lock. Rearrange net dev access to always hold reference while working with it, instead of relying on rntl lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: extend action ops with put_dev callbackVlad Buslov
As a preparation for removing dependency on rtnl lock from rules update path, all users of shared objects must take reference while working with them. Extend action ops with put_dev() API to be used on net device returned by get_dev(). Modify mirred action (only action that implements get_dev callback): - Take reference to net device in get_dev. - Implement put_dev API that releases reference to net device. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_vlan: remove dependency on rtnl lockVlad Buslov
Use tcf spinlock to protect vlan action private data from concurrent modification during dump and init. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>