summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-05rbd: copyup with an empty snapshot context (aka deep-copyup)Ilya Dryomov
This is the core of deep-flatten feature: sending a copyup request (i.e. a guarded write of the data read from the parent) with an empty snapshot context (snaps = [], seq = 0) causes the OSD to reflect the write in all existing snapshots. This allows "rbd flatten" to fully disconnect the clone image and its snapshots from the parent and make the parent snapshot removable. The actual modification request is sent only after deep-copyup request is completed. Waiting for deep-copyup reply is unnecessary, this will be improved in the future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: introduce rbd_obj_issue_copyup_ops()Ilya Dryomov
In preparation for deep-flatten feature, split rbd_obj_issue_copyup() into two functions and add a new write state to make the state machine slightly more clear. Make the copyup op optional and start using that for when the overlap goes to 0. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()Ilya Dryomov
In preparation for deep-flatten feature, stop copying num_osd_ops from the original request in rbd_obj_issue_copyup(). Split the calculation into count_{write,zeroout}_ops() respectively and determine whether the assert_exists guard is needed with the new rbd_obj_copyup_enabled(). As a nice side effect, we no longer guard in the writefull case as the copyup'ed object is always fully overwritten. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: factor out __rbd_osd_req_create()Ilya Dryomov
Allow passing a custom snapshot context: NULL for read and an empty snapshot context for deep-copyup. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: clear ->xferred on error from rbd_obj_issue_copyup()Ilya Dryomov
Otherwise the assert in rbd_obj_end_request() is triggered. Fixes: 3da691bf4366 ("rbd: new request handling code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: remove experimental designation from kernel layeringIlya Dryomov
Support for kernel layering hasn't been considered experimental for a few years now. All the issues that I'm aware of were shaken out in 2014 and early 2015. Moreover, most of that code was rewritten with the addition of support for fancy striping. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05ceph: add mount option to limit caps countYan, Zheng
If number of caps exceed the limit, ceph_trim_dentires() also trim dentries with valid leases. Trimming dentry releases references to associated inode, which may evict inode and release caps. By default, there is no limit for caps count. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: periodically trim stale dentriesYan, Zheng
Previous commit make VFS delete stale dentry when last reference is dropped. Lease also can become invalid when corresponding dentry has no reference. This patch make cephfs periodically scan lease list, delete corresponding dentry if lease is invalid. There are two types of lease, dentry lease and dir lease. dentry lease has life time and applies to singe dentry. Dentry lease is added to tail of a list when it's updated, leases at front of the list will expire first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies to all dentries in the directory. Dentries have dir leases are added to another list. Dentries in the list are periodically checked in a round robin manner. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: delete stale dentry when last reference is droppedYan, Zheng
introduce ceph_d_delete(), which checks if dentry has valid lease. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: remove dentry_lru file from debugfsYan, Zheng
The file shows all dentries in cephfs mount. It's not very useful. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: touch existing cap when handling replyYan, Zheng
Move cap to tail of session->s_caps list. So ceph_trim_caps() will trim older caps first. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: pass inclusive lend parameter to filemap_write_and_wait_range()zhengbin
The 'lend' parameter of filemap_write_and_wait_range is required to be inclusive, so follow the rule. Same for invalidate_inode_pages2_range. Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: round off and ignore discards that are too smallIlya Dryomov
If, after rounding off, the discard request is smaller than alloc_size, drop it on the floor in __rbd_img_fill_request(). Default alloc_size to 64k. This should cover both HDD and SSD based bluestore OSDs and somewhat improve things for filestore. For OSDs on filestore with filestore_punch_hole = false, alloc_size is best set to object size in order to allow deletes and truncates and disallow zero op. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05rbd: handle DISCARD and WRITE_ZEROES separatelyIlya Dryomov
With discard_zeroes_data gone in commit 48920ff2a5a9 ("block: remove the discard_zeroes_data flag"), continuing to provide this guarantee is pointless: applications can't query it and discards can only be used for deallocating. Add OBJ_OP_ZEROOUT and move the existing logic under it. As the first step to divorcing OBJ_OP_DISCARD, stop worrying about copyups but keep special casing whole-object layered discards. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05rbd: get rid of obj_req->obj_request_countIlya Dryomov
It is effectively unused. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05libceph: use struct_size() for kmalloc() in crush_decode()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: send cap releases more aggressivelyYan, Zheng
When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds) Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: support getting ceph.dir.pin vxattrYan, Zheng
Link: http://tracker.ceph.com/issues/37576 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: support versioned replyYan, Zheng
In versioned reply, inodestat, dirstat and lease are encoded with version, compat_version and struct_len. Based on a patch from Jos Collin <jcollin@redhat.com>. Link: http://tracker.ceph.com/issues/26936 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: map snapid to anonymous bdev IDYan, Zheng
ceph_getattr() return zero dev ID for head inodes and set dev ID to snapid directly for snaphost inodes. This is not good because userspace utilities may consider device ID of 0 as invalid, snapid may conflict with other device's ID. This patch introduces "snapids to anonymous bdev IDs" map. we create a new mapping when we see a snapid for the first time. we trim unused mapping after it is ilde for 5 minutes. Link: http://tracker.ceph.com/issues/22353 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: split large reconnect into multiple messagesYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: decode feature bits in session messageYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05ceph: set special inode's blocksize to page sizeYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-03Linux 5.0Linus Torvalds
2019-03-02Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "One more set of simple ARM platform fixes: - A boot regression on qualcomm msm8998 - Gemini display controllers got turned off by accident - incorrect reference counting in optee" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: tee: optee: add missing of_node_put after of_device_is_available arm64: dts: qcom: msm8998: Extend TZ reserved memory area ARM: dts: gemini: Re-enable display controller
2019-03-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two last minute fixes: - Prevent value evaluation via functions happening in the user access enabled region of __put_user() (put another way: make sure to evaluate the value to be stored in user space _before_ enabling user space accesses) - Correct the definition of a Hyper-V hypercall constant" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
2019-03-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Nine small fixes. The resume fix is a cosmetic removal of a warning with an incorrect condition causing it to alarm people wrongly. The other eight patches correct a thinko in Christoph Hellwig's DMA conversion series. Without it all these drivers end up with 32 bit DMA masks meaning they bounce any page over 4GB before sending it to the controller. Nowadays, even laptops mostly have memory above 4GB, so this can lead to significant performance degradation with all the bouncing" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: core: Avoid that system resume triggers a kernel warning scsi: hptiop: fix calls to dma_set_mask() scsi: hisi_sas: fix calls to dma_set_mask_and_coherent() scsi: csiostor: fix calls to dma_set_mask_and_coherent() scsi: bfa: fix calls to dma_set_mask_and_coherent() scsi: aic94xx: fix calls to dma_set_mask_and_coherent() scsi: 3w-sas: fix calls to dma_set_mask_and_coherent() scsi: 3w-9xxx: fix calls to dma_set_mask_and_coherent() scsi: lpfc: fix calls to dma_set_mask_and_coherent()
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix refcount leak in act_ipt during replace, from Davide Caratti. 2) Set task state properly in tun during blocking reads, from Timur Celik. 3) Leaked reference in DSA, from Wen Yang. 4) NULL deref in act_tunnel_key, from Vlad Buslov. 5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts thus referencing garbage, from Nazarov Sergey. 6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those attributes make no sense. 7) Fix hung sendto in tipc, from Tung Nguyen. 8) Out-of-bounds access in netlabel, from Paul Moore. 9) Grant reference leak in xen-netback, from Igor Druzhinin. 10) Fix tx stalls with lan743x, from Bryan Whitehead. 11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit. 12) Memory leak in sit on device registry failure, from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: sit: fix memory leak in sit_init_net() net: dsa: mv88e6xxx: Fix statistics on mv88e6161 geneve: correctly handle ipv6.disable module parameter net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode bpf: fix sanitation rewrite in case of non-pointers ipv4: Add ICMPv6 support when parse route ipproto MIPS: eBPF: Fix icache flush end address lan743x: Fix TX Stall Issue net: phy: phylink: fix uninitialized variable in phylink_get_mac_state net: aquantia: regression on cpus with high cores: set mode with 8 queues selftests: fixes for UDP GRO bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X net: dsa: mv88e6xxx: Fix u64 statistics xen-netback: don't populate the hash cache on XenBus disconnect xen-netback: fix occasional leak of grant ref mappings under memory pressure sctp: chunk.c: correct format string for size_t in printk net: netem: fix skb length BUG_ON in __skb_to_sgvec netlabel: fix out-of-bounds memory accesses ipv4: Pass original device to ip_rcv_finish_core ...
2019-03-02Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull more crypto fixes from Herbert Xu: "This fixes a couple of issues in arm64/chacha that was introduced in 5.0" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arm64/chacha - fix hchacha_block_neon() for big endian crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian
2019-03-02net: sit: fix memory leak in sit_init_net()Mao Wenan
If register_netdev() is failed to register sitn->fb_tunnel_dev, it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev). BUG: memory leak unreferenced object 0xffff888378daad00 (size 512): comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s) hex dump (first 32 bytes): 00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline] [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline] [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline] [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970 [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848 [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129 [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314 [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437 [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107 [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165 [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919 [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline] [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224 [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0000000039acff8a>] 0xffffffffffffffff Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: dsa: mv88e6xxx: Fix statistics on mv88e6161Andrew Lunn
Despite what the datesheet says, the silicon implements the older way of snapshoting the statistics. Change the op. Reported-by: Chris.Healy@zii.aero Tested-by: Chris.Healy@zii.aero Fixes: 0ac64c394900 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01geneve: correctly handle ipv6.disable module parameterJiri Benc
When IPv6 is compiled but disabled at runtime, geneve_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. This is the same fix as what commit d074bf960044 ("vxlan: correctly handle ipv6.disable module parameter") is doing for vxlan. Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup() when ipv6 was disabled") which fixes a similar issue but for regular tunnels, while this patch is needed for metadata based tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-03-01 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix sanitation rewrite, from Daniel. 2) fix error path on map_new_fd, from Peng. 3) fix icache flush address, from Paul. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmodeHeiner Kallweit
When debugging another issue I faced an interrupt storm in this driver (88E6390, port 9 in SGMII mode), consisting of alternating link-up / link-down interrupts. Analysis showed that the driver wanted to set a cmode that was set already. But so far mv88e6390x_port_set_cmode() doesn't check this and powers down SERDES, what causes the link to break, and eventually results in the described interrupt storm. Fix this by checking whether the cmode actually changes. We want that the very first call to mv88e6390x_port_set_cmode() always configures the registers, therefore initialize port.cmode with a value that is different from any supported cmode value. We have to take care that we only init the ports cmode once chip->info->num_ports is set. v2: - add small helper and init the number of actual ports only Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01bpf: fix sanitation rewrite in case of non-pointersDaniel Borkmann
Marek reported that he saw an issue with the below snippet in that timing measurements where off when loaded as unpriv while results were reasonable when loaded as privileged: [...] uint64_t a = bpf_ktime_get_ns(); uint64_t b = bpf_ktime_get_ns(); uint64_t delta = b - a; if ((int64_t)delta > 0) { [...] Turns out there is a bug where a corner case is missing in the fix d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths"), namely fixup_bpf_calls() only checks whether aux has a non-zero alu_state, but it also needs to test for the case of BPF_ALU_NON_POINTER since in both occasions we need to skip the masking rewrite (as there is nothing to mask). Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths") Reported-by: Marek Majkowski <marek@cloudflare.com> Reported-by: Arthur Fabre <afabre@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/ Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-01ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02MIPS: eBPF: Fix icache flush end addressPaul Burton
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the icache observes the code that we just wrote. Unfortunately it gets the end address calculation wrong due to some bad pointer arithmetic. The struct jit_ctx target field is of type pointer to u32, and as such adding one to it will increment the address being pointed to by 4 bytes. Therefore in order to find the address of the end of the code we simply need to add the number of 4 byte instructions emitted, but we mistakenly add the number of instructions multiplied by 4. This results in the call to flush_icache_range() operating on a memory region 4x larger than intended, which is always wasteful and can cause crashes if we overrun into an unmapped page. Fix this by correcting the pointer arithmetic to remove the bogus multiplication, and use braces to remove the need for a set of brackets whilst also making it obvious that the target field is a pointer. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01lan743x: Fix TX Stall IssueBryan Whitehead
It has been observed that tx queue stalls while downloading from certain web sites (example www.speedtest.net) The cause has been tracked down to a corner case where dma descriptors where not setup properly. And there for a tx completion interrupt was not signaled. This fix corrects the problem by properly marking the end of a multi descriptor transmission. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: phy: phylink: fix uninitialized variable in phylink_get_mac_stateHeiner Kallweit
When debugging an issue I found implausible values in state->pause. Reason in that state->pause isn't initialized and later only single bits are changed. Also the struct itself isn't initialized in phylink_resolve(). So better initialize state->pause and other not yet initialized fields. v2: - use right function name in subject v3: - initialize additional fields Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: regression on cpus with high cores: set mode with 8 queuesDmitry Bogdanov
Recently the maximum number of queues was increased up to 8, but NIC was not fully configured for 8 queues. In setups with more than 4 CPU cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th. This patch sets a tx hw traffic mode with 8 queues. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651 Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues") Reported-by: Nicholas Johnson <nicholas.johnson@outlook.com.au> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01selftests: fixes for UDP GROPaolo Abeni
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge tag 'iommu-fix-v5.0-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "One important fix for a memory corruption issue in the Intel VT-d driver that triggers on hardware with deep PCI hierarchies" * tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/dmar: Fix buffer overflow during PCI bus notification
2019-03-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlbfs: fix races and page leaks during migration kasan: turn off asan-stack for clang-8 and earlier
2019-03-01hugetlbfs: fix races and page leaks during migrationMike Kravetz
hugetlb pages should only be migrated if they are 'active'. The routines set/clear_page_huge_active() modify the active state of hugetlb pages. When a new hugetlb page is allocated at fault time, set_page_huge_active is called before the page is locked. Therefore, another thread could race and migrate the page while it is being added to page table by the fault code. This race is somewhat hard to trigger, but can be seen by strategically adding udelay to simulate worst case scheduling behavior. Depending on 'how' the code races, various BUG()s could be triggered. To address this issue, simply delay the set_page_huge_active call until after the page is successfully added to the page table. Hugetlb pages can also be leaked at migration time if the pages are associated with a file in an explicitly mounted hugetlbfs filesystem. For example, consider a two node system with 4GB worth of huge pages available. A program mmaps a 2G file in a hugetlbfs filesystem. It then migrates the pages associated with the file from one node to another. When the program exits, huge page counts are as follows: node0 1024 free_hugepages 1024 nr_hugepages node1 0 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool That is as expected. 2G of huge pages are taken from the free_hugepages counts, and 2G is the size of the file in the explicitly mounted filesystem. If the file is then removed, the counts become: node0 1024 free_hugepages 1024 nr_hugepages node1 1024 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool Note that the filesystem still shows 2G of pages used, while there actually are no huge pages in use. The only way to 'fix' the filesystem accounting is to unmount the filesystem If a hugetlb page is associated with an explicitly mounted filesystem, this information in contained in the page_private field. At migration time, this information is not preserved. To fix, simply transfer page_private from old to new page at migration time if necessary. There is a related race with removing a huge page from a file and migration. When a huge page is removed from the pagecache, the page_mapping() field is cleared, yet page_private remains set until the page is actually freed by free_huge_page(). A page could be migrated while in this state. However, since page_mapping() is not set the hugetlbfs specific routine to transfer page_private is not called and we leak the page count in the filesystem. To fix that, check for this condition before migrating a huge page. If the condition is detected, return EBUSY for the page. Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> [mike.kravetz@oracle.com: v2] Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com [mike.kravetz@oracle.com: update comment and changelog] Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01kasan: turn off asan-stack for clang-8 and earlierArnd Bergmann
Building an arm64 allmodconfig kernel with clang results in over 140 warnings about overly large stack frames, the worst ones being: drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare' drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable' drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread' drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe' drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd' drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo' None of these happen with gcc today, and almost all of these are the result of a single known issue in llvm. Hopefully it will eventually get fixed with the clang-9 release. In the meantime, the best idea I have is to turn off asan-stack for clang-8 and earlier, so we can produce a kernel that is safe to run. I have posted three patches that address the frame overflow warnings that are not addressed by turning off asan-stack, so in combination with this change, we get much closer to a clean allmodconfig build, which in turn is necessary to do meaningful build regression testing. It is still possible to turn on the CONFIG_ASAN_STACK option on all versions of clang, and it's always enabled for gcc, but when CONFIG_COMPILE_TEST is set, the option remains invisible, so allmodconfig and randconfig builds (which are normally done with a forced CONFIG_COMPILE_TEST) will still result in a mostly clean build. Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de Link: https://bugs.llvm.org/show_bug.cgi?id=38809 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Qian Cai <cai@lca.pw> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Three final fixes, one for a feature that is new in this kernel, one bochs fix for qemu riscv and one atomic modesetting fix. I've left a few of the other late fixes until next as I didn't want to throw in anything that wasn't really necessary" * tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm: drm/bochs: Fix the ID mismatch error drm: Block fb changes for async plane updates drm/amd/display: Use vrr friendly pageflip throttling in DC.
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01Merge tag 'qcom-fixes-for-5.0-rc8' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/fixes Qualcomm ARM64 Fixes for 5.0-rc8 * Fix TZ memory area size to avoid crashes during boot * tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: arm64: dts: qcom: msm8998: Extend TZ reserved memory area
2019-03-01Merge tag 'tee-fix-for-v5.0' of ↵Arnd Bergmann
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes OP-TEE driver - add missing of_node_put after of_device_is_available * tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: optee: add missing of_node_put after of_device_is_available
2019-02-28Merge tag 'mips_fixes_5.0_4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A few more MIPS fixes: - Fix 16b cmpxchg() operations which could erroneously fail if bits 15:8 of the old value are non-zero. In practice I'm not aware of any actual users of 16b cmpxchg() on MIPS, but this fixes the support for it was was introduced in v4.13. - Provide a struct device to dma_alloc_coherent for Lantiq XWAY systems with a "Voice MIPS Macro Core" (VMMC) device. - Provide DMA masks for BCM63xx ethernet devices, fixing a regression introduced in v4.19. - Fix memblock reservation for the kernel when the system has a non-zero PHYS_OFFSET, correcting the memblock conversion performed in v4.20" * tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: fix memory setup for platforms with PHYS_OFFSET != 0 MIPS: BCM63XX: provide DMA masks for ethernet devices MIPS: lantiq: pass struct device to DMA API functions MIPS: fix truncation in __cmpxchg_small for short values