summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-25crypto: sun8i-ss - Add SS_START defineCorentin Labbe
Instead of using an hardcoded value, let's use a defined value for SS_START. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: hisilicon/qm - Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: cavium/zip - Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: caam - Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: amlogic - Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: allwinner - Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: proc - Removing some useless only space linesCorentin Labbe
Some line got only spaces, remove them Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: marvell/cesa - use devm_platform_ioremap_resource_bynameZhang Qilong
Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - use typed init/exit routines for XTSArd Biesheuvel
Use the typed skcipher init/exit routines instead of the generic cra_init/_exit routines when instantiating/releasing the XTS skciphers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid loading reorder argument on encryptionArd Biesheuvel
Reordering the tweak is never necessary for encryption, so avoid the argument load on the encryption path. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid hacks to prevent Thumb2 mode switchesArd Biesheuvel
Instead of using a homegrown macrofied version of the adr instruction that sets the Thumb bit in the output value, only to ensure that any bx instructions consuming that value will not switch out of Thumb mode when branching, use non-interworking mov (to PC) instructions, which achieve the same thing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: stm32/crc32 - Avoid lock if hardware is already usedNicolas Toromanoff
If STM32 CRC device is already in use, calculate CRC by software. This will release CPU constraint for a concurrent access to the hardware, and avoid masking irqs during the whole block processing. Fixes: 7795c0baf5ac ("crypto: stm32/crc32 - protect from concurrent accesses") Signed-off-by: Nicolas Toromanoff <nicolas.toromanoff@st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: qat - remove unnecessary mutex_init()Qinglang Miao
The mutex adf_ctl_lock is initialized statically. It is unnecessary to initialize by mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha512-neon - avoid ADRL pseudo instructionArd Biesheuvel
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha256-neon - avoid ADRL pseudo instructionArd Biesheuvel
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: qat - convert to use DEFINE_SEQ_ATTRIBUTE macroLiu Shixin
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: lib/chacha20poly1305 - Set SG_MITER_ATOMIC unconditionallyHerbert Xu
There is no reason for the chacha20poly1305 SG miter code to use kmap instead of kmap_atomic as the critical section doesn't sleep anyway. So we can simply get rid of the preemptible check and set SG_MITER_ATOMIC unconditionally. Even if we need to reenable preemption to lower latency we should be doing that by interrupting the SG miter walk rather than using kmap. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: inside-secure - Reuse code in safexcel_hmac_alg_setkeyHerbert Xu
The code in the current implementation of safexcel_hmac_alg_setkey can be reused by safexcel_cipher. This patch does just that by renaming the previous safexcel_hmac_setkey to __safexcel_hmac_setkey. The now-shared safexcel_hmac_alg_setkey becomes safexcel_hmac_setkey and a new safexcel_hmac_alg_setkey has been added for use by ahash transforms. As a result safexcel_aead_setkey's stack frame has been reduced by about half in size, or about 512 bytes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: inside-secure - Move ipad/opad into safexcel_contextHerbert Xu
As both safexcel_ahash_ctx and safexcel_cipher_ctx contain ipad and opad buffers this patch moves them into the common struct safexcel_context. It also adds a union so that they can be accessed in the appropriate endian without crazy casts. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: inside-secure - Move priv pointer into safexcel_contextHerbert Xu
This patch moves the priv pointer into struct safexcel_context because both structs that extend safexcel_context have that pointer as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: atmel-aes - convert to use be32_add_cpu()Liu Shixin
Convert cpu_to_be32(be32_to_cpu(E1) + E2) to use be32_add_cpu(). Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25cypto: mediatek - fix leaks in mtk_desc_ring_allocXiaoliang Pang
In the init loop, if an error occurs in function 'dma_alloc_coherent', then goto the err_cleanup section, after run i--, in the array ring, the struct mtk_ring with index i will not be released, causing memory leaks Fixes: 785e5c616c849 ("crypto: mediatek - Add crypto driver support for some MediaTek chips") Cc: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Xiaoliang Pang <dawning.pang@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25hwrng: ingenic - Add hardware TRNG for Ingenic X1830周琰杰 (Zhou Yanjie)
Add X1830 SoC digital true random number generator driver. Tested-by: 周正 (Zhou Zheng) <sernia.zhou@foxmail.com> Co-developed-by: 漆鹏振 (Qi Pengzhen) <aric.pzqi@ingenic.com> Signed-off-by: 漆鹏振 (Qi Pengzhen) <aric.pzqi@ingenic.com> Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25dt-bindings: RNG: Add Ingenic TRNG bindings.周琰杰 (Zhou Yanjie)
Add the TRNG bindings for the X1830 SoC from Ingenic. Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-24tcp: skip DSACKs with dubious sequence rangesPriyaranjan Jha
Currently, we use length of DSACKed range to compute number of delivered packets. And if sequence range in DSACK is corrupted, we can get bogus dsacked/acked count, and bogus cwnd. This patch put bounds on DSACKed range to skip update of data delivery and spurious retransmission information, if the DSACK is unlikely caused by sender's action: - DSACKed range shouldn't be greater than maximum advertised rwnd. - Total no. of DSACKed segments shouldn't be greater than total no. of retransmitted segs. Unlike spurious retransmits, network duplicates or corrupted DSACKs shouldn't be counted as delivery. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net/fsl: quieten expected MDIO access failuresJamie Iles
MDIO reads can happen during PHY probing, and printing an error with dev_err can result in a large number of error messages during device probe. On a platform with a serial console this can result in excessively long boot times in a way that looks like an infinite loop when multiple busses are present. Since 0f183fd151c (net/fsl: enable extended scanning in xgmac_mdio) we perform more scanning so there are potentially more failures. Reduce the logging level to dev_dbg which is consistent with the Freescale enetc driver. Cc: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net: dsa: microchip: really look for phy-mode in port nodesHelmut Grohne
The previous implementation failed to account for the "ports" node. The actual port nodes are not child nodes of the switch node, but a "ports" node sits in between. Fixes: edecfa98f602 ("net: dsa: microchip: look for phy-mode in port nodes") Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net/tls: race causes kernel panicRohit Maheshwari
BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S 5.9.0-rc3+ #1 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018 Workqueue: events tx_work_handler [tls] RIP: 0010:tx_work_handler+0x1b/0x70 [tls] Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38 FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0x1a7/0x370 worker_thread+0x30/0x370 ? process_one_work+0x370/0x370 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 tls_sw_release_resources_tx() waits for encrypt_pending, which can have race, so we need similar changes as in commit 0cada33241d9de205522e3858b18e506ca5cce2c here as well. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net/ethernet/broadcom: fix spelling typoWang Qing
Modify the comment typo: "compliment" -> "complement". Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3Xiaoliang Yang
INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3. This register is used by stream gate control of PSFP, and it has not been used before, because PSFP is not implemented in ocelot driver. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net: dsa: felix: convert TAS link speed based on phylink speedXiaoliang Yang
state->speed holds a value of 10, 100, 1000 or 2500, but QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the speed to a proper value. Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24hinic: fix wrong return value of mac-set cmdLuo bin
It should also be regarded as an error when hw return status=4 for PF's setting mac cmd. Only if PF return status=4 to VF should this cmd be taken special treatment. Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24drivers/net/wan/x25_asy: Correct the ndo_open and ndo_stop functionsXie He
1. Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop functions. This makes the LAPB protocol start/stop when the network interface starts/stops. When the network interface is down, the LAPB protocol shouldn't be running and the LAPB module shoudn't be generating control frames. 2. Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop functions. This makes the TX queue start/stop when the network interface starts/stops. (netif_stop_queue was originally in the ndo_stop function. But to make the code look better, I created a new function to use as ndo_stop, and made it call the original ndo_stop function. I moved netif_stop_queue from the original ndo_stop function to the new ndo_stop function.) Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net/ipv4: always honour route mtu during forwardingMaciej Żenczykowski
Documentation/networking/ip-sysctl.txt:46 says: ip_forward_use_pmtu - BOOLEAN By default we don't trust protocol path MTUs while forwarding because they could be easily forged and can lead to unwanted fragmentation by the router. You only need to enable this if you have user-space software which tries to discover path mtus by itself and depends on the kernel honoring this information. This is normally not the case. Default: 0 (disabled) Possible values: 0 - disabled 1 - enabled Which makes it pretty clear that setting it to 1 is a potential security/safety/DoS issue, and yet it is entirely reasonable to want forwarded traffic to honour explicitly administrator configured route mtus (instead of defaulting to device mtu). Indeed, I can't think of a single reason why you wouldn't want to. Since you configured a route mtu you probably know better... It is pretty common to have a higher device mtu to allow receiving large (jumbo) frames, while having some routes via that interface (potentially including the default route to the internet) specify a lower mtu. Note that ipv6 forwarding uses device mtu unless the route is locked (in which case it will use the route mtu). This approach is not usable for IPv4 where an 'mtu lock' on a route also has the side effect of disabling TCP path mtu discovery via disabling the IPv4 DF (don't frag) bit on all outgoing frames. I'm not aware of a way to lock a route from an IPv6 RA, so that also potentially seems wrong. Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com> Cc: Vinay Paradkar <vparadka@qti.qualcomm.com> Cc: Tyler Wear <twear@quicinc.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24Merge branch 'net_sched-fix-a-UAF-in-tcf_action_init'David S. Miller
Cong Wang says: ==================== net_sched: fix a UAF in tcf_action_init() This patchset fixes a use-after-free triggered by syzbot. Please find more details in each patch description. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net_sched: commit action insertions togetherCong Wang
syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free. Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more. One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker(). Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net_sched: defer tcf_idr_insert() in tcf_action_init_1()Cong Wang
All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25Merge tag 'drm-misc-fixes-2020-09-24' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.9: - Single null pointer deref fix for dma-buf. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4106c21e-f52c-4c05-6cdb-daa743bb8617@linux.intel.com
2020-09-25Merge tag 'drm-intel-fixes-2020-09-24' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.9-rc7: - Fix selftest reference to stack data out of scope - Fix GVT null pointer dereference - Backmerge from Linus' master to fix build Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87zh5fpmha.fsf@intel.com
2020-09-25BackMerge commit '98477740630f270aecf648f1d6a9dbc6027d4ff1' into drm-fixesDave Airlie
The dax mess had some fallout, and i915 used a later base to fix their CI. Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-09-24md/raid10: improve discard request for far layoutXiao Ni
For far layout, the discard region is not continuous on disks. So it needs far copies r10bio to cover all regions. It needs a way to know all r10bios have finish or not. Similar with raid10_sync_request, only the first r10bio master_bio records the discard bio. Other r10bios master_bio record the first r10bio. The first r10bio can finish after other r10bios finish and then return the discard bio. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid10: improve raid10 discard requestXiao Ni
Now the discard request is split by chunk size. So it takes a long time to finish mkfs on disks which support discard function. This patch improve handling raid10 discard request. It uses the similar way with patch 29efc390b (md/md0: optimize raid0 discard handling). But it's a little complex than raid0. Because raid10 has different layout. If raid10 is offset layout and the discard request is smaller than stripe size. There are some holes when we submit discard bio to underlayer disks. For example: five disks (disk1 - disk5) D01 D02 D03 D04 D05 D05 D01 D02 D03 D04 D06 D07 D08 D09 D10 D10 D06 D07 D08 D09 The discard bio just wants to discard from D03 to D10. For disk3, there is a hole between D03 and D08. For disk4, there is a hole between D04 and D09. D03 is a chunk, raid10_write_request can handle one chunk perfectly. So the part that is not aligned with stripe size is still handled by raid10_write_request. If reshape is running when discard bio comes and the discard bio spans the reshape position, raid10_write_request is responsible to handle this discard bio. I did a test with this patch set. Without patch: time mkfs.xfs /dev/md0 real4m39.775s user0m0.000s sys0m0.298s With patch: time mkfs.xfs /dev/md0 real0m0.105s user0m0.000s sys0m0.007s nvme3n1 259:1 0 477G 0 disk └─nvme3n1p1 259:10 0 50G 0 part nvme4n1 259:2 0 477G 0 disk └─nvme4n1p1 259:11 0 50G 0 part nvme5n1 259:6 0 477G 0 disk └─nvme5n1p1 259:12 0 50G 0 part nvme2n1 259:9 0 477G 0 disk └─nvme2n1p1 259:15 0 50G 0 part nvme0n1 259:13 0 477G 0 disk └─nvme0n1p1 259:14 0 50G 0 part Reviewed-by: Coly Li <colyli@suse.de> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid10: pull codes that wait for blocked dev into one functionXiao Ni
The following patch will reuse these logics, so pull the same codes into one function. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid10: extend r10bio devs to raid disksXiao Ni
Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit to all member disks and it needs to use r10bio. So extend to r10bio->devs[geo.raid_disks]. Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md: add md_submit_discard_bio() for submitting discard bioXiao Ni
Move these logic from raid0.c to md.c, so that we can also use it in raid10.c. Reviewed-by: Coly Li <colyli@suse.de> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md: Simplify code with existing definition RESYNC_SECTORS in raid10.cZhen Lei
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) "RESYNC_BLOCK_SIZE/512" is equal to "RESYNC_BLOCK_SIZE >> 9", replace it with RESYNC_SECTORS. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid5: reallocate page array after setting new stripe_sizeYufen Yu
When try to resize stripe_size, we also need to free old shared page array and allocate new. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid5: resize stripe_head when reshape arrayYufen Yu
When reshape array, we try to reuse shared pages of old stripe_head, and allocate more for the new one if needed. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid5: let multiple devices of stripe_head share pageYufen Yu
In current implementation, grow_buffers() uses alloc_page() to allocate the buffers for each stripe_head, i.e. allocate a page for each dev[i] in stripe_head. After setting stripe_size as a configurable value by writing sysfs entry, it means that we always allocate 64K buffers, but just use 4K of them when stripe_size is 4K in 64KB arm64. To avoid wasting memory, we try to let multiple sh->dev share one real page. That means, multiple sh->dev[i].page will point to the only page with different offset. Example of 64K PAGE_SIZE and 4K stripe_size as following: 64K PAGE_SIZE +---+---+---+---+------------------------------+ | | | | | | | | | | +-+-+-+-+-+-+-+-+------------------------------+ ^ ^ ^ ^ | | | +----------------------------+ | | | | | | +-------------------+ | | | | | | +----------+ | | | | | | +-+ | | | | | | | +-----+-----+------+-----+------+-----+------+------+ sh | offset(0) | offset(4K) | offset(8K) | offset(12K) | + +-----------+------------+------------+-------------+ +----> dev[0].page dev[1].page dev[2].page dev[3].page A new 'pages' array will be added into stripe_head to record shared page used by this stripe_head. Allocate them when grow_buffers() and free them when shrink_buffers(). After trying to share page, the users of sh->dev[i].page need to take care of the related page offset: page of issued bio and page passed to xor compution functions. But thanks for previous different page offset supported. Here, we just need to set correct dev[i].offset. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24md/raid6: let async recovery function support different page offsetYufen Yu
For now, asynchronous raid6 recovery calculate functions are require common offset for pages. But, we expect them to support different page offset after introducing stripe shared page. Do that by simplily adding page offset where each page address are referred. Then, replace the old interface with the new ones in raid6 and raid6test. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>