summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-12drivers/net/virtio_net.c: Added USO support.Andrew Melnychenko
Now, it possible to enable GSO_UDP_L4("tx-udp-segmentation") for VirtioNet. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12linux/virtio_net.h: Support USO offload in vnet header.Andrew Melnychenko
Now, it's possible to convert USO vnet packets from/to skb. Added support for GSO_UDP_L4 offload. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12uapi/linux/virtio_net.h: Added USO types.Andrew Melnychenko
Added new GSO type for USO: VIRTIO_NET_HDR_GSO_UDP_L4. Feature VIRTIO_NET_F_HOST_USO allows to enable NETIF_F_GSO_UDP_L4. Separated VIRTIO_NET_F_GUEST_USO4 & VIRTIO_NET_F_GUEST_USO6 features required for Windows guests. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12driver/net/tun: Added features for USO.Andrew Melnychenko
Added support for USO4 and USO6. For now, to "enable" USO, it's required to set both USO4 and USO6 simultaneously. USO enables NETIF_F_GSO_UDP_L4. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12uapi/linux/if_tun.h: Added new offload types for USO4/6.Andrew Melnychenko
Added 2 additional offlloads for USO(IPv4 & IPv6). Separate offloads are required for Windows VM guests, g.e. Windows may set USO rx only for IPv4. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12udp: allow header check for dodgy GSO_UDP_L4 packets.Andrew Melnychenko
Allow UDP_L4 for robust packets. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-11ipc: fix memory leak in init_mqueue_fs()Zhengchao Shao
When setup_mq_sysctls() failed in init_mqueue_fs(), mqueue_inode_cachep is not released. In order to fix this issue, the release path is reordered. Link: https://lkml.kernel.org/r/20221209092929.1978875-1-shaozhengchao@huawei.com Fixes: dc55e35f9e81 ("ipc: Store mqueue sysctls in the ipc namespace") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jingyu Wang <jingyuwang_vip@163.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Yu Zhe <yuzhe@nfschina.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11hfsplus: fix bug causing custom uid and gid being unable to be assigned with ↵Aditya Garg
mount Despite specifying UID and GID in mount command, the specified UID and GID were not being assigned. This patch fixes this issue. Link: https://lkml.kernel.org/r/C0264BF5-059C-45CF-B8DA-3A3BD2C803A2@live.com Signed-off-by: Aditya Garg <gargaditya08@live.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11rapidio: devices: fix missing put_device in mport_cdev_openCai Xinchen
When kfifo_alloc fails, the refcount of chdev->dev is left incremental. We should use put_device(&chdev->dev) to decrease the ref count of chdev->dev to avoid refcount leak. Link: https://lkml.kernel.org/r/20221203085721.13146-1-caixinchen1@huawei.com Fixes: e8de370188d0 ("rapidio: add mport char device driver") Signed-off-by: Cai Xinchen <caixinchen1@huawei.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Jakob Koschel <jakobkoschel@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Wang Weiyang <wangweiyang2@huawei.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11kcov: fix spelling typos in commentsRong Tao
Fix the typo of 'suport' in kcov.h Link: https://lkml.kernel.org/r/tencent_922CA94B789587D79FD154445D035AA19E07@qq.com Signed-off-by: Rong Tao <rongtao@cestc.cn> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11hfs: Fix OOB Write in hfs_asc2macZhangPeng
Syzbot reported a OOB Write bug: loop0: detected capacity change from 0 to 64 ================================================================== BUG: KASAN: slab-out-of-bounds in hfs_asc2mac+0x467/0x9a0 fs/hfs/trans.c:133 Write of size 1 at addr ffff88801848314e by task syz-executor391/3632 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 hfs_asc2mac+0x467/0x9a0 fs/hfs/trans.c:133 hfs_cat_build_key+0x92/0x170 fs/hfs/catalog.c:28 hfs_lookup+0x1ab/0x2c0 fs/hfs/dir.c:31 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 If in->len is much larger than HFS_NAMELEN(31) which is the maximum length of an HFS filename, a OOB write could occur in hfs_asc2mac(). In that case, when the dst reaches the boundary, the srclen is still greater than 0, which causes a OOB write. Fix this by adding a check on dstlen in while() before writing to dst address. Link: https://lkml.kernel.org/r/20221202030038.1391945-1-zhangpeng362@huawei.com Fixes: 328b92278650 ("[PATCH] hfs: NLS support") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Reported-by: <syzbot+dc3b1cf9111ab5fe98e7@syzkaller.appspotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11hfs: fix OOB Read in __hfs_brec_findZhangPeng
Syzbot reported a OOB read bug: ================================================================== BUG: KASAN: slab-out-of-bounds in hfs_strcmp+0x117/0x190 fs/hfs/string.c:84 Read of size 1 at addr ffff88807eb62c4e by task kworker/u4:1/11 CPU: 1 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.0-rc6-syzkaller-00308-g644e9524388a #0 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 hfs_strcmp+0x117/0x190 fs/hfs/string.c:84 __hfs_brec_find+0x213/0x5c0 fs/hfs/bfind.c:75 hfs_brec_find+0x276/0x520 fs/hfs/bfind.c:138 hfs_write_inode+0x34c/0xb40 fs/hfs/inode.c:462 write_inode fs/fs-writeback.c:1440 [inline] If the input inode of hfs_write_inode() is incorrect: struct inode struct hfs_inode_info struct hfs_cat_key struct hfs_name u8 len # len is greater than HFS_NAMELEN(31) which is the maximum length of an HFS filename OOB read occurred: hfs_write_inode() hfs_brec_find() __hfs_brec_find() hfs_cat_keycmp() hfs_strcmp() # OOB read occurred due to len is too large Fix this by adding a Check on len in hfs_write_inode() before calling hfs_brec_find(). Link: https://lkml.kernel.org/r/20221130065959.2168236-1-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reported-by: <syzbot+e836ff7133ac02be825f@syzkaller.appspotmail.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11relay: fix type mismatch when allocating memory in relay_create_buf()Gavrilov Ilia
The 'padding' field of the 'rchan_buf' structure is an array of 'size_t' elements, but the memory is allocated for an array of 'size_t *' elements. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20221129092002.3538384-1-Ilia.Gavrilov@infotecs.ru Fixes: b86ff981a825 ("[PATCH] relay: migrate from relayfs to a generic relay API") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11ocfs2: always read both high and low parts of dinode link countAlexey Asemov
When filesystem is using indexed-dirs feature, maximum link count values can spill over to i_links_count_hi, up to OCFS2_DX_LINK_MAX links. ocfs2_read_links_count() checks for OCFS2_INDEXED_DIR_FL flag in dinode, but this flag is only valid for directories so for files the check causes high part of the link count not being read back from file dinodes resulting in wrong link count value when file has >65535 links. As ocfs2_set_links_count() always writes both high and low parts of link count, the flag check on reading may be removed. Link: https://lkml.kernel.org/r/cbfca02b-b39f-89de-e1a8-904a6c60407e@alex-at.net Signed-off-by: Alexey Asemov <alex@alex-at.net> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11io-mapping: move some code within the include guarded sectionChristophe JAILLET
It is spurious to have some code out-side the include guard in a .h file. Fix it. Link: https://lkml.kernel.org/r/4dbaf427d4300edba6c6bbfaf4d57493b9bec6ee.1669565241.git.christophe.jaillet@wanadoo.fr Fixes: 1fbaf8fc12a0 ("mm: add a io_mapping_map_user helper") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11kernel: kcsan: kcsan_test: build without structleak pluginAnders Roxell
Building kcsan_test with structleak plugin enabled makes the stack frame size to grow. kernel/kcsan/kcsan_test.c:704:1: error: the frame size of 3296 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Turn off the structleak plugin checks for kcsan_test. Link: https://lkml.kernel.org/r/20221128104358.2660634-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marco Elver <elver@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Gow <davidgow@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11mailmap: update email for Iskren ChernevIskren Chernev
I'm sunsetting my gmail account and moving to personal domain. Link: https://lkml.kernel.org/r/20221124114356.2187901-1-me@iskren.info Signed-off-by: Iskren Chernev <me@iskren.info> Acked-by: Iskren Chernev <iskren.chernev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFDZhang Qilong
Commit ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") forgot to change int to __u64 in the CONFIG_EVENTFD=n stub function. Link: https://lkml.kernel.org/r/20221124140154.104680-1-zhangqilong3@huawei.com Fixes: ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Cc: Dylan Yudaken <dylany@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11rapidio: fix possible UAF when kfifo_alloc() failsWang Weiyang
If kfifo_alloc() fails in mport_cdev_open(), goto err_fifo and just free priv. But priv is still in the chdev->file_list, then list traversal may cause UAF. This fixes the following smatch warning: drivers/rapidio/devices/rio_mport_cdev.c:1930 mport_cdev_open() warn: '&priv->list' not removed from list Link: https://lkml.kernel.org/r/20221123095147.52408-1-wangweiyang2@huawei.com Fixes: e8de370188d0 ("rapidio: add mport char device driver") Signed-off-by: Wang Weiyang <wangweiyang2@huawei.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Jakob Koschel <jakobkoschel@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11relay: use strscpy() is more robust and saferXu Panda
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Link: https://lkml.kernel.org/r/202211220853259244666@zte.com.cn Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11dt-bindings: leds: Add missing references to common LED schemaRob Herring
'led' nodes should have a reference to LED common.yaml schema. Add it where missing and drop any duplicate properties. Acked-by: Lee Jones <lee@kernel.org> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221207204327.2810001-2-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-11dt-bindings: leds: intel,lgm: Add missing 'led-gpios' propertyRob Herring
The example has 'led-gpio' properties, but that's not documented. As the 'gpio' form is deprecated, add 'led-gpios' to the schema and update the example. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221207204327.2810001-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-11of: overlay: fix null pointer dereferencing in find_dup_cset_node_entry() ↵ruanjinjie
and find_dup_cset_prop() When kmalloc() fail to allocate memory in kasprintf(), fn_1 or fn_2 will be NULL, and strcmp() will cause null pointer dereference. Fixes: 2fe0e8769df9 ("of: overlay: check prevents multiple fragments touching same property") Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20221211023337.592266-1-ruanjinjie@huawei.com Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-11Linux 6.1v6.1Linus Torvalds
2022-12-11Merge tag 'iommu-fix-v6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: - Fix device mask to catch all affected devices in the recently added quirk for QAT devices in the Intel VT-d driver. * tag 'iommu-fix-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix buggy QAT device mask
2022-12-10Merge tag 'mm-hotfixes-stable-2022-12-10-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Nine hotfixes. Six for MM, three for other areas. Four of these patches address post-6.0 issues" * tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: memcg: fix possible use-after-free in memcg_write_event_control() MAINTAINERS: update Muchun Song's email mm/gup: fix gup_pud_range() for dax mmap: fix do_brk_flags() modifying obviously incorrect VMAs mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit tmpfs: fix data loss from failed fallocate kselftests: cgroup: update kmem test precision tolerance mm: do not BUG_ON missing brk mapping, because userspace can unmap it mailmap: update Matti Vaittinen's email address
2022-12-10ipvs: run_estimation should control the kthread tasksJulian Anastasov
Change the run_estimation flag to start/stop the kthread tasks. Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10ipvs: add est_cpulist and est_nice sysctl varsJulian Anastasov
Allow the kthreads for stats to be configured for specific cpulist (isolation) and niceness (scheduling priority). Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10ipvs: use kthreads for stats estimationJulian Anastasov
Estimating all entries in single list in timer context by single CPU causes large latency with multiple IPVS rules as reported in [1], [2], [3]. Spread the estimator structures in multiple chains and use kthread(s) for the estimation. The chains are processed in multiple (50) timer ticks to ensure the 2-second interval between estimations with some accuracy. Every chain is processed under RCU lock. Every kthread works over its own data structure and all such contexts are attached to array. The contexts can be preserved while the kthread tasks are stopped or restarted. When estimators are removed, unused kthread contexts are released and the slots in array are left empty. First kthread determines parameters to use, eg. maximum number of estimators to process per kthread based on chain's length (chain_max), allowing sub-100us cond_resched rate and estimation taking up to 1/8 of the CPU capacity to avoid any problems if chain_max is not correctly calculated. chain_max is calculated taking into account factors such as CPU speed and memory/cache speed where the cache_factor (4) is selected from real tests with current generation of CPU/NUMA configurations to correct the difference in CPU usage between cached (during calc phase) and non-cached (working) state of the estimated per-cpu data. First kthread also plays the role of distributor of added estimators to all kthreads, keeping low the time to add estimators. The optimization is based on the fact that newly added estimator should be estimated after 2 seconds, so we have the time to offload the adding to chain from controlling process to kthread 0. The allocated kthread context may grow from 1 to 50 allocated structures for timer ticks which saves memory for setups with small number of estimators. We also add delayed work est_reload_work that will make sure the kthread tasks are properly started/stopped. ip_vs_start_estimator() is changed to report errors which allows to safely store the estimators in allocated structures. Many thanks to Jiri Wiesner for his valuable comments and for spending a lot of time reviewing and testing the changes on different platforms with 48-256 CPUs and 1-8 NUMA nodes under different cpufreq governors. [1] Report from Yunhong Jiang: https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/ [2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2 [3] Report from Dust: https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Tested-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10ipvs: use u64_stats_t for the per-cpu countersJulian Anastasov
Use the provided u64_stats_t type to avoid load/store tearing. Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type") Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Tested-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10ipvs: use common functions for stats allocationJulian Anastasov
Move alloc_percpu/free_percpu logic in new functions Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10ipvs: add rcu protection to statsJulian Anastasov
In preparation to using RCU locking for the list with estimators, make sure the struct ip_vs_stats are released after RCU grace period by using RCU callbacks. This affects ipvs->tot_stats where we can not use RCU callbacks for ipvs, so we use allocated struct ip_vs_stats_rcu. For services and dests we force RCU callbacks for all cases. Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: yunhong-cgl jiang <xintian1976@gmail.com> Cc: "dust.li" <dust.li@linux.alibaba.com> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10Merge branch 'stricter register ID checking in regsafe()'Alexei Starovoitov
Eduard Zingerman says: ==================== This patch-set consists of a series of bug fixes for register ID tracking in verifier.c:states_equal()/regsafe() functions: - for registers of type PTR_TO_MAP_{KEY,VALUE}, PTR_TO_PACKET[_META] the regsafe() should call check_ids() even if registers are byte-to-byte equal; - states_equal() must maintain idmap that covers all function frames in the state because functions like mark_ptr_or_null_regs() operate on all registers in the state; - regsafe() must compare spin lock ids for PTR_TO_MAP_VALUE registers. The last point covers issue reported by Kumar Kartikeya Dwivedi in [1], I borrowed the test commit from there. Note, that there is also an issue with register id tracking for scalars described here [2], it would be addressed separately. [1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/ [2] https://lore.kernel.org/bpf/20221128163442.280187-2-eddyz87@gmail.com/ Eduard Zingerman (6): bpf: regsafe() must not skip check_ids() selftests/bpf: test cases for regsafe() bug skipping check_id() bpf: states_equal() must build idmap for all function frames selftests/bpf: verify states_equal() maintains idmap across all frames bpf: use check_ids() for active_lock comparison selftests/bpf: test case for relaxed prunning of active_lock.id ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10selftests/bpf: test case for relaxed prunning of active_lock.idEduard Zingerman
Check that verifier.c:states_equal() uses check_ids() to match consistent active_lock/map_value configurations. This allows to prune states with active spin locks even if numerical values of active_lock ids do not match across compared states. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10selftests/bpf: Add pruning test case for bpf_spin_lockKumar Kartikeya Dwivedi
Test that when reg->id is not same for the same register of type PTR_TO_MAP_VALUE between current and old explored state, we currently return false from regsafe and continue exploring. Without the fix in prior commit, the test case fails. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10bpf: use check_ids() for active_lock comparisonEduard Zingerman
An update for verifier.c:states_equal()/regsafe() to use check_ids() for active spin lock comparisons. This fixes the issue reported by Kumar Kartikeya Dwivedi in [1] using technique suggested by Edward Cree. W/o this commit the verifier might be tricked to accept the following program working with a map containing spin locks: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: if r9 == 0 goto exit ; r9 -> PTR_TO_MAP_VALUE. 3: if r8 == 0 goto exit ; r8 -> PTR_TO_MAP_VALUE. 4: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 5: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 6: bpf_spin_lock(r8) ; active_lock.id == 2. 7: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 8: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 9: bpf_spin_unlock(r9) ; (a,b) active_lock.id == 2. ; (a) r9.id == 2, (b) r9.id == 1. 10: exit(0) Consider two verification paths: (a) 0-10 (b) 0-7,9-10 The path (a) is verified first. If checkpoint is created at (8) the (b) would assume that (8) is safe because regsafe() does not compare register ids for registers of type PTR_TO_MAP_VALUE. [1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/ Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10selftests/bpf: verify states_equal() maintains idmap across all framesEduard Zingerman
A test case that would erroneously pass verification if verifier.c:states_equal() maintains separate register ID mappings for call frames. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10bpf: states_equal() must build idmap for all function framesEduard Zingerman
verifier.c:states_equal() must maintain register ID mapping across all function frames. Otherwise the following example might be erroneously marked as safe: main: fp[-24] = map_lookup_elem(...) ; frame[0].fp[-24].id == 1 fp[-32] = map_lookup_elem(...) ; frame[0].fp[-32].id == 2 r1 = &fp[-24] r2 = &fp[-32] call foo() r0 = 0 exit foo: 0: r9 = r1 1: r8 = r2 2: r7 = ktime_get_ns() 3: r6 = ktime_get_ns() 4: if (r6 > r7) goto skip_assign 5: r9 = r8 skip_assign: ; <--- checkpoint 6: r9 = *r9 ; (a) frame[1].r9.id == 2 ; (b) frame[1].r9.id == 1 7: if r9 == 0 goto exit: ; mark_ptr_or_null_regs() transfers != 0 info ; for all regs sharing ID: ; (a) r9 != 0 => &frame[0].fp[-32] != 0 ; (b) r9 != 0 => &frame[0].fp[-24] != 0 8: r8 = *r8 ; (a) r8 == &frame[0].fp[-32] ; (b) r8 == &frame[0].fp[-32] 9: r0 = *r8 ; (a) safe ; (b) unsafe exit: 10: exit While processing call to foo() verifier considers the following execution paths: (a) 0-10 (b) 0-4,6-10 (There is also path 0-7,10 but it is not interesting for the issue at hand. (a) is verified first.) Suppose that checkpoint is created at (6) when path (a) is verified, next path (b) is verified and (6) is reached. If states_equal() maintains separate 'idmap' for each frame the mapping at (6) for frame[1] would be empty and regsafe(r9)::check_ids() would add a pair 2->1 and return true, which is an error. If states_equal() maintains single 'idmap' for all frames the mapping at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would return false when trying to add a pair 2->1. This issue was suggested in the following discussion: https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10selftests/bpf: test cases for regsafe() bug skipping check_id()Eduard Zingerman
Under certain conditions it was possible for verifier.c:regsafe() to skip check_id() call. This commit adds negative test cases previously errorneously accepted as safe. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10bpf: regsafe() must not skip check_ids()Eduard Zingerman
The verifier.c:regsafe() has the following shortcut: equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0; ... if (equal) return true; Which is executed regardless old register type. This is incorrect for register types that might have an ID checked by check_ids(), namely: - PTR_TO_MAP_KEY - PTR_TO_MAP_VALUE - PTR_TO_PACKET_META - PTR_TO_PACKET The following pattern could be used to exploit this: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 3: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 4: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 5: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 6: if r9 == 0 goto <exit> ; Nullness info is propagated to all ; registers with matching ID. 7: r1 = *(u64 *) r8 ; Not always safe. Verifier first visits path 1-7 where r8 is verified to be not null at (6). Later the jump from 4 to 6 is examined. The checkpoint for (6) looks as follows: R8_rD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9_rwD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R10=fp0 The current state is: R0=... R6=... R7=... fp-8=... R8=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9=map_value_or_null(id=1,off=0,ks=4,vs=8,imm=0) R10=fp0 Note that R8 states are byte-to-byte identical, so regsafe() would exit early and skip call to check_ids(), thus ID mapping 2->2 will not be added to 'idmap'. Next, states for R9 are compared: these are not identical and check_ids() is executed, but 'idmap' is empty, so check_ids() adds mapping 2->1 to 'idmap' and returns success. This commit pushes the 'equal' down to register types that don't need check_ids(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-10fs: sysv: Fix sysv_nblocks() returns wrong valueChen Zhongjin
sysv_nblocks() returns 'blocks' rather than 'res', which only counting the number of triple-indirect blocks and causing sysv_getattr() gets a wrong result. [AV: this is actually a sysv counterpart of minixfs fix - 0fcd426de9d0 "[PATCH] minix block usage counting fix" in historical tree; mea culpa, should've thought to check fs/sysv back then...] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-12-10NFSv4.2: Change the default KConfig value for READ_PLUSAnna Schumaker
Now that we've worked out performance issues and have a server patch addressing the failed xfstests, we can safely enable this feature by default. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-10Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: "One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling for kfence faults" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9278/1: kfence: only handle translation faults
2022-12-10NFSD: Avoid clashing function prototypesKees Cook
When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. There were 97 warnings produced by NFS. For example: fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The enc/dec callbacks were defined as passing "void *" as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences. Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name: @find@ identifier func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = (T) func, }; @already_void@ identifier find.func; identifier name; @@ func(..., -void +union nfsd4_op_u *name) { ... } @proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@ func@p(..., T name ) { ... } @script:python get_member@ type_name << proto.T; member; @@ coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0]) @convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@ func@p(..., - T name + union nfsd4_op_u *u ) { + T name = &u->member; ... } @cast@ identifier find.func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = - (T) func, }; Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10SUNRPC: Fix crasher in unwrap_integ_data()Chuck Lever
If a zero length is passed to kmalloc() it returns 0x10, which is not a valid address. gss_verify_mic() subsequently crashes when it attempts to dereference that pointer. Instead of allocating this memory on every call based on an untrusted size value, use a piece of dynamically-allocated scratch memory that is always available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10SUNRPC: Make the svc_authenticate tracepoint conditionalChuck Lever
Clean up: Simplify the tracepoint's only call site. Also, I noticed that when svc_authenticate() returns SVC_COMPLETE, it leaves rq_auth_stat set to an error value. That doesn't need to be recorded in the trace log. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10NFSD: Use only RQ_DROPME to signal the need to drop a replyChuck Lever
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10SUNRPC: Clean up xdr_write_pages()Chuck Lever
Make it more evident how xdr_write_pages() updates the tail buffer by using the convention of naming the iov pointer variable "tail". I spent more than a couple of hours chasing through code to understand this, so someone is likely to find this useful later. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() failsChuck Lever
Fixes: 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10NFSD: add CB_RECALL_ANY tracepointsDai Ngo
Add tracepoints to trace start and end of CB_RECALL_ANY operation. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: added show_rca_mask() macro ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>