summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-06-24block: move block-related definitions out of fs.hChristoph Hellwig
Move most of the block related definition out of fs.h into more suitable headers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: simplify sb_is_blkdev_sbChristoph Hellwig
Just use IS_ENABLED instead of providing a stub for !CONFIG_BLOCK. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove the mount_bdev and kill_block_super stubsChristoph Hellwig
No one calls these functions without CONFIG_BLOCK, so don't bother stubbing them out. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove the HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL definesChristoph Hellwig
These are not defined anywhere, and contrary to the comments we really do not care about out of tree code at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove an unused block_device_operations forward declarationChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: mark bd_finish_claiming staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24tty/sysrq: emergency_thaw_all does not depend on CONFIG_BLOCKChristoph Hellwig
We can also thaw non-block file systems. Remove the CONFIG_BLOCK in sysrq.c after making the prototype available unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: create the request_queue debugfs_dir on registrationLuis Chamberlain
We were only creating the request_queue debugfs_dir only for make_request block drivers (multiqueue), but never for request-based block drivers. We did this as we were only creating non-blktrace additional debugfs files on that directory for make_request drivers. However, since blktrace *always* creates that directory anyway, we special-case the use of that directory on blktrace. Other than this being an eye-sore, this exposes request-based block drivers to the same debugfs fragile race that used to exist with make_request block drivers where if we start adding files onto that directory we can later run a race with a double removal of dentries on the directory if we don't deal with this carefully on blktrace. Instead, just simplify things by always creating the request_queue debugfs_dir on request_queue registration. Rename the mutex also to reflect the fact that this is used outside of the blktrace context. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: revert back to synchronous request_queue removalLuis Chamberlain
Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d99523560 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. While at it, update the docs with the context expectations for the request_queue / gendisk refcount decrement, and make these expectations explicit by using might_sleep(). Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Suggested-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: add a new blk_mq_complete_request_remote APIChristoph Hellwig
This is a variant of blk_mq_complete_request_remote that only completes the request if it needs to be bounced to another CPU or a softirq. If the request can be completed locally the function returns false and lets the driver complete it without requring and indirect function call. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: move failure injection out of blk_mq_complete_requestChristoph Hellwig
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: merge blk-softirq.c into blk-mq.cChristoph Hellwig
__blk_complete_request is only called from the blk-mq code, and duplicates a lot of code from blk-mq.c. Move it there to prepare for better code sharing and simplifications. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24PM / EM: update callback structure and add device pointerLukasz Luba
The Energy Model framework is going to support devices other that CPUs. In order to make this happen change the callback function and add pointer to a device as an argument. Update the related users to use new function and new callback from the Energy Model. Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: introduce em_dev_register_perf_domain functionLukasz Luba
Add now function in the Energy Model framework which is going to support new devices. This function will help in transition and make it smoother. For now it still checks if the cpumask is a valid pointer, which will be removed later when the new structures and infrastructure will be ready. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: change naming convention from 'capacity' to 'performance'Lukasz Luba
The Energy Model uses concept of performance domain and capacity states in order to calculate power used by CPUs. Change naming convention from capacity to performance state would enable wider usage in future, e.g. upcoming support for other devices other than CPUs. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24vt: redefine world of cursor macrosJiri Slaby
The cursor code used to use magic constants, ANDs, ORs, and some macros. Redefine all this to make some sense. In particular: * Drop CUR_DEFAULT, which is CUR_UNDERLINE. CUR_DEFAULT was used only for cur_default variable initialization, so use CUR_UNDERLINE there to make obvious what's the default. * Drop CUR_HWMASK. Instead, define CUR_SIZE() which explains it more. And use it all over the places. * Define few more masks and bits which will be used in next patches instead of magic constants. * Define CUR_MAKE to build up cursor value. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Link: https://lore.kernel.org/r/20200615074910.19267-25-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: move vc_translate to vt.c and rename itJiri Slaby
vc_translate is used only in vt.c, so move the definition from a header there. Also, it used to be a macro, so be modern and make a static inline from it. This makes the code actually readable. And as a preparation for next patches, rename it to vc_translate_ascii. vc_translate will be a wrapper for both unicode and this one. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-10-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: get rid of VT10.ID macrosJiri Slaby
VT100ID is unused, but defined twice. Kill it. VT102ID is used only in respond_ID. Define there a variable with proper type and use that instead. Then drop both defines of VT102ID too. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-9-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: remove 25 years stale commentJiri Slaby
vc_cons was made global (non-static) in 1.3.38, almost 25 years ago. Remove a comment which says that it would be a disadvantage to do so :P. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-7-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: convert vc_tab_stop to bitmapJiri Slaby
vc_tab_stop is used as a bitmap, but defined as an unsigned int array. Switch it to bitmap and convert all users to the bitmap interface. Note the difference in behavior! We no longer mask the top 24 bits away from x, hence we do not wrap tabs at 256th column. Instead, we silently drop attempts to set a tab behind 256 columns. And we will also seek by '\t' to the rightmost column, when behind that boundary. I do not think the original behavior was desired and that someone relies on that. If this turns out to be the case, we can change the added 'if's back to masks here and there instead... (Or we can increase the limit as fb consoles now have 240 chars here. And they could have more with higher than my resolution, of course.) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: switch G0/1_charset to an arrayJiri Slaby
Declare Gx_charset[2] instead of G0_charset and G1_charset. It makes the code simpler (without ternary operators). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-5-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vc: switch state to boolJiri Slaby
The code currently uses bitfields to store true-false values. Switch all of that to bools. Apart from the cleanup, it saves 20B of code as many shifts, ANDs, and ORs became simple movzb's. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-3-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vt: introduce enum vc_intensity for intensityJiri Slaby
Introduce names (en enum) for 0, 1, and 2 constants. We now have VCI_HALF_BRIGHT, VCI_NORMAL, and VCI_BOLD instead. Apart from the cleanup, 1) the enum allows for better type checking, and 2) this saves some code. No more fiddling with bits is needed in assembly now. (OTOH, the structure is larger.) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24vc: separate stateJiri Slaby
There are two copies of some members of struct vc_data. This is because we need to save them and restore later. Move these memebers to a separate structure called vc_state. So now instead of members like: vc_x, vc_y and vc_saved_x, vc_saved_y we have state and saved_state (of type: struct vc_state) containing state.x, state.y and saved_state.x, saved_state.y This change: * makes clear what is saved & restored * eases save & restore by using memcpy (see save_cur and restore_cur) Finally, we document the newly added struct vc_state using kernel-doc. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200615074910.19267-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24gcc-plugins/stackleak: Use asm instrumentation to avoid useless register savingAlexander Popov
The kernel code instrumentation in stackleak gcc plugin works in two stages. At first, stack tracking is added to GIMPLE representation of every function (except some special cases). And later, when stack frame size info is available, stack tracking is removed from the RTL representation of the functions with small stack frame. There is an unwanted side-effect for these functions: some of them do useless work with caller-saved registers. As an example of such case, proc_sys_write without() instrumentation: 55 push %rbp 41 b8 01 00 00 00 mov $0x1,%r8d 48 89 e5 mov %rsp,%rbp e8 11 ff ff ff callq ffffffff81284610 <proc_sys_call_handler> 5d pop %rbp c3 retq 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 00 00 00 proc_sys_write() with instrumentation: 55 push %rbp 48 89 e5 mov %rsp,%rbp 41 56 push %r14 41 55 push %r13 41 54 push %r12 53 push %rbx 49 89 f4 mov %rsi,%r12 48 89 fb mov %rdi,%rbx 49 89 d5 mov %rdx,%r13 49 89 ce mov %rcx,%r14 4c 89 f1 mov %r14,%rcx 4c 89 ea mov %r13,%rdx 4c 89 e6 mov %r12,%rsi 48 89 df mov %rbx,%rdi 41 b8 01 00 00 00 mov $0x1,%r8d e8 f2 fe ff ff callq ffffffff81298e80 <proc_sys_call_handler> 5b pop %rbx 41 5c pop %r12 41 5d pop %r13 41 5e pop %r14 5d pop %rbp c3 retq 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 00 00 Let's improve the instrumentation to avoid this: 1. Make stackleak_track_stack() save all register that it works with. Use no_caller_saved_registers attribute for that function. This attribute is available for x86_64 and i386 starting from gcc-7. 2. Insert calling stackleak_track_stack() in asm: asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. The input constraint is taken into account during gcc shrink-wrapping optimization. It is needed to be sure that stackleak_track_stack() call is inserted after the prologue of the containing function, when the stack frame is prepared. This work is a deep reengineering of the idea described on grsecurity blog https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lore.kernel.org/r/20200624123330.83226-5-alex.popov@linux.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-24USB: ch9: add "USB_" prefix in front of TEST definesGreg Kroah-Hartman
For some reason, the TEST_ defines in the usb/ch9.h files did not have the USB_ prefix on it, making it a bit confusing when reading the file, as well as not the nicest thing to do in a uapi file. So fix that up and add the USB_ prefix on to them, and fix up all in-kernel usages. This included deleting the duplicate copy in the net2272.h file. Cc: Felipe Balbi <balbi@kernel.org> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Pawel Laszczak <pawell@cadence.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jia-Ju Bai <baijiaju1990@gmail.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jules Irenge <jbi.octave@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Cc: Rob Gill <rrobgill@protonmail.com> Cc: Macpaul Lin <macpaul.lin@mediatek.com> Acked-by: Minas Harutyunyan <hminas@synopsys.com> Acked-by: Bin Liu <b-liu@ti.com> Acked-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Acked-by: Peter Chen <peter.chen@nxp.com> Link: https://lore.kernel.org/r/20200618144206.2655890-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24RDMA: Add support to dump resource tracker in RAW formatMaor Gottlieb
Add support to get resource dump in raw format. It enable drivers to return the entire device specific QP/CQ/MR context without a need from the driver to set each field separately. The raw query returns only the device specific data, general data is still returned by using the existing queries. Example: $ rdma res show mr dev mlx5_1 mrn 2 -r -j [{"ifindex":7,"ifname":"mlx5_1", "data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}] Link: https://lore.kernel.org/r/20200623113043.1228482-9-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24drm/mipi-dbi: Remove ->enabledDaniel Vetter
The atomic helpers try really hard to not lose track of things, duplicating enabled tracking in the driver is at best confusing. Double-enabling or disabling is a bug in atomic helpers. In the fb_dirty function we can just assume that the fb always exists, simple display pipe helpers guarantee that the crtc is only enabled together with the output, so we always have a primary plane around. Now in the update function we need to be a notch more careful, since that can also get called when the crtc is off. And we don't want to upload frames when that's the case, so filter that out too. Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Acked-by: David Lechner <david@lechnology.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Lechner <david@lechnology.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200612160056.2082681-7-daniel.vetter@ffwll.ch
2020-06-24xfrm: policy: match with both mark and mask on user interfacesXin Long
In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"), it would take 'priority' to make a policy unique, and allow duplicated policies with different 'priority' to be added, which is not expected by userland, as Tobias reported in strongswan. To fix this duplicated policies issue, and also fix the issue in commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"), when doing add/del/get/update on user interfaces, this patch is to change to look up a policy with both mark and mask by doing: mark.v == pol->mark.v && mark.m == pol->mark.m and leave the check: (mark & pol->mark.m) == pol->mark.v for tx/rx path only. As the userland expects an exact mark and mask match to manage policies. v1->v2: - make xfrm_policy_mark_match inline and fix the changelog as Tobias suggested. Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark") Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list") Reported-by: Tobias Brunner <tobias@strongswan.org> Tested-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-06-24dmaengine: idxd: fix hw descriptor fields for delta recordDave Jiang
Fix the hw descriptor fields for delta record in user exported idxd.h header. Missing the "expected result mask" field. Reported-by: Mona Hossain <mona.hossain@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/159120526866.65385.536565786678052944.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-06-24xfrm: introduce oseq-may-wrap flagPetr Vaněk
RFC 4303 in section 3.3.3 suggests to disable anti-replay for manually distributed ICVs in which case the sender does not need to monitor or reset the counter. However, the sender still increments the counter and when it reaches the maximum value, the counter rolls over back to zero. This patch introduces new extra_flag XFRM_SA_XFLAG_OSEQ_MAY_WRAP which allows sequence number to cycle in outbound packets if set. This flag is used only in legacy and bmp code, because esn should not be negotiated if anti-replay is disabled (see note in 3.3.3 section). Signed-off-by: Petr Vaněk <pv@excello.cz> Acked-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-06-24Merge tag 'drm-misc-next-2020-06-19' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.9: UAPI Changes: - Add DRM_MODE_TYPE_USERDEF for video modes specified in cmdline. Cross-subsystem Changes: - Assorted devicetree binding updates. - Add might_sleep() to dma_fence_wait(). - Fix fbdev's get_user_pages_fast() handling, and use pin_user_pages. - Small cleanup with IS_BUILTIN in video/fbdev drivers. - Fix video/hdmi coding style for infoframe size. Core Changes: - Silence vblank output during init. - Fix DP-MST corruption during send msg timeout. - Clear leak in drm_gem_objecs_lookup(). - Make newlines work with force connector attribute. - Fix module refcounting error in drm_encoder_slave, and use new i2c api. - Header fix for drm_managed.c - More struct_mutex removal for !legacy drivers: - Remove gem_free_object() - Removal of drm_gem_object_put_unlocked(). - Show current->comm alongside pid in debug printfs. - Add drm_client_modeset_check() + drm_client_framebuffer_flush(). - Replace drm_fb_swab16 with drm_fb_swap that also supports 32-bits. - Remove mode->vrefresh, and compactify drm_display_mode. - Use drm_* macros for logging and warnings. - Add WARN when drm_gem_get_pages is used on a private obj. - Handle importing and imported dmabuf better in shmem helpers. - Small fix for drm/mm hole size comparison, and remove invalid entry optimization. - Add a drm/mm selftest. - Set DSI connector type for DSI panels. - Assorted small fixes and documentation updates. - Fix DDI I2C device registration for MST ports, and flushing on destroy. - Fix master_set return type, used by vmwgfx. - Make the drm_set/drop_master ioctl symmetrical. Driver Changes: Allow iommu in the sun4i driver and use it for sun8i. - Simplify backlight lookup for omap, amba-clcd and tilcdc. - Hold reg_lock for rockchip. - Add support for bridge gpio and lane reordering + polarity to ti-sn65dsi86, and fix clock choice. - Small assorted fixes to tilcdc, vc4, i915, omap, fbdev/sm712fb, fbdev/pxafb, console/newport_con, msm, virtio, udl, malidp, hdlcd, bridge/ti-sn65dsi86, panfrost. - Remove hw cursor support for mgag200, and use simple kms helper + shmem helpers. - Add support for KOE Allow iommu in the sun4i driver and use it for sun8i. - Simplify backlight lookup for omap, amba-clcd and tilcdc. - Hold reg_lock for rockchip. - Add support for bridge gpio and lane reordering + polarity to ti-sn65dsi86, and fix clock choice. - Small assorted fixes to tilcdc, vc4 (multiple), i915. - Remove hw cursor support for mgag200, and use simple kms helper + shmem helpers. - Add support for KOE TX26D202VM0BWA panel. - Use GEM CMA functions in arc, arm, atmel-hlcdc, fsi-dcu, hisilicon, imx, ingenic, komeda, malidp, mcde, meson, msxfb, rcar-du, shmobile, stm, sti, tilcdc, tve200, zte. - Remove gem_print_info. - Improve gem_create_object_helper so udl can use shmem helpers. - Convert vc4 dt bindings to schemas, and add clock properties. - Device initialization cleanups for mgag200. - Add a workaround to fix DP-MST short pulses handling on broken hardware in i915. - Allow build test compiling arm drivers. - Use managed pci functions in mgag200 and ast. - Use dev_groups in malidp. - Add per pixel alpha support for PX30 VOP in rockchip. - Silence deferred probe logs in panfrost. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/001cd9a6-405d-4e29-43d8-354f53ae4e8b@linux.intel.com
2020-06-23scsi: libata: Fix the ata_scsi_dma_need_drain stubChristoph Hellwig
We not only need the stub when libata is disabled, but also if it is modular and there are built-in SAS drivers (which can happen when SCSI_SAS_ATA is disabled). Link: https://lore.kernel.org/r/20200620071302.462974-2-hch@lst.de Fixes: b8f1d1e05817 ("scsi: Wire up ata_scsi_dma_need_drain for SAS HBA drivers") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-06-23net: Do not clear the sock TX queue in sk_set_socket()Tariq Toukan
Clearing the sock TX queue in sk_set_socket() might cause unexpected out-of-order transmit when called from sock_orphan(), as outstanding packets can pick a different TX queue and bypass the ones already queued. This is undesired in general. More specifically, it breaks the in-order scheduling property guarantee for device-offloaded TLS sockets. Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it explicitly only where needed. Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: phy: Allow mdio buses to auto-probe c45 devicesJeremy Linton
The mdiobus_scan logic is currently hardcoded to only work with c22 devices. This works fairly well in most cases, but its possible that a c45 device doesn't respond despite being a standard phy. If the parent hardware is capable, it makes sense to scan for c22 devices before falling back to c45. As we want this to reflect the capabilities of the STA, lets add a field to the mii_bus structure to represent the capability. That way devices can opt into the extended scanning. Existing users should continue to default to c22 only scanning as long as they are zero'ing the structure before use. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: ipv6: Use struct_size() helper and kcalloc()Gustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. Also, remove unnecessary function ipv6_rpl_srh_alloc_size() and replace kzalloc() with kcalloc(), which has a 2-factor argument form for multiplication. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23udp: move gro declarations to net/udp.hEric Dumazet
This removes following warnings : CC net/ipv4/udp_offload.o net/ipv4/udp_offload.c:504:17: warning: no previous prototype for 'udp4_gro_receive' [-Wmissing-prototypes] 504 | struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv4/udp_offload.c:584:29: warning: no previous prototype for 'udp4_gro_complete' [-Wmissing-prototypes] 584 | INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) | ^~~~~~~~~~~~~~~~~ CHECK net/ipv6/udp_offload.c net/ipv6/udp_offload.c:115:16: warning: symbol 'udp6_gro_receive' was not declared. Should it be static? net/ipv6/udp_offload.c:148:29: warning: symbol 'udp6_gro_complete' was not declared. Should it be static? CC net/ipv6/udp_offload.o net/ipv6/udp_offload.c:115:17: warning: no previous prototype for 'udp6_gro_receive' [-Wmissing-prototypes] 115 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv6/udp_offload.c:148:29: warning: no previous prototype for 'udp6_gro_complete' [-Wmissing-prototypes] 148 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff) | ^~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: move tcp gro declarations to net/tcp.hEric Dumazet
This patch removes following (C=1 W=1) warnings for CONFIG_RETPOLINE=y : net/ipv4/tcp_offload.c:306:16: warning: symbol 'tcp4_gro_receive' was not declared. Should it be static? net/ipv4/tcp_offload.c:306:17: warning: no previous prototype for 'tcp4_gro_receive' [-Wmissing-prototypes] net/ipv4/tcp_offload.c:319:29: warning: symbol 'tcp4_gro_complete' was not declared. Should it be static? net/ipv4/tcp_offload.c:319:29: warning: no previous prototype for 'tcp4_gro_complete' [-Wmissing-prototypes] CHECK net/ipv6/tcpv6_offload.c net/ipv6/tcpv6_offload.c:16:16: warning: symbol 'tcp6_gro_receive' was not declared. Should it be static? net/ipv6/tcpv6_offload.c:29:29: warning: symbol 'tcp6_gro_complete' was not declared. Should it be static? CC net/ipv6/tcpv6_offload.o net/ipv6/tcpv6_offload.c:16:17: warning: no previous prototype for 'tcp6_gro_receive' [-Wmissing-prototypes] 16 | struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv6/tcpv6_offload.c:29:29: warning: no previous prototype for 'tcp6_gro_complete' [-Wmissing-prototypes] 29 | INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) | ^~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23tcp: move ipv4_specific to tcp include fileEric Dumazet
Declare ipv4_specific once, in tcp.h were it belongs. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23tcp: move ipv6_specific declaration to remove a warningEric Dumazet
ipv6_specific should be declared in tcp include files, not mptcp. This removes the following warning : CHECK net/ipv6/tcp_ipv6.c net/ipv6/tcp_ipv6.c:78:42: warning: symbol 'ipv6_specific' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23tcp: add declarations to avoid warningsEric Dumazet
Remove these errors: net/ipv6/tcp_ipv6.c:1550:29: warning: symbol 'tcp_v6_rcv' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1770:30: warning: symbol 'tcp_v6_early_demux' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1550:29: warning: no previous prototype for 'tcp_v6_rcv' [-Wmissing-prototypes] 1550 | INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) | ^~~~~~~~~~ net/ipv6/tcp_ipv6.c:1770:30: warning: no previous prototype for 'tcp_v6_early_demux' [-Wmissing-prototypes] 1770 | INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb) | ^~~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23bpf: Fix formatting in documentation for BPF helpersQuentin Monnet
When producing the bpf-helpers.7 man page from the documentation from the BPF user space header file, rst2man complains: <stdin>:2636: (ERROR/3) Unexpected indentation. <stdin>:2640: (WARNING/2) Block quote ends without a blank line; unexpected unindent. Let's fix formatting for the relevant chunk (item list in bpf_ringbuf_query()'s description), and for a couple other functions. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200623153935.6215-1-quentin@isovalent.com
2020-06-23audit: log nftables configuration change eventsRichard Guy Briggs
iptables, ip6tables, arptables and ebtables table registration, replacement and unregistration configuration events are logged for the native (legacy) iptables setsockopt api, but not for the nftables netlink api which is used by the nft-variant of iptables in addition to nftables itself. Add calls to log the configuration actions in the nftables netlink api. This uses the same NETFILTER_CFG record format but overloads the table field. type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=?:0;?:0 family=unspecified entries=2 op=nft_register_gen pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld ... type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=firewalld:1;?:0 family=inet entries=0 op=nft_register_table pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld ... type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=8 op=nft_register_chain pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld ... type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=101 op=nft_register_rule pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld ... type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=87 op=nft_register_setelem pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld ... type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=0 op=nft_register_set pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld For further information please see issue https://github.com/linux-audit/audit-kernel/issues/124 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-23security: Fix hook iteration and default value for inode_copy_up_xattrKP Singh
inode_copy_up_xattr returns 0 to indicate the acceptance of the xattr and 1 to reject it. If the LSM does not know about the xattr, it's expected to return -EOPNOTSUPP, which is the correct default value for this hook. BPF LSM, currently, uses 0 as the default value and thereby falsely allows all overlay fs xattributes to be copied up. The iteration logic is also updated from the "bail-on-fail" call_int_hook to continue on the non-decisive -EOPNOTSUPP and bail out on other values. Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: James Morris <jmorris@namei.org>
2020-06-23Merge up to bpf_probe_read_kernel_str() fix into bpf-nextAlexei Starovoitov
2020-06-23bonding/xfrm: use real_dev instead of slave_devJarod Wilson
Rather than requiring every hw crypto capable NIC driver to do a check for slave_dev being set, set real_dev in the xfrm layer and xso init time, and then override it in the bonding driver as needed. Then NIC drivers can always use real_dev, and at the same time, we eliminate the use of a variable name that probably shouldn't have been used in the first place, particularly given recent current events. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23ipv6: fib6: avoid indirect calls from fib6_rule_lookupBrian Vazquez
It was reported that a considerable amount of cycles were spent on the expensive indirect calls on fib6_rule_lookup. This patch introduces an inline helper called pol_route_func that uses the indirect_call_wrappers to avoid the indirect calls. This patch saves around 50ns per call. Performance was measured on the receiver by checking the amount of syncookies that server was able to generate under a synflood load. Traffic was generated using trafgen[1] which was pushing around 1Mpps on a single queue. Receiver was using only one rx queue which help to create a bottle neck and make the experiment rx-bounded. These are the syncookies generated over 10s from the different runs: Whithout the patch: TcpExtSyncookiesSent 3553749 0.0 TcpExtSyncookiesSent 3550895 0.0 TcpExtSyncookiesSent 3553845 0.0 TcpExtSyncookiesSent 3541050 0.0 TcpExtSyncookiesSent 3539921 0.0 TcpExtSyncookiesSent 3557659 0.0 TcpExtSyncookiesSent 3526812 0.0 TcpExtSyncookiesSent 3536121 0.0 TcpExtSyncookiesSent 3529963 0.0 TcpExtSyncookiesSent 3536319 0.0 With the patch: TcpExtSyncookiesSent 3611786 0.0 TcpExtSyncookiesSent 3596682 0.0 TcpExtSyncookiesSent 3606878 0.0 TcpExtSyncookiesSent 3599564 0.0 TcpExtSyncookiesSent 3601304 0.0 TcpExtSyncookiesSent 3609249 0.0 TcpExtSyncookiesSent 3617437 0.0 TcpExtSyncookiesSent 3608765 0.0 TcpExtSyncookiesSent 3620205 0.0 TcpExtSyncookiesSent 3601895 0.0 Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an estimate of 50 ns per packet. [1] http://netsniff-ng.org/ Changelog since v1: - Change ordering in the ICW (Paolo Abeni) Cc: Luigi Rizzo <lrizzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23indirect_call_wrapper: extend indirect wrapper to support up to 4 callsBrian Vazquez
There are many places where 2 annotations are not enough. This patch adds INDIRECT_CALL_3 and INDIRECT_CALL_4 to cover such cases. Signed-off-by: Brian Vazquez <brianvv@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24bpf: Switch most helper return values from 32-bit int to 64-bit longAndrii Nakryiko
Switch most of BPF helper definitions from returning int to long. These definitions are coming from comments in BPF UAPI header and are used to generate bpf_helper_defs.h (under libbpf) to be later included and used from BPF programs. In actual in-kernel implementation, all the helpers are defined as returning u64, but due to some historical reasons, most of them are actually defined as returning int in UAPI (usually, to return 0 on success, and negative value on error). This actually causes Clang to quite often generate sub-optimal code, because compiler believes that return value is 32-bit, and in a lot of cases has to be up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values, before they can be used further in BPF code. Besides just "polluting" the code, these 32-bit shifts quite often cause problems for cases in which return value matters. This is especially the case for the family of bpf_probe_read_str() functions. There are few other similar helpers (e.g., bpf_read_branch_records()), in which return value is used by BPF program logic to record variable-length data and process it. For such cases, BPF program logic carefully manages offsets within some array or map to read variable-length data. For such uses, it's crucial for BPF verifier to track possible range of register values to prove that all the accesses happen within given memory bounds. Those extraneous zero-extending bit shifts, inserted by Clang (and quite often interleaved with other code, which makes the issues even more challenging and sometimes requires employing extra per-variable compiler barriers), throws off verifier logic and makes it mark registers as having unknown variable offset. We'll study this pattern a bit later below. Another common pattern is to check return of BPF helper for non-zero state to detect error conditions and attempt alternative actions in such case. Even in this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode quite often leads to sub-optimal and unnecessary extra code. We'll look at this pattern as well. Clang's BPF target supports two modes of code generation: ALU32, in which it is capable of using lower 32-bit parts of registers, and no-ALU32, in which only full 64-bit registers are being used. ALU32 mode somewhat mitigates the above described problems, but not in all cases. This patch switches all the cases in which BPF helpers return 0 or negative error from returning int to returning long. It is shown below that such change in definition leads to equivalent or better code. No-ALU32 mode benefits more, but ALU32 mode doesn't degrade or still gets improved code generation. Another class of cases switched from int to long are bpf_probe_read_str()-like helpers, which encode successful case as non-negative values, while still returning negative value for errors. In all of such cases, correctness is preserved due to two's complement encoding of negative values and the fact that all helpers return values with 32-bit absolute value. Two's complement ensures that for negative values higher 32 bits are all ones and when truncated, leave valid negative 32-bit value with the same value. Non-negative values have upper 32 bits set to zero and similarly preserve value when high 32 bits are truncated. This means that just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't require any extra shifts). To minimize the chances of regressions, two code patterns were investigated, as mentioned above. For both patterns, BPF assembly was analyzed in ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and new 64-bit long return type. Case 1. Variable-length data reading and concatenation. This is quite ubiquitous pattern in tracing/monitoring applications, reading data like process's environment variables, file path, etc. In such case, many pieces of string-like variable-length data are read into a single big buffer, and at the end of the process, only a part of array containing actual data is sent to user-space for further processing. This case is tested in test_varlen.c selftest (in the next patch). Code flow is roughly as follows: void *payload = &sample->payload; u64 len; len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1); if (len <= MAX_SZ1) { payload += len; sample->len1 = len; } len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2); if (len <= MAX_SZ2) { payload += len; sample->len2 = len; } /* and so on */ sample->total_len = payload - &sample->payload; /* send over, e.g., perf buffer */ There could be two variations with slightly different code generated: when len is 64-bit integer and when it is 32-bit integer. Both variations were analysed. BPF assembly instructions between two successive invocations of bpf_probe_read_kernel_str() were used to check code regressions. Results are below, followed by short analysis. Left side is using helpers with int return type, the right one is after the switch to long. ALU32 + INT ALU32 + LONG =========== ============ 64-BIT (13 insns): 64-BIT (10 insns): ------------------------------------ ------------------------------------ 17: call 115 17: call 115 18: if w0 > 256 goto +9 <LBB0_4> 18: if r0 > 256 goto +6 <LBB0_4> 19: w1 = w0 19: r1 = 0 ll 20: r1 <<= 32 21: *(u64 *)(r1 + 0) = r0 21: r1 s>>= 32 22: r6 = 0 ll 22: r2 = 0 ll 24: r6 += r0 24: *(u64 *)(r2 + 0) = r1 00000000000000c8 <LBB0_4>: 25: r6 = 0 ll 25: r1 = r6 27: r6 += r1 26: w2 = 256 00000000000000e0 <LBB0_4>: 27: r3 = 0 ll 28: r1 = r6 29: call 115 29: w2 = 256 30: r3 = 0 ll 32: call 115 32-BIT (11 insns): 32-BIT (12 insns): ------------------------------------ ------------------------------------ 17: call 115 17: call 115 18: if w0 > 256 goto +7 <LBB1_4> 18: if w0 > 256 goto +8 <LBB1_4> 19: r1 = 0 ll 19: r1 = 0 ll 21: *(u32 *)(r1 + 0) = r0 21: *(u32 *)(r1 + 0) = r0 22: w1 = w0 22: r0 <<= 32 23: r6 = 0 ll 23: r0 >>= 32 25: r6 += r1 24: r6 = 0 ll 00000000000000d0 <LBB1_4>: 26: r6 += r0 26: r1 = r6 00000000000000d8 <LBB1_4>: 27: w2 = 256 27: r1 = r6 28: r3 = 0 ll 28: w2 = 256 30: call 115 29: r3 = 0 ll 31: call 115 In ALU32 mode, the variant using 64-bit length variable clearly wins and avoids unnecessary zero-extension bit shifts. In practice, this is even more important and good, because BPF code won't need to do extra checks to "prove" that payload/len are within good bounds. 32-bit len is one instruction longer. Clang decided to do 64-to-32 casting with two bit shifts, instead of equivalent `w1 = w0` assignment. The former uses extra register. The latter might potentially lose some range information, but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256] after check at 18:, and shifting 32 bits left/right keeps that range intact. We should probably look into Clang's logic and see why it chooses bitshifts over sub-register assignments for this. NO-ALU32 + INT NO-ALU32 + LONG ============== =============== 64-BIT (14 insns): 64-BIT (10 insns): ------------------------------------ ------------------------------------ 17: call 115 17: call 115 18: r0 <<= 32 18: if r0 > 256 goto +6 <LBB0_4> 19: r1 = r0 19: r1 = 0 ll 20: r1 >>= 32 21: *(u64 *)(r1 + 0) = r0 21: if r1 > 256 goto +7 <LBB0_4> 22: r6 = 0 ll 22: r0 s>>= 32 24: r6 += r0 23: r1 = 0 ll 00000000000000c8 <LBB0_4>: 25: *(u64 *)(r1 + 0) = r0 25: r1 = r6 26: r6 = 0 ll 26: r2 = 256 28: r6 += r0 27: r3 = 0 ll 00000000000000e8 <LBB0_4>: 29: call 115 29: r1 = r6 30: r2 = 256 31: r3 = 0 ll 33: call 115 32-BIT (13 insns): 32-BIT (13 insns): ------------------------------------ ------------------------------------ 17: call 115 17: call 115 18: r1 = r0 18: r1 = r0 19: r1 <<= 32 19: r1 <<= 32 20: r1 >>= 32 20: r1 >>= 32 21: if r1 > 256 goto +6 <LBB1_4> 21: if r1 > 256 goto +6 <LBB1_4> 22: r2 = 0 ll 22: r2 = 0 ll 24: *(u32 *)(r2 + 0) = r0 24: *(u32 *)(r2 + 0) = r0 25: r6 = 0 ll 25: r6 = 0 ll 27: r6 += r1 27: r6 += r1 00000000000000e0 <LBB1_4>: 00000000000000e0 <LBB1_4>: 28: r1 = r6 28: r1 = r6 29: r2 = 256 29: r2 = 256 30: r3 = 0 ll 30: r3 = 0 ll 32: call 115 32: call 115 In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much superior code, as expected, eliminating unnecessary bit shifts. For 32-bit len, code is identical. So overall, only ALU-32 32-bit len case is more-or-less equivalent and the difference stems from internal Clang decision, rather than compiler lacking enough information about types. Case 2. Let's look at the simpler case of checking return result of BPF helper for errors. The code is very simple: long bla; if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0)) return 1; else return 0; ALU32 + CHECK (9 insns) ALU32 + CHECK (9 insns) ==================================== ==================================== 0: r1 = r10 0: r1 = r10 1: r1 += -8 1: r1 += -8 2: w2 = 8 2: w2 = 8 3: r3 = 0 3: r3 = 0 4: call 113 4: call 113 5: w1 = w0 5: r1 = r0 6: w0 = 1 6: w0 = 1 7: if w1 != 0 goto +1 <LBB2_2> 7: if r1 != 0 goto +1 <LBB2_2> 8: w0 = 0 8: w0 = 0 0000000000000048 <LBB2_2>: 0000000000000048 <LBB2_2>: 9: exit 9: exit Almost identical code, the only difference is the use of full register assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit architectures, new BPF assembly might be slightly less optimal, in theory. But one can argue that's not a big issue, given that use of full registers is still prevalent (e.g., for parameter passing). NO-ALU32 + CHECK (11 insns) NO-ALU32 + CHECK (9 insns) ==================================== ==================================== 0: r1 = r10 0: r1 = r10 1: r1 += -8 1: r1 += -8 2: r2 = 8 2: r2 = 8 3: r3 = 0 3: r3 = 0 4: call 113 4: call 113 5: r1 = r0 5: r1 = r0 6: r1 <<= 32 6: r0 = 1 7: r1 >>= 32 7: if r1 != 0 goto +1 <LBB2_2> 8: r0 = 1 8: r0 = 0 9: if r1 != 0 goto +1 <LBB2_2> 0000000000000048 <LBB2_2>: 10: r0 = 0 9: exit 0000000000000058 <LBB2_2>: 11: exit NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit shifts. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
2020-06-23net: qed: fix left elements count calculationAlexander Lobakin
qed_chain_get_element_left{,_u32} returned 0 when the difference between producer and consumer page count was equal to the total page count. Fix this by conditional expanding of producer value (vs unconditional). This allowed to eliminate normalizaton against total page count, which was the cause of this bug. Misc: replace open-coded constants with common defines. Fixes: a91eb52abb50 ("qed: Revisit chain implementation") Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>