summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-27nvmet: switch check for subsystem typeHannes Reinecke
Invert the check for discovery subsystem type to allow for additional discovery subsystem types. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme: add new discovery log page entry definitionsHannes Reinecke
TP8014 adds a new SUBTYPE value and a new field EFLAGS for the discovery log page entry. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet-tcp: fix data digest pointer calculationVarun Prakash
exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases &cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting &cmd->exp_ddgst to u8 *. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme-tcp: fix data digest pointer calculationVarun Prakash
ddgst is of type __le32, &req->ddgst + req->offset increases &req->ddgst by 4 * req->offset, fix this by type casting &req->ddgst to u8 *. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme-tcp: fix possible req->offset corruptionVarun Prakash
With commit db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context") r2t and response PDU can get processed while send function is executing. Current data digest send code uses req->offset after kernel_sendmsg(), this creates a race condition where req->offset gets reset before it is used in send function. This can happen in two cases - 1. Target sends r2t PDU which resets req->offset. 2. Target send response PDU which completes the req and then req is used for a new command, nvme_tcp_setup_cmd_pdu() resets req->offset. Fix this by storing req->offset in a local variable and using this local variable after kernel_sendmsg(). Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26doc: Fix typo in request queue sysfs documentationDamien Le Moal
Fix a typo (are -> as) in the introduction paragraph of Documentation/block/queue-sysfs.rst. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-6-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26doc: document sysfs queue/independent_access_ranges attributesDamien Le Moal
Update the file Documentation/block/queue-sysfs.rst to add a description of a device queue sysfs entries related to independent access ranges (e.g. concurrent positioning ranges for multi-actuator hard-disks). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-5-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26libata: support concurrent positioning ranges logDamien Le Moal
Add support to discover if an ATA device supports the Concurrent Positioning Ranges data log (address 0x47), indicating that the device is capable of seeking to multiple different locations in parallel using multiple actuators serving different LBA ranges. Also add support to translate the concurrent positioning ranges log into its equivalent Concurrent Positioning Ranges VPD page B9h in libata-scsi.c. The format of the Concurrent Positioning Ranges Log is defined in ACS-5 r9. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-4-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26scsi: sd: add concurrent positioning ranges supportDamien Le Moal
Add the sd_read_cpr() function to the sd scsi disk driver to discover if a device has multiple concurrent positioning ranges (i.e. multiple actuators on an HDD). The existence of VPD page B9h indicates if a device has multiple concurrent positioning ranges. The page content describes each range supported by the device. sd_read_cpr() is called from sd_revalidate_disk() and uses the block layer functions disk_alloc_independent_access_ranges() and disk_set_independent_access_ranges() to represent the set of actuators of the device as independent access ranges. The format of the Concurrent Positioning Ranges VPD page B9h is defined in section 6.6.6 of SBC-5. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-3-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26block: Add independent access ranges supportDamien Le Moal
The Concurrent Positioning Ranges VPD page (for SCSI) and data log page (for ATA) contain parameters describing the set of contiguous LBAs that can be served independently by a single LUN multi-actuator hard-disk. Similarly, a logically defined block device composed of multiple disks can in some cases execute requests directed at different sector ranges in parallel. A dm-linear device aggregating 2 block devices together is an example. This patch implements support for exposing a block device independent access ranges to the user through sysfs to allow optimizing device accesses to increase performance. To describe the set of independent sector ranges of a device (actuators of a multi-actuator HDDs or table entries of a dm-linear device), The type struct blk_independent_access_ranges is introduced. This structure describes the sector ranges using an array of struct blk_independent_access_range structures. This range structure defines the start sector and number of sectors of the access range. The ranges in the array cannot overlap and must contain all sectors within the device capacity. The function disk_set_independent_access_ranges() allows a device driver to signal to the block layer that a device has multiple independent access ranges. In this case, a struct blk_independent_access_ranges is attached to the device request queue by the function disk_set_independent_access_ranges(). The function disk_alloc_independent_access_ranges() is provided for drivers to allocate this structure. struct blk_independent_access_ranges contains kobjects (struct kobject) to expose to the user through sysfs the set of independent access ranges supported by a device. When the device is initialized, sysfs registration of the ranges information is done from blk_register_queue() using the block layer internal function disk_register_independent_access_ranges(). If a driver calls disk_set_independent_access_ranges() for a registered queue, e.g. when a device is revalidated, disk_set_independent_access_ranges() will execute disk_register_independent_access_ranges() to update the sysfs attribute files. The sysfs file structure created starts from the independent_access_ranges sub-directory and contains the start sector and number of sectors of each range, with the information for each range grouped in numbered sub-directories. E.g. for a dual actuator HDD, the user sees: $ tree /sys/block/sdk/queue/independent_access_ranges/ /sys/block/sdk/queue/independent_access_ranges/ |-- 0 | |-- nr_sectors | `-- sector `-- 1 |-- nr_sectors `-- sector For a regular device with a single access range, the independent_access_ranges sysfs directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock mutexes are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of independent access ranges is added in the new file block/blk-ia-ranges.c. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26net/mlx5: Lag, Make mlx5_lag_is_multipath() be static inlineMaor Dickman
Fix "no previous prototype" W=1 warnings when CONFIG_MLX5_CORE_EN is not set: drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h:34:6: error: no previous prototype for ‘mlx5_lag_is_multipath’ [-Werror=missing-prototypes] 34 | bool mlx5_lag_is_multipath(struct mlx5_core_dev *dev) { return false; } | ^~~~~~~~~~~~~~~~~~~~~ Fixes: 14fe2471c628 ("net/mlx5: Lag, change multipath and bonding to be mutually exclusive") Signed-off-by: Maor Dickman <maord@nvidia.com>
2021-10-26net/mlx5e: Prevent HW-GRO and CQE-COMPRESS features operate togetherKhalid Manaa
HW-GRO and CQE-COMPRESS are mutually exclusive, this commit adds this restriction. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add HW-GRO offloadKhalid Manaa
This commit introduces HW-GRO offload by using the SHAMPO feature - Add set feature handler for HW-GRO. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add HW_GRO statisticsKhalid Manaa
This patch adds HW_GRO counters to RX packets statistics: - gro_match_packets: counter of received packets with set match flag. - gro_packets: counter of received packets over the HW_GRO feature, this counter is increased by one for every received HW_GRO cqe. - gro_bytes: counter of received bytes over the HW_GRO feature, this counter is increased by the received bytes for every received HW_GRO cqe. - gro_skbs: counter of built HW_GRO skbs, increased by one when we flush HW_GRO skb (when we call a napi_gro_receive with hw_gro skb). - gro_large_hds: counter of received packets with large headers size, in case the packet needs new SKB, the driver will allocate new one and will not use the headers entry to build it. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: HW_GRO cqe handler implementationKhalid Manaa
this patch updates the SHAMPO CQE handler to support HW_GRO, changes in the SHAMPO CQE handler: - CQE match and flush fields are used to determine if to build new skb using the new received packet, or to add the received packet data to the existing RQ.hw_gro_skb, also this fields are used to determine when to flush the skb. - in the end of the function mlx5e_poll_rx_cq the RQ.hw_gro_skb is flushed. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add data path for SHAMPO featureBen Ben-Ishay
The header buffer is used to store the headers of the rx packets. The header buffer size deduced from WorkQueue size + restriction of max packets per WorkQueueElement. This commit adds the functionality for posting/updating memory for the header buffer during the posting/updating of WQEs. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add handle SHAMPO cqe supportKhalid Manaa
This patch adds the new CQE SHAMPO fields: - flush: indicates that we must close the current session and pass the SKB to the network stack. - match: indicates that the current packet matches the oppened session, the packet will be merge into the current SKB. - header_size: the size of the packet headers that written into the headers buffer. - header_entry_index: the entry index in the headers buffer. - data_offset: packets data offset in the WQE. Also new cqe handler is added to handle SHAMPO packets: - The new handler uses CQE SHAMPO fields to build the SKB. CQE's Flush and match fields are not used in this patch, packets are not merged in this patch. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add control path for SHAMPO featureBen Ben-Ishay
This commit introduces the control path infrastructure for SHAMPO feature. SHAMPO feature enables packet stitching by splitting packets to header and payload, the header is placed on a dedicated buffer and the payload on the RX ring, this allows stitching the data part of a flow together continuously in the receive buffer. SHAMPO feature is implemented as linked list striding RQ feature. To support packets splitting and payload stitching: - Enlarge the ICOSQ and the correspond CQ to support the header buffer memory regions. - Add support to create linked list striding RQ with SHAMPO feature set in the open_rq function. - Add deallocation function and corresponded calls for SHAMPO header buffer. - Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header buffer. - Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Add support to klm_umr_wqeBen Ben-Ishay
This commit adds the needed definitions for using the klm_umr_wqe. UMR stands for user-mode memory registration, is a mechanism to alter address translation properties of MKEY by posting WorkQueueElement aka WQE on send queue. MKEY stands for memory key, MKEY are used to describe a region in memory that can be later used by HW. KLM stands for {Key, Length, MemVa}, KLM_MKEY is indirect MKEY that enables to map multiple memory spaces with different sizes in unified MKEY. klm_umr_wqe is a UMR that use to update a KLM_MKEY. SHAMPO feature uses KLM_MKEY for memory registration of his header buffer. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Rename TIR lro functions to TIR packet merge functionsKhalid Manaa
This series introduces new packet merge type, therefore rename lro functions to packet merge to support the new merge type: - Generalize + rename mlx5e_build_tir_ctx_lro to mlx5e_build_tir_ctx_packet_merge. - Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge. - Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge. - Rename lro_en in mlx5e_params to packet_merge_type type and combine packet_merge params into one struct mlx5e_packet_merge_param. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5: Add SHAMPO caps, HW bits and enumerationsBen Ben-Ishay
This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure, SHAMPO related HW spec hardware fields and enumerations. SHAMPO stands for: split headers and merge payload offload. SHAMPO new fields: WQ: - headers_mkey: mkey that represents the headers buffer, where the packets headers will be written by the HW. - shampo_enable: flag to verify if the WQ supports SHAMPO feature. - log_reservation_size: the log of the reservation size where the data of the packet will be written by the HW. - log_max_num_of_packets_per_reservation: log of the maximum number of packets that can be written to the same reservation. - log_headers_entry_size: log of the header entry size of the headers buffer. - log_headers_buffer_entry_num: log of the entries number of the headers buffer. RQ: - shampo_no_match_alignment_granularity: the HW alignment granularity in case the received packet doesn't match the current session. - shampo_match_criteria_type: the type of match criteria. - reservation_timeout: the maximum time that the HW will hold the reservation. mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature: - shampo_log_max_reservation_size: the maximum allowed value of the field WQ.log_reservation_size. - log_reservation_size: the minimum allowed value of the field WQ.log_reservation_size. - shampo_min_mss_size: the minimum payload size of packet that can open a new session or be merged to a session. - shampo_max_log_headers_entry_size: the maximum allowed value of the field WQ.log_headers_entry_size Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5e: Rename lro_timeout to packet_merge_timeoutBen Ben-Ishay
TIR stands for transport interface receive, the TIR object is responsible for performing all transport related operations on the receive side like packet processing, demultiplexing the packets to different RQ's, etc. lro_timeout is a field in the TIR that is used to set the timeout for lro session, this series introduces new packet merge type, therefore rename lro_timeout to packet_merge_timeout for all packet merge types. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net: Prevent HW-GRO and LRO features operate togetherBen Ben-ishay
LRO and HW-GRO are mutually exclusive, this commit adds this restriction in netdev_fix_feature. HW-GRO is preferred, that means in case both HW-GRO and LRO features are requested, LRO is cleared. Signed-off-by: Ben Ben-ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26lib: bitmap: Introduce node-aware alloc APITariq Toukan
Expose new node-aware API for bitmap allocation: bitmap_alloc_node() / bitmap_zalloc_node(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26tracing/histogram: Document expression arithmetic and constantsKalesh Singh
Histogram expressions now support division, and multiplication in addition to the already supported subtraction and addition operators. Numeric constants can also be used in a hist trigger expressions or assigned to a variable and used by refernce in an expression. Link: https://lkml.kernel.org/r/20211025200852.3002369-9-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Optimize division by a power of 2Kalesh Singh
The division is a slow operation. If the divisor is a power of 2, use a shift instead. Results were obtained using Android's version of perf (simpleperf[1]) as described below: 1. hist_field_div() is modified to call 2 test functions: test_hist_field_div_[not]_optimized(); passing them the same args. Use noinline and volatile to ensure these are not optimized out by the compiler. 2. Create a hist event trigger that uses division: events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>' >> trigger events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x' >> trigger 3. Run Android's lmkd_test[2] to generate rss_stat events, and record CPU samples with Android's simpleperf: simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g -f 2000 -o perf.data == Results == Divisor is a power of 2 (divisor == 32): test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles If the divisor is a power of 2, the optimized version is ~5.3x faster. Divisor is not a power of 2 (divisor == 33): test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles If the divisor is not a power of 2, as expected, the optimized version is slightly slower (~24% slower). [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Covert expr to const if both operands are constantsKalesh Singh
If both operands of a hist trigger expression are constants, convert the expression to a constant. This optimization avoids having to perform the same calculation multiple times and also saves on memory since the merged constants are represented by a single struct hist_field instead or multiple. Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Simplify handling of .sym-offset in expressionsKalesh Singh
The '-' in .sym-offset can confuse the hist trigger arithmetic expression parsing. Simplify the handling of this by replacing the 'sym-offset' with 'symXoffset'. This allows us to correctly evaluate expressions where the user may have inadvertently added a .sym-offset modifier to one of the operands in an expression, instead of bailing out. In this case the .sym-offset has no effect on the evaluation of the expression. The only valid use of the .sym-offset is as a hist key modifier. Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix operator precedence for hist triggers expressionKalesh Singh
The current histogram expression evaluation logic evaluates the expression from right to left. This can lead to incorrect results if the operations are not associative (as is the case for subtraction and, the now added, division operators). e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2 64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2 Division and multiplication are currently limited to single operation expression due to operator precedence support not yet implemented. Rework the expression parsing to support the correct evaluation of expressions containing operators of different precedences; and fix the associativity error by evaluating expressions with operators of the same precedence from left to right. Examples: (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \ >> event/trigger (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add division and multiplication support for hist triggersKalesh Singh
Adds basic support for division and multiplication operations for hist trigger variable expressions. For simplicity this patch only supports, division and multiplication for a single operation expression (e.g. x=$a/$b), as currently expressions are always evaluated right to left. This can lead to some incorrect results: e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger 8-4-2 should evaluate to 2 i.e. (8-4)-2 but currently x evaluate to 6 i.e. 8-(4-2) Multiplication and division in sub-expressions will work correctly, once correct operator precedence support is added (See next patch in this series). For the undefined case of division by 0, the histogram expression evaluates to (u64)(-1). Since this cannot be detected when the expression is created, it is the responsibility of the user to be aware and account for this possibility. Examples: echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \ >> event/trigger echo 'hist:keys=common_pid:y=5*$b' \ >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add support for creating hist trigger variables from literalKalesh Singh
Currently hist trigger expressions don't support the use of numeric literals: e.g. echo 'hist:keys=common_pid:x=$y-1234' --> is not valid expression syntax Having the ability to use numeric constants in hist triggers supports a wider range of expressions for creating variables. Add support for creating trace event histogram variables from numeric literals. e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger A negative numeric constant is created, using unary minus operator (parentheses are required). e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger Constants can be used with division/multiplication (added in the next patch in this series) to implement granularity filters for frequent trace events. For instance we can limit emitting the rss_stat trace event to when there is a 512KB cross over in the rss size: # Create a synthetic event to monitor instead of the high frequency # rss_stat event echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr; int member; long size' >> tracing/synthetic_events # Create a hist trigger that emits the synthetic rss_stat_throttled # event only when the rss size crosses a 512KB boundary. echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket) .rss_stat_throttled(mm_id,curr,member,size)' >> events/kmem/rss_stat/trigger A use case for using constants with addition/subtraction is not yet known, but for completeness the use of constants are supported for all operators. Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26selftests/ftrace: Stop tracing while reading the trace file by defaultMasami Hiramatsu
Stop tracing while reading the trace file by default, to prevent the test results while checking it and to avoid taking a long time to check the result. If there is any testcase which wants to test the tracing while reading the trace file, please override this setting inside the test case. This also recovers the pause-on-trace when clean it up. Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27Merge tag 'amd-drm-fixes-5.15-2021-10-21' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.15-2021-10-21: amdgpu: - Fix a potential out of bounds write in debugfs - Fix revision handling for Yellow Carp - Display fixes for Yellow Carp Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021203430.4578-1-alexander.deucher@amd.com
2021-10-26firmware/psci: fix application of sizeof to pointerjing yangyang
sizeof when applied to a pointer typed expression gives the size of the pointer. ./drivers/firmware/psci/psci_checker.c:158:41-47: ERROR application of sizeof to pointer This issue was detected with the help of Coccinelle. Fixes: 7401056de5f8 ("drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests") Cc: stable@vger.kernel.org Reported-by: Zeal Robot <zealci@zte.com.cn> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: jing yangyang <jing.yangyang@zte.com.cn> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-26Merge tag 'arm-soc-fixes-5.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "One last set of small fixes for the soc tree: - Incorrect ethernet phy settings found on i.mx and allwinner platforms - a revert for a Qualcomm DT change that caused a boot regression - four patches for incorrect settings in i.MX DT files - new MAINTAINER file entries for dhcom boards - a Kconfig fix for a reset driver that became unselectable - three more code changes for bugs in reset drivers" * tag 'arm-soc-fixes-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1 Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target" arm64: dts: imx8mm-kontron: Fix connection type for VSC8531 RGMII PHY arm64: dts: imx8mm-kontron: Fix CAN SPI clock frequency arm64: dts: imx8mm-kontron: Fix polarity of reg_rst_eth2 arm64: dts: imx8mm-kontron: Set lower limit of VDD_SNVS to 800 mV arm64: dts: imx8mm-kontron: Make sure SOC and DRAM supply voltages are correct reset: socfpga: add empty driver allowing consumers to probe reset: tegra-bpmp: Handle errors in BPMP response reset: pistachio: Re-enable driver selection reset: brcmstb-rescal: fix incorrect polarity of status bit ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
2021-10-26block: schedule queue restart after BLK_STS_ZONE_RESOURCENaohiro Aota
When dispatching a zone append write request to a SCSI zoned block device, if the target zone of the request is already locked, the device driver will return BLK_STS_ZONE_RESOURCE and the request will be pushed back to the hctx dipatch queue. The queue will be marked as RESTART in dd_finish_request() and restarted in __blk_mq_free_request(). However, this restart applies to the hctx of the completed request. If the requeued request is on a different hctx, dispatch will no be retried until another request is submitted or the next periodic queue run triggers, leading to up to 30 seconds latency for the requeued request. Fix this problem by scheduling a queue restart similarly to the BLK_STS_RESOURCE case or when we cannot get the budget. Also, consolidate the checks into the "need_resource" variable to simplify the condition. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Niklas Cassel <Niklas.Cassel@wdc.com> Link: https://lore.kernel.org/r/20211026165127.4151055-1-naohiro.aota@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26io_uring: don't assign write hint in the read pathJens Axboe
Move this out of the generic read/write prep path, and place it in the write specific kiocb setup instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-10-26 We've added 12 non-merge commits during the last 7 day(s) which contain a total of 23 files changed, 118 insertions(+), 98 deletions(-). The main changes are: 1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen. 2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang. 3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai. 4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer. 5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun. 6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian. 7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix potential race in tail call compatibility check bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET selftests/bpf: Use recv_timeout() instead of retries net: Implement ->sock_is_readable() for UDP and AF_UNIX skmsg: Extract and reuse sk_msg_is_readable() net: Rename ->stream_memory_read to ->sock_is_readable tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function cgroup: Fix memory leak caused by missing cgroup_bpf_offline bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch() bpf: Prevent increasing bpf_jit_limit above max bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT bpf: Define bpf_jit_alloc_exec_limit for riscv JIT ==================== Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26MAINTAINERS: Update KPROBES and TRACING entriesTiezhu Yang
There is no git tree for KPROBES in MAINTAINERS, it is not convinent to rebase, lib/test_kprobes.c and samples/kprobes belong to kprobe, so add git tree and missing files for KPROBES, and also use linux-trace.git for TRACING to avoid confusing. Link: https://lkml.kernel.org/r/1635213091-24387-5-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26test_kprobes: Move it from kernel/ to lib/Tiezhu Yang
Since config KPROBES_SANITY_TEST is in lib/Kconfig.debug, it is better to let test_kprobes.c in lib/, just like other similar tests found in lib/. Link: https://lkml.kernel.org/r/1635213091-24387-4-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26docs, kprobes: Remove invalid URL and add new referenceTiezhu Yang
The following reference is invalid, remove it. https://www.ibm.com/developerworks/library/l-kprobes/index.html Add the following new reference "An introduction to KProbes": https://lwn.net/Articles/132196/ Link: https://lkml.kernel.org/r/1635213091-24387-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26samples/kretprobes: Fix return value if register_kretprobe() failedTiezhu Yang
Use the actual return value instead of always -1 if register_kretprobe() failed. E.g. without this patch: # insmod samples/kprobes/kretprobe_example.ko func=no_such_func insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Operation not permitted With this patch: # insmod samples/kprobes/kretprobe_example.ko func=no_such_func insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Unknown symbol in module Link: https://lkml.kernel.org/r/1635213091-24387-2-git-send-email-yangtiezhu@loongson.cn Fixes: 804defea1c02 ("Kprobes: move kprobe examples to samples/") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26lib/bootconfig: Fix the xbc_get_info kerneldocMasami Hiramatsu
Fix the kernel doc of xbc_get_info() to add '@' to the parameters. Link: https://lkml.kernel.org/r/163525086738.676803.15352231787913236933.stgit@devnote2 Fixes: e306220cb7b7 ("bootconfig: Add xbc_get_info() for the node information") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26kprobes: Add a test case for stacktrace from kretprobe handlerMasami Hiramatsu
Add a test case for stacktrace from kretprobe handler and nested kretprobe handlers. This test checks both of stack trace inside kretprobe handler and stack trace from pt_regs. Those stack trace must include actual function return address instead of kretprobe trampoline. The nested kretprobe stacktrace test checks whether the unwinder can correctly unwind the call frame on the stack which has been modified by the kretprobe. Since the stacktrace on kretprobe is correctly fixed only on x86, this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE which tells user that the stacktrace on kretprobe is correct or not. The test results will be shown like below; TAP version 14 1..1 # Subtest: kprobes_test 1..6 ok 1 - test_kprobe ok 2 - test_kprobes ok 3 - test_kretprobe ok 4 - test_kretprobes ok 5 - test_stacktrace_on_kretprobe ok 6 - test_stacktrace_on_nested_kretprobe # kprobes_test: pass:6 fail:0 skip:0 total:6 # Totals: pass:6 fail:0 skip:0 total:6 ok 1 - kprobes_test Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26lib/bootconfig: Make xbc_alloc_mem() and xbc_free_mem() as __init functionMasami Hiramatsu
Since the xbc_alloc_mem() and xbc_free_mem() are used from the __init functions and memblock_alloc() is __init function, make them __init functions too. Link: https://lkml.kernel.org/r/163515075747.547467.5746167540626712819.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 4ee1b4cac236 ("bootconfig: Cleanup dummy headers in tools/bootconfig") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26ftrace/sh: Add arch_ftrace_ops_list_func stub to have compressed image still ↵Steven Rostedt (VMware)
link Using the linker script to fix an issue where some archs call the function tracer with just the ip (instruction pointer) and pip (parent instruction pointer) where as more up to date archs also pass in the associated ftrace_ops and the ftrace_regs pointer, the generic code will be called either with two parameters or four. To avoid any C undefined behavior of calling two parameters to four or four to two parameter function, two functions are created, where a preprocessor macro uses the one that matches the architecture. As the function pointers for them may be different, a typecast is used. But this triggers issues with newer compilers that will fail due to -Werror. A linker trick is now used to map the generic function to the function that is used (note the generic function is only used to set the default function callback). The linker trick defines ftrace_ops_list_func (the generic function) to arch_ftrace_ops_list_func (the arch defined one). Link: https://lore.kernel.org/all/20200617165616.52241bde@oasis.local.home/ But this fails sh arch because their linker script is included in their compressed image that does not define arch_ftrace_ops_list_func at all sh4-linux-ld:arch/sh/boot/compressed/../../kernel/vmlinux.lds:32: undefined symbol `arch_ftrace_ops_list_func' referenced in expression Included a stub by that name in the misc.c to allow the code to compile and link, even though it's not used. This is similar to what was done for ftrace_stub: b83b43ffc6e4b ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub") Link: https://lkml.kernel.org/r/20211021221627.5d7270de@rorschach.local.home Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: linux-sh@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26net: phy: fixed warning: Function parameter not describedLuo Jie
Fixed warning: Function parameter or member 'enable' not described in 'genphy_c45_fast_retrain' Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20211026102957.17100-1-luoj@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26bpf: Fix potential race in tail call compatibility checkToke Høiland-Jørgensen
Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) v3: - Put lock and the members it protects into an embedded 'owner' struct (Daniel) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
2021-10-26bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NETTejun Heo
bpf_types.h has BPF_MAP_TYPE_INODE_STORAGE and BPF_MAP_TYPE_TASK_STORAGE declared inside #ifdef CONFIG_NET although they are built regardless of CONFIG_NET. So, when CONFIG_BPF_SYSCALL && !CONFIG_NET, they are built without the declarations leading to spurious build failures and not registered to bpf_map_types making them unavailable. Fix it by moving the BPF_MAP_TYPE for the two map types outside of CONFIG_NET. Reported-by: kernel test robot <lkp@intel.com> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/YXG1cuuSJDqHQfRY@slm.duckdns.org
2021-10-26Merge branch 'sock_map: fix ->poll() and update selftests'Alexei Starovoitov
Cong Wang says: ==================== This patchset fixes ->poll() for sockets in sockmap and updates selftests accordingly with select(). Please check each patch for more details. Fixes: c50524ec4e3a ("Merge branch 'sockmap: add sockmap support for unix datagram socket'") Fixes: 89d69c5d0fbc ("Merge branch 'sockmap: introduce BPF_SK_SKB_VERDICT and support UDP'") Acked-by: John Fastabend <john.fastabend@gmail.com> --- v4: add a comment in udp_poll() v3: drop sk_psock_get_checked() reuse tcp_bpf_sock_is_readable() v2: rename and reuse ->stream_memory_read() fix a compile error in sk_psock_get_checked() Cong Wang (3): net: rename ->stream_memory_read to ->sock_is_readable skmsg: extract and reuse sk_msg_is_readable() net: implement ->sock_is_readable() for UDP and AF_UNIX ==================== Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>