summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-13tpm: Opt-in in disable PCR integrity protectionJarkko Sakkinen
The initial HMAC session feature added TPM bus encryption and/or integrity protection to various in-kernel TPM operations. This can cause performance bottlenecks with IMA, as it heavily utilizes PCR extend operations. In order to mitigate this performance issue, introduce a kernel command-line parameter to the TPM driver for disabling the integrity protection for PCR extend operations (i.e. TPM2_PCR_Extend). Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/ Fixes: 6519fea6fd37 ("tpm: add hmac checks to tpm2_pcr_extend()") Tested-by: Mimi Zohar <zohar@linux.ibm.com> Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-13block: don't reorder requests in blk_mq_add_to_batchChristoph Hellwig
LIFO ordering for batched completions is a bit unexpected and also defeats some merging optimizations in e.g. the XFS buffered write code. Now that we can easily add the request to the tail of the list do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig
Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: add a rq_list typeChristoph Hellwig
Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: remove rq_list_moveChristoph Hellwig
Unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13virtio_blk: reverse request order in virtio_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting virtio_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13nvme-pci: reverse request order in nvme_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting nvme_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13btrfs: validate queue limitsChristoph Hellwig
Call blk_validate_limits on the queue limits used for zone append splitting so that calculated values get filled in and any stacking conflicts get cought. Without this there isn't a max_zone_append_sectors limits as of commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors"). Fixes: 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: export blk_validate_limitsChristoph Hellwig
While block drivers do the validation as part of committing them to the queue, users that use the limit outside of a block device context have to validate the limits and fill in the calculated values as well. So far btrfs is the only user of queue limits without a block device, and it has gotten away with that more or less by accident. But with commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") this became fatal for setups that have small max zone append size, as it won't be limited now. Export blk_validate_limits so that it can be called directly from btrfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13e1000: Hold RTNL when e1000_down can be calledJoe Damato
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held. There are a few paths for e1000_down to be called in e1000 where RTNL is not currently being held: - e1000_shutdown (pci shutdown) - e1000_suspend (power management) - e1000_reinit_locked (via e1000_reset_task delayed work) - e1000_io_error_detected (via pci error handler) Hold RTNL in three places to fix this issue: - e1000_reset_task: igc, igb, and e100e all hold rtnl in this path. - e1000_io_error_detected (pci error handler): e1000e and ixgbe hold rtnl in this path. A patch has been posted for igc to do the same [1]. - __e1000_shutdown (which is called from both e1000_shutdown and e1000_suspend): igb, ixgbe, and e1000e all hold rtnl in the same path. The other paths which call e1000_down seemingly hold RTNL and are OK: - e1000_close (ndo_stop) - e1000_change_mtu (ndo_change_mtu) Based on the above analysis and mailing list discussion [2], I believe adding rtnl in the three places mentioned above is correct. Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs") Reported-by: Dmitry Antipov <dmantipov@yandex.ru> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/ Link: https://lore.kernel.org/netdev/20241022215246.307821-3-jdamato@fastly.com/ [1] Link: https://lore.kernel.org/netdev/ZxgVRX7Ne-lTjwiJ@LQ3V64L9R2/ [2] Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igbvf: remove unused spinlockWander Lairson Costa
tx_queue_lock and stats_lock are declared and initialized, but never used. Remove them. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igb: Fix 2 typos in comments in igb_main.cJohnny Park
Fix 2 spelling mistakes in comments in `igb_main.c`. Signed-off-by: Johnny Park <pjohnny0508@gmail.com> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igc: remove autoneg parameter from igc_mac_infoVitaly Lifshits
Since the igc driver doesn't support forced speed configuration and its current related hardware doesn't support it either, there is no use of the mac.autoneg parameter. Moreover, in one case this usage might result in a NULL pointer dereference due to an uninitialized function pointer, phy.ops.force_speed_duplex. Therefore, remove this parameter from the igc code. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ixgbe: Break include dependency cycleDiomidis Spinellis
Header ixgbe_type.h includes ixgbe_mbx.h. Also, header ixgbe_mbx.h included ixgbe_type.h, thus introducing a circular dependency. - Remove ixgbe_mbx.h inclusion from ixgbe_type.h. - ixgbe_mbx.h requires the definition of struct ixgbe_mbx_operations so move its definition there. While at it, add missing argument identifier names. - Add required forward structure declarations. - Include ixgbe_mbx.h in the .c files that need it, for the following reasons: ixgbe_sriov.c uses ixgbe_check_for_msg ixgbe_main.c uses ixgbe_init_mbx_params_pf ixgbe_82599.c uses mbx_ops_generic ixgbe_x540.c uses mbx_ops_generic ixgbe_x550.c uses mbx_ops_generic Signed-off-by: Diomidis Spinellis <dds@aueb.gr> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Unbind the workqueueFrederic Weisbecker
The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: use stack variable for virtchnl_supported_rxdidsJacob Keller
The ice_vc_query_rxdid() function allocates memory to store the virtchnl_supported_rxdids structure used to communicate the bitmap of supported RXDIDs to a VF. This structure is only 8 bytes in size. The function must hold the allocated length on the stack as well as the pointer to the structure which itself is 8 bytes. Allocating this storage on the heap adds unnecessary overhead including a potential error path that must be handled in case kzalloc fails. Because this structure is so small, we're not saving stack space. Additionally, because we must ensure that we free the allocated memory, the return value from ice_vc_send_msg_to_vf() must also be saved in the stack ret variable. Depending on compiler optimization, this means allocating the 8-byte structure is requiring up to 16-bytes of stack memory! Simplify this function to keep the rxdid variable on the stack, saving memory and removing a potential failure exit path from this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: initialize pf->supported_rxdids immediately after loading DDPJacob Keller
The pf->supported_rxdids field is used to populate the list of valid RXDIDs that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The set of supported RXDIDs is dependent on the DDP, and can be read from the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to use this list to validate the requested descriptor ID from the VF when programming the Rx queues. A future update to support VF live migration will also want to validate that the target VF can support the same descriptor ID when migrating. Currently, pf->supported_rxdids is initialized inside the ice_vc_query_rxdid() function. This means that it is only ever initialized if at least one VF actually tries to negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized every time the VF loads and requests the descriptor list. This worked before because the PF only checks pf->suppported_rxdids when programming the Rx queue if the VF actually negotiates the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature. This will be problematic for VF live migration. We need the list of supported Rx descriptor IDs when migrating. It is possible that no VF on the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. Refactor the driver to initialize pf->supported_rxdids during driver initialization after the DDP is loaded. This is simpler, avoids unnecessary duplicate work, and avoids issues with the live migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: only allow Tx promiscuous for multicastBrett Creeley
Currently when any VF is trusted and true promiscuous mode is enabled on the PF, the VF will receive all unicast traffic directed to the device's internal switch. This includes traffic external to the NIC and also from other VSI (i.e. VFs). This does not match the expected behavior as unicast traffic should only be visible from external sources in this case. Disable the Tx promiscuous mode bits for unicast promiscuous mode. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Add support for persistent NAPI configJoe Damato
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: support optional flags in signature segment headerPrzemek Kitszel
An optional flag field has been added to the signature segment header. The field contains two flags, a "valid" bit, and a "last segment" bit that indicates whether the segment is the last segment that will be sent to firmware. If the flag field's valid bit is NOT set, then as was done before, assume that this is the last segment being downloaded. However, if the flag field's valid bit IS set, then use the last segment flag to determine if this segment is the last segment to download. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: refactor "last" segment of DDP pkgPrzemek Kitszel
Add ice_ddp_send_hunk() that buffers "sent FW hunk" calls to AQ in order to mark the "last" one in more elegant way. Next commit will add even more complicated "sent FW" flow, so it's better to untangle a bit before. Note that metadata buffers were not skipped for NOT-@indicate_last segments, this is fixed now. Minor: + use ice_is_buffer_metadata() instead of open coding it in ice_dwnld_cfg_bufs(); + ice_dwnld_cfg_bufs_no_lock() + dependencies were moved up a bit to have better git-diff, as this function was rewritten (in terms of git-blame) CC: Paul Greenwalt <paul.greenwalt@intel.com> CC: Dan Nowlin <dan.nowlin@intel.com> CC: Ahmed Zaki <ahmed.zaki@intel.com> CC: Simon Horman <horms@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: extend dump serdes equalizer values featureMateusz Polchlopek
Extend the work done in commit 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") by adding the new set of Rx registers that can be read using command: $ ethtool -d interface_name Rx equalization parameters are E810 PHY registers used by end user to gather information about configuration and status to debug link and connection issues in the field. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: rework of dump serdes equalizer values featureMateusz Polchlopek
Refactor function ice_get_tx_rx_equa() to iterate over new table of params instead of multiple calls to ice_aq_get_phy_equalization(). Subsequent commit will extend that function by add more serdes equalizer values to dump. Shorten the fields of struct ice_serdes_equalization_to_ethtool for readability purposes. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13jbd2: Fix comment describing journal_init_common()Daniel Martín Gómez
The code indicates that journal_init_common() fills the journal_t object it returns while the comment incorrectly states that only a few fields are initialised. Also, the comment claims that journal structures could be created from scratch which isn't possible as journal_init_common() calls journal_load_superblock() which loads and checks journal superblock from disk. Signed-off-by: Daniel Martín Gómez <dalme@riseup.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13ext4: prevent an infinite loop in the lazyinit threadMathieu Othacehe
Use ktime_get_ns instead of ktime_get_real_ns when computing the lr_timeout not to be affected by system time jumps. Use a boolean instead of the MAX_JIFFY_OFFSET value to determine whether the next_wakeup value has been set. Comparing elr->lr_next_sched to MAX_JIFFY_OFFSET can cause the lazyinit thread to loop indefinitely. Co-developed-by: Lukas Skupinski <lukas.skupinski@landisgyr.com> Signed-off-by: Lukas Skupinski <lukas.skupinski@landisgyr.com> Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241106134741.26948-2-othacehe@gnu.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13ext4: use struct_size() to improve ext4_htree_store_dirent()Thorsten Blum
Inline and use struct_size() to calculate the number of bytes to allocate for new_fn and remove the local variable len. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20241105103353.11590-2-thorsten.blum@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13ext4: annotate struct fname with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20241105101813.10864-2-thorsten.blum@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13jbd2: avoid dozens of -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of a flexible structure (`struct shash_desc`) where the size of the flexible-array member (`__ctx`) is known at compile-time, and refactor the rest of the code, accordingly. So, with this, fix 77 of the following warnings: include/linux/jbd2.h:1800:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/ZyU94w0IALVhc9Jy@kspp Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13ext4: use str_yes_no() helper functionThorsten Blum
Remove hard-coded strings by using the str_yes_no() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20241021100056.5521-2-thorsten.blum@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13Merge tag 'nvme-6.13-2024-11-13' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.13/block Pull NVMe updates from Keith: "nvme updates for Linux 6.13 - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen)" * tag 'nvme-6.13-2024-11-13' of git://git.infradead.org/nvme: (22 commits) nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available nvmet: support for csi identify ns nvmet: implement rotational media information log nvmet: implement endurance groups nvmet: declare 2.1 version compliance nvmet: implement crto property nvmet: implement supported features log nvmet: implement supported log pages nvmet: implement active command set ns list nvmet: implement id ns for nvm command set nvmet: support reservation feature nvme: add reservation command's defines nvme-core: remove repeated wq flags nvmet: make nvmet_wq visible in sysfs nvme-pci: use dma_alloc_noncontigous if possible ...
2024-11-13Merge tag 'v6.13-armsoc/drivers1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt Compatibles for some additional "General Register Files" syscons * tag 'v6.13-armsoc/drivers1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: dt-bindings: soc: rockchip: add rk3588 mipi dcphy syscon dt-bindings: soc: rockchip: add rk3576 usb2phy syscon dt-bindings: soc: rockchip: add rk3576 vo1-grf syscon Link: https://lore.kernel.org/r/4605658.LvFx2qVVIh@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-13Merge tag 'qcom-drivers-for-6.13-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers A few more Qualcomm driver updates for v6.13 Make the Adreno driver invoke the SMMU aperture setup firmware function, which is required to allow the GPU to manage per-process page tables in some firmware versions - as an example Rb3Gen2 has no GPU without this. Add X1E Devkit to the list of devices that has functional EFI variable access through the uefisecapp. Flip the "manual slice configuration quirk" in the Qualcomm LLCC driver, as this only applies to a single platform, and introduce support for QCS8300, QCS615, SAR2130P, and SAR1130P. Lastly, add IPQ5424 and IPQ5404 to the Qualcomm socinfo driver. * tag 'qcom-drivers-for-6.13-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: ice: Remove the device_link field in qcom_ice drm/msm/adreno: Setup SMMU aparture for per-process page table firmware: qcom: scm: Introduce CP_SMMU_APERTURE_ID soc: qcom: socinfo: add IPQ5424/IPQ5404 SoC ID dt-bindings: arm: qcom,ids: add SoC ID for IPQ5424/IPQ5404 soc: qcom: llcc: Flip the manual slice configuration condition dt-bindings: firmware: qcom,scm: Document sm8750 SCM firmware: qcom: uefisecapp: Allow X1E Devkit devices soc: qcom: llcc: Add LLCC configuration for the QCS8300 platform dt-bindings: cache: qcom,llcc: Document the QCS8300 LLCC soc: qcom: llcc: Add configuration data for QCS615 dt-bindings: cache: qcom,llcc: Document the QCS615 LLCC soc: qcom: llcc: add support for SAR2130P and SAR1130P soc: qcom: llcc: use deciman integers for bit shift values dt-bindings: cache: qcom,llcc: document SAR2130P and SAR1130P Link: https://lore.kernel.org/r/20241113032425.356306-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-13fsnotify: fix sending inotify event with unexpected filenameAmir Goldstein
We got a report that adding a fanotify filsystem watch prevents tail -f from receiving events. Reproducer: 1. Create 3 windows / login sessions. Become root in each session. 2. Choose a mounted filesystem that is pretty quiet; I picked /boot. 3. In the first window, run: fsnotifywait -S -m /boot 4. In the second window, run: echo data >> /boot/foo 5. In the third window, run: tail -f /boot/foo 6. Go back to the second window and run: echo more data >> /boot/foo 7. Observe that the tail command doesn't show the new data. 8. In the first window, hit control-C to interrupt fsnotifywait. 9. In the second window, run: echo still more data >> /boot/foo 10. Observe that the tail command in the third window has now printed the missing data. When stracing tail, we observed that when fanotify filesystem mark is set, tail does get the inotify event, but the event is receieved with the filename: read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0foo\0\0\0\0\0\0\0\0\0\0\0\0\0", 50) = 32 This is unexpected, because tail is watching the file itself and not its parent and is inconsistent with the inotify event received by tail when fanotify filesystem mark is not set: read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 50) = 16 The inteference between different fsnotify groups was caused by the fact that the mark on the sb requires the filename, so the filename is passed to fsnotify(). Later on, fsnotify_handle_event() tries to take care of not passing the filename to groups (such as inotify) that are interested in the filename only when the parent is watching. But the logic was incorrect for the case that no group is watching the parent, some groups are watching the sb and some watching the inode. Reported-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-11-13Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye) - Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS context retrieval (Zijian Zhang) - Fix BPF bits iterator selftest for s390x (Hou Tao) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx selftests/bpf: Use -4095 as the bad address for bits iterator
2024-11-13Merge tag 'loongarch-fixes-6.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: - fix possible CPUs setup logical-physical CPU mapping, in order to avoid CPU hotplug issue - fix some KASAN bugs - fix AP booting issue in VM mode - some trivial cleanups * tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Fix AP booting issue in VM mode LoongArch: Add WriteCombine shadow mapping in KASAN LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits LoongArch: Make KASAN work with 5-level page-tables LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS LoongArch: Fix early_numa_add_cpu() usage for FDT systems LoongArch: For all possible CPUs setup logical-physical CPU mapping
2024-11-13Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All singletons" * tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: swapfile: fix cluster reclaim work crash on rotational devices selftests: hugetlb_dio: fixup check for initial conditions to skip in the start mm/thp: fix deferred split queue not partially_mapped: fix mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases nommu: pass NULL argument to vma_iter_prealloc() ocfs2: fix UBSAN warning in ocfs2_verify_volume() nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint mm: page_alloc: move mlocked flag clearance into free_pages_prepare() mm: count zeromap read and set for swapout and swapin
2024-11-13nvmet: add tracing of reservation commandsGuixin Liu
Add tracing of reservation commands, including register, acquire, release and report, and also parse the action and rtype to string to make the trace log more human-readable. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-13nvme: parse reservation commands's action and rtype to stringGuixin Liu
Parse reservation commands's action(including rrega, racqa and rrela) and rtype to string to make the trace log more human-readable. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-13nvmet: report ns's vwc not presentGuixin Liu
Currently, we report that controller has vwc even though the ns may not have vwc. Report ns's vwc not present when not buffered_io or backdev doesn't have vwc. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-13libfs: kill empty_dir_getattr()Al Viro
It's used only to initialize ->getattr in one inode_operations instance (empty_dir_inode_operations) and its behaviour had always been equivalent to what we get with NULL ->getattr. Just remove that initializer, along with empty_dir_getattr() itself. While we are at it, the same instance has ->permission initialized to generic_permission, which is what NULL ->permission ends up doing. Again, no point keeping it. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flagStefan Berger
Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")' introduced the AT_GETATTR_NOSEC flag to ensure that the call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr. Now, simplify the getattr interface functions of filesystems where the flag AT_GETATTR_NOSEC is checked. There is only a single caller of inode_operations getattr function and it is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only one from which the AT_GETATTR_NOSEC flag is passed from. Two filesystems are checking this flag in .getattr and the flag is always passed to them unconditionally from only vfs_getattr_nosec: - ecryptfs: Simplify by always calling vfs_getattr_nosec in ecryptfs_getattr. From there the flag is passed to no other function and this function is not called otherwise. - overlayfs: Simplify by always calling vfs_getattr_nosec in ovl_getattr. From there the flag is passed to no other function and this function is not called otherwise. The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not important inside this function. Also, since no filesystem is checking the flag anymore, remove the flag entirely now, including the BUG_ON check that never triggered. The net change of the changes here combined with the original commit is that ecryptfs and overlayfs do not call vfs_getattr but only vfs_getattr_nosec. Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u Cc: Tyler Hicks <code@tyhicks.com> Cc: ecryptfs@vger.kernel.org Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Amir Goldstein <amir73il@gmail.com> Cc: linux-unionfs@vger.kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13fs/stat.c: switch to CLASS(fd_raw)Al Viro
... and use fd_empty() consistently Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13kill getname_statx_lookup_flags()Al Viro
LOOKUP_EMPTY is ignored by the only remaining user, and without that 'getname_' prefix makes no sense. Remove LOOKUP_EMPTY part, rename to statx_lookup_flags() and make static. It most likely is _not_ statx() specific, either, but that's the next step. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13io_statx_prep(): use getname_uflags()Al Viro
the only thing in flags getname_flags() ever cares about is LOOKUP_EMPTY; anything else is none of its damn business. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13genirq/proc: Use seq_put_decimal_ull_width() for decimal valuesDavid Wang
seq_printf() is more expensive than seq_put_decimal_ull_width() due to the format string parsing costs. Profiling on a x86 8-core system indicates seq_printf() takes ~47% samples of show_interrupts(). Replacing it with seq_put_decimal_ull_width() yields almost 30% performance gain. [ tglx: Massaged changelog and fixed up coding style ] Signed-off-by: David Wang <00107082@163.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241108160717.9547-1-00107082@163.com
2024-11-13statmount: add flag to retrieve unescaped optionsMiklos Szeredi
Filesystem options can be retrieved with STATMOUNT_MNT_OPTS, which returns a string of comma separated options, where some characters are escaped using the \OOO notation. Add a new flag, STATMOUNT_OPT_ARRAY, which instead returns the raw option values separated with '\0' charaters. Since escaped charaters are rare, this inteface is preferable for non-libmount users which likley don't want to deal with option de-escaping. Example code: if (st->mask & STATMOUNT_OPT_ARRAY) { const char *opt = st->str + st->opt_array; for (unsigned int i = 0; i < st->opt_num; i++) { printf("opt_array[%i]: <%s>\n", i, opt); opt += strlen(opt) + 1; } } Example ouput: (1) mnt_opts: <lowerdir+=/l\054w\054r,lowerdir+=/l\054w\054r1,upperdir=/upp\054r,workdir=/w\054rk,redirect_dir=nofollow,uuid=null> (2) opt_array[0]: <lowerdir+=/l,w,r> opt_array[1]: <lowerdir+=/l,w,r1> opt_array[2]: <upperdir=/upp,r> opt_array[3]: <workdir=/w,rk> opt_array[4]: <redirect_dir=nofollow> opt_array[5]: <uuid=null> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20241112101006.30715-1-mszeredi@redhat.com Acked-by: Jeff Layton <jlayton@kernel.org> [brauner: tweak variable naming and parsing add example output] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-13drm/xe: handle flat ccs during hibernation on igpuMatthew Auld
Starting from LNL, CCS has moved over to flat CCS model where there is now dedicated memory reserved for storing compression state. On platforms like LNL this reserved memory lives inside graphics stolen memory, which is not treated like normal RAM and is therefore skipped by the core kernel when creating the hibernation image. Currently if something was compressed and we enter hibernation all the corresponding CCS state is lost on such HW, resulting in corrupted memory. To fix this evict user buffers from TT -> SYSTEM to ensure we take a snapshot of the raw CCS state when entering hibernation, where upon resuming we can restore the raw CCS state back when next validating the buffer. This has been confirmed to fix display corruption on LNL when coming back from hibernation. Fixes: cbdc52c11c9b ("drm/xe/xe2: Support flat ccs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3409 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241112162827.116523-2-matthew.auld@intel.com (cherry picked from commit c8b3c6db941299d7cc31bd9befed3518fdebaf68) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-13drm/xe: improve hibernation on igpuMatthew Auld
The GGTT looks to be stored inside stolen memory on igpu which is not treated as normal RAM. The core kernel skips this memory range when creating the hibernation image, therefore when coming back from hibernation the GGTT programming is lost. This seems to cause issues with broken resume where GuC FW fails to load: [drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1 [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01 [drm] *ERROR* GT0: firmware signature verification failed [drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged. Current GGTT users are kernel internal and tracked as pinned, so it should be possible to hook into the existing save/restore logic that we use for dgpu, where the actual evict is skipped but on restore we importantly restore the GGTT programming. This has been confirmed to fix hibernation on at least ADL and MTL, though likely all igpu platforms are affected. This also means we have a hole in our testing, where the existing s4 tests only really test the driver hooks, and don't go as far as actually rebooting and restoring from the hibernation image and in turn powering down RAM (and therefore losing the contents of stolen). v2 (Brost) - Remove extra newline and drop unnecessary parentheses. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com (cherry picked from commit f2a6b8e396666d97ada8e8759dfb6a69d8df6380) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-13drm/xe: Restore system memory GGTT mappingsMatthew Brost
GGTT mappings reside on the device and this state is lost during suspend / d3cold thus this state must be restored resume regardless if the BO is in system memory or VRAM. v2: - Unnecessary parentheses around bo->placements[0] (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241031182257.2949579-1-matthew.brost@intel.com (cherry picked from commit a19d1db9a3fa89fabd7c83544b84f393ee9b851f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-13drm/xe: Ensure all locks released in exec IOCTLMatthew Brost
In couple of places the wrong error handling goto was used to release locks. Fix these to ensure all locks dropped on exec IOCTL errors. Cc: Francois Dugast <francois.dugast@intel.com> Fixes: d16ef1a18e39 ("drm/xe/exec: Switch hw engine group execution mode upon job submission") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106224944.30130-1-matthew.brost@intel.com (cherry picked from commit 9e7aacd8402b88394e6a83cb242901fde77a1773) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>