summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-15alpha: Replace one-element array with flexible-array memberGustavo A. R. Silva
One-element and zero-length arrays are deprecated. So, replace one-element array in struct osf_dirent with flexible-array member. This results in no differences in binary output. Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZMpZZBShlLqyD3ax@work Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-15sysctl: Use ctl_table_size as stopping criteria for list macroJoel Granados
This is a preparation commit to make it easy to remove the sentinel elements (empty end markers) from the ctl_table arrays. It both allows the systematic removal of the sentinels and adds the ctl_table_size variable to the stopping criteria of the list_for_each_table_entry macro that traverses all ctl_table arrays. Once all the sentinels are removed by subsequent commits, ctl_table_size will become the only stopping criteria in the macro. We don't actually remove any elements in this commit, but it sets things up to for the removal process to take place. By adding header->ctl_table_size as an additional stopping criteria for the list_for_each_table_entry macro, it will execute until it finds an "empty" ->procname or until the size runs out. Therefore if a ctl_table array with a sentinel is passed its size will be too big (by one element) but it will stop on the sentinel. On the other hand, if the ctl_table array without a sentinel is passed its size will be just write and there will be no need for a sentinel. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctlJoel Granados
Replace SIZE_MAX with ARRAY_SIZE in the register_net_sysctl macro. Now that all the callers to register_net_sysctl are actual arrays, we can call ARRAY_SIZE() without any compilation warnings. By calculating the actual array size, this commit is making sure that register_net_sysctl and all its callers forward the table_size into sysctl backend for when the sentinel elements in the ctl_table arrays (last empty markers) are removed. Without it the removal would fail lacking a stopping criteria for traversing the ctl_table arrays. Stopping condition continues to be based on both table size and the procname null test. This is needed in order to allow for the systematic removal al the sentinel element in subsequent commits: Before removing sentinel the stopping criteria will be the last null element. When the sentinel is removed then the (correct) size will take over. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15vrf: Update to register_net_sysctl_szJoel Granados
Move from register_net_sysctl to register_net_sysctl_sz and pass the ARRAY_SIZE of the ctl_table array that was used to create the table variable. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. The actual change from SIZE_MAX to ARRAY_SIZE will take place in subsequent commits. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15networking: Update to register_net_sysctl_szJoel Granados
Move from register_net_sysctl to register_net_sysctl_sz for all the networking related files. Do this while making sure to mirror the NULL assignments with a table_size of zero for the unprivileged users. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. An additional size function was added to the following files in order to calculate the size of an array that is defined in another file: include/net/ipv6.h net/ipv6/icmp.c net/ipv6/route.c net/ipv6/sysctl_net_ipv6.c Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15netfilter: Update to register_net_sysctl_szJoel Granados
Move from register_net_sysctl to register_net_sysctl_sz for all the netfilter related files. Do this while making sure to mirror the NULL assignments with a table_size of zero for the unprivileged users. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15ax.25: Update to register_net_sysctl_szJoel Granados
Move from register_net_sysctl to register_net_sysctl_sz and pass the ARRAY_SIZE of the ctl_table array that was used to create the table variable. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size to register_net_sysctl functionJoel Granados
This commit adds size to the register_net_sysctl indirection function to facilitate the removal of the sentinel elements (last empty markers) from the ctl_table arrays. Though we don't actually remove any sentinels in this commit, register_net_sysctl* now has the capability of forwarding table_size for when that happens. We create a new function register_net_sysctl_sz with an extra size argument. A macro replaces the existing register_net_sysctl. The size in the macro is SIZE_MAX instead of ARRAY_SIZE to avoid compilation errors while we systematically migrate to register_net_sysctl_sz. Will change to ARRAY_SIZE in subsequent commits. Care is taken to add table_size to the stopping criteria in such a way that when we remove the empty sentinel element, it will continue stopping in the last element of the ctl_table array. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size arg to __register_sysctl_initJoel Granados
This commit adds table_size to __register_sysctl_init in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by calculating the ctl_table array size with ARRAY_SIZE. We add a table_size argument to __register_sysctl_init and modify the register_sysctl_init macro to calculate the array size with ARRAY_SIZE. The original callers do not need to be updated as they will go through the new macro. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size to register_sysctlJoel Granados
This commit adds table_size to register_sysctl in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by either passing the table_size explicitly or using ARRAY_SIZE on the ctl_table arrays. We replace the register_syctl function with a macro that will add the ARRAY_SIZE to the new register_sysctl_sz function. In this way the callers that are already using an array of ctl_table structs do not change. For the callers that pass a ctl_table array pointer, we pass the table_size to register_sysctl_sz instead of the macro. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add a size arg to __register_sysctl_tableJoel Granados
We make these changes in order to prepare __register_sysctl_table and its callers for when we remove the sentinel element (empty element at the end of ctl_table arrays). We don't actually remove any sentinels in this commit, but we *do* make sure to use ARRAY_SIZE so the table_size is available when the removal occurs. We add a table_size argument to __register_sysctl_table and adjust callers, all of which pass ctl_table pointers and need an explicit call to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in order to forward the size of the array pointer received from the network register calls. The new table_size argument does not yet have any effect in the init_header call which is still dependent on the sentinel's presence. table_size *does* however drive the `kzalloc` allocation in __register_sysctl_table with no adverse effects as the allocated memory is either one element greater than the calculated ctl_table array (for the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size of the calculated ctl_table array (for the call from sysctl_net.c and register_sysctl). This approach will allows us to "just" remove the sentinel without further changes to __register_sysctl_table as table_size will represent the exact size for all the callers at that point. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size argument to init_headerJoel Granados
In this commit, we add a table_size argument to the init_header function in order to initialize the ctl_table_size variable in ctl_table_header. Even though the size is not yet used, it is now initialized within the sysctl subsys. We need this commit for when we start adding the table_size arguments to the sysctl functions (e.g. register_sysctl, __register_sysctl_table and __register_sysctl_init). Note that in __register_sysctl_table we temporarily use a calculated size until we add the size argument to that function in subsequent commits. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add ctl_table_size to ctl_table_headerJoel Granados
The new ctl_table_size element will hold the size of the ctl_table arrays contained in the ctl_table_header. This value should eventually be passed by the callers to the sysctl register infrastructure. And while this commit introduces the variable, it does not set nor use it because that requires case by case considerations for each caller. It provides two important things: (1) A place to put the result of the ctl_table array calculation when it gets introduced for each caller. And (2) the size that will be used as the additional stopping criteria in the list_for_each_table_entry macro (to be added when all the callers are migrated) Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Use ctl_table_header in list_for_each_table_entryJoel Granados
We replace the ctl_table with the ctl_table_header pointer in list_for_each_table_entry which is the macro responsible for traversing the ctl_table arrays. This is a preparation commit that will make it easier to add the ctl_table array size (that will be added to ctl_table_header in subsequent commits) to the already existing loop logic based on empty ctl_table elements (so called sentinels). Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Prefer ctl_table_header in proc_sysctlJoel Granados
This is a preparation commit that replaces ctl_table with ctl_table_header as the pointer that is passed around in proc_sysctl.c. This will become necessary in subsequent commits when the size of the ctl_table array can no longer be calculated by searching for an empty sentinel (last empty ctl_table element) but will be carried along inside the ctl_table_header struct. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-16netfilter: nft_dynset: disallow object mapsPablo Neira Ayuso
Do not allow to insert elements from datapath to objects maps. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: GC transaction race with netns dismantlePablo Neira Ayuso
Use maybe_get_net() since GC workqueue might race with netns exit path. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix GC transaction races with netns and netlink event ↵Pablo Neira Ayuso
exit path Netlink event path is missing a synchronization point with GC transactions. Add GC sequence number update to netns release path and netlink event path, any GC transaction losing race will be discarded. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16ipvs: fix racy memcpy in proc_do_sync_thresholdSishuai Gong
When two threads run proc_do_sync_threshold() in parallel, data races could happen between the two memcpy(): Thread-1 Thread-2 memcpy(val, valp, sizeof(val)); memcpy(valp, val, sizeof(val)); This race might mess up the (struct ctl_table *) table->data, so we add a mutex lock to serialize them. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.com/ Signed-off-by: Sishuai Gong <sishuai.system@gmail.com> Acked-by: Simon Horman <horms@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: set default timeout to 3 secs for sctp shutdown send and recv stateXin Long
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300 msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state. As Paolo Valerio noticed, this might cause unwanted expiration of the ct entry. In my test, with 1s tc netem delay set on the NAT path, after the SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is sent back from the peer, the sctp ct entry has expired and been deleted, and then the SHUTDOWN_ACK has to be dropped. Also, it is confusing these two sysctl options always show 0 due to all timeout values using sec as unit: net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0 net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0 This patch fixes it by also using 3 secs for sctp shutdown send and recv state in sctp conntrack, which is also RTO.initial value in SCTP protocol. Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV was probably used for a rare scenario where SHUTDOWN is sent on 1st path but SHUTDOWN_ACK is replied on 2nd path, then a new connection started immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV to CLOSE when receiving INIT in the ORIGINAL direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: don't fail inserts if duplicate has expiredFlorian Westphal
nftables selftests fail: run-tests.sh testcases/sets/0044interval_overlap_0 Expected: 0-2 . 0-3, got: W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1 Insertion must ignore duplicate but expired entries. Moreover, there is a strange asymmetry in nft_pipapo_activate: It refetches the current element, whereas the other ->activate callbacks (bitmap, hash, rhash, rbtree) use elem->priv. Same for .remove: other set implementations take elem->priv, nft_pipapo_remove fetches elem->priv, then does a relookup, remove this. I suspect this was the reason for the change that prompted the removal of the expired check in pipapo_get() in the first place, but skipping exired elements there makes no sense to me, this helper is used for normal get requests, insertions (duplicate check) and deactivate callback. In first two cases expired elements must be skipped. For ->deactivate(), this gets called for DELSETELEM, so it seems to me that expired elements should be skipped as well, i.e. delete request should fail with -ENOENT error. Fixes: 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: deactivate catchall elements in next generationFlorian Westphal
When flushing, individual set elements are disabled in the next generation via the ->flush callback. Catchall elements are not disabled. This is incorrect and may lead to double-deactivations of catchall elements which then results in memory leaks: WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730 CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60 RIP: 0010:nft_map_deactivate+0x549/0x730 [..] ? nft_map_deactivate+0x549/0x730 nf_tables_delset+0xb66/0xeb0 (the warn is due to nft_use_dec() detecting underflow). Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix kdoc warnings after gc reworkFlorian Westphal
Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set' Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix false-positive lockdep splatFlorian Westphal
->abort invocation may cause splat on debug kernels: WARNING: suspicious RCU usage net/netfilter/nft_set_pipapo.c:1697 suspicious rcu_dereference_check() usage! [..] rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/133554: [..] (nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid [..] lockdep_rcu_suspicious+0x1ad/0x260 nft_pipapo_abort+0x145/0x180 __nf_tables_abort+0x5359/0x63d0 nf_tables_abort+0x24/0x40 nfnetlink_rcv+0x1a0a/0x22c0 netlink_unicast+0x73c/0x900 netlink_sendmsg+0x7f0/0xc20 ____sys_sendmsg+0x48d/0x760 Transaction mutex is held, so parallel updates are not possible. Switch to _protected and check mutex is held for lockdep enabled builds. Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-15Merge branch 'genetlink-provide-struct-genl_info-to-dumps'Jakub Kicinski
Jakub Kicinski says: ==================== genetlink: provide struct genl_info to dumps One of the biggest (which is not to say only) annoyances with genetlink handling today is that doit and dumpit need some of the same information, but it is passed to them in completely different structs. The implementations commonly end up writing a _fill() method which populates a message and have to pass at least 6 parameters. 3 of which are extracted manually from request info. After a lot of umming and ahing I decided to populate struct genl_info for dumps, without trying to factor out only the common parts. This makes the adoption easiest. In the future we may add a new version of dump which takes struct genl_info *info as the second argument, instead of struct netlink_callback *cb. For now developers have to call genl_info_dump(cb) to get the info. Typical genetlink families no longer get exposed to netlink protocol internals like pid and seq numbers. v3: - correct the condition in ethtool code (patch 10) v2: https://lore.kernel.org/all/20230810233845.2318049-1-kuba@kernel.org/ - replace the GENL_INFO_NTF() macro with init helper - fix the commit messages v1: https://lore.kernel.org/all/20230809182648.1816537-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230814214723.2924989-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15ethtool: netlink: always pass genl_info to .prepare_dataJakub Kicinski
We had a number of bugs in the past because developers forgot to fully test dumps, which pass NULL as info to .prepare_data. .prepare_data implementations would try to access info->extack leading to a null-deref. Now that dumps and notifications can access struct genl_info we can pass it in, and remove the info null checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # pause Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15ethtool: netlink: simplify arguments to ethnl_default_parse()Jakub Kicinski
Pass struct genl_info directly instead of its members. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15netdev-genl: use struct genl_info for reply constructionJakub Kicinski
Use the just added APIs to make the code simpler. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: add genlmsg_iput() APIJakub Kicinski
Add some APIs and helpers required for convenient construction of replies and notifications based on struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: add a family pointer to struct genl_infoJakub Kicinski
Having family in struct genl_info is quite useful. It cuts down the number of arguments which need to be passed to helpers which already take struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: use attrs from struct genl_infoJakub Kicinski
Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15hardening: Move BUG_ON_DATA_CORRUPTION to hardening optionsMarco Elver
BUG_ON_DATA_CORRUPTION is turning detected corruptions of list data structures from WARNings into BUGs. This can be useful to stop further corruptions or even exploitation attempts. However, the option has less to do with debugging than with hardening. With the introduction of LIST_HARDENED, it makes more sense to move it to the hardening options, where it selects LIST_HARDENED instead. Without this change, combining BUG_ON_DATA_CORRUPTION with LIST_HARDENED alone wouldn't be possible, because DEBUG_LIST would always be selected by BUG_ON_DATA_CORRUPTION. Signed-off-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20230811151847.1594958-4-elver@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-15list: Introduce CONFIG_LIST_HARDENEDMarco Elver
Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The motivation behind this is that the option can be used as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025 are mitigated by the option [4]). The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel. Introduce CONFIG_LIST_HARDENED, which performs list pointer checking inline, and only upon list corruption calls the reporting slow path. To generate optimal machine code with CONFIG_LIST_HARDENED: 1. Elide checking for pointer values which upon dereference would result in an immediate access fault (i.e. minimal hardening checks). The trade-off is lower-quality error reports. 2. Use the __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path. Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case. 3. Because the inline checks are a subset of the full set of checks in __list_*_valid_or_report(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path. As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks. Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the Kconfig variables are changed to reflect that: DEBUG_LIST selects LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on DEBUG_LIST. Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases). Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4] Signed-off-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-15list_debug: Introduce inline wrappers for debug checksMarco Elver
Turn the list debug checking functions __list_*_valid() into inline functions that wrap the out-of-line functions. Care is taken to ensure the inline wrappers are always inlined, so that additional compiler instrumentation (such as sanitizers) does not result in redundant outlining. This change is preparation for performing checks in the inline wrappers. No functional change intended. Signed-off-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20230811151847.1594958-2-elver@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-15compiler_types: Introduce the Clang __preserve_most function attributeMarco Elver
[1]: "On X86-64 and AArch64 targets, this attribute changes the calling convention of a function. The preserve_most calling convention attempts to make the code in the caller as unintrusive as possible. This convention behaves identically to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This alleviates the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn't apply for values returned in callee-saved registers. * On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Floating-point registers (XMMs/YMMs) are not preserved and need to be saved by the caller. * On AArch64 the callee preserve all general purpose registers, except x0-X8 and X16-X18." [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most Introduce the attribute to compiler_types.h as __preserve_most. Use of this attribute results in better code generation for calls to very rarely called functions, such as error-reporting functions, or rarely executed slow paths. Beware that the attribute conflicts with instrumentation calls inserted on function entry which do not use __preserve_most themselves. Notably, function tracing which assumes the normal C calling convention for the given architecture. Where the attribute is supported, __preserve_most will imply notrace. It is recommended to restrict use of the attribute to functions that should or already disable tracing. Note: The additional preprocessor check against architecture should not be necessary if __has_attribute() only returns true where supported; also see https://github.com/ClangBuiltLinux/linux/issues/1908. But until __has_attribute() does the right thing, we also guard by known-supported architectures to avoid build warnings on other architectures. The attribute may be supported by a future GCC version (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899). Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230811151847.1594958-1-elver@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-15genetlink: add struct genl_info to struct genl_dumpit_infoJakub Kicinski
Netlink GET implementations must currently juggle struct genl_info and struct netlink_callback, depending on whether they were called from doit or dumpit. Add genl_info to the dump state and populate the fields. This way implementations can simply pass struct genl_info around. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: remove userhdr from struct genl_infoJakub Kicinski
Only three families use info->userhdr today and going forward we discourage using fixed headers in new families. So having the pointer to user header in struct genl_info is an overkill. Compute the header pointer at runtime. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: make genl_info->nlhdr constJakub Kicinski
struct netlink_callback has a const nlh pointer, make the pointer in struct genl_info const as well, to make copying between the two easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: push conditional locking into dumpit/doneJakub Kicinski
Add helpers which take/release the genl mutex based on family->parallel_ops. Remove the separation between handling of ops in locked and parallel families. Future patches would make the duplicated code grow even more. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15fbdev: goldfishfb: Do not check 0 for platform_get_irq()Zhu Wang
Since platform_get_irq() never returned zero, so it need not to check whether it returned zero, and we use the return error code of platform_get_irq() to replace the current return error code. Please refer to the commit a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") to get that platform_get_irq() never returned zero. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15fbdev: atmel_lcdfb: Remove redundant of_match_ptr()Ruan Jinjie
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15fbdev: kyro: Remove unused declarationsYue Haibing
These declarations is never implemented since the beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15net: Fix slab-out-of-bounds in inet[6]_steal_sockLorenz Bauer
Kumar reported a KASAN splat in tcp_v6_rcv: bash-5.2# ./test_progs -t btf_skc_cls_ingress ... [ 51.810085] BUG: KASAN: slab-out-of-bounds in tcp_v6_rcv+0x2d7d/0x3440 [ 51.810458] Read of size 2 at addr ffff8881053f038c by task test_progs/226 The problem is that inet[6]_steal_sock accesses sk->sk_protocol without accounting for request or timewait sockets. To fix this we can't just check sock_common->skc_reuseport since that flag is present on timewait sockets. Instead, add a fullsock check to avoid the out of bands access of sk_protocol. Fixes: 9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230815-bpf-next-v2-1-95126eaa4c1b@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-15Merge tag 'parisc-for-6.5-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix the parisc TLB ptlock checks so that they can be enabled together with the lightweight spinlock checks" * tag 'parisc-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix CONFIG_TLB_PTLOCK to work with lightweight spinlock checks
2023-08-15Merge tag '6.5-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Three smb client fixes, all for stable: - fix for oops in unmount race with lease break of deferred close - debugging improvement for reconnect - fix for fscache deadlock (folio_wait_bit_common hang)" * tag '6.5-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: display network namespace in debug information cifs: Release folio lock on fscache read hit. cifs: fix potential oops in cifs_oplock_break
2023-08-15Merge tag 'regulator-fix-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two small driver specific fixes: one incorrect definition for one of the Qualcomm regulators and better handling of poorly formed DTs in the DA9063 driver" * tag 'regulator-fix-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: qcom-rpmh: Fix LDO 12 regulator for PM8550 regulator: da9063: better fix null deref with partial DT
2023-08-15net: fix the RTO timer retransmitting skb every 1ms if linear option is enabledJason Xing
In the real workload, I encountered an issue which could cause the RTO timer to retransmit the skb per 1ms with linear option enabled. The amount of lost-retransmitted skbs can go up to 1000+ instantly. The root cause is that if the icsk_rto happens to be zero in the 6th round (which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero due to the changed calculation method in tcp_retransmit_timer() as follows: icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); Above line could be converted to icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0 Therefore, the timer expires so quickly without any doubt. I read through the RFC 6298 and found that the RTO value can be rounded up to a certain value, in Linux, say TCP_RTO_MIN as default, which is regarded as the lower bound in this patch as suggested by Eric. Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-15spi: tegra114: Remove unnecessary NULL-pointer checksAlexander Danilenko
cs_setup, cs_hold and cs_inactive points to fields of spi_device struct, so there is no sense in checking them for NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 04e6bb0d6bb1 ("spi: modify set_cs_timing parameter") Signed-off-by: Alexander Danilenko <al.b.danilenko@gmail.com> Link: https://lore.kernel.org/r/20230815092058.4083-1-al.b.danilenko@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-08-15gpio: sim: replace memmove() + strstrip() with skip_spaces() + strim()Bartosz Golaszewski
Turns out we can avoid the memmove() by using skip_spaces() and strim(). We did that in gpio-consumer, let's do it in gpio-sim. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2023-08-15Merge tag 'asoc-fix-v6.5-rc6' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.5 A fairly large collection of fixes here, mostly SOF and Intel related. The one core fix is Hans' change which reduces the log spam when working out new use cases for DPCM.