summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-27Merge branch 'Implement formatted output helpers with bstr_printf'Alexei Starovoitov
Florent Revest says: ==================== BPF's formatted output helpers are currently implemented with snprintf-like functions which use variadic arguments. The types of all arguments need to be known at compilation time. BPF_CAST_FMT_ARG casts all arguments to the size they should be (known at runtime), but the C type promotion rules cast them back to u64s. On 32 bit architectures, this can cause misaligned va_lists and generate mangled output. This series refactors these helpers to avoid variadic arguments. It uses a "binary printf" instead, where arguments are passed in a buffer constructed at runtime. --- Changes in v2: - Reworded the second patch's description to better describe how arguments get mangled on 32 bit architectures ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-04-27bpf: Implement formatted output helpers with bstr_printfFlorent Revest
BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf and bpf_snprintf. Their signatures specify that all arguments are provided from the BPF world as u64s (in an array or as registers). All of these helpers are currently implemented by calling functions such as snprintf() whose signatures take a variable number of arguments, then placed in a va_list by the compiler to call vsnprintf(). "d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced a bpf_printf_prepare function that fills an array of u64 sanitized arguments with an array of "modifiers" which indicate what the "real" size of each argument should be (given by the format specifier). The BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to its real size. However, the C promotion rules implicitely cast them all back to u64s. Therefore, the arguments given to snprintf are u64s and the va_list constructed by the compiler will use 64 bits for each argument. On 64 bit machines, this happens to work well because 32 bit arguments in va_lists need to occupy 64 bits anyway, but on 32 bit architectures this breaks the layout of the va_list expected by the called function and mangles values. In "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem had been solved for bpf_trace_printk only with a "horrid workaround" that emitted multiple calls to trace_printk where each call had different argument types and generated different va_list layouts. One of the call would be dynamically chosen at runtime. This was ok with the 3 arguments that bpf_trace_printk takes but bpf_seq_printf and bpf_snprintf accept up to 12 arguments. Because this approach scales code exponentially, it is not a viable option anymore. Because the promotion rules are part of the language and because the construction of a va_list is an arch-specific ABI, it's best to just avoid variadic arguments and va_lists altogether. Thankfully the kernel's snprintf() has an alternative in the form of bstr_printf() that accepts arguments in a "binary buffer representation". These binary buffers are currently created by vbin_printf and used in the tracing subsystem to split the cost of printing into two parts: a fast one that only dereferences and remembers values, and a slower one, called later, that does the pretty-printing. This patch refactors bpf_printf_prepare to construct binary buffers of arguments consumable by bstr_printf() instead of arrays of arguments and modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the bpf_printf_prepare usage but there are a few gotchas that change how bpf_printf_prepare needs to do things. Currently, bpf_printf_prepare uses a per cpu temporary buffer as a generic storage for strings and IP addresses. With this refactoring, the temporary buffers now holds all the arguments in a structured binary format. To comply with the format expected by bstr_printf, certain format specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6. Because vsnprintf subroutines for these specifiers are hard to expose, we pre-format these arguments with calls to snprintf(). Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
2021-04-27PCI/sysfs: Convert "reset" to static attributeKrzysztof Wilczyński
The "reset" sysfs attribute allows for resetting a PCI function. Previously it was dynamically created either by pci_bus_add_device() or the pci_sysfs_init() initcall, but since it doesn't need to be created or removed dynamically, we can use a static attribute so the device model takes care of addition and removal automatically. Convert "reset" to a static attribute and use the .is_visible() callback to check whether the device supports reset. Clear reset_fn in pci_stop_dev() instead of pci_remove_capabilities_sysfs() since we no longer explicitly remove the "reset" sysfs file. [bhelgaas: commit log] Suggested-by: Oliver O'Halloran <oohall@gmail.com> Link: https://lore.kernel.org/r/20210416205856.3234481-4-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-27PCI/sysfs: Convert "rom" to static attributeKrzysztof Wilczyński
The "rom" sysfs attribute allows access to the PCI Option ROM. Previously it was dynamically created either by pci_bus_add_device() or the pci_sysfs_init() initcall, but since it doesn't need to be created or removed dynamically, we can use a static attribute so the device model takes care of addition and removal automatically. Convert "rom" to a static attribute and use the .is_bin_visible() callback to set the correct object size based on the ROM size. Remove "rom_attr" from the struct pci_dev since it is no longer needed. This attribute was added in the pre-git era by https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2 [bhelgaas: commit log] Suggested-by: Oliver O'Halloran <oohall@gmail.com> Link: https://lore.kernel.org/r/20210416205856.3234481-3-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-27PCI/sysfs: Convert "config" to static attributeKrzysztof Wilczyński
The "config" sysfs attribute allows access to either the legacy (PCI and PCI-X Mode 1) or the extended (PCI-X Mode 2 and PCIe) device configuration space. Previously it was dynamically created either when a device was added (for hot-added devices) or via a late_initcall (for devices present at boot): pci_bus_add_devices pci_bus_add_device pci_create_sysfs_dev_files if (!sysfs_initialized) return sysfs_create_bin_file # for hot-added devices pci_sysfs_init # late_initcall sysfs_initialized = 1 for_each_pci_dev(pdev) pci_create_sysfs_dev_files(pdev) # for devices present at boot And dynamically removed when the PCI device is stopped and removed: pci_stop_bus_device pci_stop_dev pci_remove_sysfs_dev_files sysfs_remove_bin_file This attribute does not need to be created or removed dynamically, so we can use a static attribute so the device model takes care of addition and removal automatically. Convert "config" to a static attribute and use the .is_bin_visible() callback to set the correct object size (either 256 bytes or 4 KiB) at runtime. The pci_sysfs_init() scheme was added in the pre-git era by https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2 [bhelgaas: commit log] Suggested-by: Oliver O'Halloran <oohall@gmail.com> Link: https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com Link: https://lore.kernel.org/r/20210416205856.3234481-2-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-27seq_file: Add a seq_bprintf functionFlorent Revest
Similarly to seq_buf_bprintf in lib/seq_buf.c, this function writes a printf formatted string with arguments provided in a "binary representation" built by functions such as vbin_printf. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-2-revest@chromium.org
2021-04-27drm/dp_mst: Convert drm_dp_mst_topology.c to drm_err()/drm_dbg*()Lyude Paul
And finally, convert all of the code in drm_dp_mst_topology.c over to using drm_err() and drm_dbg*(). Note that this refactor would have been a lot more complicated to have tried writing a coccinelle script for, so this whole thing was done by hand. v2: * Fix line-wrapping in drm_dp_mst_atomic_check_mstb_bw_limit() Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Robert Foss <robert.foss@linaro.org> Reviewed-by: Robert Foss <robert.foss@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-18-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Convert drm_dp_dual_mode_helper.c to using ↵Lyude Paul
drm_err/drm_dbg_kms() Next step in the conversion, move everything in drm_dp_dual_mode_helper.c over to using drm_err() and drm_dbg_kms(). This was done using the following cocci script: @@ expression list expr; @@ ( - DRM_DEBUG_KMS(expr); + drm_dbg_kms(dev, expr); | - DRM_ERROR(expr); + drm_err(dev, expr); ) And correcting the indentation of the resulting code by hand. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-17-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Convert drm_dp_helper.c to using drm_err/drm_dbg_*()Lyude Paul
Now that we've added a back-pointer to drm_device to drm_dp_aux, made drm_dp_aux available to any functions in drm_dp_helper.c which need to print to the kernel log, and ensured all of our logging uses a consistent format, let's do the final step of the conversion and actually move everything over to using drm_err() and drm_dbg_*(). This was done by using the following cocci script: @@ expression list expr; @@ ( - DRM_DEBUG_KMS(expr); + drm_dbg_kms(aux->drm_dev, expr); | - DRM_DEBUG_DP(expr); + drm_dbg_dp(aux->drm_dev, expr); | - DRM_DEBUG_ATOMIC(expr); + drm_dbg_atomic(aux->drm_dev, expr); | - DRM_DEBUG_KMS_RATELIMITED(expr); + drm_dbg_kms_ratelimited(aux->drm_dev, expr); | - DRM_ERROR(expr); + drm_err(aux->drm_dev, expr); ) Followed by correcting the resulting line-wrapping in the results by hand. v2: * Fix indenting in drm_dp_dump_access Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Robert Foss <robert.foss@linaro.org> Reviewed-by: Robert Foss <robert.foss@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-16-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/print: Handle potentially NULL drm_devices in drm_dbg_*Lyude Paul
While this shouldn't really be something that happens all that often, since we're going to be using the drm_dbg_* log helpers in DRM helpers it's technically possible that a driver could use an AUX adapter before it's been associated with it's respective drm_device. While drivers should take care to avoid this, there's likely going to be situations where it's difficult to workaround. And since other logging helpers in the kernel tend to be OK with NULL pointers (for instance, passing a NULL pointer to a "%s" argument for a printk-like function in the kernel doesn't break anything), we should do the same for ours. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-15-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_mst: Pass drm_dp_mst_topology_mgr to drm_dp_get_vc_payload_bw()Lyude Paul
Since this is one of the few functions in drm_dp_mst_topology.c that doesn't have any way of getting access to a drm_device, let's pass the drm_dp_mst_topology_mgr down to this function so that it can use drm_dbg_kms(). Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-14-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Pass drm_device to drm_lspcon_(get|set)_mode()Lyude Paul
So that we can start using drm_dbg_*() throughout the DRM DP helpers. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-13-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Pass drm_device to drm_dp_dual_mode_get_tmds_output()Lyude Paul
Another function to pass drm_device * down to so we can start using the drm_dbg_*() in the DRM DP helpers. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-12-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Pass drm_device to drm_dp_dual_mode_max_tmds_clock()Lyude Paul
Another function we need to pass drm_device down to in order to start using drm_dbg_*(). Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-11-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Pass drm_device to drm_dp_dual_mode_set_tmds_output()Lyude Paul
Another function that we'll need to pass a drm_device (and not drm_dp_aux) down to so that we can move over to using drm_dbg_*(). Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-10-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp_dual_mode: Pass drm_device to drm_dp_dual_mode_detect()Lyude Paul
Since we're about to be using drm_dbg_*() throughout the DP helpers, we'll need to be able to access the DRM device in the dual mode DP helpers as well. Note however that since drm_dp_dual_mode_detect() can be called with DDC adapters that aren't part of a drm_dp_aux struct, we need to pass down the drm_device to these functions instead of using drm_dp_aux. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-9-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Always print aux channel name in logsLyude Paul
Since we're about to convert everything in drm_dp_helper.c over to using drm_dbg_*(), let's also make our logging more consistent in drm_dp_helper.c while we're at it to ensure that we always print the name of the AUX channel in question. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-8-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Pass drm_dp_aux to drm_dp*_link_train_channel_eq_delay()Lyude Paul
So that we can start using drm_dbg_*() for drm_dp_link_train_channel_eq_delay() and drm_dp_lttpr_link_train_channel_eq_delay(). Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-7-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Pass drm_dp_aux to drm_dp_link_train_clock_recovery_delay()Lyude Paul
So that we can start using drm_dbg_*() in drm_dp_link_train_clock_recovery_delay(). Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-6-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Clarify DP AUX registration timeLyude Paul
The docs we had for drm_dp_aux_init() and drm_dp_aux_register() were mostly correct, except for the fact that they made the assumption that all AUX devices were grandchildren of their respective DRM devices. This is the case for most normal GPUs, but is almost never the case with SoCs and display bridges. So, let's fix this documentation to clarify when the right time to use drm_dp_aux_init() or drm_dp_aux_register() is. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-5-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/dp: Add backpointer to drm_device in drm_dp_auxLyude Paul
This is something that we've wanted for a while now: the ability to actually look up the respective drm_device for a given drm_dp_aux struct. This will also allow us to transition over to using the drm_dbg_*() helpers for debug message printing, as we'll finally have a drm_device to reference for doing so. Note that there is one limitation with this - because some DP AUX adapters exist as platform devices which are initialized independently of their respective DRM devices, one cannot rely on drm_dp_aux->drm_dev to always be non-NULL until drm_dp_aux_register() has been called. We make sure to point this out in the documentation for struct drm_dp_aux. v3: * Add WARN_ON_ONCE() to drm_dp_aux_register() if drm_dev isn't filled out Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-4-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/nouveau/kms/nv50-: Move AUX adapter reg to connector late register/early ↵Lyude Paul
unregister Since AUX adapters on nouveau have their respective DRM connectors as parents, we need to make sure that we register then after their connectors. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-3-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27drm/bridge/cdns-mhdp8546: Register DP aux channel with userspaceLyude Paul
Just adds some missing calls to drm_dp_aux_register()/drm_dp_aux_unregister() for when we attach/detach the bridge. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-2-lyude@redhat.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queuesIgnat Korchagin
efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is later used to allocate and traverse efx->xdp_tx_queues lookup array. However, we may end up not initializing all the array slots with real queues during probing. This results, for example, in a NULL pointer dereference, when running "# ethtool -S <iface>", similar to below [2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [2570283.681283][T4126959] #PF: supervisor read access in kernel mode [2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page [2570283.710013][T4126959] PGD 0 P4D 0 [2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI [2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G O 5.10.20-cloudflare-2021.3.1 #1 [2570283.752641][T4126959] Hardware name: <redacted> [2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc] [2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b [2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202 [2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018 [2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005 [2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f [2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8 [2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c [2570283.922705][T4126959] FS: 00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000 [2570283.938848][T4126959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0 [2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [2570283.997308][T4126959] PKRU: 55555554 [2570284.007649][T4126959] Call Trace: [2570284.017598][T4126959] dev_ethtool+0x1832/0x2830 Fix this by adjusting efx->xdp_tx_queue_count after probing to reflect the true value of initialized slots in efx->xdp_tx_queues. Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues") Cc: <stable@vger.kernel.org> # 5.12.x Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net:nfc:digital: Fix a double free in digital_tg_recv_dep_reqLv Yunlong
In digital_tg_recv_dep_req, it calls nfc_tm_data_received(..,resp). If nfc_tm_data_received() failed, the callee will free the resp via kfree_skb() and return error. But in the exit branch, the resp will be freed again. My patch sets resp to NULL if nfc_tm_data_received() failed, to avoid the double free. Fixes: 1c7a4c24fbfd9 ("NFC Digital: Add target NFC-DEP support") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add support for the catch-all set element. This special element can be used to define a default action to be applied in case that the set lookup returns no matching element. 2) Fix incorrect #ifdef dependencies in the nftables cgroupsv2 support, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27libceph: don't set global_id until we get an auth ticketIlya Dryomov
With the introduction of enforcing mode, setting global_id as soon as we get it in the first MAuth reply will result in EACCES if the connection is reset before we get the second MAuth reply containing an auth ticket -- because on retry we would attempt to reclaim that global_id with no auth ticket at hand. Neither ceph_auth_client nor ceph_mon_client depend on global_id being set ealy, so just delay the setting until we get and process the second MAuth reply. While at it, complain if the monitor sends a zero global_id or changes our global_id as the session is likely to fail after that. Cc: stable@vger.kernel.org # needs backporting for < 5.11 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2021-04-27libceph: bump CephXAuthenticate encoding versionIlya Dryomov
A dummy v3 encoding (exactly the same as v2) was introduced so that the monitors can distinguish broken clients that may not include their auth ticket in CEPHX_GET_AUTH_SESSION_KEY request on reconnects, thus failing to prove previous possession of their global_id (one part of CVE-2021-20288). The kernel client has always included its auth ticket, so it is compatible with enforcing mode as is. However we want to bump the encoding version to avoid having to authenticate twice on the initial connect -- all legacy (CephXAuthenticate < v3) are now forced do so in order to expose insecure global_id reclaim. Marking for stable since at least for 5.11 and 5.12 it is trivial (v2 -> v3). Cc: stable@vger.kernel.org # 5.11+ URL: https://tracker.ceph.com/issues/50452 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2021-04-27ceph: don't allow access to MDS-private inodesJeff Layton
The MDS reserves a set of inodes for its own usage, and these should never be accessible to clients. Add a new helper to vet a proposed inode number against that range, and complain loudly and refuse to create or look it up if it's in it. Also, ensure that the MDS doesn't try to delegate inodes that are in that range or lower. Print a warning if it does, and don't save the range in the xarray. URL: https://tracker.ceph.com/issues/49922 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix up some bare fetches of i_sizeJeff Layton
We need to use i_size_read(), which properly handles the torn read case on 32-bit arches. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: convert some PAGE_SIZE invocations to thp_size()Jeff Layton
Start preparing to allow the use of THPs in the pagecache with ceph by making it use thp_size() in lieu of PAGE_SIZE in the appropriate places. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: support getting ceph.dir.rsnaps vxattrYanhu Cao
Add support for grabbing the rsnaps value out of the inode info in traces, and exposing that via ceph.dir.rsnaps xattr. Signed-off-by: Yanhu Cao <gmayyyha@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: drop pinned_page parameter from ceph_get_capsJeff Layton
All of the existing callers that don't set this to NULL just drop the page reference at some arbitrary point later in processing. There's no point in keeping a page reference that we don't use, so just drop the reference immediately after checking the Uptodate flag. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix inode leak on getattr error in __fh_to_dentryJeff Layton
Fixes: 878dabb64117 ("ceph: don't return -ESTALE if there's still an open file") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: only check pool permissions for regular filesJeff Layton
There is no need to do a ceph_pool_perm_check() on anything that isn't a regular file, as the MDS is what handles talking to the OSD in those cases. Just return 0 if it's not a regular file. Reported-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: send opened files/pinned caps/opened inodes metrics to MDS daemonXiubo Li
For the old ceph version, if it received this metric info, it will just ignore them. URL: https://tracker.ceph.com/issues/46866 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: avoid counting the same request twice or moreXiubo Li
If the request will retry, skip updating the latency metric. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: rename the metric helpersXiubo Li
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix kerneldoc copypasta over ceph_start_io_directJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: use attach/detach_page_private for tracking snap contextJeff Layton
There is some ambiguity around the use of PagePrivate. It's generally expected in core code that if PagePrivate is set then you have a reference to it. It's not clear that ceph always does (and I believe it may not). Change ceph to use attach/detach_page_private so that we keep a reference to the page until the snap context is detached. Link: https://lore.kernel.org/ceph-devel/2503810.1616508988@warthog.procyon.org.uk/T/#mf29e5abbb0ec8035cde0de30778690de7d956f84 Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: don't use d_add in ceph_handle_snapdirJeff Layton
It's possible ceph_get_snapdir could end up finding a (disconnected) inode that already exists in the cache. Change the prototype for ceph_handle_snapdir to return a dentry pointer and have it use d_splice_alias so we don't end up with an aliased dentry in the cache. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: don't clobber i_snap_caps on non-I_NEW inodeJeff Layton
We want the snapdir to mirror the non-snapped directory's attributes for most things, but i_snap_caps represents the caps granted on the snapshot directory by the MDS itself. A misbehaving MDS could issue different caps for the snapdir and we lose them here. Only reset i_snap_caps when the inode is I_NEW. Also, move the setting of i_op and i_fop inside the if block since they should never change anyway. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of warnings by explicitly adding a break and a goto statements instead of just letting the code fall through to the next case. URL: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: convert ceph_readpages to ceph_readaheadJeff Layton
Convert ceph_readpages to ceph_readahead and make it use netfs_readahead. With this we can rip out a lot of the old readpage/readpages infrastructure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: convert ceph_write_begin to netfs_write_beginJeff Layton
Convert ceph_write_begin to use the netfs_write_begin helper. Most of the ops we need for it are already in place from the readpage conversion but we do add a new check_write_begin op since ceph needs to be able to vet whether there is an incompatible writeback already in flight before reading in the page. With this, we can also remove the old ceph_do_readpage helper. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: convert ceph_readpage to netfs_readpageJeff Layton
Have the ceph KConfig select NETFS_SUPPORT. Add a new netfs ops structure and the operations for it. Convert ceph_readpage to use the new netfs_readpage helper. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: fix fscache invalidationJeff Layton
Ensure that we invalidate the fscache whenever we invalidate the pagecache. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: rework PageFsCache handlingJeff Layton
With the new fscache API, the PageFsCache bit now indicates that the page is being written to the cache and shouldn't be modified or released until it's finished. Change releasepage and invalidatepage to wait on that bit before returning. Also define FSCACHE_USE_NEW_IO_API so that we opt into the new fscache API. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27ceph: rip out old fscache readpage handlingJeff Layton
With the new netfs read helper functions, we won't need a lot of this infrastructure as it handles the pagecache pages itself. Rip out the read handling for now, and much of the old infrastructure that deals in individual pages. The cookie handling is mostly unchanged, however. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-27net: fix a concurrency bug in l2tp_tunnel_register()Gong, Sishuai
l2tp_tunnel_register() registers a tunnel without fully initializing its attribute. This can allow another kernel thread running l2tp_xmit_core() to access the uninitialized data and then cause a kernel NULL pointer dereference error, as shown below. Thread 1 Thread 2 //l2tp_tunnel_register() list_add_rcu(&tunnel->list, &pn->l2tp_tunnel_list); //pppol2tp_connect() tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id); // Fetch the new tunnel ... //l2tp_xmit_core() struct sock *sk = tunnel->sock; ... bh_lock_sock(sk); //Null pointer error happens tunnel->sock = sk; Fix this bug by initializing tunnel->sock before adding the tunnel into l2tp_tunnel_list. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Sishuai Gong <sishuai@purdue.edu> Reported-by: Sishuai Gong <sishuai@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net>