summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-10-12 The main changes are: 1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong. 2) libbpf relocation support for different size load/store, from Andrii. 3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel. 4) BPF support for per-cpu variables, form Hao. 5) sockmap improvements, from John. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-12can: remove obsolete version stringsOliver Hartkopp
As pointed out by Jakub Kicinski here: http://lore.kernel.org/r/20201009175751.5c54097f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com this patch removes the obsolete version information of the different CAN protocols and the AF_CAN core module. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201012074354.25839-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-11bpf, sockmap: Allow skipping sk_skb parser programJohn Fastabend
Currently, we often run with a nop parser namely one that just does this, 'return skb->len'. This happens when either our verdict program can handle streaming data or it is only looking at socket data such as IP addresses and other metadata associated with the flow. The second case is common for a L3/L4 proxy for instance. So lets allow loading programs without the parser then we can skip the stream parser logic and avoid having to add a BPF program that is effectively a nop. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
2020-10-11bpf: Allow for map-in-map with dynamic inner array map entriesDaniel Borkmann
Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf ("bpf: Relax max_entries check for most of the inner map types") added support for dynamic inner max elements for most map-in-map types. Exceptions were maps like array or prog array where the map_gen_lookup() callback uses the maps' max_entries field as a constant when emitting instructions. We recently implemented Maglev consistent hashing into Cilium's load balancer which uses map-in-map with an outer map being hash and inner being array holding the Maglev backend table for each service. This has been designed this way in order to reduce overall memory consumption given the outer hash map allows to avoid preallocating a large, flat memory area for all services. Also, the number of service mappings is not always known a-priori. The use case for dynamic inner array map entries is to further reduce memory overhead, for example, some services might just have a small number of back ends while others could have a large number. Right now the Maglev backend table for small and large number of backends would need to have the same inner array map entries which adds a lot of unneeded overhead. Dynamic inner array map entries can be realized by avoiding the inlined code generation for their lookup. The lookup will still be efficient since it will be calling into array_map_lookup_elem() directly and thus avoiding retpoline. The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips inline code generation and relaxes array_map_meta_equal() check to ignore both maps' max_entries. This also still allows to have faster lookups for map-in-map when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed. Example code generation where inner map is dynamic sized array: # bpftool p d x i 125 int handle__sys_enter(void * ctx): ; int handle__sys_enter(void *ctx) 0: (b4) w1 = 0 ; int key = 0; 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 ; 3: (07) r2 += -4 ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key); 4: (18) r1 = map[id:468] 6: (07) r1 += 272 7: (61) r0 = *(u32 *)(r2 +0) 8: (35) if r0 >= 0x3 goto pc+5 9: (67) r0 <<= 3 10: (0f) r0 += r1 11: (79) r0 = *(u64 *)(r0 +0) 12: (15) if r0 == 0x0 goto pc+1 13: (05) goto pc+1 14: (b7) r0 = 0 15: (b4) w6 = -1 ; if (!inner_map) 16: (15) if r0 == 0x0 goto pc+6 17: (bf) r2 = r10 ; 18: (07) r2 += -4 ; val = bpf_map_lookup_elem(inner_map, &key); 19: (bf) r1 = r0 | No inlining but instead 20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem() ; return val ? *val : -1; | for inner array lookup. 21: (15) if r0 == 0x0 goto pc+1 ; return val ? *val : -1; 22: (61) r6 = *(u32 *)(r0 +0) ; } 23: (bc) w0 = w6 24: (95) exit Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11bpf: Add redirect_peer helperDaniel Borkmann
Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
2020-10-09netlink: export policy in extended ACKJohannes Berg
Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK to advertise the policy, e.g. if an attribute was out of range, you'll know the range that's permissible. Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL() macros to set this, since realistically it's only useful to do this when the bad attribute (offset) is also returned. Use it in lib/nlattr.c which practically does all the policy validation. v2: - add and use netlink_policy_dump_attr_size_estimate() v3: - remove redundant break v4: - really remove redundant break ... sorry Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09Merge tag 'linux-can-next-for-5.10-20201007' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.10-20201007 The first 3 patches are by me and fix several warnings found when compiling the kernel with W=1. Lukas Bulwahn's patch adjusts the MAINTAINERS file, to accommodate the renaming of the mcp251xfd driver. Vincent Mailhol contributes 3 patches for the CAN networking layer. First error queue support is added the the CAN RAW protocol. The second patch converts the get_can_dlc() and get_canfd_dlc() in-Kernel-only macros from using __u8 to u8. The third patch adds a helper function to calculate the length of one bit in in multiple of time quanta. Oliver Hartkopp's patch add support for the ISO 15765-2:2016 transport protocol to the CAN stack. Three patches by Lad Prabhakar add documentation for various new rcar controllers to the device tree bindings of the rcar_can and rcan_canfd driver. Michael Walle's patch adds various processors to the flexcan driver binding documentation. The next two patches are by me and target the flexcan driver aswell. The remove the ack_grp and ack_bit from the fsl,stop-mode DT property and the driver, as they are not used anymore. As these are the last two arguments this change will not break existing device trees. The last three patches are by Srinivas Neeli and target the xilinx_can driver. The first one increases the lower limit for the bit rate prescaler to 2, the other two fix sparse and coverity findings. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09net/mlx5: Add support for fw live patch eventMoshe Shemesh
Firmware live patch event notifies the driver that the firmware was just updated using live patch. In such case the driver should not reload or re-initiate entities, part to updating the firmware version and re-initiate the firmware tracer which can be updated by live patch with new strings database to help debugging an issue. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09net/mlx5: Handle sync reset request eventMoshe Shemesh
Once the driver gets sync_reset_request from firmware it prepares for the coming reset and sends acknowledge. After getting this event the driver expects device reset, either it will trigger PCI reset on sync_reset_now event or such PCI reset will be triggered by another PF of the same device. So it moves to reset requested mode and if it gets PCI reset triggered by the other PF it detect the reset and reloads. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-07can: dev: add a helper function to calculate the duration of one bitVincent Mailhol
Rename macro CAN_CALC_SYNC_SEG to CAN_SYNC_SEG and make it available through include/linux/can/dev.h Add an helper function can_bit_time() which returns the duration (in time quanta) of one CAN bit. Rationale for this patch: the sync segment and the bit time are two concepts which are defined in the CAN ISO standard. Device drivers for CAN might need those. Please refer to ISO 11898-1:2015, section 11.3.1.1 "Bit time" for additional information. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20201002154219.4887-6-mailhol.vincent@wanadoo.fr [mkl: Let can_bit_time() return an unsinged int, make argument const] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-06can: dev: fix type of get_can_dlc() and get_canfd_dlc() macrosVincent Mailhol
The macros get_can_dlc() and get_canfd_dlc() are not visible in userland. As such, type u8 should be preferred over type __u8. Reference: https://lkml.org/lkml/2020/10/1/708 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20201002154219.4887-3-mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-06Merge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Daniel queued these up last week and I took a long weekend so didn't get them out, but fixing the OOB access on get font seems like something we should land and it's cc'ed stable as well. The other big change is a partial revert for a regression on android on the clcd fbdev driver, and one other docs fix. fbdev: - Re-add FB_ARMCLCD for android - Fix global-out-of-bounds read in fbcon_get_font() core: - Small doc fix" * tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm: drm: drm_dsc.h: fix a kernel-doc markup Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver" fbcon: Fix global-out-of-bounds read in fbcon_get_font() Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
2020-10-06net: netdevice.h: sw_netstats_rx_add helperFabian Frederick
some drivers/network protocols update rx bytes/packets under u64_stats_update_begin/end sequence. Add a specific helper like dev_lstats_add() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06ethtool: allow netdev driver to define phy tunablesIgor Russkikh
Define get/set phy tunable callbacks in ethtool ops. This will allow MAC drivers with integrated PHY still to implement these tunables. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06Merge tag 'drm-misc-fixes-2020-10-01' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.9: - Small doc fix. - Re-add FB_ARMCLCD for android. - Fix global-out-of-bounds read in fbcon_get_font(). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/8585daa2-fcbc-3924-ac4f-e7b5668808e0@linux.intel.com
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05Merge tag 'platform-drivers-x86-v5.9-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Andy Shevchenko: "We have some fixes for Tablet Mode reporting in particular, that users are complaining a lot about. Summary: - Attempt #3 of enabling Tablet Mode reporting w/o regressions - Improve battery recognition code in ASUS WMI driver - Fix Kconfig dependency warning for Fujitsu and LG laptop drivers - Add fixes in Thinkpad ACPI driver for _BCL method and NVRAM polling - Fix power supply extended topology in Mellanox driver - Fix memory leak in OLPC EC driver - Avoid static struct device in Intel PMC core driver - Add support for the touchscreen found in MPMAN Converter9 2-in-1 - Update MAINTAINERS to reflect the real state of affairs" * tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse MAINTAINERS: Add Mark Gross and Hans de Goede as x86 platform drivers maintainers platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting platform/x86: intel-vbtn: Revert "Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360" platform/x86: intel_pmc_core: do not create a static struct device platform/x86: mlx-platform: Fix extended topology configuration for power supply units platform/x86: pcengines-apuv2: Fix typo on define of AMD_FCH_GPIO_REG_GPIO55_DEVSLP0 platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP platform/x86: fix kconfig dependency warning for LG_LAPTOP platform/x86: thinkpad_acpi: initialize tp_nvram_state variable platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 platform/x86: asus-wmi: Add BATC battery name to the list of supported platform/x86: asus-nb-wmi: Revert "Do not load on Asus T100TA and T200TA" platform/x86: touchscreen_dmi: Add info for the MPMAN Converter9 2-in-1 Documentation: laptops: thinkpad-acpi: fix underline length build warning Platform: OLPC: Fix memleak in olpc_ec_probe
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Make sure SKB control block is in the proper state during IPSEC ESP-in-TCP encapsulation. From Sabrina Dubroca. 2) Various kinds of attributes were not being cloned properly when we build new xfrm_state objects from existing ones. Fix from Antony Antony. 3) Make sure to keep BTF sections, from Tony Ambardar. 4) TX DMA channels need proper locking in lantiq driver, from Hauke Mehrtens. 5) Honour route MTU during forwarding, always. From Maciej Żenczykowski. 6) Fix races in kTLS which can result in crashes, from Rohit Maheshwari. 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan Jha. 8) Use correct address family in xfrm state lookups, from Herbert Xu. 9) A bridge FDB flush should not clear out user managed fdb entries with the ext_learn flag set, from Nikolay Aleksandrov. 10) Fix nested locking of netdev address lists, from Taehee Yoo. 11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau. 12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit. 13) Don't free command entries in mlx5 while comp handler could still be running, from Eran Ben Elisha. 14) Error flow of request_irq() in mlx5 is busted, due to an off by one we try to free and IRQ never allocated. From Maor Gottlieb. 15) Fix leak when dumping netlink policies, from Johannes Berg. 16) Sendpage cannot be performed when a page is a slab page, or the page count is < 1. Some subsystems such as nvme were doing so. Create a "sendpage_ok()" helper and use it as needed, from Coly Li. 17) Don't leak request socket when using syncookes with mptcp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net/core: check length before updating Ethertype in skb_mpls_{push,pop} net: mvneta: fix double free of txq->buf net_sched: check error pointer in tcf_dump_walker() net: team: fix memory leak in __team_options_register net: typhoon: Fix a typo Typoon --> Typhoon net: hinic: fix DEVLINK build errors net: stmmac: Modify configuration method of EEE timers tcp: fix syn cookied MPTCP request socket leak libceph: use sendpage_ok() in ceph_tcp_sendpage() scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage() tcp: use sendpage_ok() to detect misused .sendpage nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send net: introduce helper sendpage_ok() in include/linux/net.h net: usb: pegasus: Proper error handing when setting pegasus' MAC address net: core: document two new elements of struct net_device netlink: fix policy dump leak net/mlx5e: Fix race condition on nhe->n pointer in neigh update net/mlx5e: Fix VLAN create flow ...
2020-10-03net/sched: act_vlan: Add {POP,PUSH}_ETH actionsGuillaume Nault
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to respectively pop and push a base Ethernet header at the beginning of a frame. POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any, must be stripped before calling POP_ETH. PUSH_ETH is restricted to skbs with no mac_header, and only the MAC addresses can be configured. The Ethertype is automatically set from skb->protocol. These restrictions ensure that all skb's fields remain consistent, so that this action can't confuse other part of the networking stack (like GSO). Since openvswitch already had these actions, consolidate the code in skbuff.c (like for vlan and mpls push/pop). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03net: remove NETDEV_HW_ADDR_T_SLAVETaehee Yoo
NETDEV_HW_ADDR_T_SLAVE is not used anymore, remove it. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02Merge tag 'mlx5-fixes-2020-09-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux From: Saeed Mahameed <saeedm@nvidia.com> ==================== This series introduces some fixes to mlx5 driver. v1->v2: - Patch #1 Don't return while mutex is held. (Dave) v2->v3: - Drop patch #1, will consider a better approach (Jakub) - use cpu_relax() instead of cond_resched() (Jakub) - while(i--) to reveres a loop (Jakub) - Drop old mellanox email sign-off and change the committer email (Jakub) Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: Fix VLAN cleanup flow') ('net/mlx5e: Fix VLAN create flow') For -stable v4.16 ('net/mlx5: Fix request_irqs error flow') For -stable v5.4 ('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU') ('net/mlx5: Avoid possible free of command entry while timeout comp handler') For -stable v5.7 ('net/mlx5e: Fix return status when setting unsupported FEC mode') For -stable v5.8 ('net/mlx5e: Fix race condition on nhe->n pointer in neigh update') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02Merge tag 'mac80211-next-for-net-next-2020-10-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Another set of changes, this time with: * lots more S1G band support * 6 GHz scanning, finally * kernel-doc fixes * non-split wiphy dump fixes in nl80211 * various other small cleanups/features ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: introduce helper sendpage_ok() in include/linux/net.hColy Li
The original problem was from nvme-over-tcp code, who mistakenly uses kernel_sendpage() to send pages allocated by __get_free_pages() without __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on tail pages, sending them by kernel_sendpage() may trigger a kernel panic from a corrupted kernel heap, because these pages are incorrectly freed in network stack as page_count 0 pages. This patch introduces a helper sendpage_ok(), it returns true if the checking page, - is not slab page: PageSlab(page) is false. - has page refcount: page_count(page) is not zero All drivers who want to send page to remote end by kernel_sendpage() may use this helper to check whether the page is OK. If the helper does not return true, the driver should try other non sendpage method (e.g. sock_no_sendpage()) to handle the page. Signed-off-by: Coly Li <colyli@suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Vlastimil Babka <vbabka@suse.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: core: document two new elements of struct net_deviceMauro Carvalho Chehab
As warned by "make htmldocs", there are two new struct elements that aren't documented: ../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device' ../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device' Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02bpf: Introducte bpf_this_cpu_ptr()Hao Luo
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This helper always returns a valid pointer, therefore no need to check returned value for NULL. Also note that all programs run with preemption disabled, which means that the returned pointer is stable during all the execution of the program. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
2020-10-02bpf: Introduce bpf_per_cpu_ptr()Hao Luo
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars. bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel except that it may return NULL. This happens when the cpu parameter is out of range. So the caller must check the returned value. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
2020-10-02bpf: Introduce pseudo_btf_idHao Luo
Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a ksym so that further dereferences on the ksym can use the BTF info to validate accesses. Internally, when seeing a pseudo_btf_id ld insn, the verifier reads the btf_id stored in the insn[0]'s imm field and marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND, which is encoded in btf_vminux by pahole. If the VAR is not of a struct type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID and the mem_size is resolved to the size of the VAR's type. >From the VAR btf_id, the verifier can also read the address of the ksym's corresponding kernel var from kallsyms and use that to fill dst_reg. Therefore, the proper functionality of pseudo_btf_id depends on (1) kallsyms and (2) the encoding of kernel global VARs in pahole, which should be available since pahole v1.18. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
2020-10-02net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessibleSaeed Mahameed
In case of pci is offline reclaim_pages_cmd() will still try to call the FW to release FW pages, cmd_exec() in this case will return a silent success without actually calling the FW. This is wrong and will cause page leaks, what we should do is to detect pci offline or command interface un-available before tying to access the FW and manually release the FW pages in the driver. In this patch we share the code to check for FW command interface availability and we call it in sensitive places e.g. reclaim_pages_cmd(). Alternative fix: 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value, command success simulation list. 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd(). Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5: Avoid possible free of command entry while timeout comp handlerEran Ben Elisha
Upon command completion timeout, driver simulates a forced command completion. In a rare case where real interrupt for that command arrives simultaneously, it might release the command entry while the forced handler might still access it. Fix that by adding an entry refcount, to track current amount of allowed handlers. Command entry to be released only when this refcount is decremented to zero. Command refcount is always initialized to one. For callback commands, command completion handler is the symmetric flow to decrement it. For non-callback commands, it is wait_func(). Before ringing the doorbell, increment the refcount for the real completion handler. Once the real completion handler is called, it will decrement it. For callback commands, once the delayed work is scheduled, increment the refcount. Upon callback command completion handler, we will try to cancel the timeout callback. In case of success, we need to decrement the callback refcount as it will never run. In addition, gather the entry index free and the entry free into a one flow for all command types release. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02Merge tag 'mmc-v5.9-rc4-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - Fix deadlock when removing MEMSTICK host - Workaround broken CMDQ on Intel GLK based IRBIS models * tag 'mmc-v5.9-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models memstick: Skip allocating card when removing host
2020-10-02mm: memcg/slab: fix slab statistics in !SMP configurationRoman Gushchin
Since commit ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items") the write side of slab counters accepts a value in bytes and converts it to pages. It happens in __mod_node_page_state(). However a non-SMP version of __mod_node_page_state() doesn't perform this conversion. It leads to incorrect (unrealistically high) slab counters values. Fix this by adding a similar conversion to the non-SMP version of __mod_node_page_state(). Signed-off-by: Roman Gushchin <guro@fb.com> Reported-and-tested-by: Bastian Bittorf <bb@npl.de> Fixes: ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items") Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-01pipe: remove pipe_wait() and fix wakeup race with spliceLinus Torvalds
The pipe splice code still used the old model of waiting for pipe IO by using a non-specific "pipe_wait()" that waited for any pipe event to happen, which depended on all pipe IO being entirely serialized by the pipe lock. So by checking the state you were waiting for, and then adding yourself to the wait queue before dropping the lock, you were guaranteed to see all the wakeups. Strictly speaking, the actual wakeups were not done under the lock, but the pipe_wait() model still worked, because since the waiter held the lock when checking whether it should sleep, it would always see the current state, and the wakeup was always done after updating the state. However, commit 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") split the single wait-queue into two, and in the process also made the "wait for event" code wait for _two_ wait queues, and that then showed a race with the wakers that were not serialized by the pipe lock. It's only splice that used that "pipe_wait()" model, so the problem wasn't obvious, but Josef Bacik reports: "I hit a hang with fstest btrfs/187, which does a btrfs send into /dev/null. This works by creating a pipe, the write side is given to the kernel to write into, and the read side is handed to a thread that splices into a file, in this case /dev/null. The box that was hung had the write side stuck here [pipe_write] and the read side stuck here [splice_from_pipe_next -> pipe_wait]. [ more details about pipe_wait() scenario ] The problem is we're doing the prepare_to_wait, which sets our state each time, however we can be woken up either with reads or writes. In the case above we race with the WRITER waking us up, and re-set our state to INTERRUPTIBLE, and thus never break out of schedule" Josef had a patch that avoided the issue in pipe_wait() by just making it set the state only once, but the deeper problem is that pipe_wait() depends on a level of synchonization by the pipe mutex that it really shouldn't. And the whole "wait for any pipe state change" model really isn't very good to begin with. So rather than trying to work around things in pipe_wait(), remove that legacy model of "wait for arbitrary pipe event" entirely, and actually create functions that wait for the pipe actually being readable or writable, and can do so without depending on the pipe lock serializing everything. Fixes: 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/ Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 90 non-merge commits during the last 8 day(s) which contain a total of 103 files changed, 7662 insertions(+), 1894 deletions(-). Note that once bpf(/net) tree gets merged into net-next, there will be a small merge conflict in tools/lib/bpf/btf.c between commit 1245008122d7 ("libbpf: Fix native endian assumption when parsing BTF") from the bpf tree and the commit 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") from the bpf-next tree. Correct resolution would be to stick with bpf-next, it should look like: [...] /* check BTF magic */ if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) { err = -EIO; goto err_out; } if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { /* definitely not a raw BTF */ err = -EPROTO; goto err_out; } /* get file size */ [...] The main changes are: 1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying BTF-based kernel data structures out of BPF programs, from Alan Maguire. 2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race conditions exposed by it. It was discussed to take these via BPF and networking tree to get better testing exposure, from Paul E. McKenney. 3) Support multi-attach for freplace programs, needed for incremental attachment of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen. 4) libbpf support for appending new BTF types at the end of BTF object, allowing intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko. 5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add a redirect helper into neighboring subsys, from Daniel Borkmann. 6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps from one to another, from Lorenz Bauer. 7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused a verifier issue due to type downgrade to scalar, from John Fastabend. 8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue and epilogue sections, from Maciej Fijalkowski. 9) Add an option to perf RB map to improve sharing of event entries by avoiding remove- on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu. 10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "A previous commit to prevent AML memory opregions from accessing the kernel memory turned out to be too restrictive. Relax the permission check to permit the ACPI core to map kernel memory used for table overrides" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: permit ACPI core to map kernel memory used for table overrides
2020-09-30net/mlx5: E-switch, Use PF num in metadata reg c0sunils
Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <sunils@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-30arm64: permit ACPI core to map kernel memory used for table overridesArd Biesheuvel
Jonathan reports that the strict policy for memory mapped by the ACPI core breaks the use case of passing ACPI table overrides via initramfs. This is due to the fact that the memory type used for loading the initramfs in memory is not recognized as a memory type that is typically used by firmware to pass firmware tables. Since the purpose of the strict policy is to ensure that no AML or other ACPI code can manipulate any memory that is used by the kernel to keep its internal state or the state of user tasks, we can relax the permission check, and allow mappings of memory that is reserved and marked as NOMAP via memblock, and therefore not covered by the linear mapping to begin with. Fixes: 1583052d111f ("arm64/acpi: disallow AML memory opregions to access kernel memory") Fixes: 325f5585ec36 ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-30net: macb: move pdata to private headerAlexandre Belloni
struct macb_platform_data is only used by macb_pci to register the platform device, move its definition to cadence/macb.h and remove platform_data/macb.h Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30bpf: Add redirect_neigh helper as redirect drop-inDaniel Borkmann
Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver"Peter Collingbourne
Also partially revert the follow-up change "drm: pl111: Absorb the external register header". This reverts the parts of commits 7e4e589db76a3cf4c1f534eb5a09cc6422766b93 and 0fb8125635e8eb5483fb095f98dcf0651206a7b8 that touch paths outside of drivers/gpu/drm/pl111. The fbdev driver is used by Android's FVP configuration. Using the DRM driver together with DRM's fbdev emulation results in a failure to boot Android. The root cause is that Android's generic fbdev userspace driver relies on the ability to set the pixel format via FBIOPUT_VSCREENINFO, which is not supported by fbdev emulation. There have been other less critical behavioral differences identified between the fbdev driver and the DRM driver with fbdev emulation. The DRM driver exposes different values for the panel's width, height and refresh rate, and the DRM driver fails a FBIOPUT_VSCREENINFO syscall with yres_virtual greater than the maximum supported value instead of letting the syscall succeed and setting yres_virtual based on yres. Signed-off-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200929195344.2219796-1-pcc@google.com
2020-09-29net: Add netif_rx_any_context()Sebastian Andrzej Siewior
Quite some drivers make conditional decisions based on in_interrupt() to invoke either netif_rx() or netif_rx_ni(). Conditionals based on in_interrupt() or other variants of preempt count checks in drivers should not exist for various reasons and Linus clearly requested to either split the code pathes or pass an argument to the common functions which provides the context. This is obviously the correct solution, but for some of the affected drivers this needs a major rewrite due to their convoluted structure. As in_interrupt() usage in drivers needs to be phased out, provide netif_rx_any_context() as a stop gap for these drivers. This confines the in_interrupt() conditional to core code which in turn allows to remove the access to this check for driver code and provides one central place to do further modifications once the driver maze is cleaned up. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29bpf: Support attaching freplace programs to multiple attach pointsToke Høiland-Jørgensen
This enables support for attaching freplace programs to multiple attach points. It does this by amending the UAPI for bpf_link_Create with a target btf ID that can be used to supply the new attachment point along with the target program fd. The target must be compatible with the target that was supplied at program load time. The implementation reuses the checks that were factored out of check_attach_btf_id() to ensure compatibility between the BTF types of the old and new attachment. If these match, a new bpf_tracing_link will be created for the new attach target, allowing multiple attachments to co-exist simultaneously. The code could theoretically support multiple-attach of other types of tracing programs as well, but since I don't have a use case for any of those, there is no API support for doing so. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
2020-09-29bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attachToke Høiland-Jørgensen
In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
2020-09-28stmmac: intel: Adding ref clock 1us tic for LPI cntrRusaimi Amira Ruslan
Adding reference clock (1us tic) for all LPI timer on Intel platforms. The reference clock is derived from ptp clk. This also enables all LPI counter. Signed-off-by: Rusaimi Amira Ruslan <rusaimi.amira.rusaimi@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28bpf: Add bpf_seq_printf_btf helperAlan Maguire
A helper is added to allow seq file writing of kernel data structures using vmlinux BTF. Its signature is long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); Flags and struct btf_ptr definitions/use are identical to the bpf_snprintf_btf helper, and the helper returns 0 on success or a negative error value. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
2020-09-28bpf: Add bpf_snprintf_btf helperAlan Maguire
A helper is added to support tracing kernel type information in BPF using the BPF Type Format (BTF). Its signature is long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); struct btf_ptr * specifies - a pointer to the data to be traced - the BTF id of the type of data pointed to - a flags field is provided for future use; these flags are not to be confused with the BTF_F_* flags below that control how the btf_ptr is displayed; the flags member of the struct btf_ptr may be used to disambiguate types in kernel versus module BTF, etc; the main distinction is the flags relate to the type and information needed in identifying it; not how it is displayed. For example a BPF program with a struct sk_buff *skb could do the following: static struct btf_ptr b = { }; b.ptr = skb; b.type_id = __builtin_btf_type_id(struct sk_buff, 1); bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0); Default output looks like this: (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags modifying display are as follows: - BTF_F_COMPACT: no formatting around type information - BTF_F_NONAME: no struct/union member names/types - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values; equivalent to %px. - BTF_F_ZERO: show zero-valued struct/union members; they are not displayed by default Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
2020-09-28bpf: Move to generic BTF show support, apply it to seq files/stringsAlan Maguire
generalize the "seq_show" seq file support in btf.c to support a generic show callback of which we support two instances; the current seq file show, and a show with snprintf() behaviour which instead writes the type data to a supplied string. Both classes of show function call btf_type_show() with different targets; the seq file or the string to be written. In the string case we need to track additional data - length left in string to write and length to return that we would have written (a la snprintf). By default show will display type information, field members and their types and values etc, and the information is indented based upon structure depth. Zeroed fields are omitted. Show however supports flags which modify its behaviour: BTF_SHOW_COMPACT - suppress newline/indent. BTF_SHOW_NONAME - suppress show of type and member names. BTF_SHOW_PTR_RAW - do not obfuscate pointer values. BTF_SHOW_UNSAFE - do not copy data to safe buffer before display. BTF_SHOW_ZERO - show zeroed values (by default they are not shown). Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com
2020-09-28bpf: Provide function to get vmlinux BTF informationAlan Maguire
It will be used later for BPF structure display support Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com
2020-09-28bpf: verifier: refactor check_attach_btf_id()Toke Høiland-Jørgensen
The check_attach_btf_id() function really does three things: 1. It performs a bunch of checks on the program to ensure that the attachment is valid. 2. It stores a bunch of state about the attachment being requested in the verifier environment and struct bpf_prog objects. 3. It allocates a trampoline for the attachment. This patch splits out (1.) and (3.) into separate functions which will perform the checks, but return the computed values instead of directly modifying the environment. This is done in preparation for reusing the checks when the actual attachment is happening, which will allow tracing programs to have multiple (compatible) attachments. This also fixes a bug where a bunch of checks were skipped if a trampoline already existed for the tracing target. Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>