Age | Commit message (Collapse) | Author |
|
This converts the PXA2xx SPI driver to use GPIO descriptors
exclusively to retrieve GPIO chip select lines.
The device tree and ACPI paths of the driver already use
descriptors, hence ->use_gpio_descriptors is already set and
this codepath is well tested.
Convert all the PXA boards providing chip select GPIOs as
platform data and drop the old GPIO chipselect handling in
favor of the core managing it exclusively.
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220125005836.494807-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Remove most references to 'master' in the code and replace them with
some form of 'aggregate device'. This better reflects the reality of
what this code does, i.e. an aggregate device that represents a
device like a GPU card once some set of devices that make up the
aggregate device probe and register with the component framework.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Saravana Kannan <saravanak@google.com>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20220127200141.1295328-2-swboyd@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After following the call tree of phy_set_max_speed(), it became clear
that this function never returns anything but 0, so we can change its
result type to *void* and drop the result checks from the three drivers
that actually bothered to do it...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need the tty/serial fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the char/misc fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of custom data type re-use generic struct u32_fract.
No changes intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220126135353.24007-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Introduce a macro to produce data types like
struct TYPE_fract {
__TYPE numerator;
__TYPE denominator;
};
to be used in the code wherever it's needed.
In the following changes convert some users to it.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220126135353.24007-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
When CONFIG_CGROUPS is disabled psi code generates the following
warnings:
kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
| ^~~~~~~~~~~~~~~~~~
kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
1182 | void psi_trigger_destroy(struct psi_trigger *t)
| ^~~~~~~~~~~~~~~~~~~
kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
| ^~~~~~~~~~~~~~~~
Change the declarations of these functions in the header to provide the
prototypes even when they are unused.
Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It has been reported that the tag setting operation on newly-allocated
pages can cause the page flags to be corrupted when performed
concurrently with other flag updates as a result of the use of
non-atomic operations.
Fix the problem by using a compare-exchange loop to update the tag.
Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0cec63101
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As with the other members of folio, the offset of page->mapping and
folio->mapping must be the same. The compile-time check was
inadvertently removed during development. Add it back.
[willy@infradead.org: changelog redo]
Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The CONFIG_SYSCTL=n stub returns the wrong type.
Fixes: ee9efac48a082 ("sysctl: add helper to register a sysctl mount point")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for 5.17-rc2 that resolve a
number of reported problems. These include:
- typec driver fixes
- xhci platform driver fixes for suspending
- ulpi core fix
- role.h build fix
- new device ids
- syzbot-reported bugfixes
- gadget driver fixes
- dwc3 driver fixes
- other small fixes
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdnsp: Fix segmentation fault in cdns_lost_power function
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
usb: gadget: at91_udc: fix incorrect print type
usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
usb: xhci-plat: fix crash when suspend if remote wake enable
usb: common: ulpi: Fix crash in ulpi_match()
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
ucsi_ccg: Check DEV_INT bit only when starting CCG4
USB: core: Fix hang in usb_kill_urb by adding memory barriers
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
usb: typec: tcpm: Do not disconnect while receiving VBUS off
usb: typec: Don't try to register component master without components
usb: typec: Only attempt to link USB ports if there is fwnode
usb: typec: tcpci: don't touch CC line if it's Vconn source
usb: roles: fix include/linux/usb/role.h compile issue
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request
- add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu
Zheng)
- remove the unneeded ret variable in nvmf_dev_show (Changcheng
Deng)
- Fix for a hang regression introduced with a patch in the merge
window, where low queue depth devices would not always get woken
correctly (Laibin)
- Small series fixing an IO accounting issue with bio backed dm devices
(Mike, Yu)
* tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
dm: properly fix redundant bio-based IO accounting
dm: revert partial fix for redundant bio-based IO accounting
block: add bio_start_io_acct_time() to control start_time
blk-mq: Fix wrong wakeup batch configuration which will cause hang
nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show
nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
blk-mq: fix missing blk_account_io_done() in error path
block: fix memory leak in disk_register_independent_access_ranges
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security sybsystem fix from James Morris:
"Fix NULL pointer crash in LSM via Ceph, from Vivek Goyal"
* tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security, lsm: dentry_init_security() Handle multi LSM registration
|
|
Change of_*() functions to device_*() for firmware agnostic usage.
This allows to have the smbus_alert interrupt without any changes
in the controller drivers using the ACPI table.
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Add fwnode_irq_get_byname() to get an interrupt by name from either
ACPI table or Device Tree, whichever is used for enumeration.
In the ACPI case, this allow us to use 'interrupt-names' in
_DSD which can be mapped to Interrupt() resource by index.
The implementation is similar to 'interrupt-names' in the
Device Tree.
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
bio_start_io_acct_time() interface is like bio_start_io_acct() that
allows start_time to be passed in. This gives drivers the ability to
defer starting accounting until after IO is issued (but possibily not
entirely due to bio splitting).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A ceph user has reported that ceph is crashing with kernel NULL pointer
dereference. Following is the backtrace.
/proc/version: Linux version 5.16.2-arch1-1 (linux@archlinux) (gcc (GCC)
11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Thu, 20 Jan 2022
16:18:29 +0000
distro / arch: Arch Linux / x86_64
SELinux is not enabled
ceph cluster version: 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
relevant dmesg output:
[ 30.947129] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 30.947206] #PF: supervisor read access in kernel mode
[ 30.947258] #PF: error_code(0x0000) - not-present page
[ 30.947310] PGD 0 P4D 0
[ 30.947342] Oops: 0000 [#1] PREEMPT SMP PTI
[ 30.947388] CPU: 5 PID: 778 Comm: touch Not tainted 5.16.2-arch1-1 #1
86fbf2c313cc37a553d65deb81d98e9dcc2a3659
[ 30.947486] Hardware name: Gigabyte Technology Co., Ltd. B365M
DS3H/B365M DS3H, BIOS F5 08/13/2019
[ 30.947569] RIP: 0010:strlen+0x0/0x20
[ 30.947616] Code: b6 07 38 d0 74 16 48 83 c7 01 84 c0 74 05 48 39 f7 75
ec 31 c0 31 d2 89 d6 89 d7 c3 48 89 f8 31 d2 89 d6 89 d7 c3 0
f 1f 40 00 <80> 3f 00 74 12 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31
ff
[ 30.947782] RSP: 0018:ffffa4ed80ffbbb8 EFLAGS: 00010246
[ 30.947836] RAX: 0000000000000000 RBX: ffffa4ed80ffbc60 RCX:
0000000000000000
[ 30.947904] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 30.947971] RBP: ffff94b0d15c0ae0 R08: 0000000000000000 R09:
0000000000000000
[ 30.948040] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 30.948106] R13: 0000000000000001 R14: ffffa4ed80ffbc60 R15:
0000000000000000
[ 30.948174] FS: 00007fc7520f0740(0000) GS:ffff94b7ced40000(0000)
knlGS:0000000000000000
[ 30.948252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.948308] CR2: 0000000000000000 CR3: 0000000104a40001 CR4:
00000000003706e0
[ 30.948376] Call Trace:
[ 30.948404] <TASK>
[ 30.948431] ceph_security_init_secctx+0x7b/0x240 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[ 30.948582] ceph_atomic_open+0x51e/0x8a0 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[ 30.948708] ? get_cached_acl+0x4d/0xa0
[ 30.948759] path_openat+0x60d/0x1030
[ 30.948809] do_filp_open+0xa5/0x150
[ 30.948859] do_sys_openat2+0xc4/0x190
[ 30.948904] __x64_sys_openat+0x53/0xa0
[ 30.948948] do_syscall_64+0x5c/0x90
[ 30.948989] ? exc_page_fault+0x72/0x180
[ 30.949034] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 30.949091] RIP: 0033:0x7fc7521e25bb
[ 30.950849] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00
00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 0
0 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14
25
Core of the problem is that ceph checks for return code from
security_dentry_init_security() and if return code is 0, it assumes
everything is fine and continues to call strlen(name), which crashes.
Typically SELinux LSM returns 0 and sets name to "security.selinux" and
it is not a problem. Or if selinux is not compiled in or disabled, it
returns -EOPNOTSUP and ceph deals with it.
But somehow in this configuration, 0 is being returned and "name" is
not being initialized and that's creating the problem.
Our suspicion is that BPF LSM is registering a hook for
dentry_init_security() and returns hook default of 0.
LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry,...)
I have not been able to reproduce it just by doing CONFIG_BPF_LSM=y.
Stephen has tested the patch though and confirms it solves the problem
for him.
dentry_init_security() is written in such a way that it expects only one
LSM to register the hook. Atleast that's the expectation with current code.
If another LSM returns a hook and returns default, it will simply return
0 as of now and that will break ceph.
Hence, suggestion is that change semantics of this hook a bit. If there
are no LSMs or no LSM is taking ownership and initializing security context,
then return -EOPNOTSUP. Also allow at max one LSM to initialize security
context. This hook can't deal with multiple LSMs trying to init security
context. This patch implements this new behavior.
Reported-by: Stephen Muth <smuth4@gmail.com>
Tested-by: Stephen Muth <smuth4@gmail.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: <stable@vger.kernel.org> # 5.16.0
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These make the buffer handling in pm_show_wakelocks() more robust and
drop an unused hibernation-related function.
Specifics:
- Make the buffer handling in pm_show_wakelocks() more robust by
using sysfs_emit_at() in it to generate output (Greg
Kroah-Hartman).
- Drop register_nosave_region_late() which is not used (Amadeusz
Sławiński)"
* tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Remove register_nosave_region_late()
PM: wakeup: simplify the output logic of pm_show_wakelocks()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
and one fanotify cleanup"
* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix fsnotify hooks in pseudo filesystems
fsnotify: invalidate dcache before IN_DELETE event
fanotify: remove variable set but not used
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
cleanup"
* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: cleanup double word in comment
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct svc_xprt holds a long lived reference to a netns,
it is worth tracking it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For applications running on a mix of platforms it's useful
to have a clear indication whether host's NIC supports the
geometry requirements of TCP zero-copy. TCP zero-copy Rx
requires data to be neatly placed into memory pages.
Most NICs can't do that.
This patch is adding GET support only, since the NICs
I work with either always have the feature enabled or
enable it whenever MTU is set to jumbo. In other words
I don't need SET. But adding set should be trivial.
(The only note on SET is that we will likely want
the setting to be "sticky" and use 0 / `unknown`
to reset it back to driver default.)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir points out that since we removed mii_lpa_to_linkmode_lpa_sgmii(),
mii_lpa_mod_linkmode_lpa_sgmii() is also no longer called.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-01-27
1) Dima, adds an internal mlx5 steering callback per steering provider
(FW vs SW steering), to advertise steering capabilities implemented by
each module, this helps upper modules in mlx5 to know what is
supported and what's not without the need to tell what is the underlying
steering mode.
2nd patch is the usecase where this interface is used to implement
Vlan Push/pop for uplink with SW steering, where in FW mode it's not
supported yet.
2) Roi Dayan improves code readability and maintainability
as preparation step for multi attribute instance per flow
in mlx5 TC module
Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.
This is a refactoring series in a preparation to support multiple
attribute instances per flow.
The commits prepare functions to get attr instance instead of using
flow->attr and also using attr->flags if the flag is more relevant
to be attr flag and not a flow flag considering there will be multiple
attr instances. i.e. CT and SAMPLE flags.
* tag 'mlx5-updates-2022-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: VLAN push on RX, pop on TX
net/mlx5: Introduce software defined steering capabilities
net/mlx5: Remove unused TIR modify bitmask enums
net/mlx5e: CT, Remove redundant flow args from tc ct calls
net/mlx5e: TC, Store mapped tunnel id on flow attr
net/mlx5e: Test CT and SAMPLE on flow attr
net/mlx5e: Refactor eswitch attr flags to just attr flags
net/mlx5e: CT, Don't set flow flag CT for ct clear flow
net/mlx5e: TC, Hold sample_attr on stack instead of pointer
net/mlx5e: TC, Reject rules with multiple CT actions
net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr
net/mlx5e: TC, Pass attr to tc_act can_offload()
net/mlx5e: TC, Split pedit offloads verify from alloc_tc_pedit_action()
net/mlx5e: TC, Move pedit_headers_action to parse_attr
net/mlx5e: Move counter creation call to alloc_flow_attr_counter()
net/mlx5e: Pass attr arg for attaching/detaching encaps
net/mlx5e: Move code chunk setting encap dests into its own function
====================
Link: https://lore.kernel.org/r/20220127204007.146300-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
struct mlx5_ifc_modify_tir_bitmask_bits is used for the bitmask
of MODIFY_TIR operations.
Remove the unused bitmask enums.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
BPF verifier supports direct memory access for BPF_PROG_TYPE_TRACING type
of bpf programs, e.g., a->b. If "a" is a pointer
pointing to kernel memory, bpf verifier will allow user to write
code in C like a->b and the verifier will translate it to a kernel
load properly. If "a" is a pointer to user memory, it is expected
that bpf developer should be bpf_probe_read_user() helper to
get the value a->b. Without utilizing BTF __user tagging information,
current verifier will assume that a->b is a kernel memory access
and this may generate incorrect result.
Now BTF contains __user information, it can check whether the
pointer points to a user memory or not. If it is, the verifier
can reject the program and force users to use bpf_probe_read_user()
helper explicitly.
In the future, we can easily extend btf_add_space for other
address space tagging, for example, rcu/percpu etc.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154606.654961-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.
Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code. In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.
To support encoding __user information in BTF, a new attribute
__attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
#define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF.
The following is an example with latest upstream clang (clang14) and
pahole 1.23:
[$ ~] cat test.c
#define __user __attribute__((btf_type_tag("user")))
int foo(int __user *arg) {
return *arg;
}
[$ ~] clang -O2 -g -c test.c
[$ ~] pahole -JV test.o
...
[1] INT int size=4 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG user type_id=1
[3] PTR (anon) type_id=2
[4] FUNC_PROTO (anon) return=1 args=(3 arg)
[5] FUNC foo type_id=4
[$ ~]
You can see for the function argument "int __user *arg", its type is
described as
PTR -> TYPE_TAG(user) -> INT
The kernel can use this information for bpf verification or other
use cases.
Current btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).
[1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
[2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
[3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
[4] https://reviews.llvm.org/D111199
[5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.
Current release - new code bugs:
- tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
- tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
- nf_tables: set last expression in register tracking area
- nft_connlimit: fix memleak if nf_ct_netns_get() fails
- mptcp: fix removing ids bitmap setting
- bonding: use rcu_dereference_rtnl when getting active slave
- fix three cases of sleep in atomic context in drivers: lan966x, gve
- handful of build fixes for esoteric drivers after netdev->dev_addr
was made const
Previous releases - regressions:
- revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
Linux compatibility with USGv6 tests
- procfs: show net device bound packet types
- ipv4: fix ip option filtering for locally generated fragments
- phy: broadcom: hook up soft_reset for BCM54616S
Previous releases - always broken:
- ipv4: raw: lock the socket in raw_bind()
- ipv4: decrease the use of shared IPID generator to decrease the
chance of attackers guessing the values
- procfs: fix cross-netns information leakage in /proc/net/ptype
- ethtool: fix link extended state for big endian
- bridge: vlan: fix single net device option dumping
- ping: fix the sk_bound_dev_if match in ping_lookup"
* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
net: bridge: vlan: fix memory leak in __allowed_ingress
net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
ipv4: remove sparse error in ip_neigh_gw4()
ipv4: avoid using shared IP generator for connected sockets
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: raw: lock the socket in raw_bind()
MAINTAINERS: add missing IPv4/IPv6 header paths
MAINTAINERS: add more files to eth PHY
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
net: bridge: vlan: fix single net device option dumping
net: stmmac: skip only stmmac_ptp_register when resume from suspend
net: stmmac: configure PTP clock source prior to PTP initialization
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
connector/cn_proc: Use task_is_in_init_pid_ns()
pid: Introduce helper task_is_in_init_pid_ns()
gve: Fix GFP flags when allocing pages
net: lan966x: Fix sleep in atomic context when updating MAC table
net: lan966x: Fix sleep in atomic context when injecting frames
ethernet: seeq/ether3: don't write directly to netdev->dev_addr
ethernet: 8390/etherh: don't write directly to netdev->dev_addr
...
|
|
Even though there is a static key protecting from overhead from
cgroup-bpf skb filtering when there is nothing attached, in many cases
it's not enough as registering a filter for one type will ruin the fast
path for all others. It's observed in production servers I've looked
at but also in laptops, where registration is done during init by
systemd or something else.
Add a per-socket fast path check guarding from such overhead. This
affects both receive and transmit paths of TCP, UDP and other
protocols. It showed ~1% tx/s improvement in small payload UDP
send benchmarks using a real NIC and in a server environment and the
number jumps to 2-3% for preemtible kernels.
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/d8c58857113185a764927a46f4b5a058d36d3ec3.1643292455.git.asml.silence@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Not used since v5.10.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Not used since added in v3.8.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The stubs under !CONFIG_IPV6 were missed when real functions
got deleted ca. v3.13.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No callers since v5.7, the initial use case seems pretty
esoteric so removing this should not harm the completeness
of the API.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No callers since v3.15.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The only caller of mii_lpa_to_linkmode_lpa_sgmii()
disappeared in v5.10.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_CGROUPS is disabled psi code generates the following warnings:
kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
| ^~~~~~~~~~~~~~~~~~
kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
1182 | void psi_trigger_destroy(struct psi_trigger *t)
| ^~~~~~~~~~~~~~~~~~~
kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
| ^~~~~~~~~~~~~~~~
Change declarations of these functions in the header to provide the
prototypes even when they are unused.
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
|
|
Hot-unplug all firmware-framebuffer devices as part of removing
them via remove_conflicting_framebuffers() et al. Releases all
memory regions to be acquired by native drivers.
Firmware, such as EFI, install a framebuffer while posting the
computer. After removing the firmware-framebuffer device from fbdev,
a native driver takes over the hardware and the firmware framebuffer
becomes invalid.
Firmware-framebuffer drivers, specifically simplefb, don't release
their device from Linux' device hierarchy. It still owns the firmware
framebuffer and blocks the native drivers from loading. This has been
observed in the vmwgfx driver. [1]
Initiating a device removal (i.e., hot unplug) as part of
remove_conflicting_framebuffers() removes the underlying device and
returns the memory range to the system.
[1] https://lore.kernel.org/dri-devel/20220117180359.18114-1-zack@kde.org/
v2:
* rename variable 'dev' to 'device' (Javier)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
CC: stable@vger.kernel.org # v5.11+
Link: https://patchwork.freedesktop.org/patch/msgid/20220125091222.21457-2-tzimmermann@suse.de
|
|
Remove two dead stubs, sk_msg_clear_meta() was never
used, use of xskq_cons_is_full() got replaced by
xsk_tx_writeable() in v5.10.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220126185412.2776254-1-kuba@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently the kernel uses open code in multiple places to check if a
task is in the root PID namespace with the kind of format:
if (task_active_pid_ns(current) == &init_pid_ns)
do_something();
This patch creates a new helper function, task_is_in_init_pid_ns(), it
returns true if a passed task is in the root PID namespace, otherwise
returns false. So it will be used to replace open codes.
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set more registers in force_power_down callback to avoid S3 wakeup from
hotplugging cards.
This is originally written by Ricky WU.
Link: https://lore.kernel.org/lkml/c4525b4738f94483b9b8f8571fc80646@realtek.com/
Cc: Ricky WU <ricky_wu@realtek.com>
Tested-by: Ricky WU <ricky_wu@realtek.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20220125055010.1866563-4-kai.heng.feng@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM")
uses "rtd3_work" and "idle_work" to manage it's own runtime PM state
machine.
When its child device, rtsx_pci_sdmmc, uses runtime PM refcount
correctly, all the additional works can be managed by generic runtime PM
helpers.
So consolidate "idle_work" and "rtd3_work" into generic runtime idle
callback and runtime suspend callback, respectively.
Fixes: 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM")
Cc: Ricky WU <ricky_wu@realtek.com>
Tested-by: Ricky WU <ricky_wu@realtek.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20220125055010.1866563-2-kai.heng.feng@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Trond Myklebust reported soft lockups in XFS IO completion such as
this:
watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
Workqueue: xfs-conv/md127 xfs_end_io [xfs]
RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
Call Trace:
wake_up_page_bit+0x8a/0x110
iomap_finish_ioend+0xd7/0x1c0
iomap_finish_ioends+0x7f/0xb0
xfs_end_ioend+0x6b/0x100 [xfs]
xfs_end_io+0xb9/0xe0 [xfs]
process_one_work+0x1a7/0x360
worker_thread+0x1fa/0x390
kthread+0x116/0x130
ret_from_fork+0x35/0x40
Ioends are processed as an atomic completion unit when all the
chained bios in the ioend have completed their IO. Logically
contiguous ioends can also be merged and completed as a single,
larger unit. Both of these things can be problematic as both the
bio chains per ioend and the size of the merged ioends processed as
a single completion are both unbound.
If we have a large sequential dirty region in the page cache,
write_cache_pages() will keep feeding us sequential pages and we
will keep mapping them into ioends and bios until we get a dirty
page at a non-sequential file offset. These large sequential runs
can will result in bio and ioend chaining to optimise the io
patterns. The pages iunder writeback are pinned within these chains
until the submission chaining is broken, allowing the entire chain
to be completed. This can result in huge chains being processed
in IO completion context.
We get deep bio chaining if we have large contiguous physical
extents. We will keep adding pages to the current bio until it is
full, then we'll chain a new bio to keep adding pages for writeback.
Hence we can build bio chains that map millions of pages and tens of
gigabytes of RAM if the page cache contains big enough contiguous
dirty file regions. This long bio chain pins those pages until the
final bio in the chain completes and the ioend can iterate all the
chained bios and complete them.
OTOH, if we have a physically fragmented file, we end up submitting
one ioend per physical fragment that each have a small bio or bio
chain attached to them. We do not chain these at IO submission time,
but instead we chain them at completion time based on file
offset via iomap_ioend_try_merge(). Hence we can end up with unbound
ioend chains being built via completion merging.
XFS can then do COW remapping or unwritten extent conversion on that
merged chain, which involves walking an extent fragment at a time
and running a transaction to modify the physical extent information.
IOWs, we merge all the discontiguous ioends together into a
contiguous file range, only to then process them individually as
discontiguous extents.
This extent manipulation is computationally expensive and can run in
a tight loop, so merging logically contiguous but physically
discontigous ioends gains us nothing except for hiding the fact the
fact we broke the ioends up into individual physical extents at
submission and then need to loop over those individual physical
extents at completion.
Hence we need to have mechanisms to limit ioend sizes and
to break up completion processing of large merged ioend chains:
1. bio chains per ioend need to be bound in length. Pure overwrites
go straight to iomap_finish_ioend() in softirq context with the
exact bio chain attached to the ioend by submission. Hence the only
way to prevent long holdoffs here is to bound ioend submission
sizes because we can't reschedule in softirq context.
2. iomap_finish_ioends() has to handle unbound merged ioend chains
correctly. This relies on any one call to iomap_finish_ioend() being
bound in runtime so that cond_resched() can be issued regularly as
the long ioend chain is processed. i.e. this relies on mechanism #1
to limit individual ioend sizes to work correctly.
3. filesystems have to loop over the merged ioends to process
physical extent manipulations. This means they can loop internally,
and so we break merging at physical extent boundaries so the
filesystem can easily insert reschedule points between individual
extent manipulations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-and-tested-by: Trond Myklebust <trondmy@hammerspace.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Define a kfifo inside struct tty_port. We use DECLARE_KFIFO_PTR and let
the preexisting tty_port::xmit_buf be also the buffer for the kfifo.
And handle the initialization/decomissioning along with xmit_buf, i.e.
in tty_port_alloc_xmit_buf() and tty_port_free_xmit_buf(), respectively.
This allows for kfifo use in drivers which opt-in, while others still
may use the old xmit_buf. mxser will be the first user in the next
few patches.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20220124071430.14907-4-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
stmmac explicitly calls the xpcs driver to validate the ethtool
linkmodes. This is no longer necessary as phylink now supports
validation through a PCS method. Convert both drivers to use this
new mechanism.
Tested-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> # Intel EHL
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a function to the xpcs driver to retrieve the supported PHY
interface modes, which can be used by drivers to fill in phylink's
supported_interfaces mask.
We validate the interface bit index to ensure that it fits within the
bitmap as xpcs lists PHY_INTERFACE_MODE_MAX in an entry.
Tested-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> # Intel EHL Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Former are always inlined so do that for the latter too, for
consistency.
Size impact is a whopping 5 bytes increase! :-)
text data bss dec hex filename
22350551 8213184 1917164 32480899 1ef9e83 vmlinux.x86-64.defconfig.before
22350556 8213152 1917164 32480872 1ef9e68 vmlinux.x86-64.defconfig.after
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20220113155357.4706-3-bp@alien8.de
|