Age | Commit message (Collapse) | Author |
|
If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
and 'regmap/topic/seq' into regmap-next
|
|
'regmap/topic/irq-type' into regmap-next
|
|
|
|
|
|
Rename method spi_test_fill_tx to spi_test_fill_pattern
to better describe what it does.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Currently the rx_buf does not get set with the
SPI_TEST_PATTERN_UNWRITTEN when tx_buf == NULL in the transfer.
Reorder code so that it gets done also under this specific condition.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Unlike the registers file we don't have any substantial performance
concerns rendering the entire file (it involves no device accesses) so
just use seq_printf() to simplify the code.
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Some of devices supports the trigger level for interrupt
like rising/falling edge specially for GPIOs. The interrupt
support of such devices may have uses the generic regmap irq
framework for implementation.
Add support to configure the trigger type device interrupt
register via regmap-irq framework. The regmap-irq framework
configures the trigger register only if the details of trigger
type registers are provided.
[Fixed use of terery operator for legibility -- broonie]
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Xin Long says:
====================
sctp: use transport hashtable to replace association's with rhashtable
for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.
when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:
in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
and use one of them to send data to server]
===== test on net-next
-- perf top
server:
55.73% [kernel] [k] sctp_assoc_is_match
6.80% [kernel] [k] sctp_assoc_lookup_paddr
4.81% [kernel] [k] sctp_v4_cmp_addr
3.12% [kernel] [k] _raw_spin_unlock_irqrestore
1.94% [kernel] [k] sctp_cmp_addr_exact
client:
46.01% [kernel] [k] sctp_endpoint_lookup_assoc
5.55% libc-2.17.so [.] __libc_calloc
5.39% libc-2.17.so [.] _int_free
3.92% libc-2.17.so [.] _int_malloc
3.23% [kernel] [k] __memset
-- spent time
time is 487s, send pkt is 10000000
we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.
besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.
===== test with this patchset:
-- perf top
server:
15.98% [kernel] [k] _raw_spin_unlock_irqrestore
9.92% [kernel] [k] __pv_queued_spin_lock_slowpath
7.22% [kernel] [k] copy_user_generic_string
2.38% libpthread-2.17.so [.] __recvmsg_nocancel
1.88% [kernel] [k] sctp_recvmsg
client:
11.90% [kernel] [k] sctp_hash_cmp
8.52% [kernel] [k] rht_deferred_worker
4.94% [kernel] [k] __pv_queued_spin_lock_slowpath
3.95% [kernel] [k] sctp_bind_addr_match
2.49% [kernel] [k] __memset
-- spent time
time is 22s, send pkt is 10000000
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.
lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.
this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport
hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport
init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Removed the unused quirks. These quirks don't used anywhere.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Use to_pci_dev() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Use to_platform_device() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The misc control register is 32bit wide, the used readw/writew
accessors only mainipulate the low 16bit of this register. It
currently doesn't matter as all the bit changed are located in
the lower half, but together with the u32 variable used to hold
the contents of the register it is seriously confusing.
Switch to 32bit accessors to avoid any future breakage.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Keep the quirk bits, as Tegra30 and Tegra114 host have different levels
of support for UHS-I modes and so need different spare bits to be set,
but change the logic to be positive.
Tegra210 needs a different tuning sequence than Tegra30+. Disable
UHS modes until support for this is properly added.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The fsl-espi hardware can trasfer at most 64K data so report teh
limitation.
Based on patch by Heiner Kallweit <hkallweit1@gmail.com>
CC: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
On some SPI controllers it is not feasible to transfer arbitrary amount
of data at once.
When the limit on transfer size is a few kilobytes at least it makes
sense to use the SPI hardware rather than reverting to gpio driver.
The protocol drivers need a way to check that they do not sent overly
long messages, though.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
This patch implements the mgmt Start Limited Discovery command. Most
of existing Start Discovery code is reused since the only difference
is the presence of a 'limited' flag as part of the discovery state.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
To make the EIR parsing helper more general purpose, make it return
the found data and its length rather than just saying whether the data
was present or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Initialising the suppport for EFI runtime services requires us to
allocate a pgd off the back of an early_initcall. On systems where the
PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
pgd_cache isn't initialised at this stage, and we panic with a NULL
dereference during boot:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
__create_mapping.isra.5+0x84/0x350
create_pgd_mapping+0x20/0x28
efi_create_mapping+0x5c/0x6c
arm_enable_runtime_services+0x154/0x1e4
do_one_initcall+0x8c/0x190
kernel_init_freeable+0x84/0x1ec
kernel_init+0x10/0xe0
ret_from_fork+0x10/0x50
This patch fixes the problem by initialising the pgd_cache earlier, in
the pgtable_cache_init callback, which sounds suspiciously like what it
was intended for.
Reported-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Signed-off-by: Yuan Sun <sunyuan3@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Property names do not match real names needed by driver itself.
This patch fix this problem.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Fix bogus indentation of the PSCI compatible values, reformat.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Currently, drivers/bus/uniphier-system-bus.c is kept from being a
module due to the unresolved reference to of_default_bus_match_table.
Refer to commit 326ea45aa827 ("bus: uniphier: allow only built-in
driver").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Add a single resource to the test bus device to exercise the platform
bus code a little more. This isn't strictly a devicetree test, but it is
a corner case that the devicetree runs into. Until we've got platform
device unittests, it can live here. It doesn't need to be an explicit
text because the kernel will oops when it is wrong.
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
[wsa: added the comment provided by Grant, rebased, and tested]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
This allows the build system to know that it can't attempt to
configure the Lustre virtual block device, for example, when tilepro
is using 64KB pages (as it does by default). The tilegx build
already provided those symbols.
Previously we required that the tilepro hypervisor be rebuilt with
a different hardcoded page size in its headers, and then Linux be
rebuilt using the updated hypervisor header. Now we allow each of
the hypervisor and Linux to be built independently. We still check
at boot time to ensure that the page size provided by the hypervisor
matches what Linux expects.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: stable@vger.kernel.org [3.19+]
|
|
The latest vendor SDK contained this patch.
Signed-off-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This is an attempt to make documentation more user friendly.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Reviewed-by: Chen, Yu C <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
source is decleared as a 4 byte char array in struct acpi_pci_routing_table
so !prt->source is a redundant null string pointer check. Detected with
smatch:
drivers/acpi/pci_irq.c:134 do_prt_fixups() warn: this array is probably
non-NULL. 'prt->source'
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_video_handles_brightness_key_presses() may use an uninitialized mutex.
The error has been reported by lockdep: DEBUG_LOCKS_WARN_ON(l->magic != l).
The function assumes that the video driver has been registered before being
called. As explained in the comment of acpi_video_init(), the registration
of the video class may be defered and thus may not take place in the init
function of the module.
Use completion mechanisms to make sure that
acpi_video_handles_brightness_key_presses() wait for the completion of
acpi_video_register() before using the mutex.
Also get rid of register_count since task completion can replace it.
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This provide the fix for firmware memory by freeing the pointer in driver
remove where it is safe to do so
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
This reverts commit 87b5ed8ecb9fe05a696e1c0b53c7a49ea66432c1 ("ASoC: Intel:
Skylake: fix memory leak") as it causes regression on Skylake devices
The SKL drivers can be deferred probe. The topology file based widgets can
have references to topology file so this can't be freed until card is fully
created, so revert this patch for now
[ 66.682767] BUG: unable to handle kernel paging request at ffffc900001363fc
[ 66.690735] IP: [<ffffffff806c94dd>] strnlen+0xd/0x40
[ 66.696509] PGD 16e035067 PUD 16e036067 PMD 16e038067 PTE 0
[ 66.702925] Oops: 0000 [#1] PREEMPT SMP
[ 66.768390] CPU: 3 PID: 57 Comm: kworker/u16:3 Tainted: G O 4.4.0-rc7-skl #62
[ 66.778869] Hardware name: Intel Corporation Skylake Client platform
[ 66.793201] Workqueue: deferwq deferred_probe_work_func
[ 66.799173] task: ffff88008b700f40 ti: ffff88008b704000 task.ti: ffff88008b704000
[ 66.807692] RIP: 0010:[<ffffffff806c94dd>] [<ffffffff806c94dd>] strnlen+0xd/0x40
[ 66.816243] RSP: 0018:ffff88008b707878 EFLAGS: 00010286
[ 66.822293] RAX: ffffffff80e60a82 RBX: 000000000000000e RCX: fffffffffffffffe
[ 66.830406] RDX: ffffc900001363fc RSI: ffffffffffffffff RDI: ffffc900001363fc
[ 66.838520] RBP: ffff88008b707878 R08: 000000000000ffff R09: 000000000000ffff
[ 66.846649] R10: 0000000000000001 R11: ffffffffa01c6368 R12: ffffc900001363fc
[ 66.854765] R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
[ 66.862910] FS: 0000000000000000(0000) GS:ffff88016ecc0000(0000) knlGS:0000000000000000
[ 66.872150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.878696] CR2: ffffc900001363fc CR3: 0000000002c09000 CR4: 00000000003406e0
[ 66.886820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 66.894938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 66.903052] Stack:
[ 66.905346] ffff88008b7078b0 ffffffff806cb1db 000000000000000e 0000000000000000
[ 66.913854] ffff88008b707928 ffffffffa00d1050 ffffffffa00d104e ffff88008b707918
[ 66.922353] ffffffff806ccbd6 ffff88008b707948 0000000000000046 ffff88008b707940
[ 66.930855] Call Trace:
[ 66.933646] [<ffffffff806cb1db>] string.isra.4+0x3b/0xd0
[ 66.939793] [<ffffffff806ccbd6>] vsnprintf+0x116/0x540
[ 66.945742] [<ffffffff806d02f0>] kvasprintf+0x40/0x80
[ 66.951591] [<ffffffff806d0370>] kasprintf+0x40/0x50
[ 66.957359] [<ffffffffa00c085f>] dapm_create_or_share_kcontrol+0x1cf/0x300 [snd_soc_core]
[ 66.966771] [<ffffffff8057dd1e>] ? __kmalloc+0x16e/0x2a0
[ 66.972931] [<ffffffffa00c0dab>] snd_soc_dapm_new_widgets+0x41b/0x4b0 [snd_soc_core]
[ 66.981857] [<ffffffffa00be8c0>] ? snd_soc_dapm_add_routes+0xb0/0xd0 [snd_soc_core]
[ 67.007828] [<ffffffffa00b92ed>] soc_probe_component+0x23d/0x360 [snd_soc_core]
[ 67.016244] [<ffffffff80b14e69>] ? mutex_unlock+0x9/0x10
[ 67.022405] [<ffffffffa00ba02f>] snd_soc_instantiate_card+0x47f/0xd10 [snd_soc_core]
[ 67.031329] [<ffffffff8049eeb2>] ? debug_mutex_init+0x32/0x40
[ 67.037973] [<ffffffffa00baa92>] snd_soc_register_card+0x1d2/0x2b0 [snd_soc_core]
[ 67.046619] [<ffffffffa00c8b54>] devm_snd_soc_register_card+0x44/0x80 [snd_soc_core]
[ 67.055539] [<ffffffffa01c303b>] skylake_audio_probe+0x1b/0x20 [snd_soc_skl_rt286]
[ 67.064292] [<ffffffff808aa887>] platform_drv_probe+0x37/0x90
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use to_platform_device() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use to_platform_device() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Compilers may engage the improbability drive when encountering shifts
by a distance that is a multiple of the size of the operand type. Since
the required bounds check is very simple here, we can get rid of all the
fuzzy masking, shifting and comparing, and use the documented bounds
directly.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The test whether a movz instruction with a signed immediate should be
turned into a movn instruction (i.e., when the immediate is negative)
is flawed, since the value of imm is always positive. Also, the
subsequent bounds check is incorrect since the limit update never
executes, due to the fact that the imm_type comparison will always be
false for negative signed immediates.
Let's fix this by performing the sign test on sval directly, and
replacing the bounds check with a simple comparison against U16_MAX.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up use of sval, renamed MOVK enum value to MOVKZ]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
|
|
Using mremap() to shrink the map size of a VM_PFNMAP range causes
the following error message, and leaves the pfn range allocated.
x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff]
This is because rbt_memtype_erase(), called from free_memtype()
with spin_lock held, only supports to free a whole memtype node in
memtype_rbroot. Therefore, this patch changes rbt_memtype_erase()
to support a request that shrinks the size of a memtype node for
mremap().
memtype_rb_exact_match() is renamed to memtype_rb_match(), and
is enhanced to support EXACT_MATCH and END_MATCH in @match_type.
Since the memtype_rbroot tree allows overlapping ranges,
rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free
a whole node for the munmap case. If no such entry is found,
it then checks with END_MATCH, i.e. shrink the size of a node
from the end for the mremap case.
On the mremap case, rbt_memtype_erase() proceeds in two steps,
1) remove the node, and then 2) insert the updated node. This
allows proper update of augmented values, subtree_max_end, in
the tree.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: stsp@list.ru
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
WARN_ON_ONCE() message in untrack_pfn().
WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
Call Trace:
[<ffffffff817729ea>] dump_stack+0x45/0x57
[<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
[<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
[<ffffffff811d3725>] unmap_vmas+0x55/0xb0
[<ffffffff811d916c>] unmap_region+0xac/0x120
[<ffffffff811db86a>] do_munmap+0x28a/0x460
[<ffffffff811dec33>] move_vma+0x1b3/0x2e0
[<ffffffff811df113>] SyS_mremap+0x3b3/0x510
[<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71
MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is
called with the old vma after its pfnmap page table has been removed,
which causes follow_phys() to fail. The new vma has a new pfnmap to
the same pfn & cache type with VM_PAT set. Therefore, we only need to
clear VM_PAT from the old vma in this case.
Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
move_vma() is changed to call this function with the old vma when
VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn()
is a no-op since VM_PAT is cleared.
Reported-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets,
http://lists.openwall.net/netdev/2015/11/06/24
The situation was analyzed as
(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
does sb_start_write() on /mnt
C: try to freeze /mnt
wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
lock our socket, see it not bound yet
decide that it needs to create something in /mnt
try to do sb_start_write() on /mnt, block (it's
waiting for C).
D: splice() from the same pipe to our socket
lock the pipe, see that socket is connected
try to lock the socket, block waiting for A
B: get around to actually feeding a chunk from
pipe to file, try to lock the pipe. Deadlock.
on 2015/11/10 by Al Viro,
http://lists.openwall.net/netdev/2015/11/10/4
The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).
Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commands run in a vrf context are not failing as expected on a route lookup:
root@kenny:~# ip ro ls table vrf-red
unreachable default
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
ping: Warning: source address might be selected on device other than vrf-red.
PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.
--- 10.100.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
connect: No route to host
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Craig Gallek says:
====================
Faster SO_REUSEPORT
This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.
This series only includes the UDP path. I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.
Changes in v4:
- pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)
Changes in v3:
- skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
- reuseport_attach* -> sk_reuseport_attach* and simple return statement
syntax change (per Daniel Borkmann)
Changes in v2:
- Fix ARM build; remove unnecessary include.
- Handle case where protocol header is not in linear section (per
Alexei Starovoitov).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.
It also validates the re-programing flow and several edge cases.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|