Age | Commit message (Collapse) | Author |
|
Pull networking fixes from David Miller:
1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig
Gallak.
2) Bug fixes for the new macsec facility (missing kmalloc NULL checks,
missing locking around netdev list traversal, etc.) from Sabrina
Dubroca.
3) Fix handling of host routes on ifdown in ipv6, from David Ahern.
4) Fix double-fdput in bpf verifier. From Jann Horn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net: ipv6: Delete host routes on an ifdown
Revert "ipv6: Revert optional address flusing on ifdown."
net/mlx4_en: fix spurious timestamping callbacks
net: dummy: remove note about being Y by default
cxgbi: fix uninitialized flowi6
ipv6: Revert optional address flusing on ifdown.
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5: Add pci shutdown callback
net/mlx5_core: Remove static from local variable
net/mlx5e: Use vport MTU rather than physical port MTU
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
net/mlx5_core: Add ConnectX-5 to list of supported devices
net/mlx5e: Fix MLX5E_100BASE_T define
net/mlx5_core: Fix soft lockup in steering error flow
qlcnic: Update version to 5.3.64
net: stmmac: socfpga: Remove re-registration of reset controller
macsec: fix netlink attribute validation
macsec: add missing macsec prefix in uapi
...
|
|
This is more prep-work for the upcoming pty changes. Still just code
cleanup with no actual semantic changes.
This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode. That allows us to remove all the "look up the dentry" code
for when we want to remove it again.
Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code. Not only is the end result smaller,
it's simpler and easier to understand.
The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.
That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".
The new code just does
dentry->d_fsdata = NULL;
in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously. Don't do something complex and subtle when the
obvious straightforward approach will do.
The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-testing
Jonathan writes:
2nd set of new device support, features and cleanup for IIO in the 4.7 cycle.
Bit of a bumper set for new drivers but plenty of other stuff here as well!
New device support
* ad5592R ADC/DAC
- new driver supporting ad5592r and ad5593r combined ADC/DAC and gpio chips.
* Aosong am2315 relative humidity
- new driver with triggered buffer support in follow up patch.
* bmi160 imu
- new driver
* bmp280
- bmp180 support - note there is support in the misc/bmp085 driver. Intent
is to remove that driver long term.
* invensense mpu6050
- cleanup leading to explicit support of mpu9150 with a good few cleanups
along the way.
* Hope RF hp03 pressure and temperature sensor.
- new driver
* maxim DS1803 potentiometer
- new driver
* maxim max44000 light and proximity sensor
- new driver built in a series of steps to support pretty much everything.
* ROHM BH1780 light sensor
- new driver. There is an existing driver in misc that this is pretty much
intended to replace. The discussion on whether to support the non standard
interface of that driver is some way is continuing.
* st-gyro
- lsm9ds0-gyro. The accel/magn side of this will take a while longer as
extensions to the st library are needed for cases where two types of sensor
share a single i2c address.
* ti-adc081c
- support the adc101c and adc121c
* Vishay VEML6070 UV sensor
- new driver.
New features
* core
- devm_ APIs for channel_get and channel_get_all. The first user of these
is the generic ADC based thermal driver. As it is going through the
thermal tree these will be picked up as a patch to that next cycle as that
is how the author preferred to do it.
- mounting matrix support. This new core support allows devices to provide
to userspace (typically from the device tree) allowing compensation for how
the sensor is mounted on the device. First examples are on UAVs but it
has a more mundane use on typical phone where the chip may be on the front
or the back of the circuit board and soldered at any angle. Includes
support for this ABI in ak8975 (which has an older interface, now
deprecated) and mpu6050.
* tools
- add a -a option to enable all available channels in generic_buffer sample.
Makes it somewhat easier to use.
* adis library and drivers
- support manual self test flag clearing. This has technically been broken
for a very long time - result is an offset on readings as the applied field
is on all the time.
* ak8975
- triggered buffer support
* bmc150
- spi support (including splitting the driver into core and i2c parts)
* bmp280
- oversampling support.
* dht11
- improved logging - useful to debug timing issues on this quirky device.
* st-sensors
- read each channel invidivually as not all support the optimization of
reading in bulk. This is technically a fix, but will need to be backported
if desired.
- support open drain and shared interrupts.
* ti-adc081c
- triggered buffer support.
Cleanups
* inkern
- white space fix.
* ad7606
- use the iio_device_claim_direct_mode call rather than open coding equiv.
* ad799x
- white space fix.
* ad9523
- unsigned -> unsigned int
* apds9660
- brace location tidying up.
- silence an uninitialized variable warning.
* ak8975
- else and brace on same line fix.
* at91_adc
- white space fixes.
* bmc150
- use regmap stored copy of the device pointer rather than having an
additional copy.
* bmg160
- use regmap stored copy of the device pointer rather than having an
additional copy.
* hid-sensors
- white space fixes.
* mcp3422
- white space fix.
* mma7455
- use regmap to retrieve the device struct rather than carrying another copy
in the private data.
* ms_sensors
- white space fix.
* mxs-lradc
- move current bindings out of staging - some will be shortly deprecated but
the reality is that we have device trees out there using them so they will
need to be supported for some time. They accidentally got left behind
when the driver graduated from staging.
- white space cleanup.
- set INPUT_PROP_DIRECT.
- move ts config into a better function.
- move the STMP reset out of the ADC init.
* vf610_adc
- case label indenting fix.
|
|
SS+ also expresses intervals in units of 125ms. Testing must
be for SS or faster, not SS exactly.
Signed-off-by: Oliver neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
LTM is also defined for SS+. The correct test is to check for anything
slower than SS not exactly SS.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The tty field was missing from AUDIT_LOGIN events.
Refactor code to create a new function audit_get_tty(), using it to
replace the call in audit_log_task_info() and to add it to
audit_log_set_loginuid(). Lock and bump the kref to protect it, adding
audit_put_tty() alias to decrement it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Now as rx-vlan offload can be disabled, packets can be received
with vlan tag not stripped, which means is_first_ethertype_ip will
return false, for that we need to check if the hardware reported
csum OK so we will report CHECKSUM_UNNECESSARY for those packets.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use ethtool -K <interface> rxvlan <on/off> to enable/disable
C-TAG vlan stripping by hardware.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add query MCIA, PMLP registers infrastructure and commands.
Add ethtool support for get_module_info() and get_module_eeprom()
callbacks.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the needed hardware command and mlx5_ifc structs for managing LED
control.
Add set_phys_id ethtool callback to support ethtool -p flag.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce new access register named Ports Check Mask Register (PCMR) to
control all HW checks on port. With this register, the driver can
enable/disable Hardware FCS validation.
When RXALL is enabled/disabled using ndo_set_features, enable/disable
fcs check at HW.
User can change HW configuration using rx-all flag at ethtool.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose link_down_events counter through ethtool -S.
This counter is read from PPort statistics, then proccessed and stored as
a special handling software counter.
This counter is stored along software counters since it is the only PPort
counter that it's size is not 64 bits.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Redesign ethtool statistics handling and reporting in the driver:
1. Move counters to a separate file (en_stats.h).
2. Remove unnecessary dependencies between stats and strings.
3. Use counter descriptors which hold a name and offset for each counter,
and will be used to decide which counters will be exposed.
For example when adding a new software counter to ethtool, instead of:
1. Add to stats struct.
2. Add to strings struct in the same order.
3. Change macro defining number of software counters.
The only thing needed is to link the new counter to a counter descriptor.
VPort counters are a set of hardware traffic counters created automatically
for each virtual port opened.
PPort counters are a set of counters describing per physical port
performance statistics.
These counters are gathered from hardware register and divided to groups
according to different protocols.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-04-26
Here's another set of Bluetooth & 802.15.4 patches for the 4.7 kernel:
- Cleanups & refactoring of ieee802154 & 6lowpan code
- Security related additions to ieee802154 and mrf24j40 driver
- Memory corruption fix to Bluetooth 6lowpan code
- Race condition fix in vhci driver
- Enhancements to the atusb 802.15.4 driver
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61.
Ok, this puts the feature back. I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
Merge "Second Round of Renesas ARM Based SoC R-Car SYSC Updates for v4.7" from Simon Horman:
Introduce a DT-based driver for the R-Car System Controller, as found on
Renesas R-Car H1, R-Car Gen2, and R-Car Gen3 SoCs.
* tag 'renesas-rcar-sysc2-for-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (30 commits)
soc: renesas: rcar-sysc: Add support for R-Car H3 power areas
soc: renesas: rcar-sysc: Add support for R-Car E2 power areas
soc: renesas: rcar-sysc: Add support for R-Car M2-N power areas
soc: renesas: rcar-sysc: Add support for R-Car M2-W power areas
soc: renesas: rcar-sysc: Add support for R-Car H2 power areas
soc: renesas: rcar-sysc: Add support for R-Car H1 power areas
soc: renesas: rcar-sysc: Enable Clock Domain for I/O devices
soc: renesas: rcar-sysc: Make rcar_sysc_power_is_off() static
soc: renesas: rcar-sysc: Add DT support for SYSC PM domains
soc: renesas: rcar-sysc: Improve rcar_sysc_power() debug info
soc: renesas: Move pm-rcar to drivers/soc/renesas/rcar-sysc
clk: renesas: cpg-mssr: Export cpg_mssr_{at,de}tach_dev()
clk: renesas: mstp: Provide dummy attach/detach_dev callbacks
clk: renesas: Provide Kconfig symbols for CPG/MSSR and CPG/MSTP support
soc: renesas: Add r8a7795 SYSC PM Domain Binding Definitions
soc: renesas: Add r8a7794 SYSC PM Domain Binding Definitions
soc: renesas: Add r8a7793 SYSC PM Domain Binding Definitions
soc: renesas: Add r8a7791 SYSC PM Domain Binding Definitions
soc: renesas: Add r8a7790 SYSC PM Domain Binding Definitions
soc: renesas: Add r8a7779 SYSC PM Domain Binding Definitions
...
|
|
The omap_mbox_save_ctx() and omap_mbox_restore_ctx() API were
previously provided to OMAP mailbox clients to save and restore
the mailbox context during system suspend/resume. The save and
restore functionality is now implemented through System PM driver
callbacks, and there is no need for these functions, so kill these
API.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
OMAP mailbox devices can no longer be created in legacy non-DT
mode, all the relevant code has been cleaned up. The OMAP mailbox
driver will only support devices created from DT going forward,
so drop the legacy platform device support from the driver.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
bio might have NOMERGE flag set, for example blk_queue_split sets it.
When we initiate request, copy this flag too.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/drivers
Merge "Qualcomm ARM Based SoC Updates for v4.7 part 2" from Andy Gross:
* Change SMD callback parameters
* Use writecombine mapping for SMEM
* tag 'qcom-soc-for-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
soc: qcom: smd: Make callback pass channel reference
soc: qcom: smem: Use write-combine remap for SMEM
|
|
psci_power_state_loses_context() and psci_power_state_is_valid are only
used internally now, so make them static.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.
The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().
To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's some inconsistency in current logic determining whether the
link settings of a given interface can be changed; I.e., in all modes
other than the so-called `deault' mode the interfaces are forbidden from
changing the configuration - but even this rule is not applied to all
user APIs that may change the configuration.
Instead, let the core-module [qed] decide whether an interface can change
the configuration by supporting a new API function. We also revise the
current rule, allowing all interfaces to change their configurations while
laying the infrastructure for future modes where an interface would be
blocked from making such a configuration.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a difference in statsitics' names starting at qed and
propagating to qede, where egress counters indicate ranges while ingress
counters indiciate high-end.
Align all statistcs to follow the same conventions - name indicates range.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
cgroup_subsys->post_attach callback
Since e93ad19d0564 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write(). This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.
memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function. This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.
The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex. This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished. cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem. We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> # 4.4+ prerequisite for the next patch
|
|
'pci/thunderbolt' and 'pci/virtualization' into next
* pci/enumeration:
x86/PCI: Refine PCI support check in pcibios_init()
* pci/hotplug:
PCI: acpiphp_ibm: Avoid uninitialized variable reference
* pci/misc:
PCI: Fix spelling errors
* pci/ntb:
PCI: Add DMA alias quirk for mic_x200_dma
PCI: Add support for multiple DMA aliases
PCI: Move informational printk to pci_add_dma_alias()
PCI: Add pci_add_dma_alias() to abstract implementation
* pci/thunderbolt:
thunderbolt: Support 1st gen Light Ridge controller
thunderbolt: Fix typos and magic number
PCI: Add Intel Thunderbolt device IDs
* pci/virtualization:
PCI: Work around Intel Sunrise Point PCH incorrect ACS capability
PCI: Reverse standard ACS vs device-specific ACS enabling
PCI: Mark Intel i40e NIC INTx masking as broken
|
|
This reverts the following three commits:
70af921db6f8835f4b11c65731116560adb00c14
799977d9aafbf0ca0b9c39b04cbfb16db71302c9
f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac
The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Starting the kernel client with cephx disabled and then enabling cephx
and restarting userspace daemons can result in a crash:
[262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
[262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130
[262671.584334] PGD 0
[262671.635847] Oops: 0000 [#1] SMP
[262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
[262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
[262672.268937] Workqueue: ceph-msgr con_work [libceph]
[262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
[262672.428330] RIP: 0010:[<ffffffff811cd04a>] [<ffffffff811cd04a>] kfree+0x5a/0x130
[262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286
[262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
[262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
[262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
[262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
[262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
[262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
[262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
[262673.417556] Stack:
[262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
[262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
[262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
[262673.805230] Call Trace:
[262673.859116] [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
[262673.968705] [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
[262674.078852] [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph]
[262674.134249] [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph]
[262674.189124] [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph]
[262674.243749] [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph]
[262674.297485] [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
[262674.350813] [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph]
[262674.403312] [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph]
[262674.454712] [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690
[262674.505096] [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph]
[262674.555104] [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0
[262674.604072] [<ffffffff810901ea>] worker_thread+0x11a/0x470
[262674.652187] [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310
[262674.699022] [<ffffffff810957a2>] kthread+0xd2/0xf0
[262674.744494] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
[262674.789543] [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70
[262674.834094] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
What happens is the following:
(1) new MON session is established
(2) old "none" ac is destroyed
(3) new "cephx" ac is constructed
...
(4) old OSD session (w/ "none" authorizer) is put
ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)
osd->o_auth.authorizer in the "none" case is just a bare pointer into
ac, which contains a single static copy for all services. By the time
we get to (4), "none" ac, freed in (2), is long gone. On top of that,
a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
so we end up trying to destroy a "none" authorizer with a "cephx"
destructor operating on invalid memory!
To fix this, decouple authorizer destruction from ac and do away with
a single static "none" authorizer by making a copy for each OSD or MDS
session. Authorizers themselves are independent of ac and so there is
no reason for destroy_authorizer() to be an ac op. Make it an op on
the authorizer itself by turning ceph_authorizer into a real struct.
Fixes: http://tracker.ceph.com/issues/15447
Reported-by: Alan Zhang <alan.zhang@linux.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
dev_pm_opp_set_sharing_cpus() doesn't do any DT specific stuff and its
declarations are added within the CONFIG_OF ifdef by mistake. Take them
out of that.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The complete common architectural and micro-architectural
event number structure is filtered based on PMCEIDn_EL0 and
exposed to /sys using is_visibile function pointer in events
attribute_group.
To filter the events in is_visible function, pmceid based bitmap
is stored in arm_pmu structure and the id field from
perf_pmu_events_attr is used to check against the bitmap.
The function which derives event bitmap from PMCEIDn_EL0 is
executed in the cpus, which has the pmu being initialized,
for heterogeneous pmu support.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.
In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set and report vport MTU rather than physical MTU,
Driver will set both vport and physical port mtu and will
rely on the query of vport mtu.
SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.
Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, mostly from Florian Westphal to sort out the lack of sufficient
validation in x_tables and connlabel preparation patches to add
nf_tables support. They are:
1) Ensure we don't go over the ruleset blob boundaries in
mark_source_chains().
2) Validate that target jumps land on an existing xt_entry. This extra
sanitization comes with a performance penalty when loading the ruleset.
3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.
4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.
5) Make sure the minimal possible target size in x_tables.
6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.
7) Check that standard target size is valid.
8) More sanitization to ensure that the target_offset field is correct.
9) Add xt_check_entry_match() to validate that matches are well-formed.
10-12) Three patch to reduce the number of parameters in
translate_compat_table() for {arp,ip,ip6}tables by using a container
structure.
13) No need to return value from xt_compat_match_from_user(), so make
it void.
14) Consolidate translate_table() so it can be used by compat code too.
15) Remove obsolete check for compat code, so we keep consistent with
what was already removed in the native layout code (back in 2007).
16) Get rid of target jump validation from mark_source_chains(),
obsoleted by #2.
17) Introduce xt_copy_counters_from_user() to consolidate counter
copying, and use it from {arp,ip,ip6}tables.
18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
functions.
19) Move nf_connlabel_match() to xt_connlabel.
20) Skip event notification if connlabel did not change.
21) Update of nf_connlabels_get() to make the upcoming nft connlabel
support easier.
23) Remove spinlock to read protocol state field in conntrack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"Specifics in this pull request:
- Fixes in mediatek and OF thermal drivers
- Fixes in power_allocator governor
- More fixes of unsigned to int type change in thermal_core.c.
These change have been CI tested using KernelCI bot. \o/"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: fix Mediatek thermal controller build
thermal: consistently use int for trip temp
thermal: fix mtk_thermal build dependency
thermal: minor mtk_thermal.c cleanups
thermal: power_allocator: req_range multiplication should be a 64 bit type
thermal: of: add __init attribute
|
|
nla_data() is now aligned on a 64-bit area.
The temporary function nla_put_be64_32bit() is removed in this patch.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts were two cases of simple overlapping changes,
nothing serious.
In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new rotation matrix sysfs attribute compliant with IIO core
mounting matrix API.
Matrix is retrieved from "in_anglvel_mount_matrix" and
"in_accel_mount_matrix" sysfs attributes. It is declared into mpu6050 DTS
entry as a "mount-matrix" property.
Old interface is kept for backward userspace compatibility and may be
retrieved from legacy platform_data mechanism only.
Signed-off-by: Gregor Boirie <gregor.boirie@parrot.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
Expose a rotation matrix to indicate userspace the chip orientation with
respect to the overall hardware system.
Matrix is retrieved from "in_mount_matrix". It is declared into ak8975 DTS
entry as a "mount-matrix" property.
Signed-off-by: Gregor Boirie <gregor.boirie@parrot.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
Expose a rotation matrix to indicate userspace the chip placement with
respect to the overall hardware system. This is needed to adjust
coordinates sampled from a sensor chip when its position deviates from the
main hardware system.
Final coordinates computation is delegated to userspace since:
* computation may involve floating point arithmetics ;
* it allows an application to combine adjustments with arbitrary
transformations.
This 3 dimentional space rotation matrix is expressed as 3x3 array of
strings to support floating point numbers. It may be retrieved from a
"[<dir>_][<type>_]mount_matrix" sysfs attribute file. It is declared into a
device / driver specific DTS property or platform data.
Signed-off-by: Gregor Boirie <gregor.boirie@parrot.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.
Now this confusion happens to be a problem with CPU load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.
In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.
So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The CPU load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.
Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:
update_cpu_load_*() -> cpu_load_update_*()
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch introduces 'write_backward' bit to perf_event_attr, which
controls the direction of a ring buffer. After set, the corresponding
ring buffer is written from end to beginning. This feature is design to
support reading from overwritable ring buffer.
Ring buffer can be created by mapping a perf event fd. Kernel puts event
records into ring buffer, user tooling like perf fetch them from
address returned by mmap(). To prevent racing between kernel and tooling,
they communicate to each other through 'head' and 'tail' pointers.
Kernel maintains 'head' pointer, points it to the next free area (tail
of the last record). Tooling maintains 'tail' pointer, points it to the
tail of last consumed record (record has already been fetched). Kernel
determines the available space in a ring buffer using these two
pointers to avoid overwrite unfetched records.
By mapping without 'PROT_WRITE', an overwritable ring buffer is created.
Different from normal ring buffer, tooling is unable to maintain 'tail'
pointer because writing is forbidden. Therefore, for this type of ring
buffers, kernel overwrite old records unconditionally, works like flight
recorder. This feature would be useful if reading from overwritable ring
buffer were as easy as reading from normal ring buffer. However,
there's an obscure problem.
The following figure demonstrates a full overwritable ring buffer. In
this figure, the 'head' pointer points to the end of last record, and a
long record 'E' is pending. For a normal ring buffer, a 'tail' pointer
would have pointed to position (X), so kernel knows there's no more
space in the ring buffer. However, for an overwritable ring buffer,
kernel ignore the 'tail' pointer.
(X) head
. |
. V
+------+-------+----------+------+---+
|A....A|B.....B|C........C|D....D| |
+------+-------+----------+------+---+
Record 'A' is overwritten by event 'E':
head
|
V
+--+---+-------+----------+------+---+
|.E|..A|B.....B|C........C|D....D|E..|
+--+---+-------+----------+------+---+
Now tooling decides to read from this ring buffer. However, none of these
two natural positions, 'head' and the start of this ring buffer, are
pointing to the head of a record. Even the full ring buffer can be
accessed by tooling, it is unable to find a position to start decoding.
The first attempt tries to solve this problem AFAIK can be found from
[1]. It makes kernel to maintain 'tail' pointer: updates it when ring
buffer is half full. However, this approach introduces overhead to
fast path. Test result shows a 1% overhead [2]. In addition, this method
utilizes no more tham 50% records.
Another attempt can be found from [3], which allows putting the size of
an event at the end of each record. This approach allows tooling to find
records in a backward manner from 'head' pointer by reading size of a
record from its tail. However, because of alignment requirement, it
needs 8 bytes to record the size of a record, which is a huge waste. Its
performance is also not good, because more data need to be written.
This approach also introduces some extra branch instructions to fast
path.
'write_backward' is a better solution to this problem.
Following figure demonstrates the state of the overwritable ring buffer
when 'write_backward' is set before overwriting:
head
|
V
+---+------+----------+-------+------+
| |D....D|C........C|B.....B|A....A|
+---+------+----------+-------+------+
and after overwriting:
head
|
V
+---+------+----------+-------+---+--+
|..E|D....D|C........C|B.....B|A..|E.|
+---+------+----------+-------+---+--+
In each situation, 'head' points to the beginning of the newest record.
From this record, tooling can iterate over the full ring buffer and fetch
records one by one.
The only limitation that needs to be considered is back-to-back reading.
Due to the non-deterministic of user programs, it is impossible to ensure
the ring buffer keeps stable during reading. Consider an extreme situation:
tooling is scheduled out after reading record 'D', then a burst of events
come, eat up the whole ring buffer (one or multiple rounds). When the
tooling process comes back, reading after 'D' is incorrect now.
To prevent this problem, we need to find a way to ensure the ring buffer
is stable during reading. ioctl(PERF_EVENT_IOC_PAUSE_OUTPUT) is
suggested because its overhead is lower than
ioctl(PERF_EVENT_IOC_ENABLE).
By carefully verifying 'header' pointer, reader can avoid pausing the
ring-buffer. For example:
/* A union of all possible events */
union perf_event event;
p = head = perf_mmap__read_head();
while (true) {
/* copy header of next event */
fetch(&event.header, p, sizeof(event.header));
/* read 'head' pointer */
head = perf_mmap__read_head();
/* check overwritten: is the header good? */
if (!verify(sizeof(event.header), p, head))
break;
/* copy the whole event */
fetch(&event, p, event.header.size);
/* read 'head' pointer again */
head = perf_mmap__read_head();
/* is the whole event good? */
if (!verify(event.header.size, p, head))
break;
p += event.header.size;
}
However, the overhead is high because:
a) In-place decoding is not safe.
Copying-verifying-decoding is required.
b) Fetching 'head' pointer requires additional synchronization.
(From Alexei Starovoitov:
Even when this trick works, pause is needed for more than stability of
reading. When we collect the events into overwrite buffer we're waiting
for some other trigger (like all cpu utilization spike or just one cpu
running and all others are idle) and when it happens the buffer has
valuable info from the past. At this point new events are no longer
interesting and buffer should be paused, events read and unpaused until
next trigger comes.)
This patch utilizes event's default overflow_handler introduced
previously. perf_event_output_backward() is created as the default
overflow handler for backward ring buffers. To avoid extra overhead to
fast path, original perf_event_output() becomes __perf_event_output()
and marked '__always_inline'. In theory, there's no extra overhead
introduced to fast path.
Performance testing:
Calling 3000000 times of 'close(-1)', use gettimeofday() to check
duration. Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
system calls. In ns.
Testing environment:
CPU : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Kernel : v4.5.0
MEAN STDVAR
BASE 800214.950 2853.083
PRE1 2253846.700 9997.014
PRE2 2257495.540 8516.293
POST 2250896.100 8933.921
Where 'BASE' is pure performance without capturing. 'PRE1' is test
result of pure 'v4.5.0' kernel. 'PRE2' is test result before this
patch. 'POST' is test result after this patch. See [4] for the detailed
experimental setup.
Considering the stdvar, this patch doesn't introduce performance
overhead to the fast path.
[1] http://lkml.iu.edu/hypermail/linux/kernel/1304.1/04584.html
[2] http://lkml.iu.edu/hypermail/linux/kernel/1307.1/00535.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1512.0/01265.html
[4] http://lkml.kernel.org/g/56F89DCD.1040202@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: <acme@kernel.org>
Cc: <pi3orama@163.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1459865478-53413-1-git-send-email-wangnan0@huawei.com
[ Fixed the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
lock_chain::base is used to store an index into the chain_hlocks[]
array, however that array contains more elements than can be indexed
using the u16.
Change the lock_chain structure to use a bitfield to encode the data
it needs and add BUILD_BUG_ON() assertions to check the fields are
wide enough.
Also, for DEBUG_LOCKDEP, assert that we don't run out of elements of
that array; as that would wreck the collision detectoring.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160330093659.GS3408@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for providing an alternative (to block device) access
mechanism to persistent memory, convert pmem_rw_bytes() to
nsio_rw_bytes(). This allows ->rw_bytes() functionality without
requiring a 'struct pmem_device' to be instantiated.
In other words, when ->rw_bytes() is in use i/o is driven through
'struct nd_namespace_io', otherwise it is driven through 'struct
pmem_device' and the block layer. This consolidates the disjoint calls
to devm_exit_badblocks() and devm_memunmap() into a common
devm_nsio_disable() and cleans up the init path to use a unified
pmem_attach_disk() implementation.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Consolidate the information for issuing i/o to a blk-namespace, and
eliminate some pointer chasing.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|