Age | Commit message (Collapse) | Author |
|
Some of the Broadcom iProc SoCs have FlexRM ring manager
which provides a ring-based programming interface to various
offload engines (e.g. RAID, Crypto, etc).
This patch adds a common mailbox driver for Broadcom FlexRM
ring manager which can be shared by various offload engine
drivers (implemented as mailbox clients).
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Pramod KUMAR <pramod.kumar@broadcom.com>
Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
User configures latency target, but the latency threshold for each
request size isn't fixed. For a SSD, the IO latency highly depends on
request size. To calculate latency threshold, we sample some data, eg,
average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency
threshold of each request size will be the sample latency (I'll call it
base latency) plus latency target. For example, the base latency for
request size 4k is 80us and user configures latency target 60us. The 4k
latency threshold will be 80 + 60 = 140us.
To sample data, we calculate the order base 2 of rounded up IO sectors.
If the IO size is bigger than 1M, it will be accounted as 1M. Since the
calculation does round up, the base latency will be slightly smaller
than actual value. Also if there isn't any IO dispatched for a specific
IO size, we will use the base latency of smaller IO size for this IO
size.
But we shouldn't sample data at any time. The base latency is supposed
to be latency where disk isn't congested, because we use latency
threshold to schedule IOs between cgroups. If disk is congested, the
latency is higher, using it for scheduling is meaningless. Hence we only
do the sampling when block throttling is in the LOW limit, with
assumption disk isn't congested in such state. If the assumption isn't
true, eg, low limit is too high, calculated latency threshold will be
higher.
Hard disk is completely different. Latency depends on spindle seek
instead of request size. Currently this feature is SSD only, we probably
can use a fixed threshold like 4ms for hard disk though.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently there is no way to know the request size when the request is
finished. Next patch will need this info. We could add extra field to
record the size, but blk_issue_stat has enough space to record it, so
this patch just overloads blk_issue_stat. With this, we will have 49bits
to track time, which still is very long time.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A cgroup gets assigned a low limit, but the cgroup could never dispatch
enough IO to cross the low limit. In such case, the queue state machine
will remain in LIMIT_LOW state and all other cgroups will be throttled
according to low limit. This is unfair for other cgroups. We should
treat the cgroup idle and upgrade the state machine to lower state.
We also have a downgrade logic. If the state machine upgrades because of
cgroup idle (real idle), the state machine will downgrade soon as the
cgroup is below its low limit. This isn't what we want. A more
complicated case is cgroup isn't idle when queue is in LIMIT_LOW. But
when queue gets upgraded to lower state, other cgroups could dispatch
more IO and this cgroup can't dispatch enough IO, so the cgroup is below
its low limit and looks like idle (fake idle). In this case, the queue
should downgrade soon. The key to determine if we should do downgrade is
to detect if cgroup is truely idle.
Unfortunately it's very hard to determine if a cgroup is real idle. This
patch uses the 'think time check' idea from CFQ for the purpose. Please
note, the idea doesn't work for all workloads. For example, a workload
with io depth 8 has disk utilization 100%, hence think time is 0, eg,
not idle. But the workload can run higher bandwidth with io depth 16.
Compared to io depth 16, the io depth 8 workload is idle. We use the
idea to roughly determine if a cgroup is idle.
We treat a cgroup idle if its think time is above a threshold (by
default 1ms for SSD and 100ms for HD). The idea is think time above the
threshold will start to harm performance. HD is much slower so a longer
think time is ok.
The patch (and the latter patches) uses 'unsigned long' to track time.
We convert 'ns' to 'us' with 'ns >> 10'. This is fast but loses
precision, should not a big deal.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
definitions
Add the definitions related to creation/deletion of a modify header
context and the modify header steering action which are used for HW
packet header modify (re-write) as part of steering. Add as well the
modify header id into two intermediate structs and set it to the FTE.
Note that as the push/pop vlan steering actions are emulated by the
ewitch management code, we're not breaking any compatibility while
changing their values to make room for the modify header action which
is not emulated and whose value is part of the FW API. The new bit
values for the emulated actions are at the end of the possible range.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
There are bunch of places in the code where the intermediate struct
that keeps the elements related to flow actions is initialized with
the same default values. Put that into a small DECLARE type helper.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This fixes compilation for files, that try to include the
cpcap header in alphabetically sorted #include lists.
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
|
|
The patch fix syntax errors introduced by commit 0c8893c9095d
("clockevents: Add a clkevt-of mechanism like clksrc-of").
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.
The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.
When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
0day testing by Fengguang Wu triggered this crash while running Trinity:
kernel BUG at include/linux/pagemap.h:151!
...
CPU: 0 PID: 458 Comm: trinity-c0 Not tainted 4.11.0-rc2-00251-g2947ba0 #1
...
Call Trace:
__get_user_pages_fast()
get_user_pages_fast()
get_futex_key()
futex_requeue()
do_futex()
SyS_futex()
do_syscall_64()
entry_SYSCALL64_slow_path()
It' VM_BUG_ON() due to false-negative in_atomic(). We call
page_cache_get_speculative() with disabled local interrupts.
It should be atomic enough.
So let's check for disabled interrupts in the VM_BUG_ON() condition
too, to resolve this.
( This got triggered by the conversion of the x86 GUP code to the
generic GUP code. )
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170324114709.pcytvyb3d6ajux33@black.fi.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
blob.
We switched from "struct task_struct"->security to "struct cred"->security
in Linux 2.6.29. But not all LSM modules were happy with that change.
TOMOYO LSM module is an example which want to use per "struct task_struct"
security blob, for TOMOYO's security context is defined based on "struct
task_struct" rather than "struct cred". AppArmor LSM module is another
example which want to use it, for AppArmor is currently abusing the cred
a little bit to store the change_hat and setexeccon info. Although
security_task_free() hook was revived in Linux 3.4 because Yama LSM module
wanted to release per "struct task_struct" security blob,
security_task_alloc() hook and "struct task_struct"->security field were
not revived. Nowadays, we are getting proposals of lightweight LSM modules
which want to use per "struct task_struct" security blob.
We are already allowing multiple concurrent LSM modules (up to one fully
armored module which uses "struct cred"->security field or exclusive hooks
like security_xfrm_state_pol_flow_match(), plus unlimited number of
lightweight modules which do not use "struct cred"->security nor exclusive
hooks) as long as they are built into the kernel. But this patch does not
implement variable length "struct task_struct"->security field which will
become needed when multiple LSM modules want to use "struct task_struct"->
security field. Although it won't be difficult to implement variable length
"struct task_struct"->security field, let's think about it after we merged
this patch.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Djalal Harouni <tixxdz@gmail.com>
Acked-by: José Bollo <jobol@nonadev.net>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: José Bollo <jobol@nonadev.net>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"A smattering of different small fixes for some random driver
subsystems. Nothing all that major, just resolutions for reported
issues and bugs.
All have been in linux-next with no reported issues"
* tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
extcon: int3496: Set the id pin to direction-input if necessary
extcon: int3496: Use gpiod_get instead of gpiod_get_index
extcon: int3496: Add dependency on X86 as it's Intel specific
extcon: int3496: Add GPIO ACPI mapping table
extcon: int3496: Rename GPIO pins in accordance with binding
vmw_vmci: handle the return value from pci_alloc_irq_vectors correctly
ppdev: fix registering same device name
parport: fix attempt to write duplicate procfiles
auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
Drivers: hv: vmbus: Don't leak channel ids
Drivers: hv: util: don't forget to init host_ts.lock
Drivers: hv: util: move waiting for release to hv_utils_transport itself
vmbus: remove hv_event_tasklet_disable/enable
vmbus: use rcu for per-cpu channel list
mei: don't wait for os version message reply
mei: fix deadlock on mei reset
intel_th: pci: Add Gemini Lake support
intel_th: pci: Add Denverton SOC support
intel_th: Don't leak module refcount on failure to activate
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for 4.11-rc4 that resolve a
number of tiny reported issues. All of these have been in linux-next
for a while with no reported issues"
* tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: imu: st_lsm6dsx: fix FIFO_CTRL2 overwrite during watermark configuration
iio: adc: ti_am335x_adc: fix fifo overrun recovery
iio: sw-device: Fix config group initialization
iio: magnetometer: ak8974: remove incorrect __exit markups
iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.11-rc4.
Nothing major here, just an bunch of small fixes, and a handfull of
good fixes from Johan for devices with crazy descriptors. There are a
few new device ids in here as well.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held
usb: gadget: udc: remove pointer dereference after free
usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
usb: gadget: acm: fix endianness in notifications
usb: dwc3: gadget: delay unmap of bounced requests
USB: serial: qcserial: add Dell DW5811e
usb: hub: Fix crash after failure to read BOS descriptor
ACM gadget: fix endianness in notifications
USB: usbtmc: fix probe error path
USB: usbtmc: add missing endpoint sanity check
USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
usb: musb: fix possible spinlock deadlock
usb: musb: dsps: fix iounmap in error and exit paths
usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
uwb: i1480-dfu: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
USB: wusbcore: fix NULL-deref at probe
USB: uss720: fix NULL-deref at probe
...
|
|
Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized
PER_CPU locks properly") we try to collapse per-cpu locks into a single
class by giving them all the same key. For this key we choose the canonical
address of the per-cpu object, which would be the offset into the per-cpu
area.
This has two problems:
- there is a case where we run !0 lock->key through static_obj() and
expect this to pass; it doesn't for canonical pointers.
- 0 is a valid canonical address.
Cure both issues by redefining the canonical address as the address of the
per-cpu variable on the boot CPU.
Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
all, track the boot CPU in a variable.
Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-mm@kvack.org
Cc: wfg@linux.intel.com
Cc: kernel test robot <fengguang.wu@intel.com>
Cc: LKP <lkp@01.org>
Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypto fixes from Ted Ts'o:
"A code cleanup and bugfix for fs/crypto"
* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: eliminate ->prepare_context() operation
fscrypt: remove broken support for detecting keyring key revocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- bug fixes in asus_atk0110, it87 and max31790 drivers
- added missing API definition to hwmon core
* tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus_atk0110) fix uninitialized data access
hwmon: Add missing HWMON_T_ALARM
hwmon: (it87) Avoid registering the same chip on both SIO addresses
hwmon: (max31790) Set correct PWM value
|
|
blk_integrity_profile's are never modified, so mark them 'const' so that
they are placed in .rodata and benefit from memory protection.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
One is sufficient since Blue Flame is not supported anymore.
This will also come in handy for switchdev mode to save resources, since
VF representors will use same single UAR as well for their own SQs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
- a couple of OMAP 4.11 regression fixes, including a boot regression
for SmartReflex, hypervisor mode in thumb2 mode, and reference
counting of device nodes
- a fix for cpu_idle on at91
- minor DT fixes on across several platforms: sunxi, bcm53xx, at91,
nsp, ns2, ux500, omap
- a fix to correct an API change in the reset controllers
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (22 commits)
arm64: dts: NS2: Add dma-coherent to relevant DT entries
reset: fix optional reset_control_get stubs to return NULL
ARM: sun8i: a23/a33: drop bl_en_pin GPIO pinmux in reference design DTSI
ARM: dts: sun7i: lamobo-r1: Fix CPU port RGMII settings
ARM: dts: NSP: GPIO reboot open-source
ARM: at91: pm: cpu_idle: switch DDR to power-down mode
ARM: dts: add the AB8500 clocks to the device tree
ARM: dts: imx6sx-udoo-neo: Fix reboot hang
ARM: sun8i: Fix the mali clock rate
ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
ARM: dts: BCM5301X: Fix memory start address
ARM: dts: BCM5301X: Fix UARTs on bcm953012k
Revert "ARM: at91/dt: sama5d2: Use new compatible for ohci node"
ARM: OMAP2+: Release device node after it is no longer needed.
ARM: OMAP2+: Fix device node reference counts
ARM: OMAP2+: Remove legacy gpmc-nand.c
ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
ARM: dts: am335x-pcm953: Fix legacy wakeup source binding
ARM: omap2plus_defconfig: Enable INPUT_MOUSEDEV as loadable modules
ARM: dts: am57xx-idk: tpic2810 is on I2C bus, not SPI
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes regressions in the crypto ccp driver and the hwrng drivers
for amd and geode"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: geode - Revert managed API changes
hwrng: amd - Revert managed API changes
crypto: ccp - Assign DMA commands to the channel's CCP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few fixes piled up:
- fix a NULL-ptr dereference that happens in VT-d on some platforms
- a fix for ARM MSI region reporting, so that a sane interface makes
it to a released kernel
- fixes for leaf-checking in ARM io-page-table code
- two fixes for IO/TLB flushing code on ARM Exynos platforms"
* tag 'iommu-fixes-v4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Disambiguate MSI region types
iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
iommu/exynos: Block SYSMMU while invalidating FLPD cache
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
|
|
Introduce a new configuration symbol: MDIO_DEVICE which allows building
the MDIO devices and bus code, without pulling in the entire Ethernet
PHY library and devices code.
PHYLIB nows select MDIO_DEVICE and the relevant Makefile files are
updated to reflect that.
When MDIO_DEVICE (MDIO bus/device only) is selected, but not PHYLIB, we
have mdio-bus.ko as a loadable module, and it does not have a
module_exit() function because the safety of removing a bus class is
unclear.
When both MDIO_DEVICE and PHYLIB are enabled, we need to assemble
everything into a common loadable module: libphy.ko because of nasty
circular dependencies between phy.c, phy_device.c and mdio_bus.c which
are really tough to untangle.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CONFIG_OF_MDIO is actually what triggers the build of drivers/of/of_mdio.c, so
providing inline stubs when CONFIG_OF_MDIO=y should be based on that symbol as
well.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Drop duplicate header nsproxy.h from linux/cgroup.h.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This patch fix spelling typos found in
Documentation/output/xml/driver-api/basics.xml.
It is because the xml file was generated from comments in source,
so I had to fix the comments.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
A version 5 device provides the primitive commands
required for AES GCM. This patch adds support for
en/decryption.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Wire up support for Triple DES in ECB mode.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Incorporate 384-bit and 512-bit hashing for a version 5 CCP
device
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/ethernet/broadcom/genet/bcmmii.c
drivers/net/hyperv/netvsc.c
kernel/bpf/hashtab.c
Almost entirely overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.11-rc4
f_acm got an endianness fix by Oliver Neukum. This has been around for a
long time but it's finally fixed.
f_hid learned that it should never access hidg->req without first
grabbing the spinlock.
Roger Quadros fixed two bugs in the f_uvc function driver.
Janusz Dziedzic fixed a very peculiar bug with EP0, one that's rather
difficult to trigger. When we're dealing with bounced EP0 requests, we
should delay unmap until after ->complete() is called.
UDC class got a use-after-free fix.
|
|
A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.
For the configuration at registration, we already have
clockevents_config_and_register().
Right now, there are no clockevents_config() users outside of the
clockevents core.
To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
Pull networking fixes from David Miller:
1) Several netfilter fixes from Pablo and the crew:
- Handle fragmented packets properly in netfilter conntrack, from
Florian Westphal.
- Fix SCTP ICMP packet handling, from Ying Xue.
- Fix big-endian bug in nftables, from Liping Zhang.
- Fix alignment of fake conntrack entry, from Steven Rostedt.
2) Fix feature flags setting in fjes driver, from Taku Izumi.
3) Openvswitch ipv6 tunnel source address not set properly, from Or
Gerlitz.
4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky.
5) sk->sk_frag.page not released properly in some cases, from Eric
Dumazet.
6) Fix RTNL deadlocks in nl80211, from Johannes Berg.
7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu.
8) Cure improper inflight handling during AF_UNIX GC, from Andrey
Ulanov.
9) sch_dsmark doesn't write to packet headers properly, from Eric
Dumazet.
10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas
Yeganeh.
11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren.
12) Fix nametbl deadlock in tipc, from Ying Xue.
13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal
Pressman.
14) Fix reset of internal PHYs in bcmgenet, from Doug Berger.
15) Fix hashmap allocation handling, from Alexei Starovoitov.
16) nl_fib_input() needs stronger netlink message length checking, from
Eric Dumazet.
17) Fix double-free of sk->sk_filter during sock clone, from Daniel
Borkmann.
18) Fix RX checksum offloading in aquantia driver, from Pavel Belous.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
net:ethernet:aquantia: Fix for RX checksum offload.
amd-xgbe: Fix the ECC-related bit position definitions
sfc: cleanup a condition in efx_udp_tunnel_del()
Bluetooth: btqcomsmd: fix compile-test dependency
inet: frag: release spinlock before calling icmp_send()
tcp: initialize icsk_ack.lrcvtime at session start time
genetlink: fix counting regression on ctrl_dumpfamily()
socket, bpf: fix sk_filter use after free in sk_clone_lock
ipv4: provide stronger user input validation in nl_fib_input()
bpf: fix hashmap extra_elems logic
enic: update enic maintainers
net: bcmgenet: remove bcmgenet_internal_phy_setup()
ipv6: make sure to initialize sockc.tsflags before first use
fjes: Do not load fjes driver if extended socket device is not power on.
fjes: Do not load fjes driver if system does not have extended socket device.
net/mlx5e: Count LRO packets correctly
net/mlx5e: Count GSO packets correctly
net/mlx5: Increase number of max QPs in default profile
net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
...
|
|
No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.
There is nothing the callers could do, except retrying over and over
again.
So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).
Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Cc: stable@vger.kernel.org # 3.4+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There isn't a bug here, but Smatch is not smart enough to know that
"nr_iovecs" can't be negative so it complains about underflows.
Really, it's slightly cleaner to make this parameter unsigned.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Add a new cmpxchg interface:
bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);
Where the boolean returns the result of the compare; and thus if the
exchange happened; and in case of failure, the new value of *ptr is
returned in *val.
This allows simplification/improvement of loops like:
for (;;) {
new = val $op $imm;
old = cmpxchg(ptr, val, new);
if (old == val)
break;
val = old;
}
into:
do {
} while (!try_cmpxchg(ptr, &val, val $op $imm));
while also generating better code (GCC6 and onwards).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
People reported that commit:
5680d8094ffa ("sched/clock: Provide better clock continuity")
broke "perf test tsc".
That commit added another offset to the reported clock value; so
take that into account when computing the provided offset values.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5680d8094ffa ("sched/clock: Provide better clock continuity")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This flag was never used since it was introduced.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Make the function available for outside use and fortify it against NULL
kobject.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.
The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment.
Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.
Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently we wait for all cgwbs to get released in cgwb_bdi_destroy()
(called from bdi_unregister()). That is however unnecessary now when
cgwb->bdi is a proper refcounted reference (thus bdi cannot get
released before all cgwbs are released) and when cgwb_bdi_destroy()
shuts down writeback directly.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy()
which also means that writeback has been shutdown on them. Since this
wait is going away, directly shutdown writeback on cgwbs from
cgwb_bdi_destroy() to avoid live writeback structures after
bdi_unregister() has finished. To make that safe with concurrent
shutdown from cgwb_release_workfn(), we also have to make sure
wb_shutdown() returns only after the bdi_writeback structure is really
shutdown.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
congested->bdi pointer is used only to be able to remove congested
structure from bdi->cgwb_congested_tree on structure release. Moreover
the pointer can become NULL when we unregister the bdi. Rename the field
to __bdi and add a comment to make it more explicit this is internal
stuff of memcg writeback code and people should not use the field as
such use will be likely race prone.
We do not bother with converting congested->bdi to a proper refcounted
reference. It will be slightly ugly to special-case bdi->wb.congested to
avoid effectively a cyclic reference of bdi to itself and the reference
gets cleared from bdi_unregister() making it impossible to reference
a freed bdi.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
|
|
This patch adds hash of maps support (hashmap->bpf_map).
BPF_MAP_TYPE_HASH_OF_MAPS is added.
A map-in-map contains a pointer to another map and lets call
this pointer 'inner_map_ptr'.
Notes on deleting inner_map_ptr from a hash map:
1. For BPF_F_NO_PREALLOC map-in-map, when deleting
an inner_map_ptr, the htab_elem itself will go through
a rcu grace period and the inner_map_ptr resides
in the htab_elem.
2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC),
when deleting an inner_map_ptr, the htab_elem may
get reused immediately. This situation is similar
to the existing prealloc-ated use cases.
However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls
inner_map->ops->map_free(inner_map) which will go
through a rcu grace period (i.e. all bpf_map's map_free
currently goes through a rcu grace period). Hence,
the inner_map_ptr is still safe for the rcu reader side.
This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the
check_map_prealloc() in the verifier. preallocation is a
must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect
heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map
is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using
map-in-map first.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|