Age | Commit message (Collapse) | Author |
|
We now handle the open drain mode internally in the I2C GPIO
driver, but we will get warnings from the gpiolib that we
override the default mode of the line so it becomes open
drain.
We can fix all in-kernel users by simply passing the right
flag along in the descriptor table, and we already touched
all of these files in the series so let's just tidy it up.
Cc: Steven Miao <realmz6@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Wu, Aaron <Aaron.Wu@analog.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The I2C GPIO bitbang driver currently emulates open drain
behaviour by implementing what the gpiolib already does:
not actively driving the line high, instead setting it to
input.
This makes no sense. Use the new facility in gpiolib to
request the lines enforced into open drain mode, and let
the open drain emulation already present in the gpiolib
kick in and handle this.
As a bonus: if the GPIO driver in the back-end actually
supports open drain in hardware using the .set_config()
callback, it will be utilized. That's correct: we never
used that hardware feature before, instead relying on
emulating open drain even if the GPIO controller could
actually handle this for us.
Users will sometimes get messages like this:
gpio-485 (?): enforced open drain please flag it properly
in DT/ACPI DSDT/board file
gpio-486 (?): enforced open drain please flag it properly
in DT/ACPI DSDT/board file
i2c-gpio gpio-i2c: using lines 485 (SDA) and 486 (SCL)
Which is completely proper: since the line is used as
open drain, it should actually be flagged properly with
e.g.
gpios = <&gpio0 5 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>,
<&gpio0 6 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
Or similar facilities in board file descriptor tables
or ACPI DSDT.
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Some busses, like I2C, strictly need to have the line handled
as open drain, i.e. not actively driven high. For this reason
the i2c-gpio.c bit-banged I2C driver is reimplementing open
drain handling outside of gpiolib.
This is not very optimal. Instead make it possible for a
consumer to explcitly express that the line must be handled
as open drain instead of allowing local hacks papering over
this issue.
The descriptor tables, whether DT, ACPI or board files, should
of course have flagged these lines as open drain. E.g.:
enum gpio_lookup_flags GPIO_OPEN_DRAIN for a board file, or
gpios = <&foo 42 GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN>; in a
device tree using <dt-bindings/gpio/gpio.h>
But more often than not, these descriptors are wrong. So
we need to make it possible for consumers to enforce this
open drain behaviour.
We now have two new enumerated GPIO descriptor config flags:
GPIOD_OUT_LOW_OPEN_DRAIN and GPIOD_OUT_HIGH_OPEN_DRAIN
that will set up the lined enforced as open drain as output
low or high, using open drain (if the driver supports it)
or using open drain emulation (setting the line as input
to drive it high) from the gpiolib core.
Cc: linux-gpio@vger.kernel.org
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This converts the GPIO-based I2C-driver to using GPIO
descriptors instead of the old global numberspace-based
GPIO interface. We:
- Convert the driver to unconditionally grab two GPIOs
from the device by index 0 (SDA) and 1 (SCL) which
will work fine with device tree and descriptor tables.
The existing device trees will continue to work just
like before, but without any roundtrip through the
global numberspace.
- Brutally convert all boardfiles still passing global
GPIOs by registering descriptor tables associated with
the devices instead so this driver does not need to keep
supporting passing any GPIO numbers as platform data.
There is no stepwise approach as elegant as this, I
strongly prefer this big hammer over any antsteps for this
conversion. This way the old GPIO numbers go away and
NEVER COME BACK.
Special conversion for the different boards utilizing
I2C-GPIO:
- EP93xx (arch/arm/mach-ep93xx): pretty straight forward as
all boards were using the same two GPIO lines, just define
these two in a lookup table for "i2c-gpio" and register
these along with the device. None of them define any
other platform data so just pass NULL as platform data.
This platform selects GPIOLIB so all should be smooth.
The pins appear on a gpiochip for bank "G" as pins 1 (SDA)
and 0 (SCL).
- IXP4 (arch/arm/mach-ixp4): descriptor tables have to
be registered for each board separately. They all use
"IXP4XX_GPIO_CHIP" so it is pretty straight forward.
Most board define no other platform data than SCL/SDA
so they can drop the #include of <linux/i2c-gpio.h> and
assign NULL to platform data.
The "goramo_mlr" (Goramo Multilink Router) board is a bit
worrisome: it implements its own I2C bit-banging in the
board file, and optionally registers an I2C serial port,
but claims the same GPIO lines for itself in the board file.
This is not going to work: there will be competition for the
GPIO lines, so delete the optional extra I2C bus instead, no
I2C devices are registered on it anyway, there are just hints
that it may contain an EEPROM that may be accessed from
userspace. This needs to be fixed up properly by the serial
clock using I2C emulation so drop a note in the code.
- KS8695 board acs5k (arch/arm/mach-ks8695/board-acs5.c)
has some platform data in addition to the pins so it needs to
be kept around sans GPIO lines. Its GPIO chip is named
"KS8695" and the arch selects GPIOLIB.
- PXA boards (arch/arm/mach-pxa/*) use some of the platform
data so it needs to be preserved here. The viper board even
registers two GPIO I2Cs. The gpiochip is named "gpio-pxa" and
the arch selects GPIOLIB.
- SA1100 Simpad (arch/arm/mach-sa1100/simpad.c) defines a GPIO
I2C bus, and the arch selects GPIOLIB.
- Blackfin boards (arch/blackfin/bf533 etc) for these I assume
their I2C GPIOs refer to the local gpiochip defined in
arch/blackfin/kernel/bfin_gpio.c names "BFIN-GPIO".
The arch selects GPIOLIB. The boards get spiked with
IF_ENABLED(I2C_GPIO) but that is a side effect of it
being like that already (I would just have Kconfig select
I2C_GPIO and get rid of them all.) I also delete any
platform data set to 0 as it will get that value anyway
from static declartions of platform data.
- The MIPS selects GPIOLIB and the Alchemy machine is using
two local GPIO chips, one of them has a GPIO I2C. We need
to adjust the local offset from the global number space here.
The ATH79 has a proper GPIO driver in drivers/gpio/gpio-ath79.c
and AFAICT the chip is named "ath79-gpio" and the PB44
PCF857x expander spawns from this on GPIO 1 and 0. The latter
board only use the platform data to specify pins so it can be
cut altogether after this.
- The MFD Silicon Motion SM501 is a special case. It dynamically
spawns an I2C bus off the MFD using sm501_create_subdev().
We use an approach to dynamically create a machine descriptor
table and attach this to the "SM501-LOW" or "SM501-HIGH"
gpiochip. We use chip-local offsets to grab the right lines.
We can get rid of two local static inline helpers as part
of this refactoring.
Cc: Steven Miao <realmz6@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Heiko Schocher <hs@denx.de>
Acked-by: Wu, Aaron <Aaron.Wu@analog.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
It is possible to read all lines of a generic MMIO GPIO chip
with a single register read so support this if we are in
native endianness.
Add an especially quirky callback to read multiple lines for
the variants that require you to read values from the
output registers if and only if the line is set as output.
We managed to do that with a maximum of two register reads,
and just one read if the requested lines are all input or all
output.
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
And fix tcon leak in error path.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
|
|
Fix kernel-doc build error. A symbol that ends with an underscore
character ('_') has special meaning in reST (reStructuredText), so add
a '*' to prevent this error and to indicate that there are several of
these values to choose from.
../sound/core/jack.c:312: ERROR: Unknown target name: "snd_jack_btn".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We intended to test for failure here but accidentally tested for
success. It means that we don't set "*val" to true and it means that
if i2c_smbus_write_byte() does fail then we return success.
Fixes: e7895864b0d7 ("hwmon: (max6621) Add support for Maxim MAX6621 temperature sensor")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The setting of newval to zero is redundant as the following if/else
stanzas will always update newval to a new value. Remove the
redundant setting, cleans up clang build warning:
drivers/hwmon/asc7621.c:582:2: warning: Value stored to 'newval' is
never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
This patch supports xgene-hwmon v2 which uses the non-cachable memory
as the PCC shared memory.
Signed-off-by: Hoan Tran <hotran@apm.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
A previous commit changed the argument list of gpio_fan_get_of_data(),
removing the "struct *dev" argument and retrieving it instead from the
gpio_fan_data structure. The "dev" entry of gpio_fan_data was then
dereferenced to access the of_node field, leading to a kernel panic
during the probe as the "dev" entry of the gpio_fan_data structure was
not filled yet.
Fix this by setting fan_data->dev before calling gpio_fan_get_of_data().
Fixes: 5859d8d30737 ("hwmon: (gpio-fan) Get rid of platform data struct")
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
This converts the GPIO fan driver to use GPIO descriptors. This way
we avoid indirection since the gpiolib anyway just use descriptors
inside, and we also get rid of explicit polarity handling: the
descriptors internally knows if the line is active high or active
low.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[groeck: Line length]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The "ctrl" and "num_ctrl" entries in the state container struct is
ambiguously named "ctrl" and "num_ctrl" overlapping with some hwmon
lingo and making it hard to understand. Since this array actually
contains the GPIO line numbers, from the Linux global GPIO numberspace,
used to control the different fan speeds. Rename these fields to
"gpios" (pluralis) and "num_gpios" so as to make it unambiguous.
Convert some instances of "unsigned" to "unsigned int" to keep
checkpatch happy.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
There is no point in storing the GPIO alarm settings in their
own struct so merge this into the main state container.
Convert the variables from "unsigned" to "unsigned int" to
make checkpatch happy.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
We are not passing the platform data struct into the driver from the
outside, there is no point of having it around separately so instead
of first populating the platform data struct and assigning the result
into the same variables in the state container (struct gpio_fan_data)
just assign the configuration from the device tree directly into the
state container members.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
We have no users of platform data, we made platform data driver-local,
so cut all #ifdefs for the platform data case, and depend on the
Kconfig CONFIG_OF_GPIO symbol.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The driver is storing the struct platform_device *pdev pointer
but what it is really using and want to pass around is a
struct device *dev pointer.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
There is not a single user of the platform data header in
<linux/gpio-fan.h>. We can conclude that all current users are
probing from the device tree, so start simplifying the code by
pulling the header into the driver.
Convert "unsigned" to "unsigned int" in the process to make
checkpatch happy.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Create local struct device *dev and device_node *np pointers to
make the code easier to read.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
This moves the GPIO fan bindings to the hwmon bindings. The GPIO
fan is a hwmon class hardware, not related to GPIO other than being
a consumer of GPIOs.
Cc: devicetree@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add device record for Maxim MAX6621 temperature sensor device.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
MAX6621 is a PECI-to-I2C translator provides an efficient, low-cost
solution for PECI-to-SMBus/I2C protocol conversion. It allows reading the
temperature from the PECI-compliant host directly from up to four
PECI-enabled CPUs.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
size
Don't populate const array watchdog_minors on the stack, instead make it
static. Makes the object code smaller by over 350 bytes:
Before:
text data bss dec hex filename
48019 38144 256 86419 15193 drivers/hwmon/w83793.o
After:
text data bss dec hex filename
47574 38232 256 86062 1502e drivers/hwmon/w83793.o
(gcc 6.3.0, x86-64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The previous value reduced the time required to determine
the fan value, however, it's also used as the final timeout
mechanism. The prevous value would work for any fan speed
greater than around 3k RPM. This increased value, lets the fan
speeds return quickly but will wait longer to handle speeds below 3k
RPM.
Testing: this value was determined through experimentation on the ast2400
on the Quanta-q71l. This configurations runs 8 fans attached to the
controller.
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add new device tree binding for max1619.
Signed-off-by: Alan Tull <atull@kernel.org>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add new device tree bindings document for max1619 device
including a new compatible string.
Signed-off-by: Alan Tull <atull@kernel.org>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
A previous commit removed bit or'ing into to the integer status
so now status is now always zero. This means that the non-zero check on
status and the sht15_send_status call will never occur; it is deadcode.
Clean this up by removing the dead code.
Detected by: CoverityScan CID#1456835 ("Logically dead code")
Fixes: aa7ab80c578c ("hwmon: (sht15) Root out platform data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
After finding out there are active users of this sensor I noticed:
- It has a single PXA27x board file using the platform data
- The platform data is only used to carry two GPIO pins, all other
fields are unused
- The driver does not use GPIO descriptors but the legacy GPIO
API
I saw we can swiftly fix this by:
- Killing off the platform data entirely
- Define a GPIO descriptor lookup table in the board file
- Use the standard devm_gpiod_get() to grab the GPIO descriptors
from either the device tree or the board file table.
This compiles, but needs testing.
Cc: arm@kernel.org
Cc: Marco Franchi <marco.franchi@nxp.com>
Cc: Davide Hug <d@videhug.ch>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Marco Franchi <marco.franchi@nxp.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add support for handling temperature offset values for various AMD CPUs,
similar to the code used in the coretemp driver for Intel CPUs. This is
primarily for Ryzen CPUs (which has documented temperature offsets),
but the code is kept generic to simplify adding additional CPUs.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add support for temperature sensors on Family 17h (Ryzen) processors.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Introduce a local data structure and determine the temperature read
function at probe time to reduce runtime complexity.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Function snprintf already cares for the terminating NUL at the end of
the string, the caller doesn't need to do it.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Andrea Merello <andrea.merello@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The pmbus core may call read/write word data functions with a page value
of -1, intending to perform the operation without setting the page.
However, the read/write word data functions accept only unsigned 8-bit
page numbers, and therefore cannot check for negative page number to
avoid setting the page. This results in setting the page number to 0xFF.
This may result in errors or undefined behavior of some devices
(specifically the ir35221, which allows the page to be set to 0xFF,
but some subsequent operations to read registers may fail).
Switch the pmbus_set_page page parameter to an integer and perform the
check for negative page there. Make read/write functions consistent in
accepting an integer page number parameter.
Signed-off-by: Edward A. James <eajames@us.ibm.com>
Fixes: cbcdec6202c9 ("hwmon: (pmbus): Access word data for STATUS_WORD")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
|
|
The timer-of API does not provide a function to undo what has been done by
the timer_of_init() function.
Add a timer_of_exit() function.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
It is currently unclear how to set the VCPU affinity for a percpu_devid
interrupt , since the Linux irq_data structure describes the state for
multiple interrupts, one for each physical CPU on the system. Since
each such interrupt can be associated with different VCPUs or none at
all, associating a single VCPU state with such an interrupt does not
capture the necessary semantics.
The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
the ARM GIC irqchip. The Intel and AMD callers do not appear to use
percpu_devid interrupts, and the ARM GIC implementation only checks the
pointer against NULL vs. non-NULL.
Therefore, simply update the function documentation to explain the
expected use in the context of percpu_devid interrupts, allowing future
changes or additions to irqchip implementers to do the right thing.
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvm@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1509093281-15225-13-git-send-email-cdall@linaro.org
|
|
Pull networking fixes from David Miller:
1) Fix route leak in xfrm_bundle_create().
2) In mac80211, validate user rate mask before configuring it. From
Johannes Berg.
3) Properly enforce memory limits in fair queueing code, from Toke
Hoiland-Jorgensen.
4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
5) Fix TSO header allocation and management in mvpp2 driver, from Yan
Markman.
6) Don't take socket lock in BH handler in strparser code, from Tom
Herbert.
7) Don't show sockets from other namespaces in AF_UNIX code, from
Andrei Vagin.
8) Fix double free in error path of tap_open(), from Girish Moodalbail.
9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
and Alexander Duyck.
10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
Long.
12) Properly align SKB head before building SKB in tuntap, from Jason
Wang.
13) Avoid matching qdiscs with a zero handle during lookups, from Cong
Wang.
14) Fix various endianness bugs in sctp, from Xin Long.
15) Fix tc filter callback races and add selftests which trigger the
problem, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
selftests: Introduce a new test case to tc testsuite
selftests: Introduce a new script to generate tc batch file
net_sched: fix call_rcu() race on act_sample module removal
net_sched: add rtnl assertion to tcf_exts_destroy()
net_sched: use tcf_queue_work() in tcindex filter
net_sched: use tcf_queue_work() in rsvp filter
net_sched: use tcf_queue_work() in route filter
net_sched: use tcf_queue_work() in u32 filter
net_sched: use tcf_queue_work() in matchall filter
net_sched: use tcf_queue_work() in fw filter
net_sched: use tcf_queue_work() in flower filter
net_sched: use tcf_queue_work() in flow filter
net_sched: use tcf_queue_work() in cgroup filter
net_sched: use tcf_queue_work() in bpf filter
net_sched: use tcf_queue_work() in basic filter
net_sched: introduce a workqueue for RCU callbacks of tc filter
sctp: fix some type cast warnings introduced since very beginning
sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
sctp: fix some type cast warnings introduced by transport rhashtable
sctp: fix some type cast warnings introduced by stream reconf
...
|
|
Cong Wang says:
====================
net_sched: fix races with RCU callbacks
Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:
1. A simple one fixed by Daniel:
commit c78e1746d3ad7d548bdf3fe491898cc453911a49
Author: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed May 20 17:13:33 2015 +0200
net: sched: fix call_rcu() race on classifier module unloads
2. A very nasty one fixed by me:
commit 1697c4bb5245649a23f06a144cc38c06715e1b65
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon Sep 11 16:33:32 2017 -0700
net_sched: carefully handle tcf_block_put()
3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/
Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.
Alternatively, we could also:
a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb5245 ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).
b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.
As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.
Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
# sudo ./tdc.py -d enp4s0f0
This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.
In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
# ./tdc_batch.py -h
usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file
TC batch file generator
positional arguments:
device device name
file batch file name
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
how many lines in batch file
-o, --skip_sw skip_sw (offload), by default skip_hw
-s, --share_action all filters share the same action
-p, --prio all filters have different prio
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to commit c78e1746d3ad
("net: sched: fix call_rcu() race on classifier module unloads"),
we need to wait for flying RCU callback tcf_sample_cleanup_rcu().
Cc: Yotam Gigi <yotamg@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After previous patches, it is now safe to claim that
tcf_exts_destroy() is always called with RTNL lock.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|