summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-04-24mmc: sdhci-cadence: add HS400 enhanced strobe supportPiotr Sroka
Add support for HS400ES mode to Cadence SDHCI driver. Signed-off-by: Piotr Sroka <piotrs@cadence.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: tegra: Add Tegra186 supportThierry Reding
The SDHCI controller found on NVIDIA Tegra186 SoCs is very similar to the one on prior generations of Tegra and can be supported by the same driver. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: tegra: Support module resetThierry Reding
The device tree binding for the SDHCI controller found on Tegra SoCs specifies that a reset control can be provided by the device tree. No code was ever added to support the module reset, which can cause the driver to try and access registers from a module that's in reset. On most Tegra SoC generations doing so would cause a hang. Note that it's unlikely to see this happen because on most platforms these resets will have been deasserted by the bootloader. However the portability can be improved by making sure the driver deasserts the reset before accessing any registers. Since resets are synchronous on Tegra SoCs, the platform driver needs to implement a custom ->remove() callback now to make sure the clock is disabled after the reset is asserted. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove mmc host on device removalMichał Zegan
The mmc host was added in meson_mmc_probe, but never removed in meson_mmc_remove. Fix that by removing the host before deallocating other resources. Signed-off-by: Michał Zegan <webczat@webczatnet.pl> Tested-by: Michał Zegan <webczat@webczatnet.pl> Acked-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: host: tmio: fill in response from auto cmd12Wolfram Sang
After we received the dataend interrupt, R1 response register carries the value from the automatically generated stop command. Report that info back to the MMC block layer, so we will be notified in case of e.g. ECC errors which happened during the last transfer. Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: host: tmio: don't BUG on unsupported stop commandsWolfram Sang
Halting the kernel on an unsupported stop command seems overkill, report the error and say what we already did (due to autocmd12) instead. Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: host: tmio: fix minor typos in commentsWolfram Sang
Making sure we match the actual register names. Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: host: tmio: use defines for CTL_STOP_INTERNAL_ACTION valuesWolfram Sang
Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: tmio: ensure end of DMA and SD access are in syncWolfram Sang
The current code assumes that DMA is finished before SD access end is flagged. Thus, it schedules the 'dma_complete' tasklet in the SD card interrupt routine when DATAEND is set. The assumption is not safe, though. Even by mounting an SD card, it can be seen that sometimes DMA complete is first, sometimes DATAEND. It seems they are usually close enough timewise to not cause problems. However, a customer reported that with CMD53 sometimes things really break apart. As a result, the BSP has a patch which introduces flags for both events and makes sure both flags are set before scheduling the tasklet. The customer accepted the patch, yet it doesn't seem a proper upstream solution to me. This patch refactors the code to replace the tasklet with already existing and more lightweight mechanisms. First of all, we set the callback in a DMA descriptor to automatically get notified when DMA is done. In the callback, we then use a completion to make sure the SD access has already ended. Then, we proceed as before. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: sdhci-pxav2: add error handling of clk_prepare_enable()Alexey Khoroshilov
There is no check if clk_prepare_enable() succeed in sdhci_pxav2_probe(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: replace magic timeout numbers with constantsHeiner Kallweit
Replace timeout magic numbers with proper constants. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove member mrq from struct meson_hostHeiner Kallweit
Struct mmc_command includes a reference to the related mmc_request. Therefore we don't have to store mrq separately in struct meson_host. And we can remove some now unneeded WARN_ON's. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: improve initial configurationHeiner Kallweit
Config values which are not changed during runtime we can set in the probe function already. The block size setting is overwritten later in meson_mmc_start_cmd anyway if needed, so it doesn't harm if we remove this setting in meson_mmc_set_ios. In addition write config register only if configuration changed. Don't change the location of clock initialization as in an earlier version of the patch, this change causes a hang. This issue was reported and fix suggested by: Helmut Klein <hgkr.klein@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove unneeded devm_kstrdup in meson_mmc_clk_initHeiner Kallweit
CLK core does a deep copy of init.name, therefore it's fully ok to provide a local variable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: fix error path in meson_mmc_clk_init / meson_mmc_probeHeiner Kallweit
The condition should be "if (ret)" as the disable/unprepare is supposed to be executed if the previous command fails. In addition adjust the error path in probe to properly deal with the case that cfg_div_clk can be registered successfully but enable/prepare fails. In this case we shouldn't call clk_disable_unprepare. Reported-by: Michał Zegan <webczat_200@poczta.onet.pl> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove member parent_mux from struct meson_hostHeiner Kallweit
Member mux_parent isn't used outside meson_mmc_clk_init. So remove it and replace it with a local variable in meson_mmc_clk_init. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove unneeded variable in meson_mmc_clk_initHeiner Kallweit
Because the DT requires a fixed number of mux parent clocks, variable mux_parent_count can be replaced with constant MUX_CLK_NUM_PARENTS, so remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: remove unused members irq, ocr_mask from struct meson_hostHeiner Kallweit
Member ocr_mask is never used and member irq we can replace with a local variable in meson_mmc_probe. So let's remove both members. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: make two functions return voidHeiner Kallweit
The return value of meson_mmc_request_done and meson_mmc_read_resp isn't used, so make both functions return void. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: meson-gx: simplify bounce buffer setting in meson_mmc_start_cmdHeiner Kallweit
Core ensures that there are no commands with cmd->data being set and nothing to transfer. And we don't have to reset bit CMD_CFG_DATA_NUM because cmd_cfg was zero-initialized and this bit isn't set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: improve dw_mci_reset a bitShawn Lin
Too much condition iteration makes the code less readable. Slightly improve it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: move mci_send_cmd forward to avoid declarationShawn Lin
No functional change intended. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: remove declaration of dw_mci_card_busyShawn Lin
No need to declar it there, remove it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: move dw_mci_get_cd forward to avoid declarationShawn Lin
No functional change intended. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: move dw_mci_ctrl_reset forward to avoid declarationShawn Lin
No functional change intended. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: move dw_mci_reset forward to avoid declarationShawn Lin
No functional change intended. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24mmc: dw_mmc: improve the timeout polling codeShawn Lin
Just use the readl_poll_timeout{_atomic} to avold open coding them. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-24dm verity: switch to using asynchronous hash crypto APIGilad Ben-Yossef
Use of the synchronous digest API limits dm-verity to using pure CPU based algorithm providers and rules out the use of off CPU algorithm providers which are normally asynchronous by nature, potentially freeing CPU cycles. This can reduce performance per Watt in situations such as during boot time when a lot of concurrent file accesses are made to the protected volume. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> CC: Eric Biggers <ebiggers3@gmail.com> CC: Ondrej Mosnáček <omosnacek+linux-crypto@gmail.com> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm crypt: use WQ_HIGHPRI for the IO and crypt workqueuesTim Murray
Running dm-crypt with workqueues at the standard priority results in IO competing for CPU time with standard user apps, which can lead to pipeline bubbles and seriously degraded performance. Move to using WQ_HIGHPRI workqueues to protect against that. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm crypt: rewrite (wipe) key in crypto layer using random dataOndrej Kozina
The message "key wipe" used to wipe real key stored in crypto layer by rewriting it with zeroes. Since commit 28856a9 ("crypto: xts - consolidate sanity check for keys") this no longer works in FIPS mode for XTS. While running in FIPS mode the crypto key part has to differ from the tweak key. Fixes: 28856a9 ("crypto: xts - consolidate sanity check for keys") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Kozina <okozina@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm mpath: requeue after a small delay if blk_get_request() failsBart Van Assche
If blk_get_request() returns ENODEV then multipath_clone_and_map() causes a request to be requeued immediately. This can cause a kworker thread to spend 100% of the CPU time of a single core in __blk_mq_run_hw_queue() and also can cause device removal to never finish. Avoid this by only requeuing after a delay if blk_get_request() fails. Additionally, reduce the requeue delay. Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm era: save spacemap metadata root after the pre-commitSomasundaram Krishnasamy
When committing era metadata to disk, it doesn't always save the latest spacemap metadata root in superblock. Due to this, metadata is getting corrupted sometimes when reopening the device. The correct order of update should be, pre-commit (shadows spacemap root), save the spacemap root (newly shadowed block) to in-core superblock and then the final commit. Cc: stable@vger.kernel.org Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm thin: fix a memory leak when passing discard bio downDennis Yang
dm-thin does not free the discard_parent bio after all chained sub bios finished. The following kmemleak report could be observed after pool with discard_passdown option processes discard bios in linux v4.11-rc7. To fix this, we drop the discard_parent bio reference when its endio (passdown_endio) called. unreferenced object 0xffff8803d6b29700 (size 256): comm "kworker/u8:0", pid 30349, jiffies 4379504020 (age 143002.776s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 f0 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a5efd9>] kmemleak_alloc+0x49/0xa0 [<ffffffff8114ec34>] kmem_cache_alloc+0xb4/0x100 [<ffffffff8110eec0>] mempool_alloc_slab+0x10/0x20 [<ffffffff8110efa5>] mempool_alloc+0x55/0x150 [<ffffffff81374939>] bio_alloc_bioset+0xb9/0x260 [<ffffffffa018fd20>] process_prepared_discard_passdown_pt1+0x40/0x1c0 [dm_thin_pool] [<ffffffffa018b409>] break_up_discard_bio+0x1a9/0x200 [dm_thin_pool] [<ffffffffa018b484>] process_discard_cell_passdown+0x24/0x40 [dm_thin_pool] [<ffffffffa018b24d>] process_discard_bio+0xdd/0xf0 [dm_thin_pool] [<ffffffffa018ecf6>] do_worker+0xa76/0xd50 [dm_thin_pool] [<ffffffff81086239>] process_one_work+0x139/0x370 [<ffffffff810867b1>] worker_thread+0x61/0x450 [<ffffffff8108b316>] kthread+0xd6/0xf0 [<ffffffff81a6cd1f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff Cc: stable@vger.kernel.org Signed-off-by: Dennis Yang <dennisyang@qnap.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm btree: fix for dm_btree_find_lowest_key()Vinothkumar Raja
dm_btree_find_lowest_key() is giving incorrect results. find_key() traverses the btree correctly for finding the highest key, but there is an error in the way it traverses the btree for retrieving the lowest key. dm_btree_find_lowest_key() fetches the first key of the rightmost block of the btree instead of fetching the first key from the leftmost block. Fix this by conditionally passing the correct parameter to value64() based on the @find_highest flag. Cc: stable@vger.kernel.org Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu> Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu> Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24Merge branch 'nfp-dma-adjust_head-fixes'David S. Miller
Jakub Kicinski says: ==================== nfp: DMA flags, adjust head and fixes This series takes advantage of Alex's DMA_ATTR_SKIP_CPU_SYNC to make XDP packet modifications "correct" from DMA API point of view. It also allows us to parse the metadata before we run XDP at no additional DMA sync cost. That way we can get rid of the metadata memcpy, and remove the last upstream user of bpf_prog->xdp_adjust_head. David's patch adds a way to read capabilities from the management firmware. There are also two net-next fixes. Patch 4 which fixes what seems to be a result of a botched rebase on my part. Patch 5 corrects locking when state of ethernet ports is being refreshed. v3: move the sync from alloc func to the actual give to hw func v2: sync rx buffers before giving them to the card (Alex) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24nfp: remove the refresh of all ports optimizationJakub Kicinski
The code refreshing the eth port state was trying to update state of all ports of the card. Unfortunately to safely walk the port list we would have to hold the port lock, which we can't due to lock ordering constraints against rtnl. Make the per-port sync refresh and async refresh of all ports completely separate routines. Fixes: 172f638c93dd ("nfp: add port state refresh") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24nfp: fix free list buffer size reportingJakub Kicinski
XDP headroom should not be included in free list buffer size. Fixes: 6fe0c3b43804 ("nfp: add support for xdp_adjust_head()") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24nfp: add NSP routine to get static informationDavid Brunecz
Retrieve identifying information from the NSP. For now it only contains versions of firmware subcomponents. Signed-off-by: David Brunecz <david.brunecz@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24nfp: parse metadata prepend before XDP runsJakub Kicinski
Calling memcpy to shift metadata out of the way for XDP to run seems like an overkill. The most common metadata contents are 8 bytes containing type and flow hash. Simply parse the metadata before we run XDP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attrJakub Kicinski
DMA unmap may destroy changes CPU made to the buffer. To make XDP run correctly on non-x86 platforms we should use the DMA_ATTR_SKIP_CPU_SYNC attribute. Thanks to using the attribute we can now push the sync operation to the common code path from XDP handler. A little bit of variable name reshuffling is required to bring the code back to readable state. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24dm ioctl: remove double parenthesesMatthias Kaehlcke
The extra pair of parantheses is not needed and causes clang to generate warnings about the DM_DEV_CREATE_CMD comparison in validate_params(). Also remove another double parentheses that doesn't cause a warning. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24Merge branch 'cls_flower-MPLS'David S. Miller
Benjamin LaHaise says: ==================== flower: add MPLS matching support This patch series adds support for parsing MPLS flows in the flow dissector and the flower classifier. Each of the MPLS TTL, BOS, TC and Label fields can be used for matching. v2: incorporate style feedback, move #defines to linux/include/mpls.h Note: this omits Jiri's request to remove tabs between the type and field names in struct declarations. This would be inconsistent with numerous other struct definitions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24cls_flower: add support for matching MPLS fields (v2)Benjamin LaHaise
Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24flow_dissector: add mpls support (v2)Benjamin LaHaise
Add support for parsing MPLS flows to the flow dissector in preparation for adding MPLS match support to cls_flower. Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <jhs@mojatatu.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24Merge branch 'tcp-fastopen-middlebox-fixes'David S. Miller
Wei Wang says: ==================== net/tcp_fastopen: Fix for various TFO firewall issues Currently there are still some firewall issues in the middlebox which make the middlebox drop packets silently for TFO sockets. This kind of issue is hard to be detected by the end client. This patch series tries to detect such issues in the kernel and disable TFO temporarily. More details about the issues and the fixes are included in the following patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Remove mss check in tcp_write_timeout()Wei Wang
Christoph Paasch from Apple found another firewall issue for TFO: After successful 3WHS using TFO, server and client starts to exchange data. Afterwards, a 10s idle time occurs on this connection. After that, firewall starts to drop every packet on this connection. The fix for this issue is to extend existing firewall blackhole detection logic in tcp_write_timeout() by removing the mss check. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Add snmp counter for blackhole detectionWei Wang
This counter records the number of times the firewall blackhole issue is detected and active TFO is disabled. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Disable active side TFO in certain scenariosWei Wang
Middlebox firewall issues can potentially cause server's data being blackholed after a successful 3WHS using TFO. Following are the related reports from Apple: https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf Slide 31 identifies an issue where the client ACK to the server's data sent during a TFO'd handshake is dropped. C ---> syn-data ---> S C <--- syn/ack ----- S C (accept & write) C <---- data ------- S C ----- ACK -> X S [retry and timeout] https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf Slide 5 shows a similar situation that the server's data gets dropped after 3WHS. C ---- syn-data ---> S C <--- syn/ack ----- S C ---- ack --------> S S (accept & write) C? X <- data ------ S [retry and timeout] This is the worst failure b/c the client can not detect such behavior to mitigate the situation (such as disabling TFO). Failing to proceed, the application (e.g., SSL library) may simply timeout and retry with TFO again, and the process repeats indefinitely. The proposed solution is to disable active TFO globally under the following circumstances: 1. client side TFO socket detects out of order FIN 2. client side TFO socket receives out of order RST We disable active side TFO globally for 1hr at first. Then if it happens again, we disable it for 2h, then 4h, 8h, ... And we reset the timeout to 1hr if a client side TFO sockets not opened on loopback has successfully received data segs from server. And we examine this condition during close(). The rational behind it is that when such firewall issue happens, application running on the client should eventually close the socket as it is not able to get the data it is expecting. Or application running on the server should close the socket as it is not able to receive any response from client. In both cases, out of order FIN or RST will get received on the client given that the firewall will not block them as no data are in those frames. And we want to disable active TFO globally as it helps if the middle box is very close to the client and most of the connections are likely to fail. Also, add a debug sysctl: tcp_fastopen_blackhole_detect_timeout_sec: the initial timeout to use when firewall blackhole issue happens. This can be set and read. When setting it to 0, it means to disable the active disable logic. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24Merge tag 'mlx5-updates-2017-04-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-04-22 Sparse and compiler warnings fixes from Stephen Hemminger. From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net: add rcu locking when changing early demuxDavid Ahern
systemd-sysctl is triggering a suspicious RCU usage message when net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via a sysctl config file: [ 33.896184] =============================== [ 33.899558] [ ERR: suspicious RCU usage. ] [ 33.900624] 4.11.0-rc7+ #104 Not tainted [ 33.901698] ------------------------------- [ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage! [ 33.905724] other info that might help us debug this: [ 33.907656] rcu_scheduler_active = 2, debug_locks = 0 [ 33.909288] 1 lock held by systemd-sysctl/143: [ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48 [ 33.912407] stack backtrace: [ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104 [ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 33.917870] Call Trace: [ 33.918431] dump_stack+0x81/0xb6 [ 33.919241] lockdep_rcu_suspicious+0x10f/0x118 [ 33.920263] proc_configure_early_demux+0x65/0x10a [ 33.921391] proc_udp_early_demux+0x3a/0x41 add rcu locking to proc_configure_early_demux. Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>