summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-08-09phylink: add helper for configuring 2500BaseX modesRussell King
Add a helper for MAC drivers to use in their validate callback to deal with 2500BaseX vs 1000BaseX modes, where the hardware supports both but it is not possible to automatically select between them. This helper defaults to 1000BaseX, as that is the 802.3 standard, and will allow users to select 2500BaseX either by forcing the speed if AN is disabled, or by changing the advertising mask if AN is enabled. Disabling AN is not recommended as it is only the speed that we're interested in controlling, not the duplex or pause mode parameters. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09signal: Don't restart fork when signals come in.Eric W. Biederman
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn> report that a periodic signal received during fork can cause fork to continually restart preventing an application from making progress. The code was being overly pessimistic. Fork needs to guarantee that a signal sent to multiple processes is logically delivered before the fork and just to the forking process or logically delivered after the fork to both the forking process and it's newly spawned child. For signals like periodic timers that are always delivered to a single process fork can safely complete and let them appear to logically delivered after the fork(). While examining this issue I also discovered that fork today will miss signals delivered to multiple processes during the fork and handled by another thread. Similarly the current code will also miss blocked signals that are delivered to multiple process, as those signals will not appear pending during fork. Add a list of each thread that is currently forking, and keep on that list a signal set that records all of the signals sent to multiple processes. When fork completes initialize the new processes shared_pending signal set with it. The calculate_sigpending function will see those signals and set TIF_SIGPENDING causing the new task to take the slow path to userspace to handle those signals. Making it appear as if those signals were received immediately after the fork. It is not possible to send real time signals to multiple processes and exceptions don't go to multiple processes, which means that that are no signals sent to multiple processes that require siginfo. This means it is safe to not bother collecting siginfo on signals sent during fork. The sigaction of a child of fork is initially the same as the sigaction of the parent process. So a signal the parent ignores the child will also initially ignore. Therefore it is safe to ignore signals sent to multiple processes and ignored by the forking process. Signals sent to only a single process or only a single thread and delivered during fork are treated as if they are received after the fork, and generally not dealt with. They won't cause any problems. V2: Added removal from the multiprocess list on failure. V3: Use -ERESTARTNOINTR directly V4: - Don't queue both SIGCONT and SIGSTOP - Initialize signal_struct.multiprocess in init_task - Move setting of shared_pending to before the new task is visible to signals. This prevents signals from comming in before shared_pending.signal is set to delayed.signal and being lost. V5: - rework list add and delete to account for idle threads v6: - Use sigdelsetmask when removing stop signals Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447 Reported-by: Wen Yang <wen.yang99@zte.com.cn> and Reported-by: majiang <ma.jiang@zte.com.cn> Fixes: 4a2c7a7837da ("[PATCH] make fork() atomic wrt pgrp/session signals") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-09NFS handle COPY reply CB_OFFLOAD call raceOlga Kornievskaia
It's possible that server replies back with CB_OFFLOAD call and COPY reply at the same time such that client will process CB_OFFLOAD before reply to COPY. For that keep a list of pending callback stateids received and then before waiting on completion check the pending list. Cleanup any pending copies on the client shutdown. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-09NFS add support for asynchronous COPYOlga Kornievskaia
Change xdr to always send COPY asynchronously. Keep the list copies send in a list under a server structure. Once copy is sent, it waits on a completion structure that will be signalled by the callback thread that receives CB_OFFLOAD. If CB_OFFLOAD returned an error and even if it returned partial bytes, ignore them (as we can't commit without a verifier to match) and return an error. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-09NFS COPY xdr handle async replyOlga Kornievskaia
If server returns async reply, it must include a callback stateid, wr_callback_id in the write_response4. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-09NFS OFFLOAD_CANCEL xdrOlga Kornievskaia
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-09ssb: Remove SSB_WARN_ON, SSB_BUG_ON and SSB_DEBUGMichael Büsch
Use the standard WARN_ON instead. If a small kernel is desired, WARN_ON can be disabled globally. Also remove SSB_DEBUG. Besides WARN_ON it only adds a tiny debug check. Include this check unconditionally. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-08-09blkcg: Introduce blkg_root_lookup()Bart Van Assche
This new function will be used in a later patch to verify whether a queue has been dissociated from the cgroup controller before being released. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-09block: Remove two superfluous #include directivesBart Van Assche
Commit 12f5b9314545 ("blk-mq: Remove generation seqeunce") removed the only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but did not remove the corresponding #include directives. Since these include directives are no longer needed, remove them. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <keith.busch@intel.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com>, Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-09Merge branch 'asoc-4.19' into asoc-nextMark Brown
2018-08-09Merge branch 'regmap-4.19' into regmap-nextMark Brown
2018-08-09Merge tag 'regmap-noinc-read' into regmap-4.19Mark Brown
regmap: Support non-incrementing registers Some devices have individual registers that don't autoincrement the register address during bulk reads but instead repeatedly read the same value, for example for monitoring GPIOs or ADCs. Add support for these.
2018-08-09regmap: Add regmap_noinc_read APICrestez Dan Leonard
The regmap API usually assumes that bulk read operations will read a range of registers but some I2C/SPI devices have certain registers for which a such a read operation will return data from an internal FIFO instead. Add an explicit API to support bulk read without range semantics. Some linux drivers use regmap_bulk_read or regmap_raw_read for such registers, for example mpu6050 or bmi150 from IIO. This only happens to work because when caching is disabled a single regmap read op will map to a single bus read op (as desired). This breaks if caching is enabled and reg+1 happens to be a cacheable register. Without regmap support refactoring a driver to enable regmap caching requires separate I2C and SPI paths. This is exactly what regmap is supposed to help avoid. Suggested-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Crestez Dan Leonard <leonard.crestez@intel.com> Signed-off-by: Stefan Popa <stefan.popa@analog.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-08-08net/mlx5: Remove unused mlx5_query_vport_admin_stateEli Cohen
mlx5_query_vport_admin_state() is not used anywhere. Remove it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5: Rename modify/query_vport state related enumsEran Ben Elisha
Modify and query vport state commands share the same admin_state and op_mod values, rename the enums to fit them both. In addition, remove the esw prefix from the admin state enum as this also applied for vnic. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5: Use max_num_eqs for calculation of required MSIX vectorsDenis Drozdov
New firmware has defined new HCA capability field called "max_num_eqs", that is the number of available EQs after subtracting reserved FW EQs. Before this capability the FW reported the EQ number in "log_max_eqs", the reported value also contained FW reserved EQs, but the driver might be failing to load on 320 cpus systems due to the fact that FW reserved EQs were not available to the driver. Now the driver has to obtain max_num_eqs value from new FW to get real number of EQs available. Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08i2c: core: Parse SDA hold time from firmwareAndy Shevchenko
There are two drivers already using the SDA hold time setting. It might be more in the future, thus, make I2C core to parse the setting for us if provided by firmware. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-08netfilter: nfnetlink_osf: add missing enum in nfnetlink_osf uapi headerFernando Fernandez Mancera
xt_osf_window_size_options was originally part of include/uapi/linux/netfilter/xt_osf.h, restore it. Fixes: bfb15f2a95cb ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-08overflow.h: Add arithmetic shift helperJason Gunthorpe
Add shift_overflow() helper to assist driver authors in ensuring that shift operations don't cause overflows or other odd conditions. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> [kees: tweaked comments and commit log, dropped unneeded assignment] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-08Merge branches 'arm/shmobile', 'arm/renesas', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'arm/omap', 'x86/amd', 'x86/vt-d' and 'core' into next
2018-08-08nvme.h: add support for ns write protect definitionsChaitanya Kulkarni
Add various definitions from NVMe 1.3 TP 4005. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08nvme.h: fixup ANA group descriptor formatHannes Reinecke
ANA Phase 3 draft had the 'reserved' field in the group descriptor format set to '23:17' (so that the first namespace identifier started at byte 24), but that got move with the approved TP to '31:17' (so that the first namespace identifier started at byte 32). Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08iommu: Remove the ->map_sg indirectionChristoph Hellwig
All iommu drivers use the default_iommu_map_sg implementation, and there is no good reason to ever override it. Just expose it as iommu_map_sg directly and remove the indirection, specially in our post-spectre world where indirect calls are horribly expensive. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-08-07net: phy: Add support for Broadcom Omega internal Combo GPHYArun Parameswaran
Add support for the Broadcom Omega SoC internal Combo Ethernet GPHY to the bcm7xxx phy driver. Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add cgroup local storage for BPF programs, which provides a fast accessible memory for storing various per-cgroup data like number of transmitted packets, etc, from Roman. 2) Support bpf_get_socket_cookie() BPF helper in several more program types that have a full socket available, from Andrey. 3) Significantly improve the performance of perf events which are reported from BPF offload. Also convert a couple of BPF AF_XDP samples overto use libbpf, both from Jakub. 4) seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header. Adds this action now to the seg6local BPF interface, from Mathieu. 5) Do not mark dst register as unbounded in MOV64 instruction when both src and dst register are the same, from Arthur. 6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier instructions on arm64 for the AF_XDP sample code, from Brian. 7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts over from Python 2 to Python 3, from Jeremy. 8) Enable BTF build flags to the BPF sample code Makefile, from Taeung. 9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee. 10) Several improvements to the README.rst from the BPF documentation to make it more consistent with RST format, from Tobin. 11) Replace all occurrences of strerror() by calls to strerror_r() in libbpf and fix a FORTIFY_SOURCE build error along with it, from Thomas. 12) Fix a bug in bpftool's get_btf() function to correctly propagate an error via PTR_ERR(), from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08powerpc/64s: Fix page table fragment refcount race vs speculative referencesNicholas Piggin
The page table fragment allocator uses the main page refcount racily with respect to speculative references. A customer observed a BUG due to page table page refcount underflow in the fragment allocator. This can be caused by the fragment allocator set_page_count stomping on a speculative reference, and then the speculative failure handler decrements the new reference, and the underflow eventually pops when the page tables are freed. Fix this by using a dedicated field in the struct page for the page table fragment allocator. Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") Cc: stable@vger.kernel.org # v3.10+ Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07cpu/hotplug: Fix SMT supported evaluationThomas Gleixner
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06Merge branch 'x86/pti-urgent' into x86/ptiThomas Gleixner
Integrate the PTI Global bit fixes which conflict with the 32bit PTI support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06Revert "ata: ahci_platform: allow disabling of hotplug to save power"Tejun Heo
This reverts commit aece27a2f01be4bb7683790f69cd1bed3a0929a2. Causes boot failure on some devices. http://lore.kernel.org/r/CA+G9fYuKW_jCFZPqG4tz=QY9ROfHO38KiCp9XTA+KaDOFVtcqQ@mail.gmail.com Signed-off-by: Tejun Heo <tj@kernel.org>
2018-08-06KVM: x86: Add tlb remote flush callback in kvm_x86_ops.Tianyu Lan
This patch is to provide a way for platforms to register hv tlb remote flush callback and this helps to optimize operation of tlb flush among vcpus for nested virtualization case. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: Switch 'requests' to be 64-bit (explicitly)KarimAllah Ahmed
Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to use the size of "requests" instead of the hard-coded '32'. That gives us a bit more room again for arch-specific requests as we already ran out of space for x86 due to the hard-coded check. The only exception here is ARM32 as it is still 32-bits. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06Merge tag 'v4.18-rc6' into HEADPaolo Bonzini
Pull bug fixes into the KVM development tree to avoid nasty conflicts.
2018-08-06PM / reboot: Eliminate race between reboot and suspendPingfan Liu
At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-06Merge back cpufreq changes for 4.19.Rafael J. Wysocki
2018-08-05Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block2Jens Axboe
Pull NVMe changes from Christoph: "This contains the support for TP4004, Asymmetric Namespace Access, which makes NVMe multipathing usable in practice." * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: use Retain Async Event bit to clear AEN nvmet: support configuring ANA groups nvmet: add minimal ANA support nvmet: track and limit the number of namespaces per subsystem nvmet: keep a port pointer in nvmet_ctrl nvme: add ANA support nvme: remove nvme_req_needs_failover nvme: simplify the API for getting log pages nvme.h: add ANA definitions nvme.h: add support for the log specific field Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-05Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>
2018-08-05Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-08-05 Here's the main bluetooth-next pull request for the 4.19 kernel. - Added support for Bluetooth Advertising Extensions - Added vendor driver support to hci_h5 HCI driver - Added serdev support to hci_h5 driver - Added support for Qualcomm wcn3990 controller - Added support for RTL8723BS and RTL8723DS controllers - btusb: Added new ID for Realtek 8723DE - Several other smaller fixes & cleanups Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05ip: use rb trees for IP frag queue.Peter Oskolkov
Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster. Tested: - a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD). Reported-by: Jann Horn <jannh@google.com> Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05net: modify skb_rbtree_purge to return the truesize of all purged skbs.Peter Oskolkov
Tested: see the next patch is the series. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05ptp_qoriq: support automatic configuration for ptp timerYangbo Lu
This patch is to support automatic configuration for ptp timer. If required ptp dts properties are not provided, driver could try to calculate a set of default configurations to initialize the ptp timer. This makes the driver work for many boards which don't have the required ptp dts properties in current kernel. Also the users could set dts properties by themselves according to their requirement. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for transparent proxying for nf_tables, from Mate Eckl. 2) Patchset to add OS passive fingerprint recognition for nf_tables, from Fernando Fernandez. This takes common code from xt_osf and place it into the new nfnetlink_osf module for codebase sharing. 3) Lightweight tunneling support for nf_tables. 4) meta and lookup are likely going to be used in rulesets, make them direct calls. From Florian Westphal. A bunch of incremental updates: 5) use PTR_ERR_OR_ZERO() from nft_numgen, from YueHaibing. 6) Use kvmalloc_array() to allocate hashtables, from Li RongQing. 7) Explicit dependencies between nfnetlink_cttimeout and conntrack timeout extensions, from Harsha Sharma. 8) Simplify NLM_F_CREATE handling in nf_tables. 9) Removed unused variable in the get element command, from YueHaibing. 10) Expose bridge hook priorities through uapi, from Mate Eckl. And a few fixes for previous Netfilter batch for net-next: 11) Use per-netns mutex from flowtable event, from Florian Westphal. 12) Remove explicit dependency on iptables CT target from conntrack zones, from Florian. 13) Fix use-after-free in rmmod nf_conntrack path, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05Merge 4.18-rc7 into master to pick up the KVM dependcyThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-04i2c: quirks: add zero length checksWolfram Sang
Some adapters do not support a message length of 0. Add this as a quirk so drivers don't have to open code it. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-03fork: Have new threads join on-going signal group stopsEric W. Biederman
There are only two signals that are delivered to every member of a signal group: SIGSTOP and SIGKILL. Signal delivery requires every signal appear to be delivered either before or after a clone syscall. SIGKILL terminates the clone so does not need to be considered. Which leaves only SIGSTOP that needs to be considered when creating new threads. Today in the event of a group stop TIF_SIGPENDING will get set and the fork will restart ensuring the fork syscall participates in the group stop. A fork (especially of a process with a lot of memory) is one of the most expensive system so we really only want to restart a fork when necessary. It is easy so check to see if a SIGSTOP is ongoing and have the new thread join it immediate after the clone completes. Making it appear the clone completed happened just before the SIGSTOP. The calculate_sigpending function will see the bits set in jobctl and set TIF_SIGPENDING to ensure the new task takes the slow path to userspace. V2: The call to task_join_group_stop was moved before the new task is added to the thread group list. This should not matter as sighand->siglock is held over both the addition of the threads, the call to task_join_group_stop and do_signal_stop. But the change is trivial and it is one less thing to worry about when reading the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03fork: Skip setting TIF_SIGPENDING in ptrace_init_taskEric W. Biederman
The code in calculate_sigpending will now handle this so it is just redundant and possibly a little confusing to continue setting TIF_SIGPENDING in ptrace_init_task. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03signal: Add calculate_sigpending()Eric W. Biederman
Add a function calculate_sigpending to test to see if any signals are pending for a new task immediately following fork. Signals have to happen either before or after fork. Today our practice is to push all of the signals to before the fork, but that has the downside that frequent or periodic signals can make fork take much much longer than normal or prevent fork from completing entirely. So we need move signals that we can after the fork to prevent that. This updates the code to set TIF_SIGPENDING on a new task if there are signals or other activities that have moved so that they appear to happen after the fork. As the code today restarts if it sees any such activity this won't immediately have an effect, as there will be no reason for it to set TIF_SIGPENDING immediately after the fork. Adding calculate_sigpending means the code in fork can safely be changed to not always restart if a signal is pending. The new calculate_sigpending function sets sigpending if there are pending bits in jobctl, pending signals, the freezer needs to freeze the new task or the live kernel patching framework need the new thread to take the slow path to userspace. I have verified that setting TIF_SIGPENDING does make a new process take the slow path to userspace before it executes it's first userspace instruction. I have looked at the callers of signal_wake_up and the code paths setting TIF_SIGPENDING and I don't see anything else that needs to be handled. The code probably doesn't need to set TIF_SIGPENDING for the kernel live patching as it uses a separate thread flag as well. But at this point it seems safer reuse the recalc_sigpending logic and get the kernel live patching folks to sort out their story later. V2: I have moved the test into schedule_tail where siglock can be grabbed and recalc_sigpending can be reused directly. Further as the last action of setting up a new task this guarantees that TIF_SIGPENDING will be properly set in the new process. The helper calculate_sigpending takes the siglock and uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending clear TIF_SIGPENDING if it is unnecessary. This allows reusing the existing code and keeps maintenance of the conditions simple. Oleg Nesterov <oleg@redhat.com> suggested the movement and pointed out the need to take siglock if this code was going to be called while the new task is discoverable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03new helper: inode_fake_hash()Al Viro
open-coded in a quite a few places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03new primitive: discard_new_inode()Al Viro
We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately *after* that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03netfilter: bridge: Expose nf_tables bridge hook priorities through uapiMáté Eckl
Netfilter exposes standard hook priorities in case of ipv4, ipv6 and arp but not in case of bridge. This patch exposes the hook priority values of the bridge family (which are different from the formerly mentioned) via uapi so that they can be used by user-space applications just like the others. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>