summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-03-04sh_eth: Mitigate lost statistics updatesBen Hutchings
The statistics registers have write-clear behaviour, which means we will lose any increment between the read and write. Mitigate this by only clearing when we read a non-zero value, so we will never falsely report a total of zero. This also saves time as we only handle error statistics here and they won't often be incremented. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04sh_eth: Optionally log RX and TX status for each completed descriptorBen Hutchings
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04sh_eth: Implement ethtool register dump operationsBen Hutchings
There are many different sets of registers implemented by the different versions of this controller, and we can only expect this to get more complicated in future. Limit how much ethtool needs to know by including an explicit bitmap of which registers are included in the dump, allowing room for future growth in the number of possible registers. As I don't have datasheets for all of these, I've only included registers that are: - defined in all 5 register type arrays, or - used by the driver, or - documented in the datasheet I have Add one new capability flag so we can tell whether the RTRATE register is implemented. Delete the TSU_ADRL0 and TSU_ADR{H,L}31 definitions, as they weren't used and the address table is already assumed to be contiguous. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04sh_eth: WARN on access to a register not implemented in a particular chipBen Hutchings
Currently we may silently read/write a register at offset 0. Change this to WARN and then ignore the write or read-back all-ones. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04sh_eth: Implement multicast statistic based on the RFS8 status bitBen Hutchings
At least on the R8A7790, RFS8 reflects the RINT8 (multicast) MAC status flag. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04Merge tag 'dma-buf-for-4.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf Pull dma-buf fixes from Sumit Semwal: "Minor timeout & other fixes on reservation/fence" * tag 'dma-buf-for-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf: reservation: Remove shadowing local variable 'ret' dma-buf/fence: don't wait when specified timeout is zero reservation: wait only with non-zero timeout specified (v3)
2015-03-04Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Marcelo Tosatti: "KVM bug fixes, including a SVM interrupt injection regression fix, MIPS and ARM bug fixes" * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MIPS: Enable after disabling interrupt KVM: MIPS: Fix trace event to save PC directly KVM: SVM: fix interrupt injection (apic->isr_count always 0) KVM: emulate: fix CMPXCHG8B on 32-bit hosts KVM: VMX: fix build without CONFIG_SMP arm/arm64: KVM: Add exit reaons to kvm_exit event tracing ARM: KVM: Fix size check in __coherent_cache_guest_page
2015-03-04netfilter: nf_tables: fix error handling of rule replacementPablo Neira Ayuso
In general, if a transaction object is added to the list successfully, we can rely on the abort path to undo what we've done. This allows us to simplify the error handling of the rule replacement path in nf_tables_newrule(). This implicitly fixes an unnecessary removal of the old rule, which needs to be left in place if we fail to replace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04netfilter: nf_tables: fix userdata length overflowPatrick McHardy
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8 to store its size. Introduce a struct nft_userdata which contains a length field and indicate its presence using a single bit in the rule. The length field of struct nft_userdata is also a u8, however we don't store zero sized data, so the actual length is udata->len + 1. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04netfilter: nf_tables: check for overflow of rule dlen fieldPatrick McHardy
Check that the space required for the expressions doesn't exceed the size of the dlen field, which would lead to the iterators crashing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04netfilter: nf_tables: fix transaction race conditionPatrick McHardy
A race condition exists in the rule transaction code for rules that get added and removed within the same transaction. The new rule starts out as inactive in the current and active in the next generation and is inserted into the ruleset. When it is deleted, it is additionally set to inactive in the next generation as well. On commit the next generation is begun, then the actions are finalized. For the new rule this would mean clearing out the inactive bit for the previously current, now next generation. However nft_rule_clear() clears out the bits for *both* generations, activating the rule in the current generation, where it should be deactivated due to being deleted. The rule will thus be active until the deletion is finalized, removing the rule from the ruleset. Similarly, when aborting a transaction for the same case, the undo of insertion will remove it from the RCU protected rule list, the deletion will clear out all bits. However until the next RCU synchronization after all operations have been undone, the rule is active on CPUs which can still see the rule on the list. Generally, there may never be any modifications of the current generations' inactive bit since this defeats the entire purpose of atomicity. Change nft_rule_clear() to only touch the next generations bit to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04Merge tag 'arc-4.0-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Fix for /proc/<pid>/maps "stack" vma annotation - sched stats not printing correct sleeping task PC - perf not reporting page faults * tag 'arc-4.0-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: Fix thread_saved_pc() ARC: Fix KSTK_ESP() ARC: perf: Enable generic software events ARC: Make arc_unwind_core accessible externally
2015-03-04ASoC: omap-pcm: Correct dma maskPeter Ujfalusi
DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be set to 32. The 64 was introduced by commit (in 2009): a152ff24b978 ASoC: OMAP: Make DMA 64 aligned But the dma_mask and coherent_dma_mask can not be used to specify alignment. Fixes: a152ff24b978 (ASoC: OMAP: Make DMA 64 aligned) Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2015-03-04Merge tag 'powerpc-4.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: - Fix for dynticks. - Fix for smpboot bug. - Fix for IOMMU group refcounting. * tag 'powerpc-4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/iommu: Remove IOMMU device references via bus notifier powerpc/smp: Wait until secondaries are active & online powerpc: Re-enable dynticks
2015-03-04seq_buf: Fix seq_buf_vprintf() truncationSteven Rostedt (Red Hat)
In seq_buf_vprintf(), vsnprintf() is used to copy the format into the buffer remaining in the seq_buf structure. The return of vsnprintf() is the amount of characters written to the buffer excluding the '\0', unless the line was truncated! If the line copied does not fit, it is truncated, and a '\0' is added to the end of the buffer. But in this case, '\0' is included in the length of the line written. To know if the buffer had overflowed, the return length will be the same as the length of the buffer passed in. The check in seq_buf_vprintf() only checked if the length returned from vsnprintf() would fit in the buffer, as the seq_buf_vprintf() is only to be an all or nothing command. It either writes all the string into the seq_buf, or none of it. If the string is truncated, the pointers inside the seq_buf must be reset to what they were when the function was called. This is not the case. On overflow, it copies only part of the string. The fix is to change the overflow check to see if the length returned from vsnprintf() is less than the length remaining in the seq_buf buffer, and not if it is less than or equal to as it currently does. Then seq_buf_vprintf() will know if the write from vsnpritnf() was truncated or not. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-04ACPI / video: Propagate the error code for acpi_video_registerChris Wilson
Report the actual error code from acpi_bus_register_driver(), it may help future debugging (typically ENODEV as previously reported, but the unusual cases are where it may help most). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04ACPI / video: Load the module even if ACPI is disabledChris Wilson
i915.ko depends upon the acpi/video.ko module and so refuses to load if ACPI is disabled at runtime if for example the BIOS is broken beyond repair. acpi/video provides an optional service for i915.ko and so we should just allow the modules to load, but do no nothing in order to let the machines boot correctly. Reported-by: Bill Augur <bill-auger@programmer.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Cc: All applicable <stable@vger.kernel.org> Acked-by: Aaron Lu <aaron.lu@intel.com> [ rjw: Fixed up the new comment in acpi_video_init() ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04ath10k: disable multi-vif ps by defaultMichal Kazior
Not all firmware revisions have a proper multi-interface client powersaving implementation, e.g. qca6174 WLAN.RM.2.0-00073. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-04ath10k: workaround qca6174 sta powersave issueMichal Kazior
qca6184 WLAN.RM.2.0-00073 has a bug in sta powersave state machine and requires peer param to be poked to enable the powersave. Calling this unconditionally should be safe for other chips/firmwares. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-04PM / Domains: cleanup: rename gpd -> genpd in debugfs interfaceKevin Hilman
To keep consisitency with the rest of the file, use 'genpd' as the name of the 'struct generic_pm_domain' pointer instead of 'gpd'. This is just a rename, no functional changes. Signed-off-by: Kevin Hilman <khilman@linaro.org> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04cpufreq: ppc: Add missing #include <asm/smp.h>Geert Uytterhoeven
If CONFIG_SMP=n, <linux/smp.h> does not include <asm/smp.h>, causing: drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init': drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-function-declaration] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04x86/PCI/ACPI: Relax ACPI resource descriptor checks to work around BIOS bugsJiang Liu
Some BIOSes report incorrect length for ACPI address space descriptors, so relax the checks to avoid regressions. This issue has appeared several times as: 3162b6f0c5e1 ("PNPACPI: truncate _CRS windows with _LEN > _MAX - _MIN + 1") d558b483d5a7 ("x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1") f238b414a74a ("PNPACPI: compute Address Space length rather than using _LEN") 48728e077480 ("x86/PCI: compute Address Space length rather than using _LEN") Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=94221 for more details and example malformed ACPI resource descriptors. Link: https://bugzilla.kernel.org/show_bug.cgi?id=94221 Fixes: 593669c2ac0f (x86/PCI/ACPI: Use common ACPI resource interfaces ...) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Dave Airlie <airlied@redhat.com> Tested-by: Prakash Punnoor <prakash@punnoor.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04x86/PCI/ACPI: Ignore resources consumed by host bridge itselfJiang Liu
When parsing resources for PCI host bridge, we should ignore resources consumed by host bridge itself and only report window resources available to child PCI busses. Fixes: 593669c2ac0f (x86/PCI/ACPI: Use common ACPI resource interfaces ...) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-04dma: mmp-tdma: refine dma disable and dma-pos updateQiao Zhou
Below are the refinements. 1. Set DMA abort bit when disabling dma channel. This will clear the remaining data in dma FIFO, to fix channel-swap issue. 2. Read DMA HW pointer when updating DMA status. Previously dma position is calculated by adding one period size in dma interrupt. This is inaccurate/insufficient for some high-quality audio APP. Since interrupt bottom half handler has variable schedule delay, it causes big error when calculating sample delay. Read the actual HW pointer and feedback can improve the accuracy. 3. Do some minor code clean. Signed-off-by: Qiao Zhou <zhouqiao@marvell.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-03-04regulator: Fix regression due to NULL constraints checkTakashi Iwai
The commit [39f802d6b6d9: 'regulator: Build sysfs entries with static attribute groups'] converted the sysfs entry creation to static attribute groups, but this resulted in a regression due to the NULL check of rdev->constraints. At the point where the device is registered, rdev->constraints isn't set, so the attributes depending on it are missing. We may fix it by shuffling the code order in regulator_register(), but a quicker fix is to just remove this NULL check. rdev->constraints is in anyway always set to non-NULL in set_machine_constraints(), thus the check there is basically superfluous. Fixes: 39f802d6b6d9 ('regulator: Build sysfs entries with static attribute groups') Signed-off-by: Takashi Iwai <tiwai@suse.de> Reportded-by: Steve Twiss <stwiss.opensource@diasemi.com> Tested-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-03-04ath10k: fix wmm params per vdevMarek Puzyniak
During wmm tests changing wmm parameters did not change anything. This was because of mismatch in WMM params per vdev command. WMM params per vdev uses different command structure than wmm params per pdev command. Patch concerns qca6174. Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-04ath10k: workaround corrupted htt rx eventsMichal Kazior
qca6174 WLAN.RM.2.0-00073 firmware uses full rx reordering offload and delivers Rx via a new HTT event. The event however is incorrectly generated in firmware and becomes overly long (with trailing garbage). This was hitting defined CE buffer limit that was programmed to the device and caused device to crash upon busier Rx traffic. Increasing the CE buffer limit for HTT Rx pipe to 2KBytes seems to be enough to workaround this problem. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-04ath10k: delete unnecessary checks before the function call "release_firmware"Markus Elfring
The release_firmware() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-04KVM: s390: non-LPAR case obsolete during facilities mask initMichael Mueller
With patch "include guest facilities in kvm facility test" it is no longer necessary to have special handling for the non-LPAR case. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-04KVM: s390: include guest facilities in kvm facility testMichael Mueller
Most facility related decisions in KVM have to take into account: - the facilities offered by the underlying run container (LPAR/VM) - the facilities supported by the KVM code itself - the facilities requested by a guest VM This patch adds the KVM driver requested facilities to the test routine. It additionally renames struct s390_model_fac to kvm_s390_fac and its field names to be more meaningful. The semantics of the facilities stored in the KVM architecture structure is changed. The address arch.model.fac->list now points to the guest facility list and arch.model.fac->mask points to the KVM facility mask. This patch fixes the behaviour of KVM for some facilities for guests that ignore the guest visible facility bits, e.g. guests could use transactional memory intructions on hosts supporting them even if the chosen cpu model would not offer them. The userspace interface is not affected by this change. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-04KVM: s390: fix in memory copy of facility listsMichael Mueller
The facility lists were not fully copied. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-04KVM: s390/cpacf: Fix kernel bug under z/VMChristian Borntraeger
Under z/VM PQAP might trigger an operation exception if no crypto cards are defined via APVIRTUAL or APDEDICATED. [ 386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable] [ 386.098693] illegal operation: 0001 ilc:2 [#1] SMP [...] [ 386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98) [...] [ 386.098804] [<000000000013627c>] kvm_arch_init_vm+0x29c/0x358 [ 386.098806] [<000000000012d008>] kvm_dev_ioctl+0xc0/0x460 [ 386.098809] [<00000000002c639a>] do_vfs_ioctl+0x332/0x508 [ 386.098811] [<00000000002c660e>] SyS_ioctl+0x9e/0xb0 [ 386.098814] [<000000000070476a>] system_call+0xd6/0x258 [ 386.098815] [<000003fffc7400a2>] 0x3fffc7400a2 Lets add an extable entry and provide a zeroed config in that case. Reported-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Tested-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
2015-03-04bfin_can: Merge header file from arch dependent locationAaron Wu
Header file was in arch dependent location arch/blackfin/include/asm/bfin_can.h, Now move and merge the useful contents of header file into driver code, note the original header file is reserved for full registers set access test by other code so it survives. Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-04bfin_can: introduce ioremap to comply to archs with MMUAaron Wu
Blackfin was built without MMU, old driver code access the IO space by physical address, introduce the ioremap approach to be compitable with the common style supporting MMU enabled arch. Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-04bfin_can: rewrite the blackfin style of read/write to common onesAaron Wu
Replace the blackfin arch dependent style of bfin_read/bfin_write with common readw/writew Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-04drm/i915: gen4: work around hang during hibernationImre Deak
Bjørn reported that his machine hang during hibernation and eventually bisected the problem to the following commit: commit da2bc1b9db3351addd293e5b82757efe1f77ed1d Author: Imre Deak <imre.deak@intel.com> Date: Thu Oct 23 19:23:26 2014 +0300 drm/i915: add poweroff_late handler The problem seems to be that after the kernel puts the device into D3 the BIOS still tries to access it, or otherwise assumes that it's in D0. This is clearly bogus, since ACPI mandates that devices are put into D3 by the OSPM if they are not wake-up sources. In the future we want to unify more of the driver's runtime and system suspend paths, for example by skipping all the system suspend/hibernation hooks if the device is runtime suspended already. Accordingly for all other platforms the goal is still to properly power down the device during hibernation. v2: - Another GEN4 Lenovo laptop had the same issue, while platforms from other vendors (including mobile and desktop, GEN4 and non-GEN4) seem to work fine. Based on this apply the workaround on all GEN4 Lenovo platforms. - add code comment about failing platforms (Ville) Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no> Cc: stable@vger.kernel.org # v3.19 Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-04drm/i915: Check for driver readyness before handling an underrun interruptChris Wilson
When we takeover from the BIOS and install our interrupt handler, the BIOS may have left us a few surprises in the form of spontaneous interrupts. (This is especially likely on hardware like 965gm where display fifo underruns are continuous and the GMCH cannot filter that interrupt souce.) As we enable our IRQ early so that we can use it during hardware probing, our interrupt handler must be prepared to handle a few sources prior to being fully configured. As such, we need to add a simple is-ready check prior to dereferencing our KMS state for reporting underruns. Reported-by: Rob Clark <rclark@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1193972 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org [Jani: dropped the extra !] Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-04Merge branch 'basic-mpls-support'David S. Miller
Eric W. Biederman says: ==================== Basic MPLS support take 2 On top of my two pending neighbour table prep patches here is the mpls support refactored to use them, and edited to not drop routes when an interface goes down. Additionally the addition of RTA_LLGATEWAY has been replaced with the addtion of RTA_VIA. RTA_VIA being an attribute that includes the address family as well as the address of the next hop. MPLS is at it's heart simple and I have endeavoured to maintain that simplicity in my implemenation. This is an implementation of a RFC3032 forwarding engine, and basic MPLS egress logic. Which should make linux sufficient to be a mpls forwarding node or to be a LSA (Label Switched Router) as it says in all of the MPLS documents. The ingress support will follow but it deserves it's own discussion so I am pushing it separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Multicast route table change notificationsEric W. Biederman
Unlike IPv4 this code notifies on all cases where mpls routes are added or removed and it never automatically removes routes. Avoiding both the userspace confusion that is caused by omitting route updates and the possibility of a flood of netlink traffic when an interface goes doew. For now reserved labels are handled automatically and userspace is not notified. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Netlink commands to add, remove, and dump routesEric W. Biederman
This change adds two new netlink routing attributes: RTA_VIA and RTA_NEWDST. RTA_VIA specifies the specifies the next machine to send a packet to like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it includes the address family of the address of the next machine to send a packet to. Currently the MPLS code supports addresses in AF_INET, AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac address is acquired from the neighbour table. For AF_PACKET the destination mac_address is specified in the netlink configuration. I think raw destination mac address support with the family AF_PACKET will prove useful. There is MPLS-TP which is defined to operate on machines that do not support internet packets of any flavor. Further seem to be corner cases where it can be useful. At this point I don't care much either way. RTA_NEWDST specifies the destination address to forward the packet with. MPLS typically changes it's destination address at every hop. For a swap operation RTA_NEWDST is specified with a length of one label. For a push operation RTA_NEWDST is specified with two or more labels. For a pop operation RTA_NEWDST is not specified or equivalently an emtpy RTAN_NEWDST is specified. Those new netlink attributes are used to implement handling of rt-netlink RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the MPLS label table. rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message, verify no unhandled attributes or unhandled values are present and sets up the data structures for mpls_route_add and mpls_route_del. I did my best to match up with the existing conventions with the caveats that MPLS addresses are all destination-specific-addresses, and so don't properly have a scope. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Functions for reading and wrinting mpls labels over netlinkEric W. Biederman
Reading and writing addresses in network byte order in netlink is traditional and I see no reason to change that. MPLS is interesting as effectively it has variabely length addresses (the MPLS label stack). To represent these variable length addresses in netlink I use a valid MPLS label stack (complete with stop bit). This achieves two things: a well defined existing format is used, and the data can be interpreted without looking at it's length. Not needed to look at the length to decode the variable length network representation allows existing userspace functions such as inet_ntop to be used without needed to change their prototype. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Basic support for adding and removing routesEric W. Biederman
mpls_route_add and mpls_route_del implement the basic logic for adding and removing Next Hop Label Forwarding Entries from the MPLS input label map. The addition and subtraction is done in a way that is consistent with how the existing routing table in Linux are maintained. Thus all of the work to deal with NLM_F_APPEND, NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE. Cases that are not clearly defined such as changing the interpretation of the mpls reserved labels is not allowed. Because it seems like the right thing to do adding an MPLS route without specifying an input label and allowing the kernel to pick a free label table entry is supported. The implementation is currently less than optimal but that can be changed. As I don't have anything else to test with only ethernet and the loopback device are the only two device types currently supported for forwarding MPLS over. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Add a sysctl to control the size of the mpls label tableEric W. Biederman
This sysctl gives two benefits. By defaulting the table size to 0 mpls even when compiled in and enabled defaults to not forwarding any packets. This prevents unpleasant surprises for users. The other benefit is that as mpls labels are allocated locally a dense table a small dense label table may be used which saves memory and is extremely simple and efficient to implement. This sysctl allows userspace to choose the restrictions on the label table size userspace applications need to cope with. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Basic routing supportEric W. Biederman
This change adds a new Kconfig option MPLS_ROUTING. The core of this change is the code to look at an mpls packet received from another machine. Look that packet up in a routing table and forward the packet on. Support of MPLS over ATM is not considered or attempted here. This implemntation follows RFC3032 and implements the MPLS shim header that can pass over essentially any network. What RFC3021 refers to as the as the Incoming Label Map (ILM) I call net->mpls.platform_label[]. What RFC3031 refers to as the Next Label Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the label fordwarding information base (lfib) might also be valid. Further the implemntation forwards packets as described in RFC3032. There is no need and given the original motivation for MPLS a strong discincentive to have a flexible label forwarding path. In essence the logic is the topmost label is read, looked up, removed, and replaced by 0 or more new lables and the sent out the specified interface to it's next hop. Quite a few optional features are not implemented here. Among them are generation of ICMP errors when the TTL is exceeded or the packet is larger than the next hop MTU (those conditions are detected and the packets are dropped instead of generating an icmp error). The traffic class field is always set to 0. The implementation focuses on IP over MPLS and does not handle egress of other kinds of protocols. Instead of implementing coordination with the neighbour table and sorting out how to input next hops in a different address family (for which there is value). I was lazy and implemented a next hop mac address instead. The code is simpler and there are flavor of MPLS such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is appropriate so a next hop by mac address would need to be implemented at some point. Two new definitions AF_MPLS and PF_MPLS are exposed to userspace. Decoding the mpls header must be done by first byeswapping a 32bit bit endian word into the local cpu endian and then bit shifting to extract the pieces. There is no C bit-field that can represent a wire format mpls header on a little endian machine as the low bits of the 20bit label wind up in the wrong half of third byte. Therefore internally everything is deal with in cpu native byte order except when writing to and reading from a packet. For management simplicity if a label is configured to forward out an interface that is down the packet is dropped early. Similarly if an network interface is removed rt_dev is updated to NULL (so no reference is preserved) and any packets for that label are dropped. Keeping the label entries in the kernel allows the kernel label table to function as the definitive source of which labels are allocated and which are not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Refactor how the mpls module is builtEric W. Biederman
This refactoring is needed to allow more than just mpls gso support to be built into the mpls moddule. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04Merge branch 'neigh-mpls-prep'David S. Miller
Eric W. Biederman says: ==================== Neighbour table prep for MPLS In preparation for using the IPv4 and IPv6 neighbour tables in my mpls code this patchset factors out ___neigh_lookup_noref from __ipv4_neigh_lookup_noref, __ipv6_lookup_noref and neigh_lookup. Allowing the lookup logic to be shared between the different implementations. At what appears to be no cost. (Aka the same assembly is generated for ip6_finish_output2 and ip_finish_output2). After that I add a simple function that takes an address family and an address consults the neighbour table and sends the packet to the appropriate location. The address family argument decoupls callers of neigh_xmit from the addresses families the packets are sent over. (Aka The ipv6 module can be loaded after mpls and a previously configured ipv6 next hop will start working). The refactoring in ___neigh_lookup_noref may be a bit overkill but it feels like the right thing to do. Especially since the same code is generated. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04neigh: Add helper function neigh_xmitEric W. Biederman
For MPLS I am building the code so that either the neighbour mac address can be specified or we can have a next hop in ipv4 or ipv6. The kind of next hop we have is indicated by the neighbour table pointer. A neighbour table pointer of NULL is a link layer address. A non-NULL neighbour table pointer indicates which neighbour table and thus which address family the next hop address is in that we need to look up. The code either sends a packet directly or looks up the appropriate neighbour table entry and sends the packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04neigh: Factor out ___neigh_lookup_norefEric W. Biederman
While looking at the mpls code I found myself writing yet another version of neigh_lookup_noref. We currently have __ipv4_lookup_noref and __ipv6_lookup_noref. So to make my work a little easier and to make it a smidge easier to verify/maintain the mpls code in the future I stopped and wrote ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and __ipv6_lookup_noref in terms of this new function. I tested my new version by verifying that the same code is generated in ip_finish_output2 and ip6_finish_output2 where these functions are inlined. To get to ___neigh_lookup_noref I added a new neighbour cache table function key_eq. So that the static size of the key would be available. I also added __neigh_lookup_noref for people who want to to lookup a neighbour table entry quickly but don't know which neibhgour table they are going to look up. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04bridge: fix bridge netlink RCU usageJohannes Berg
When the STP timer fires, it can call br_ifinfo_notify(), which in turn ends up in the new br_get_link_af_size(). This function is annotated to be using RTNL locking, which clearly isn't the case here, and thus lockdep warns: =============================== [ INFO: suspicious RCU usage. ] 3.19.0+ #569 Not tainted ------------------------------- net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage! Fix this by doing RCU locking here. Fixes: b7853d73e39b ("bridge: add vlan info to bridge setlink and dellink notification messages") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03NFSv4.1: Clear the old state by our client id before establishing a new leaseTrond Myklebust
If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag set, then that means our lease was established by a previous mount instance. Ensure that we detect this situation, and that we clear the state held by that mount. Reported-by: Jorge Mora <Jorge.Mora@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>