linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2011-12-15	cfg80211: validate nl80211 station handling better	Johannes Berg
	The nl80211 station handling code is a bit messy and doesn't do a lot of validation. It seems like this could be an issue for drivers that don't use mac80211 to validate everything. As cfg80211 doesn't keep station state, move the validation of allowing supported_rates to change for TDLS only in station mode to mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-15	Move limit definitions outside CONFIG_INET	Glauber Costa
	They need to be available for other protocols as well, since they are used in sock.c openly Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-15	[S390] cputime: add sparse checking and cleanup	Martin Schwidefsky
	Make cputime_t and cputime64_t nocast to enable sparse checking to detect incorrect use of cputime. Drop the cputime macros for simple scalar operations. The conversion macros are still needed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-15	sched: Mark parent and real_parent as __rcu	Kees Cook
	The parent and real_parent pointers should be considered __rcu, since they should be held under either tasklist_lock or rcu_read_lock. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20111214223925.GA27578@www.outflux.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15	Merge commit 'v3.2-rc5' into sched/core	Ingo Molnar
	Merge reason: Pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15	[SCSI] fcoe: fix fcoe in a DCB environment by adding DCB notifiers to set ↵	john fastabend
	skb priority Use DCB notifiers to set the skb priority to allow packets to be steered and tagged correctly over DCB enabled drivers that setup traffic classes. This allows queue_mapping() routines to be removed in these drivers that were previously inspecting the ethertype of every skb to mark FCoE/FIP frames. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-12-14	xen-balloon: convert sysdev_class to a regular subsystem	Kay Sievers
	After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14	edac: convert sysdev_class to a regular subsystem	Kay Sievers
	After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Doug Thompson <dougthompson@xmission.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14	driver-core: implement 'sysdev' functionality for regular devices and buses	Kay Sievers
	All sysdev classes and sysdev devices will converted to regular devices and buses to properly hook userspace into the event processing. There is no interesting difference between a 'sysdev' and 'device' which would justify to roll an entire own subsystem with different userspace export semantics. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are currently not properly available. Every converted sysdev class will create a regular device with the class name in /sys/devices/system and all registered devices will becom a children of theses devices. For compatibility reasons, the sysdev class-wide attributes are created at this parent device. (Do not copy that logic for anything new, subsystem- wide properties belong to the subsystem, not to some fake parent device created in /sys/devices.) Every sysdev driver is implemented as a simple subsystem interface now, and no longer called a driver. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14	NFC: Initial LLCP support	Samuel Ortiz
	This patch is an initial implementation for the NFC Logical Link Control protocol. It's also known as NFC peer to peer mode. This is a basic implementation as it lacks SDP (services Discovery Protocol), frames aggregation support, and frame rejecion parsing. Follow up patches will implement those missing features. This code has been tested against a Nexus S phone implementing LLCP 1.0. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-14	NFC: Set and get DEP general bytes	Samuel Ortiz
	Without an API for setting and getting the local and remote general bytes, drivers won't be able to properly establish a DEP link. This API also allows them to propagate the remote general bytes they get from the DEP link establishment up to the LLCP layer. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-14	NFC: Add a DEP link control netlink command	Samuel Ortiz
	NFC-DEP (Data Exchange Protocol) is an NFC MAC layer. This command allows to enable and disable the DEP link on to which e.g. LLCP can run. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-14	NFC: Add tx skb allocation routine	Samuel Ortiz
	This is a factorization of the current rawsock tx skb allocation routine, as it will be used by the LLCP code. We also rename nfc_alloc_skb to nfc_alloc_recv_skb for consistency sake. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-14	kref: fix up the kfree build problems	Greg Kroah-Hartman
	It turns out that some memory allocators use kobjects, which use krefs, and kref.h was wanting to figure out the address of kfree(), which ended up in a loop. kfree was only being needed for a warning to tell the caller that they were doing something stupid. Now we just move that warning into the comments for the functions, which results in a bit more fun as everyone enjoys digging for people to mock at times of boredom. So, remove the dependancy of slab.h on kref.h, and fix up the other include file as well (we really only need bug.h and atomic.h, not types.h). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oliver Neukum <oneukum@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14	inet: remove rcu protection on tw_net	Eric Dumazet
	commit b099ce2602d806 (net: Batch inet_twsk_purge) added rcu protection on tw_net for no obvious reason. struct net are refcounted anyway since timewait sockets escape from rcu protected sections. tw_net stay valid for the whole timwait lifetime. This also removes a lot of sparse errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14	drm/radeon/kms: add some new pci ids	Alex Deucher
	Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=43739 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-14	Merge branch 'rcu/next' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
2011-12-13	net: Remove unused neighbour layer ops.	David S. Miller
	It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	bcma: use static keyword for inline function declaration in bcma.h	Arend van Spriel
	Just scratching an itch here, but it makes more sense to use the static keyword if you think about how the compiler treats inline functions. Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Reviewed-by: Alwin Beukers <alwin@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Franky Lin <frankyl@broadcom.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	bcma: add set/mask macros for 16-bit register access	Arend van Spriel
	The BCMA header only had definitions for 32-bit register access. Used those as a template for the 16-bit flavour. Also changed them to inline functions to be on the safe side. As offset parameter is used twice there would be a problem when used like this: bcma_set32(core, offset++, val); Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Reviewed-by: Alwin Beukers <alwin@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Franky Lin <frankyl@broadcom.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	bcma: extract FEM info from SPROM	Rafał Miłecki
	Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	ssb: extract FEM info from SPROM	Rafał Miłecki
	Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	ieee80211: Introduce ieee80211_is_first_frag	Helmut Schaa
	Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	cfg80211: Fix race in bss timeout	Vasanthakumar Thiagarajan
	It is quite possible to run into a race in bss timeout where the drivers see the bss entry just before notifying cfg80211 of a roaming event but it got timed out by the time rdev->event_work got scehduled from cfg80211_wq. This would result in the following WARN-ON() along with the failure to notify the user space of the roaming. The other situation which is happening with ath6kl that runs into issue is when the driver reports roam to same AP event where the AP bss entry already got expired. To fix this, move cfg80211_get_bss() from __cfg80211_roamed() to cfg80211_roamed(). [158645.538384] WARNING: at net/wireless/sme.c:586 __cfg80211_roamed+0xc2/0x1b1() [158645.538810] Call Trace: [158645.538838] [<c1033527>] warn_slowpath_common+0x65/0x7a [158645.538917] [<c14cfacf>] ? __cfg80211_roamed+0xc2/0x1b1 [158645.538946] [<c103354b>] warn_slowpath_null+0xf/0x13 [158645.539055] [<c14cfacf>] __cfg80211_roamed+0xc2/0x1b1 [158645.539086] [<c14beb5b>] cfg80211_process_rdev_events+0x153/0x1cc [158645.539166] [<c14bd57b>] cfg80211_event_work+0x26/0x36 [158645.539195] [<c10482ae>] process_one_work+0x219/0x38b [158645.539273] [<c14bd555>] ? wiphy_new+0x419/0x419 [158645.539301] [<c10486cb>] worker_thread+0xf6/0x1bf [158645.539379] [<c10485d5>] ? rescuer_thread+0x1b5/0x1b5 [158645.539407] [<c104b3e2>] kthread+0x62/0x67 [158645.539484] [<c104b380>] ? __init_kthread_worker+0x42/0x42 [158645.539514] [<c151309a>] kernel_thread_helper+0x6/0xd Reported-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-12-13	mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet	Jack Morgenstein
	1. Added module parameters sr_iov and probe_vf for controlling enablement of SRIOV mode. 2. Increased default max num-qps, num-mpts and log_num_macs to accomodate SRIOV mode 3. Added port_type_array as a module parameter to allow driver startup with ports configured as desired. In SRIOV mode, only ETH is supported, and this array is ignored; otherwise, for the case where the FW supports both port types (ETH and IB), the port_type_array parameter is used. By default, the port_type_array is set to configure both ports as IB. 4. When running in sriov mode, the master needs to initialize the ICM eq table to hold the eq's for itself and also for all the slaves. 5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap. 6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca). VFs obtain their startup information from the PF (master) device via the comm channel. 7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver aborts loading the device. 8. Do not allow setting port type via sysfs when running in SRIOV mode. 9. mlx4_get_ownership: Currently, only one PF is supported by the driver. If the HCA is burned with FW which enables more than one PF, only one of the PFs is allowed to run. The first one up grabs a FW ownership semaphone -- all other PFs will find that semaphore taken, and the driver will not allow them to run. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Liran Liss <liranl@mellanox.co.il> Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4_core: mtts resources units changed to offset	Marcel Apfelbaum
	In the previous implementation mtts are managed by: 1. order - log(mtt segments), 'mtt segment' groups several mtts together. 2. first_seg - segment location relative to mtt table. In the current implementation: 1. order - log(mtts) rather than segments 2. offset - mtt index in mtt table Note: The actual mtt allocation is made in segments but it is transparent to callers. Rational: The mtt resource holders are not interested on how the allocation of mtt is done, but rather on how they will use it. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4: Ethernet port management modifications	Eugenia Emantayev
	The physical port is now common to the PF and VFs. The port resources and configuration is managed by the PF, VFs can only influence the MTU of the port, it is set as max among all functions, Each function allocates RX buffers of required size to meet it's MTU enforcement. Port management code was moved to mlx4_core, as the mlx4_en module is virtualization unaware Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp including reserve/release range and add/release unicast steering. Let mlx4_register/unregister_mac deal only with MAC (un)registration. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4_core: Reduce number of PD bits to 17	Jack Morgenstein
	When SRIOV is enabled on the chip (at FW burning time), the HCA uses only 17 bits for the PD. The remaining 7 high-order bits are ignored. Change the allocator to return only 17 bits for the PD. The MSB 7 bits will be used to encode the slave number for consistency checking later on in the resource tracker. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed)	Jack Morgenstein
	For SRIOV, some Hypervisor commands can be executed directly (native = 1). Others should go through the command wrapper flow (for tracking resource usage, for example, or for changing some HCA configurations that slaves need to be notified of). This patch sets the groundwork for this capability -- adding the correct value of "native" in each case. Note that if SRIOV is not activated, this parameter has no effect. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4: Extanding port_mask functionality	Jack Morgenstein
	Port mask now has additional state. Port can be set as "none". In this case neither the mlx4_en or mlx4_ib drivers take ownership of the port. In multifunction mode there is an option to set the vfs as single ported devices. (in single function mode, both physical ports belong to same function) Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	mlx4_core: initial header-file changes for SRIOV support	Jack Morgenstein
	These changes will not affect module operation as yet. They are only to get some structs and enums in place for use by subsequent patches (making those smaller). Added here: * sriov state structs and inlines (mlx4_is_master/slave/mfunc) * comm-channel and vhcr support structures * enum values for new FW and comm-channel virtual commands (i.e., commands, passed via the comm channel to the PF-driver). * prototypes for many command wrapper functions (used by the PF context for processing FW commands passed to it by the VFs). * struct mlx4_eqe is moved from eq.c to mlx4.h (it will be used by other mlx4_core source files). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	net: fix build error if CONFIG_CGROUPS=n	Eric Dumazet
	Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13	kref: Remove the memory barriers	Peter Zijlstra
	Commit 1b0b3b9980e ("kref: fix CPU ordering with respect to krefs") wrongly adds memory barriers to kref. It states: some atomic operations are only atomic, not ordered. Thus a CPU is allowed to reorder memory references to an object to before the reference is obtained. This fixes it. While true, it fails to show why this is a problem. I say it is not a problem because if there is a race with kref_put() such that we could end up referencing a free'd object without this memory barrier, we would still have that race with the memory barrier. The kref_put() in question could complete (and free the object) before the atomic_inc() and we'd still be up shit creek. The kref_init() case is even worse, if your object is published at this time you're so wrong the memory barrier won't make a difference what so ever. If its not published, the act of publishing should include the needed barriers/locks to make sure all writes prior to the act of publishing are complete such that others will only observe a complete object. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oliver Neukum <oneukum@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-13	kref: Implement kref_put in terms of kref_sub	Peter Zijlstra
	Less lines of code is better. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-13	kref: Inline all functions	Peter Zijlstra
	These are tiny functions, there's no point in having them out-of-line. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8eccvi2ur2fzgi00xdjlbf5z@git.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-12	linux/log2.h: Fix rounddown_pow_of_two(1)	Linus Torvalds
	Exactly like roundup_pow_of_two(1), the rounddown version was buggy for the case of a compile-time constant '1' argument. Probably because it originated from the same code, sharing history with the roundup version from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix roundup_pow_of_two(1)"). However, unlike the roundup version, the fix for rounddown is to just remove the broken special case entirely. It's simply not needed - the generic code 1UL << ilog2(n) does the right thing for the constant '1' argment too. The only reason roundup needed that special case was because rounding up does so by subtracting one from the argument (and then adding one to the result) causing the obvious problems with "ilog2(0)". But rounddown doesn't do any of that, since ilog2() naturally truncates (ie "rounds down") to the right rounded down value. And without the ilog2(0) case, there's no reason for the special case that had the wrong value. tl;dr: rounddown_pow_of_two(1) should be 1, not 0. Acked-by: Dmitry Torokhov <dtor@vmware.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-12	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: core: Fix deadlock when the CONFIG_MMC_UNSAFE_RESUME is not defined mmc: sdhci-s3c: Remove old and misprototyped suspend operations mmc: tmio: fix clock gating on platforms with a .set_pwr() method mmc: sh_mmcif: fix clock gating on platforms with a .down_pwr() method mmc: core: Fix typo at mmc_card_sleep mmc: core: Fix power_off_notify during suspend mmc: core: Fix setting power notify state variable for non-eMMC mmc: core: Add quirk for long data read time mmc: Add module.h include to sdhci-cns3xxx.c mmc: mxcmmc: fix falling back to PIO mmc: omap_hsmmc: DMA unmap only once in case of MMC error
2011-12-12	netem: add cell concept to simulate special MAC behavior	Hagen Paul Pfeifer
	This extension can be used to simulate special link layer characteristics. Simulate because packet data is not modified, only the calculation base is changed to delay a packet based on the original packet size and artificial cell information. packet_overhead can be used to simulate a link layer header compression scheme (e.g. set packet_overhead to -20) or with a positive packet_overhead value an additional MAC header can be simulated. It is also possible to "replace" the 14 byte Ethernet header with something else. cell_size and cell_overhead can be used to simulate link layer schemes, based on cells, like some TDMA schemes. Another application area are MAC schemes using a link layer fragmentation with a (small) header each. Cell size is the maximum amount of data bytes within one cell. Cell overhead is an additional variable to change the per-cell-overhead (e.g. 5 byte header per fragment). Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per cell overhead 5 byte): tc qdisc add dev eth0 root netem rate 5kbit 20 100 5 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	tcp buffer limitation: per-cgroup limit	Glauber Costa
	This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to effectively control the amount of kernel memory pinned by a cgroup. This value is ignored in the root cgroup, and in all others, caps the value specified by the admin in the net namespaces' view of tcp_sysctl_mem. If namespaces are being used, the admin is allowed to set a value bigger than cgroup's maximum, the same way it is allowed to set pretty much unlimited values in a real box. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	per-netns ipv4 sysctl_tcp_mem	Glauber Costa
	This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	tcp memory pressure controls	Glauber Costa
	This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	socket: initial cgroup code.	Glauber Costa
	The goal of this work is to move the memory pressure tcp controls to a cgroup, instead of just relying on global conditions. To avoid excessive overhead in the network fast paths, the code that accounts allocated memory to a cgroup is hidden inside a static_branch(). This branch is patched out until the first non-root cgroup is created. So when nobody is using cgroups, even if it is mounted, no significant performance penalty should be seen. This patch handles the generic part of the code, and has nothing tcp-specific. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	foundations of per-cgroup memory pressure controlling.	Glauber Costa
	This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12	Merge branch 'master' of ↵	John W. Linville
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2011-12-11	net: use IS_ENABLED(CONFIG_IPV6)	Eric Dumazet
	Instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-11	rcu: Augment rcu_batch_end tracing for idle and callback state	Paul E. McKenney
	The current rcu_batch_end event trace records only the name of the RCU flavor and the total number of callbacks that remain queued on the current CPU. This is insufficient for testing and tuning the new dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along with whether or not any of the callbacks that were ready to invoke at the beginning of rcu_do_batch() are still queued. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11	driver-core/cpu: Expose hotpluggability to the rest of the kernel	Josh Triplett
	When architectures register CPUs, they indicate whether the CPU allows hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0. Userspace can easily query the hotpluggability of a CPU via sysfs; however, the kernel has no convenient way of accessing that property in an architecture-independent way. While the kernel can simply try it and see, some code needs to distinguish between "hotplug failed" and "hotplug has no hope of working on this CPU"; for example, rcutorture's CPU hotplug tests want to avoid drowning out real hotplug failures with expected failures. Expose this property via a new cpu_is_hotpluggable function, so that the rest of the kernel can access it in an architecture-independent way. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11	rcu: Permit dyntick-idle with callbacks pending	Paul E. McKenney
	The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering dyntick-idle state if they have RCU callbacks pending. Unfortunately, this has the side-effect of often preventing them from entering this state, especially if at least one other CPU is not in dyntick-idle state. However, the resulting per-tick wakeup is wasteful in many cases: if the CPU has already fully responded to the current RCU grace period, there will be nothing for it to do until this grace period ends, which will frequently take several jiffies. This commit therefore permits a CPU that has done everything that the current grace period has asked of it (rcu_pending() == 0) even if it still as RCU callbacks pending. However, such a CPU posts a timer to wake it up several jiffies later (6 jiffies, based on experience with grace-period lengths). This wakeup is required to handle situations that can result in all CPUs being in dyntick-idle mode, thus failing to ever complete the current grace period. If a CPU wakes up before the timer goes off, then it cancels that timer, thus avoiding spurious wakeups. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11	rcu: Document same-context read-side constraints	Paul E. McKenney
	The intent is that a given RCU read-side critical section be confined to a single context. For example, it is illegal to invoke rcu_read_lock() in an exception handler and then invoke rcu_read_unlock() from the context of the task that received the exception. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11	rcu: Eliminate RCU_FAST_NO_HZ grace-period hang	Paul E. McKenney
	With the new implementation of RCU_FAST_NO_HZ, it was possible to hang RCU grace periods as follows: o CPU 0 attempts to go idle, cycles several times through the rcu_prepare_for_idle() loop, then goes dyntick-idle when RCU needs nothing more from it, while still having at least on RCU callback pending. o CPU 1 goes idle with no callbacks. Both CPUs can then stay in dyntick-idle mode indefinitely, preventing the RCU grace period from ever completing, possibly hanging the system. This commit therefore prevents CPUs that have RCU callbacks from entering dyntick-idle mode. This approach also eliminates the need for the end-of-grace-period IPIs used previously. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>