Age | Commit message (Collapse) | Author |
|
Build an array that finds hardware CPU number from logical CPU
number in firmware CPU discovery. Use that rather than setting
paca of other CPUs directly, to begin with. Subsequent patch will
not have pacas allocated at this point.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Move this into the early setup code, and don't iterate over CPU masks.
We don't want to call into sysfs so early from setup, and a future patch
won't initialize CPU masks by the time this is called.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in incremental fix from Nick for DSCR handling]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop stray change to memblock_alloc_range() as noticed by akpm]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
slb_shadow structures are avoided for radix environment.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.
This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The "lppaca" is a structure registered with the hypervisor. This is
unnecessary when running on non-virtualised platforms. One field from
the lppaca (pmcregs_in_use) is also used by the host, so move the host
part out into the paca (lppaca field is still updated in
guest mode).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix non-pseries build with some #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In mpic_physmask() we loop over all CPUs up to 32, then get the hard
SMP processor id of that CPU.
Currently that's possibly walking off the end of the paca array, but
in a future patch we will change the paca array to be an array of
pointers, and in that case we will get a NULL for missing CPUs and
oops. eg:
Unable to handle kernel paging request for data at address 0x88888888888888b8
Faulting instruction address: 0xc00000000004e380
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP .mpic_set_affinity+0x60/0x1a0
LR .irq_do_set_affinity+0x48/0x100
Fix it by checking the CPU is possible, this also fixes the code if
there are gaps in the CPU numbering which probably never happens on
mpic systems but who knows.
Debugged-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
To make the test a bit clearer and to reduce object size a little.
Miscellanea:
o remove now unnecessary static const array
$ size ip_set_hash_mac.o*
text data bss dec hex filename
22822 4619 64 27505 6b71 ip_set_hash_mac.o.allyesconfig.new
22932 4683 64 27679 6c1f ip_set_hash_mac.o.allyesconfig.old
10443 1040 0 11483 2cdb ip_set_hash_mac.o.defconfig.new
10507 1040 0 11547 2d1b ip_set_hash_mac.o.defconfig.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
underflow/policy"
This reverts commit 0d7df906a0e78079a02108b06d32c3ef2238ad25.
Valdis Kletnieks reported that xtables is broken in linux-next since
0d7df906a0e78 ("netfilter: x_tables: ensure last rule in base chain
matches underflow/policy"), as kernel rejects the (well-formed) ruleset:
[ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1)
mark_source_chains is not the correct place for such a check, as it
terminates evaluation of a chain once it sees an unconditional verdict
(following rules are known to be unreachable). It seems preferrable to
fix libiptc instead, so remove this check again.
Fixes: 0d7df906a0e78 ("netfilter: x_tables: ensure last rule in base chain matches underflow/policy")
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
With commit e948bc8fbee0 (cpufreq: Cap the default transition delay
value to 10 ms) the cpufreq was not honouring the delay passed via
ACPI (PCCT). Due to which on ARM based platforms using CPPC the
cpufreq governor tries to change the frequency of CPUs faster than
expected.
This leads to continuous error messages like the following.
" ACPI CPPC: PCC check channel failed. Status=0 "
Earlier (without above commit) the default transition delay was
taken form the value passed from PCCT. Use the same value provided
by PCCT to set the transition_delay_us.
Fixes: e948bc8fbee0 (cpufreq: Cap the default transition delay value to 10 ms)
Signed-off-by: George Cherian <george.cherian@cavium.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix once per second (round_robin_time) memory leak of about 1 KB in
each acpi_pad kernel idling thread that is activated.
Found by testing with kmemleak.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This file is used both for setting the wakeup device without kernel
command line as well as for actually waking the system (when appropriate
swap header is in place).
To avoid confusion on incorrect logs in system log downgrade the
message to debug and make it clearer.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Currently the only way to specify a hibernate offset for a
swap file is on the kernel command line.
Add a new /sys/power/resume_offset that lets userspace
specify the offset and disk to use when initiating a hibernate
cycle.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Due to the way percpu counters are allocated and freed in blocks,
it is not safe to free counters individually. Currently all callers
do the right thing, but let's note this restriction.
Fixes: ae0ac0ed6fcf ("netfilter: x_tables: pack percpu counter allocations")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Merge assignment with return statement to directly return the value.
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Replace opencoded implementation of nft_set_lookup_global() by call to
this function.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
To prepare shorter introduction of shorter function prefix.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Register conntrack hooks if the user adds NAT chains. Users get confused
with the existing behaviour since they will see no packets hitting this
chain until they add the first rule that refers to conntrack.
This patch adds new ->init() and ->free() indirections to chain types
that can be used by NAT chains to invoke the conntrack dependency.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
One module per supported filter chain family type takes too much memory
for very little code - too much modularization - place all chain filter
definitions in one single file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use WARN_ON() instead since it should not happen that neither family
goes over NFPROTO_NUMPROTO nor there is already a chain of this type
already registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use nft_ prefix. By when I added chain types, I forgot to use the
nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too,
otherwise there is an overlap.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Instead of unnecessary const declarations, use the generic functions to
save a little object space.
$ size net/bridge/netfilter/ebt_stp.o*
text data bss dec hex filename
1250 144 0 1394 572 net/bridge/netfilter/ebt_stp.o.new
1344 144 0 1488 5d0 net/bridge/netfilter/ebt_stp.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If a page is already locked, attempting to dirty it leads to a deadlock
in lock_page(). This is what currently happens to ITER_BVEC pages when
a dio-enabled loop device is backed by ceph:
$ losetup --direct-io /dev/loop0 /mnt/cephfs/img
$ xfs_io -c 'pread 0 4k' /dev/loop0
Follow other file systems and only dirty ITER_IOVEC pages.
Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This patch adds initial documentation for the Netfilter flowtable
infrastructure.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch is part of a proposal to add a string filter to
ebtables, which would be similar to the string filter in
iptables. Like iptables, the ebtables filter uses the xt_string
module.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently ebtables assumes that the revision number of all match
modules is 0, which is an issue when trying to use existing
xtables matches with ebtables. The solution is to modify ebtables
to allow extensions to specify a revision number, similar to
iptables. This gets passed down to the kernel, which is then able
to find the match module correctly.
To main binary backwards compatibility, the size of the ebt_entry
structures is not changed, only the size of the name field is
decreased by 1 byte to make room for the revision field.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Improve the bindings example by adding an example of how to represent
two SPI NOR devices.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Acked-by: Han Xu <han.xu@nxp.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
|
|
Currently on a imx6sx-sdb board, which has two SPI NOR chips connected
to QSPI2 the following output from /proc/mtd is seen:
dev: size erasesize name
mtd0: 01000000 00010000 "21e4000.qspi"
mtd1: 01000000 00010000 "21e4000.qspi"
Attempts to partition them on the kernel command line result in both
chips with identical (and identically named) partitions, which is
an inconvenient behavior.
Assign a different mtd->name for each mtd device to avoid this problem.
After this change the output from /proc/mtd becomes:
dev: size erasesize name
mtd0: 01000000 00010000 "21e4000.qspi-0"
mtd1: 01000000 00010000 "21e4000.qspi-1"
In order to keep mtdparts compatibility keep the mtd->name
unchanged when a single SPI NOR is present.
Reported-by: David Wolfe <david.wolfe@nxp.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Han Xu <han.xu@nxp.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix a DM multipath regression introduced in a v4.16-rc6 commit:
restore support for loading, and attaching, scsi_dh modules during
multipath table load. Otherwise some users may find themselves unable
to boot, as was reported today:
https://marc.info/?l=linux-scsi&m=152231276114962&w=2
- Fix a DM core ioctl permission check regression introduced in a
v4.16-rc5 commit.
* tag 'for-4.16/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix dropped return code from dm_get_bdev_for_ioctl
dm mpath: fix support for loading scsi_dh modules during table load
|
|
Pull rdma fixes from Jason Gunthorpe:
"It has been fairly silent lately on our -rc front. Big queue of
patches on the mailing list going to for-next though.
Bug fixes:
- qedr driver bugfixes causing application hangs, wrong uapi errnos,
and a race condition
- three syzkaller found bugfixes in the ucma uapi
Regression fixes for things introduced in 4.16:
- Crash on error introduced in mlx5 UMR flow
- Crash on module unload/etc introduced by bad interaction of
restrack and mlx5 patches this cycle
- Typo in a two line syzkaller bugfix causing a bad regression
- Coverity report of nonsense code in hns driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Introduce safer rdma_addr_size() variants
RDMA/hns: ensure for-loop actually iterates and free's buffers
RDMA/ucma: Check that device exists prior to accessing it
RDMA/ucma: Check that device is connected prior to access it
RDMA/rdma_cm: Fix use after free race with process_one_req
RDMA/qedr: Fix QP state initialization race
RDMA/qedr: Fix rc initialization on CNQ allocation failure
RDMA/qedr: fix QP's ack timeout configuration
RDMA/ucma: Correct option size check using optlen
RDMA/restrack: Move restrack_clean to be symmetrical to restrack_init
IB/mlx5: Don't clean uninitialized UMR resources
|
|
Pull MTD fixes from Boris Brezillon:
"Two fixes, one in the atmel NAND driver and another one in the
CFI/JEDEC code.
Summary:
- Fix a bug in Atmel ECC engine driver
- Fix a bug in the CFI/JEDEC driver"
* tag 'mtd/fixes-for-4.16' of git://git.infradead.org/linux-mtd:
mtd: jedec_probe: Fix crash in jedec_read_mfr()
mtd: nand: atmel: Fix get_sectorsize() function
|
|
Previously, mount -l would show data=<mode> even if the ext4 default
journaling mode was being used. Change this to be consistent with the
rest of the options.
Ext4 already did the right thing when the journaling mode being used
matched the one specified in the superblock's default mount options. The
reason it failed to do the right thing for the ext4 defaults is that,
when set, they were never included in sbi->s_def_mount_opt (unlike the
superblock's defaults, which were).
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Don't show init_itable=n in /proc/fs/ext4/<dev>/options when filesystem
is mounted with noinit_itable.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Previously, /proc/fs/ext4/<dev>/options would only show binary options
if they were set (1 in the options bit mask). E.g. it would show "grpid"
if it was set, but it would not show "nogrpid" if grpid was not set.
This seems sensible, but when an option is absent from the file, it can
be hard for the unfamiliar to know what is being used. E.g. if there
isn't a (no)grpid entry, nogrpid is in effect. But if there isn't a
(no)auto_da_alloc entry, auto_da_alloc is in effect. If there isn't a
(minixdf|bsddf) entry, it turns out bsddf is in effect. It all depends
on how the option is implemented.
It's clearer to be explicit, so print the corresponding option
regardless of whether it means a 1 or a 0 in the bit mask.
Note that options which do not have an explicit disable option aren't
indicated as being disabled even with this change (e.g. dax).
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Replace kset with generic kobject provided by kobject_create_and_add(),
since the latter is sufficient.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Make cleanup of ext4_feat kobject consistent with similar objects.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
wasn't). Unfortunately commit 519049afea ("dm: use blkdev_get rather
than bdgrab when issuing pass-through ioctl") reused the variable 'r'
to store the return from blkdev_get() that follows prepare_ioctl()
-- whereby dropping prepare_ioctl()'s result on the floor.
This can lead to an ioctl or persistent reservation being issued to a
partition going unnoticed, which implies the extra permission check for
CAP_SYS_RAWIO is skipped.
Fix this by using a different variable to store blkdev_get()'s return.
Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Reported-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
If some metadata block, such as an allocation bitmap, overlaps the
superblock, it's very likely that if the file system is mounted
read/write, the results will not be pretty. So disallow r/w mounts
for file systems corrupted in this particular way.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
The extended attribute code now uses the crc32c checksum for hashing
purposes, so we should just always always initialize it. We also want
to prevent NULL pointer dereferences if one of the metadata checksum
features is enabled after the file sytsem is originally mounted.
This issue has been assigned CVE-2018-1094.
https://bugzilla.kernel.org/show_bug.cgi?id=199183
https://bugzilla.redhat.com/show_bug.cgi?id=1560788
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.
This issue has been assigned CVE-2018-1092.
https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
Daniel Borkman says:
====================
pull-request: bpf 2018-03-29
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix nfp to properly check max insn count while emitting
instructions in the JIT which was wrongly comparing bytes
against number of instructions before, from Jakub.
2) Fix for bpftool to avoid usage of hex numbers in JSON
output since JSON doesn't accept hex numbers with 0x
prefix, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sysfs functions, the function names are embedded into their
error strings. If the function name later changes, the string may
not be updated accordingly. Update the strings to use __func__
to avoid this.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Smatch complains that flush_workqueue() dereferences the work queue
pointer but then we check if it's NULL on the next line when it's too
late. These NULL checks can be removed because the module won't load if
we can't allocate the work queues.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If the line has not been written to, we should not
try to recover any data from it, so check the state of the
chunks in the line before attempting to read smeta.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Implement 2.0 support in pblk. This includes the address formatting and
mapping paths, as well as the sysfs entries for them.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation of pblk supporting 2.0, implement the get log report
chunk in pblk. Also, define the chunk states as given in the 2.0 spec.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for 2.0 support in pblk, rename variables referring to
the address format to addrf and reserve ppaf for the 1.2 path.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|