Age | Commit message (Collapse) | Author |
|
We never set any congested bits in the group writeback instances of it.
And for the simpler bdi-wide case a simple scalar field is all that
that is needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just merge them into their only callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The LSM_AUDIT_DATA_* records for PATH, FILE, IOCTL_OP, DENTRY and INODE
are incomplete without the task context of the AUDIT Current Working
Directory record. Add it.
This record addition can't use audit_dummy_context to determine whether
or not to store the record information since the LSM_AUDIT_DATA_*
records are initiated by various LSMs independent of any audit rules.
context->in_syscall is used to determine if it was called in user
context like audit_getname.
Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/96
Adapted from Vladis Dronov's v2 patch.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In order to perform future tests against the cred saved during open(),
switch kallsyms_show_value() to operate on a cred, and have all current
callers pass current_cred(). This makes it very obvious where callers
are checking the wrong credential in their "read" contexts. These will
be fixed in the coming patches.
Additionally switch return value to bool, since it is always used as a
direct permission check, not a 0-on-success, negative-on-error style
function return.
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.
It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.
The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.
Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.
Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.
Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch exposes new link modes using 100Gbps per lane, including 100G,
200G and 400G modes.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Define 100G, 200G and 400G link modes using 100Gbps per lane
LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As well as the ->media_changed method. All these are left over from
before the drivers were switched over to the check_events scheme.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
md is the last driver using the legacy media_changed method. Switch
it over to (not so) new ->clear_events approach, which also removes the
need for the ->revalidate_disk method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[axboe: remove unused 'bdops' variable in disk_clear_events()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The IOMMU_SYS_CACHE_ONLY flag was never exposed via the DMA API and
has no in-tree users. Remove it.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
More often than not, a failed VM-entry in an x86 production
environment is induced by a defective CPU. To help identify the bad
hardware, include the id of the last logical CPU to run a vCPU in the
information provided to userspace on a KVM exit for failed VM-entry or
for KVM internal errors not associated with emulation. The presence of
this additional information is indicated by a new capability,
KVM_CAP_LAST_CPU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-5-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
type to bool
Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/
kvm_arch_setup_async_pf() return '1' when a job to handle page fault
asynchronously was scheduled and '0' otherwise. To avoid the confusion
change return type to 'bool'.
No functional change intended.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200615121334.91300-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for rejecting packets from the prerouting chain, from
Laura Garcia Liebana.
2) Remove useless assignment in pipapo, from Stefano Brivio.
3) On demand hook registration in IPVS, from Julian Anastasov.
4) Expire IPVS connection from process context to not overload
timers, also from Julian.
5) Fallback to conntrack TCP tracker to handle connection reuse
in IPVS, from Julian Anastasov.
6) Several patches to support for chain bindings.
7) Expose enum nft_chain_flags through UAPI.
8) Reject unsupported chain flags from the netlink control plane.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have moved the PHY ethtool statistics to be dynamically
registered, we no longer need to inline those for ethtool. This used to
be done to avoid cross symbol referencing and allow ethtool to be
decoupled from PHYLIB entirely.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend ethtool_phy_ops to include the 3 function pointers necessary for
implementing PHY statistics. In a subsequent change we will uninline
those functions.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small, mostly device-specific fixes.
The significant one is the regression fix for USB-audio implicit
feedback devices due to the incorrect frame size calculation, which
landed in 5.8 and stable trees.
In addition, a few usual HD-audio and USB-audio quirks, Intel HDMI
fixes, ASoC fsl and rt5682 fixes, as well as the fix in
compress-offload partial drain operation"
* tag 'sound-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: compress: fix partial_drain completion state
ALSA: usb-audio: Add implicit feedback quirk for RTX6001
ALSA: usb-audio: add quirk for MacroSilicon MS2109
ALSA: hda/realtek: Enable headset mic of Acer Veriton N4660G with ALC269VC
ALSA: hda/realtek: Enable headset mic of Acer C20-820 with ALC269VC
ALSA: hda/realtek - Enable audio jacks of Acer vCopperbox with ALC269VC
ALSA: hda/realtek - Fix Lenovo Thinkpad X1 Carbon 7th quirk subdevice id
ALSA: hda/hdmi: improve debug traces for stream lookups
ALSA: hda/hdmi: fix failures at PCM open on Intel ICL and later
ALSA: opl3: fix infoleak in opl3
ALSA: usb-audio: Replace s/frame/packet/ where appropriate
ALSA: usb-audio: Fix packet size calculation
AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel module
ALSA: hda - let hs_mic be picked ahead of hp_mic
ASoC: rt5682: fix the pop noise while OMTP type headset plugin
ASoC: fsl_mqs: Fix unchecked return value for clk_prepare_enable
ASoC: fsl_mqs: Don't check clock is NULL before calling clk API
|
|
I realize that we fairly recently raised it to 4.8, but the fact is, 4.9
is a much better minimum version to target.
We have a number of workarounds for actual bugs in pre-4.9 gcc versions
(including things like internal compiler errors on ARM), but we also
have some syntactic workarounds for lacking features.
In particular, raising the minimum to 4.9 means that we can now just
assume _Generic() exists, which is likely the much better replacement
for a lot of very convoluted built-time magic with conditionals on
sizeof and/or __builtin_choose_expr() with same_type() etc.
Using _Generic also means that you will need to have a very recent
version of 'sparse', but thats easy to build yourself, and much less of
a hassle than some old gcc version can be.
The latest (in a long string) of reasons for minimum compiler version
upgrades was commit 5435f73d5c4a ("efi/x86: Fix build with gcc 4").
Ard points out that RHEL 7 uses gcc-4.8, but the people who stay back on
old RHEL versions persumably also don't build their own kernels anyway.
And maybe they should cross-built or just have a little side affair with
a newer compiler?
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add support for inline encryption to fs/crypto/. With "inline
encryption", the block layer handles the decryption/encryption as part
of the bio, instead of the filesystem doing the crypto itself via
Linux's crypto API. This model is needed in order to take advantage of
the inline encryption hardware present on most modern mobile SoCs.
To use inline encryption, the filesystem needs to be mounted with
'-o inlinecrypt'. Blk-crypto will then be used instead of the traditional
filesystem-layer crypto whenever possible to encrypt the contents
of any encrypted files in that filesystem. Fscrypt still provides the key
and IV to use, and the actual ciphertext on-disk is still the same;
therefore it's testable using the existing fscrypt ciphertext verification
tests.
Note that since blk-crypto has a fallback to Linux's crypto API, and
also supports all the encryption modes currently supported by fscrypt,
this feature is usable and testable even without actual inline
encryption hardware.
Per-filesystem changes will be needed to set encryption contexts when
submitting bios and to implement the 'inlinecrypt' mount option. This
patch just adds the common code.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200702015607.1215430-3-satyat@google.com
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Introduce SB_INLINECRYPT, which is set by filesystems that wish to use
blk-crypto for file content en/decryption. This flag maps to the
'-o inlinecrypt' mount option which multiple filesystems will implement,
and code in fs/crypto/ needs to be able to check for this mount option
in a filesystem-independent way.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200702015607.1215430-2-satyat@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Bossart <pierre-louis.bossart@linux.intel.com>:
While experimenting and introducing errors in Baytrail topology files
until I got them right, I encountered multiple kernel oopses and
memory leaks. This is a first batch to harden the code, but we should
probably think of a tool to fuzz the topology...
Pierre-Louis Bossart (5):
ASoC: topology: fix kernel oops on route addition error
ASoC: topology: fix tlvs in error handling for widget_dmixer
ASoC: topology: use break on errors, not continue
ASoC: topology: factor kfree(se) in error handling
ASoC: topology: add more logs when topology load fails.
sound/soc/soc-topology.c | 97 ++++++++++++++++++++++++----------------
1 file changed, 58 insertions(+), 39 deletions(-)
base-commit: a5911ac5790acaf98c929b826b3f7b4a438f9759
--
2.25.1
|
|
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
in NVM Express TP4053. Zoned namespaces are discovered based on their
Command Set Identifier reported in the namespaces Namespace
Identification Descriptor list. A successfully discovered Zoned
Namespace will be registered with the block layer as a host managed
zoned block device with Zone Append command support. A namespace that
does not support append is not supported by the driver.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The Commands Supported and Effects log page was extended with a CSI
field that enables the host to query the log page for each command set
supported. Retrieve this log page for each command set that an attached
namespace supports, and save a pointer to that log in the namespace head.
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Implements support for multiple I/O Command Sets. NVMe TP 4056
introduces a method to enumerate multiple command sets per namespace. If
the command set is exposed, this method for enumeration will be used
instead of the traditional method that uses the CC.CSS register command
set register for command set identification.
For namespaces where the Command Set Identifier is not supported or
recognized, the specific namespace will not be created.
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In the zoned storage model, the sectors within a zone are typically all
writeable. With the introduction of the Zoned Namespace (ZNS) Command
Set in the NVM Express organization, the model was extended to have a
specific writeable capacity.
Extend the zone descriptor data structure with a zone capacity field to
indicate to the user how many sectors in a zone are writeable.
Introduce backward compatibility in the zone report ioctl by extending
the zone report header data structure with a flags field to indicate if
the capacity field is available.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Merge in 5.8-rc4 for-5.9/block to setup for-5.9/drivers, to provide
a clean base and making the life for the NVMe changes easier.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* tag 'v5.8-rc4': (732 commits)
Linux 5.8-rc4
x86/ldt: use "pr_info_once()" instead of open-coding it badly
MIPS: Do not use smp_processor_id() in preemptible code
MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
.gitignore: Do not track `defconfig` from `make savedefconfig`
io_uring: fix regression with always ignoring signals in io_cqring_wait()
x86/ldt: Disable 16-bit segments on Xen PV
x86/entry/32: Fix #MC and #DB wiring on x86_32
x86/entry/xen: Route #DB correctly on Xen PV
x86/entry, selftests: Further improve user entry sanity checks
x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
i2c: mlxcpld: check correct size of maximum RECV_LEN packet
i2c: add Kconfig help text for slave mode
i2c: slave-eeprom: update documentation
i2c: eg20t: Load module automatically if ID matches
i2c: designware: platdrv: Set class based on DMI
i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
mm/page_alloc: fix documentation error
vmalloc: fix the owner argument for the new __vmalloc_node_range callers
mm/cma.c: use exact_nid true to fix possible per-numa cma leak
...
|
|
Add a helper to walk through all the DAIs and set dpcm_playback and
dpcm_capture flags based on the DAIs capabilities, and use this helper
to avoid setting these flags arbitrarily in generic cards.
The commit referenced in the Fixes tag did not introduce the
configuration issue but will prevent the card from probing when
detecting invalid configurations.
Fixes: b73287f0b0745 ('ASoC: soc-pcm: dpcm: fix playback/capture checks')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Link: https://lore.kernel.org/r/20200707210439.115300-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into regmap-5.9
|
|
Convert the argument to the newer fwnode_handle instead a device tree
node. Fortunately, there are no users for now. So this is an easy
change.
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20200706175353.16404-2-michael@walle.cc
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add a bare tracepoint trace_sched_update_nr_running_tp which tracks
->nr_running CPU's rq. This is used to accurately trace this data and
provide a visualization of scheduler imbalances in, for example, the
form of a heat map. The tracepoint is accessed by loading an external
kernel module. An example module (forked from Qais' module and including
the pelt related tracepoints) can be found at:
https://github.com/auldp/tracepoints-helpers.git
A script to turn the trace-cmd report output into a heatmap plot can be
found at:
https://github.com/jirvoz/plot-nr-running
The tracepoints are added to add_nr_running() and sub_nr_running() which
are in kernel/sched/sched.h. In order to avoid CREATE_TRACE_POINTS in
the header a wrapper call is used and the trace/events/sched.h include
is moved before sched.h in kernel/sched/core.
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200629192303.GC120228@lorien.usersys.redhat.com
|
|
For some mysterious reason GCC-4.9 has a 64 byte section alignment for
structures, all other GCC versions (and Clang) tested (including 4.8
and 5.0) are fine with the 32 bytes alignment.
Getting this right is important for the new SCHED_DATA macro that
creates an explicitly ordered array of 'struct sched_class' in the
linker script and expect pointer arithmetic to work.
Fixes: c3a340f7e7ea ("sched: Have sched_class_highest define by vmlinux.lds.h")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200630144905.GX4817@hirez.programming.kicks-ass.net
|
|
|
|
A new kmem_cache method has replaced the kzalloc() to allocate the PMU
specific data. The task_ctx_size is not required anymore.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-19-git-send-email-kan.liang@linux.intel.com
|
|
Currently, the PMU specific data task_ctx_data is allocated by the
function kzalloc() in the perf generic code. When there is no specific
alignment requirement for the task_ctx_data, the method works well for
now. However, there will be a problem once a specific alignment
requirement is introduced in future features, e.g., the Architecture LBR
XSAVE feature requires 64-byte alignment. If the specific alignment
requirement is not fulfilled, the XSAVE family of instructions will fail
to save/restore the xstate to/from the task_ctx_data.
The function kzalloc() itself only guarantees a natural alignment. A
new method to allocate the task_ctx_data has to be introduced, which
has to meet the requirements as below:
- must be a generic method can be used by different architectures,
because the allocation of the task_ctx_data is implemented in the
perf generic code;
- must be an alignment-guarantee method (The alignment requirement is
not changed after the boot);
- must be able to allocate/free a buffer (smaller than a page size)
dynamically;
- should not cause extra CPU overhead or space overhead.
Several options were considered as below:
- One option is to allocate a larger buffer for task_ctx_data. E.g.,
ptr = kmalloc(size + alignment, GFP_KERNEL);
ptr &= ~(alignment - 1);
This option causes space overhead.
- Another option is to allocate the task_ctx_data in the PMU specific
code. To do so, several function pointers have to be added. As a
result, both the generic structure and the PMU specific structure
will become bigger. Besides, extra function calls are added when
allocating/freeing the buffer. This option will increase both the
space overhead and CPU overhead.
- The third option is to use a kmem_cache to allocate a buffer for the
task_ctx_data. The kmem_cache can be created with a specific alignment
requirement by the PMU at boot time. A new pointer for kmem_cache has
to be added in the generic struct pmu, which would be used to
dynamically allocate a buffer for the task_ctx_data at run time.
Although the new pointer is added to the struct pmu, the existing
variable task_ctx_size is not required anymore. The size of the
generic structure is kept the same.
The third option which meets all the aforementioned requirements is used
to replace kzalloc() for the PMU specific data allocation. A later patch
will remove the kzalloc() method and the related variables.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-17-git-send-email-kan.liang@linux.intel.com
|
|
The recent commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
moved these lines in ttwu():
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
up before:
smp_cond_load_acquire(&p->on_cpu, !VAL);
into the 'p->on_rq == 0' block, with the thinking that once we hit
schedule() the current task cannot change it's ->state anymore. And
while this is true, it is both incorrect and flawed.
It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
to avoid weak hardware from re-ordering things for us. This can fairly
easily be achieved by relying on the control-dependency already in
place.
The second problem, which makes the flaw in the original argument, is
that while schedule() will not change prev->state, it will read it a
number of times (arguably too many times since it's marked volatile).
The previous condition 'p->on_cpu == 0' was sufficient because that
indicates schedule() has completed, and will no longer read
prev->state. So now the trick is to make this same true for the (much)
earlier 'prev->on_rq == 0' case.
Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
assignment needs to he a RELEASE, but adding additional ordering to
schedule() is an unwelcome proposition at the best of times, doubly so
for mere accounting.
Luckily we can push the prev->state load up before rq->lock, with the
only caveat that we then have to re-read the state after. However, we
know that if it changed, we no longer have to worry about the blocking
path. This gives us the required ordering, if we block, we did the
prev->state load before an (effective) smp_mb() and the p->on_rq store
needs not change.
With this we end up with the effective ordering:
LOAD p->state LOAD-ACQUIRE p->on_rq == 0
MB
STORE p->on_rq, 0 STORE p->state, TASK_WAKING
which ensures the TASK_WAKING store happens after the prev->state
load, and all is well again.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
|
|
So far setns() was missing time namespace support. This was partially due
to it simply not being implemented but also because vdso_join_timens()
could still fail which made switching to multiple namespaces atomically
problematic. This is now fixed so support CLONE_NEWTIME with setns()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200706154912.3248030-4-christian.brauner@ubuntu.com
|
|
Fold it into the two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
to show f2fs_fiemap()'s result as below:
f2fs_fiemap: dev = (251,0), ino = 7, lblock:0, pblock:1625292800, len:2097152, flags:0, ret:0
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
to show f2fs_bmap()'s result as below:
f2fs_bmap: dev = (251,0), ino = 7, lblock:0, pblock:396800
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Two new PCI ids added to ehl.
v2: added two additional PCI ids
BSpec: 29153
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200707204530.42289-1-jose.souza@intel.com
|
|
Sometimes it's handy to know when the socket gets freed. In
particular, we'd like to try to use a smarter allocation of
ports for bpf_bind and explore the possibility of limiting
the number of SOCK_DGRAM sockets the process can have.
Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers on
inet socket release. It triggers only for userspace sockets
(not in-kernel ones) and therefore has the same semantics as
the existing BPF_CGROUP_INET_SOCK_CREATE.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200706230128.4073544-2-sdf@google.com
|
|
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.
Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.
v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)
Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to decouple ethtool from its PHY library dependency, define an
ethtool_phy_ops singleton which can be overriden by the PHY library when
it loads with an appropriate set of function pointers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The packets from tunnel devices (eg bareudp) may have only
metadata in the dst pointer of skb. Hence a pointer check of
neigh_lookup is needed in dst_neigh_lookup_skb
Kernel crashes when packets from bareudp device is processed in
the kernel neighbour subsytem.
[ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 133.385240] #PF: supervisor instruction fetch in kernel mode
[ 133.385828] #PF: error_code(0x0010) - not-present page
[ 133.386603] PGD 0 P4D 0
[ 133.386875] Oops: 0010 [#1] SMP PTI
[ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15
[ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 133.391076] RIP: 0010:0x0
[ 133.392401] Code: Bad RIP value.
[ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.404933] Call Trace:
[ 133.405169] <IRQ>
[ 133.405367] __neigh_update+0x5a4/0x8f0
[ 133.405734] arp_process+0x294/0x820
[ 133.406076] ? __netif_receive_skb_core+0x866/0xe70
[ 133.406557] arp_rcv+0x129/0x1c0
[ 133.406882] __netif_receive_skb_one_core+0x95/0xb0
[ 133.407340] process_backlog+0xa7/0x150
[ 133.407705] net_rx_action+0x2af/0x420
[ 133.408457] __do_softirq+0xda/0x2a8
[ 133.408813] asm_call_on_stack+0x12/0x20
[ 133.409290] </IRQ>
[ 133.409519] do_softirq_own_stack+0x39/0x50
[ 133.410036] do_softirq+0x50/0x60
[ 133.410401] __local_bh_enable_ip+0x50/0x60
[ 133.410871] ip_finish_output2+0x195/0x530
[ 133.411288] ip_output+0x72/0xf0
[ 133.411673] ? __ip_finish_output+0x1f0/0x1f0
[ 133.412122] ip_send_skb+0x15/0x40
[ 133.412471] raw_sendmsg+0x853/0xab0
[ 133.412855] ? insert_pfn+0xfe/0x270
[ 133.413827] ? vvar_fault+0xec/0x190
[ 133.414772] sock_sendmsg+0x57/0x80
[ 133.415685] __sys_sendto+0xdc/0x160
[ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0
[ 133.417679] ? __audit_syscall_exit+0x1d9/0x280
[ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0
[ 133.419819] __x64_sys_sendto+0x24/0x30
[ 133.420848] do_syscall_64+0x4d/0x90
[ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 133.422833] RIP: 0033:0x7fe013689c03
[ 133.423749] Code: Bad RIP value.
[ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
[ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
[ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
[ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
[ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
[ 133.444045] CR2: 0000000000000000
[ 133.445082] ---[ end trace f4aeee1958fd1638 ]---
[ 133.446236] RIP: 0010:0x0
[ 133.447180] Code: Bad RIP value.
[ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt
[ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: aaa0c23cb901 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The method struct pci_error_handlers.error_detected() is defined and
documented as taking an 'enum pci_channel_state' for the second argument,
but most drivers use 'pci_channel_state_t' instead.
This 'pci_channel_state_t' is not a typedef for the enum but a typedef for
a bitwise type in order to have better/stricter typechecking.
Consolidate everything by using 'pci_channel_state_t' in the method's
definition, in the related helpers and in the drivers.
Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state'
with an anonymous 'enum'.
Note: Currently, from a typechecking point of view this patch changes
nothing because only the constants defined by the enum are bitwise, not the
enum itself (sparse doesn't have the notion of 'bitwise enum'). This may
change in some not too far future, hence the patch.
[bhelgaas: squash in
https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.com
https://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com]
Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead. This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid the W=1 warning that symbol 'genphy_c45_driver' was not
declared. Should it be static?
Declare it on the phy header file.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as with other private fields, this moves the event interface reference
to the opaque IIO device object, to be invisible to drivers.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|