Age | Commit message (Collapse) | Author |
|
|
|
It was observed that the kernel embeds the path in the x86 boot
artifacts.
From https://bugzilla.yoctoproject.org/show_bug.cgi?id=13458:
[
If you turn on the buildpaths QA test, or try a reproducible build, you
discover that the kernel image contains build paths.
$ strings bzImage-5.0.19-yocto-standard |grep tmp/
out of pgt_buf in
/data/poky-tmp/reproducible/tmp/work-shared/qemux86-64/kernel-source/arch/x86/boot/compressed/kaslr_64.c!?
But what's this in the top-level Makefile:
$ git grep prefix-map
Makefile:KBUILD_CFLAGS += $(call
cc-option,-fmacro-prefix-map=$(srctree)/=)
So the __FILE__ shouldn't be using the full path. However
arch/x86/boot/compressed/Makefile has this:
KBUILD_CFLAGS := -m$(BITS) -O2
So that clears KBUILD_FLAGS, removing the -fmacro-prefix-map option.
]
Other architectures do not clear the flags, but instead prune before
adding boot or specific options. There's no obvious reason why x86 isn't
doing the same thing (pruning vs clearing) and no build or boot issues
have been observed.
So we make x86 can do the same thing, and we no longer have embedded paths.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
This is an all-in change of the workqueue rework.
The worker_pool.lock is made to raw_spinlock_t. With this change we can
schedule workitems from preempt-disable sections and sections with disabled
interrupts. This change allows to remove all kthread_.* workarounds we used to
have.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
This is an all-one patch which contains the softirq rework.
The basic idea is to have local-lock within local_bh_disable() which is
used for synchronisation similar to mainline. With this synchronisation
we can't have two softirqs processed in parallel on the same CPU. This
allows to remove the extra synchronisation we had.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The TIMER_IRQSAFE flag is required because the i915 cancels timer from
interrupt.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Add a depends statement on ATMEL_TCLIB to ensure that it is not built on !ATMEL
if COMPILE_TEST is enabled. The build will fail because the driver depends on
`atmel_tc_divisors'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The idle call back is invoked with disabled interrupts and requires
raw_spinlock_t locks to work.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
[ Upstream commit 978315462d3ea3cf6cfacd34c563ec1eb02a3aa5 ]
It is possible to ignore the validation for a certain lock by using:
lockdep_set_novalidate_class()
on it. Each invocation will assign a new name to the class it created
for created __lockdep_no_validate__. That means that once
lockdep_set_novalidate_class() has been used on two locks then
class->name won't match lock->name for the first lock triggering the
warning.
So ignore changed non-matching ->name pointer for the special
__lockdep_no_validate__ class.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20190517212234.32611-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The backported stable commit
59c39840f5abf ("genirq: Prevent use-after-free and work list corruption")
added cancel_work_sync() on a work_struct element which is not available
in RT.
Replace cancel_work_sync() with kthread_cancel_work_sync() on RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
This is the 5.0.19 stable release
|
|
|
|
This is the 5.0.18 stable release
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Consider following race:
T0 T1 T2
wait_for_completion()
do_wait_for_common()
__prepare_to_swait()
schedule()
complete()
x->done++ (0 -> 1)
raw_spin_lock_irqsave()
swake_up_locked() wait_for_completion()
wake_up_process(T0)
list_del_init()
raw_spin_unlock_irqrestore()
raw_spin_lock_irq(&x->wait.lock)
raw_spin_lock_irq(&x->wait.lock) x->done != UINT_MAX, 1 -> 0
raw_spin_unlock_irq(&x->wait.lock)
return 1
while (!x->done && timeout),
continue loop, not enqueued
on &x->wait
Basically, the problem is that the original wait queues used in
completions did not remove the item from the queue in the wakeup
function, but swake_up_locked() does.
Fix it by adding the thread to the wait queue inside the do loop.
The design of swait detects if it is already in the list and doesn't
do the list add again.
Cc: stable-rt@vger.kernel.org
Fixes: a04ff6b4ec4ee7e ("completion: Use simple wait queues")
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bigeasy: shorten commit message ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[ Upstream commit 16e32c3cde7763ab875b9030b443ecbc8e352d8a ]
A recent change split iommu_dma_map_msi_msg() in two new functions. The
function was still implemented to avoid modifying all the callers at
once.
Now that all the callers have been reworked, iommu_dma_map_msi_msg() can
be removed.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[ Upstream commit 73103975425786ebdb6c4d2868ecf26f391fb77e ]
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible
context. However, on RT, iommu_dma_map_msi_msg() requires to be called
from a preemptible context.
A recent patch split iommu_dma_map_msi_msg in two new functions:
one that should be called in preemptible context, the other does
not have any requirement.
The GICv3 MSI driver is reworked to avoid executing preemptible code in
non-preemptible context. This can be achieved by preparing the MSI
mapping when allocating the MSI interrupt.
Signed-off-by: Julien Grall <julien.grall@arm.com>
[maz: only call iommu_dma_prepare_msi once, fix commit log accordingly]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[ Upstream commit 2cb3b16545495ee31dc9438f88232c2cfe44a41f ]
ls_scfg_msi_compose_msg() may be called from non-preemptible context.
However, on RT, iommu_dma_map_msi_msg() requires to be called from a
preemptible context.
A recent patch split iommu_dma_map_msi_msg() in two new functions:
one that should be called in preemptible context, the other does
not have any requirement.
The FreeScale SCFG MSI driver is reworked to avoid executing preemptible
code in non-preemptible context. This can be achieved by preparing the
MSI maping when allocating the MSI interrupt.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[ Upstream commit 35ae7df21be098848722f96f0f33bf33467436a8 ]
its_irq_compose_msi_msg() may be called from non-preemptible context.
However, on RT, iommu_dma_map_msi_msg requires to be called from a
preemptible context.
A recent change split iommu_dma_map_msi_msg() in two new functions:
one that should be called in preemptible context, the other does
not have any requirement.
The GICv3 ITS driver is reworked to avoid executing preemptible code in
non-preemptible context. This can be achieved by preparing the MSI
mapping when allocating the MSI interrupt.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[Upstream commit 737be74710f30e611ee871f7b4f47975d1c6f71a ]
gicv2m_compose_msi_msg() may be called from non-preemptible context.
However, on RT, iommu_dma_map_msi_msg() requires to be called from a
preemptible context.
A recent change split iommu_dma_map_msi_msg() in two new functions:
one that should be called in preemptible context, the other does
not have any requirement.
The GICv2m driver is reworked to avoid executing preemptible code in
non-preemptible context. This can be achieved by preparing the MSI
mapping when allocating the MSI interrupt.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[ Upstream commit ece6e6f0218b7777e650bf93728130ae6f4feb7d]
On RT, iommu_dma_map_msi_msg() may be called from non-preemptible
context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as
the function is using spin_lock (they can sleep on RT).
iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT
and update the MSI message with the IOVA.
Only the part to lookup for the MSI page requires to be called in
preemptible context. As the MSI page cannot change over the lifecycle
of the MSI interrupt, the lookup can be cached and re-used later on.
iomma_dma_map_msi_msg() is now split in two functions:
- iommu_dma_prepare_msi(): This function will prepare the mapping
in the IOMMU and store the cookie in the structure msi_desc. This
function should be called in preemptible context.
- iommu_dma_compose_msi_msg(): This function will update the MSI
message with the IOVA when the device is behind an IOMMU.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[Upstream commit aaebdf8d68479f78d9f72b239684f70fbb0722c6]
When an MSI doorbell is located downstream of an IOMMU, it is required
to swizzle the physical address with an appropriately-mapped IOVA for any
device attached to one of our DMA ops domain.
At the moment, the allocation of the mapping may be done when composing
the message. However, the composing may be done in non-preemtible
context while the allocation requires to be called from preemptible
context.
A follow-up change will split the current logic in two functions
requiring to keep an IOMMU cookie per MSI.
A new field is introduced in msi_desc to store an IOMMU cookie. As the
cookie may not be required in some configuration, the field is protected
under a new config CONFIG_IRQ_MSI_IOMMU.
A pair of helpers has also been introduced to access the field.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[
cherry pick commit d9c9ce34ed5c892323cbf5b4f9a4c498e036316a which
replaces commit 6c0731d83879c994cc2e2f8eab52e22231737e48 and
294e8631fbd27bc3727e7052cc4dc27ff7ff1320 (in this tree).
]
In the compacted form, XSAVES may save only the XMM+SSE state but skip
FP (x87 state).
This is denoted by header->xfeatures = 6. The fastpath
(copy_fpregs_to_sigframe()) does that but _also_ initialises the FP
state (cwd to 0x37f, mxcsr as we do, remaining fields to 0).
The slowpath (copy_xstate_to_user()) leaves most of the FP
state untouched. Only mxcsr and mxcsr_flags are set due to
xfeatures_mxcsr_quirk(). Now that XFEATURE_MASK_FP is set
unconditionally, see
04944b793e18 ("x86: xsave: set FP, SSE bits in the xsave header in the user sigcontext"),
on return from the signal, random garbage is loaded as the FP state.
Instead of utilizing copy_xstate_to_user(), fault-in the user memory
and retry the fast path. Ideally, the fast path succeeds on the second
attempt but may be retried again if the memory is swapped out due
to memory pressure. If the user memory can not be faulted-in then
get_user_pages() returns an error so we don't loop forever.
Fault in memory via get_user_pages_unlocked() so
copy_fpregs_to_sigframe() succeeds without a fault.
Fixes: 69277c98f5eef ("x86/fpu: Always store the registers in copy_fpstate_to_sigframe()")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190502171139.mqtegctsg35cir2e@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This is an icremental update from
[PATCH 00/12] clocksource: improve Atmel TCB timer driver
to
[PATCH v3 0/9] clocksource: improve Atmel TCB timer driver
as posted on Fri, 26 Apr 2019 23:47:17 +0200 by Alexandre Belloni.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Checking efi_enabled(EFI_BOOT) is not sufficient to ensure that
EFI runtime services are available, e.g. if efi=noruntime is used.
Without this, I get an oops on a PREEMPT_RT kernel where efi=noruntime is
the default.
Fixes: 399574c64eaf94e8 ("x86/ima: retry detecting secure boot mode")
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The kmsg dumper can be called from any context, but the dumping
helpers were using a mutex to synchronize the iterator against
concurrent dumps.
Rather than trying to synchronize the iterator, use a local copy
of the iterator during the dump. Then no synchronization is
required.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Rename rwsem_rt.h to rwsem-rt.h to remain consistent with rwsem-rt.c.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
copy_fpstate_to_sigframe()
Since commit:
eeec00d73be2e ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
there is no need to have FPU registers saved if
copy_fpregs_to_sigframe() fails, because we retry it after we resolved
the fault condition.
Saving the registers is not wrong but it is not necessary and it forces us
to load the FPU registers on the retry attempt.
Don't save the FPU registers if copy_fpstate_to_sigframe() fails.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason@zx2c4.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jannh@google.com
Cc: kurt.kanzenbach@linutronix.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: riel@surriel.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/20190430083126.rilbb76yc27vrem5@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
In the compacted form, XSAVES may save only the XMM+SSE state but skip
FP (x87 state).
This is denoted by header->xfeatures = 6. The fastpath
(copy_fpregs_to_sigframe()) does that but _also_ initialises the FP
state (cwd to 0x37f, mxcsr as we do, remaining fields to 0).
The slowpath (copy_xstate_to_user()) leaves most of the FP
state untouched. Only mxcsr and mxcsr_flags are set due to
xfeatures_mxcsr_quirk(). Now that XFEATURE_MASK_FP is set
unconditionally, see
04944b793e18 ("x86: xsave: set FP, SSE bits in the xsave header in the user sigcontext"),
on return from the signal, random garbage is loaded as the FP state.
Instead of utilizing copy_xstate_to_user(), fault-in the user memory
and retry the fast path. Ideally, the fast path succeeds on the second
attempt but may be retried again if the memory is swapped out due
to memory pressure. If the user memory can not be faulted-in then
get_user_pages() returns an error so we don't loop forever.
Fault in memory via get_user_pages() so copy_fpregs_to_sigframe()
succeeds without a fault.
Fixes: 69277c98f5eef ("x86/fpu: Always store the registers in copy_fpstate_to_sigframe()")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason@zx2c4.com
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: rkrcmar@redhat.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190429163953.gqxgsc5okqxp4olv@linutronix.de
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The locks (timeline->lock and rq->lock) need to be taken with disabled
interrupts. This is done in __retire_engine_request() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on RT. Also, it is not obvious if there is a special reason
to why the interrupts are disabled independently of the lock.
Enable/disable interrupts as part of the locking instruction.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This is an icremental update from
[PATCH v7 0/7] clocksource: rework Atmel TCB timer driver
to
[PATCH 00/12] clocksource: improve Atmel TCB timer driver
as posted on Date: Wed, 3 Apr 2019 16:11:08 +0200.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This is an incremental change to update the patch series from v7 to v9
as posted on Date: Wed, 3 Apr 2019 18:41:29 +0200.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
|
|
commit 9dc20113988b9a75ea6b3abd68dc45e2d73ccdab upstream.
A fallthrough in switch/case was introduced in f627caf55b8e ("fbdev:
sm712fb: fix crashes and garbled display during DPMS modesetting"),
due to my copy-paste error, which would cause the memory clock frequency
for SM720 to be programmed to SM712.
Since it only reprograms the clock to a different frequency, it's only
a benign issue without visible side-effect, so it also evaded Sudip
Mukherjee's code review and regression tests. scripts/checkpatch.pl
also failed to discover the issue, possibly due to nested switch
statements.
This issue was found by Stephen Rothwell by building linux-next with
-Wimplicit-fallthrough.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f627caf55b8e ("fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting")
Signed-off-by: Yifeng Li <tomli@tomli.me>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 50b045a8c0ccf44f76640ac3eea8d80ca53979a3 upstream.
One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".
In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.
[0] http://vger.kernel.org/bpfconf.html
Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6110222c6f49ea68169f353565eb865488a8619 upstream.
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e547ff3f803e779a3898f1f48447b29f43c54085 upstream.
For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0b777eee88d712256ba8232a9429edb17c4f9ceb upstream.
In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after
devres release"), we changed the ordering of tearing down the device DMA
ops and releasing all the device's resources; this was because the DMA ops
should be maintained until we release the device's managed DMA memories.
However, we have seen another crash on an arm64 system when a
device driver probe fails:
hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
scsi host1: hisi_sas_v3_hw
BUG: Bad page state in process swapper/0 pfn:313f5
page:ffff7e0000c4fd40 count:1 mapcount:0
mapping:0000000000000000 index:0x0
flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1000(reserved)
Modules linked in:
CPU: 49 PID: 1 Comm: swapper/0 Not tainted
5.1.0-rc1-43081-g22d97fd-dirty #1433
Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
RC0 - V1.12.01 01/29/2019
Call trace:
dump_backtrace+0x0/0x118
show_stack+0x14/0x1c
dump_stack+0xa4/0xc8
bad_page+0xe4/0x13c
free_pages_check_bad+0x4c/0xc0
__free_pages_ok+0x30c/0x340
__free_pages+0x30/0x44
__dma_direct_free_pages+0x30/0x38
dma_direct_free+0x24/0x38
dma_free_attrs+0x9c/0xd8
dmam_release+0x20/0x28
release_nodes+0x17c/0x220
devres_release_all+0x34/0x54
really_probe+0xc4/0x2c8
driver_probe_device+0x58/0xfc
device_driver_attach+0x68/0x70
__driver_attach+0x94/0xdc
bus_for_each_dev+0x5c/0xb4
driver_attach+0x20/0x28
bus_add_driver+0x14c/0x200
driver_register+0x6c/0x124
__pci_register_driver+0x48/0x50
sas_v3_pci_driver_init+0x20/0x28
do_one_initcall+0x40/0x25c
kernel_init_freeable+0x2b8/0x3c0
kernel_init+0x10/0x100
ret_from_fork+0x10/0x18
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper/0 pfn:313f6
page:ffff7e0000c4fd80 count:1 mapcount:0
mapping:0000000000000000 index:0x0
[ 89.322983] flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
The crash occurs for the same reason.
In this case, on the really_probe() failure path, we are still clearing
the DMA ops prior to releasing the device's managed memories.
This patch fixes this issue by reordering the DMA ops teardown and the
call to devres_release_all() on the failure path.
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b2176a1dfb518d870ee073445d27055fea64dfb8 upstream.
The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.
Note: fixed warning reported by kbuild test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a25d8c327bb41742dbd59f8c545f59f3b9c39983 upstream.
This reverts commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Nigel Croxon <ncroxon@redhat.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6f55967ad9d9752813e36de6d5fdbd19741adfc7 ]
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:perf_prepare_sample+0x8c/0x530
Call Trace:
<NMI>
perf_event_output_forward+0x2a/0x80
__perf_event_overflow+0x51/0xe0
handle_pmi_common+0x19e/0x240
intel_pmu_handle_irq+0xad/0x170
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x69/0x110
default_do_nmi+0x3e/0x100
do_nmi+0x11a/0x180
end_repeat_nmi+0x16/0x1a
RIP: 0010:native_write_msr+0x6/0x20
...
</NMI>
intel_pmu_disable_event+0x98/0xf0
x86_pmu_stop+0x6e/0xb0
x86_pmu_del+0x46/0x140
event_sched_out.isra.97+0x7e/0x160
...
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
x86_pmu_stop
test_bit(hwc->idx, cpuc->active_mask)
intel_pmu_disable_event(event)
{
...
intel_pmu_pebs_disable(event);
...
EVENT OVERFLOW -> <NMI>
intel_pmu_handle_irq
handle_pmi_common
TEST PASSES -> test_bit(bit, cpuc->active_mask))
perf_event_overflow
perf_prepare_sample
{
...
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
CRASH -> size += data->callchain->nr;
}
</NMI>
...
x86_pmu_disable_event(event)
}
__clear_bit(hwc->idx, cpuc->active_mask);
Fixing this by disabling the event itself before setting
off the PEBS bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 35bb59c10a6d0578806dd500477dae9cb4be344e ]
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.
If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.
As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.
Reported-by: Robert Walker <robert.walker@arm.com>
Suggested-by: Robert Walker <robert.walker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Robert Walker <robert.walker@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet")
Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet")
Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bf561d3c13423fc54daa19b5d49dc15fafdb7acc ]
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|