Age | Commit message (Collapse) | Author |
|
|
|
This is the 5.4.278 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmZuzHwACgkQONu9yGCS
# aT6M7xAAuCGja5JhZ7ciWjhdxIFxACHrSVm3l/i+bqNAXoDh2rJDKgOize1OTLGR
# R1Fq2qBEtfMsoKNa+3DyMCB8D1IsMUrhr8hdM56+AqDBbmaTsDmw+L6vzepwJKVV
# ksX2onT5tXdydH95IjqnM7vbadxnU9cLkHZIpckXkb2dDA2kjhMksWSfA+GJfRsY
# XReqFYfeQhLWv4c0OWZ//ssSmGWztSVaRCX22jemUJ5MYpQHt3TYQXVpFYkaQLYH
# FZHDxBFEr48Gt9bQtnBZ2fyurKX9NdzZF+WwNDuo0DShlEpJB99vYHQ/dIxzxl9R
# 7VOs3Cj7vvUBSIRPY9DhuhBm7tQrpCvf4w3qUBc0TD+IdPPIIf0ZqDsVUvZGM9JE
# T54rnXppxNKlbW8HmYqefc2FBgjrZn08NprrAVqIVhNwFyq+8/ADqdSTTKlD9z1i
# S/3Pur7ES2e15YsrjOrDJIpGl3klMrEBbz9XyEHwrTY2vrDjdqOF0NHYUaugnjSd
# B+qmUtZxwSjzRok5wgQaf/EbrW78e9q1lTj97QRJfBr0A7MUpDMuQCvmsMoe2HUA
# vANx7ty/ubNk/9PLTGyBHUw4/F58Bk5osibAKRmulvlO5g80FsJXEU9WWqQmIkzp
# 3df+cOL8ipqh0fg/PaUOBCoeVs2HqPHnewOlYRegN41FokQndsE=
# =XU0g
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 16 Jun 2024 07:29:00 AM EDT
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream.
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[gpiccoli: Backport the patch due to differences in the trees. The main changes
between 5.4.y and 5.15.y are due to renaming the fixup function, by
commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()"),
and on processor.h thread_struct due to commit cf122cfba5b1 ("kill uaccess_try()").
Following 2 commits cause divergence in the diffs too (in the removed lines):
cd072dab453a ("x86/fault: Add a helper function to sanitize error code")
d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey").]
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32 upstream.
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 66ee3636eddcc82ab82b539d08b85fb5ac1dff9b ]
It took me some time to understand the purpose of the tricky code at
the end of arch/x86/Kconfig.debug.
Without it, the following would be shown:
WARNING: unmet direct dependencies detected for FRAME_POINTER
because
81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
removed 'select ARCH_WANT_FRAME_POINTERS'.
The correct and more straightforward approach should have been to move
it where 'select FRAME_POINTER' is located.
Several architectures properly handle the conditional selection of
ARCH_WANT_FRAME_POINTERS. For example, 'config UNWINDER_FRAME_POINTER'
in arch/arm/Kconfig.debug.
Fixes: 81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20240204122003.53795-1-masahiroy@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 59162e0c11d7257cde15f907d19fefe26da66692 ]
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.
Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.
Example:
$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
Before:
$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
After:
$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax
Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cba786af84a0f9716204e09f518ce3b7ada8555e ]
On x86, the ordinary, position dependent small and kernel code models
only support placement of the executable in 32-bit addressable memory,
due to the use of 32-bit signed immediates to generate references to
global variables. For the kernel, this implies that all global variables
must reside in the top 2 GiB of the kernel virtual address space, where
the implicit address bits 63:32 are equal to sign bit 31.
This means the kernel code model is not suitable for other bare metal
executables such as the kexec purgatory, which can be placed arbitrarily
in the physical address space, where its address may no longer be
representable as a sign extended 32-bit quantity. For this reason,
commit
e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
switched to the large code model, which uses 64-bit immediates for all
symbol references, including function calls, in order to avoid relying
on any assumptions regarding proximity of symbols in the final
executable.
The large code model is rarely used, clunky and the least likely to
operate in a similar fashion when comparing GCC and Clang, so it is best
avoided. This is especially true now that Clang 18 has started to emit
executable code in two separate sections (.text and .ltext), which
triggers an issue in the kexec loading code at runtime.
The SUSE bugzilla fixes tag points to gcc 13 having issues with the
large model too and that perhaps the large model should simply not be
used at all.
Instead, use the position independent small code model, which makes no
assumptions about placement but only about proximity, where all
referenced symbols must be within -/+ 2 GiB, i.e., in range for a
RIP-relative reference. Use hidden visibility to suppress the use of a
GOT, which carries absolute addresses that are not covered by static ELF
relocations, and is therefore incompatible with the kexec loader's
relocation logic.
[ bp: Massage commit message. ]
Fixes: e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1211853
Closes: https://github.com/ClangBuiltLinux/linux/issues/2016
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/all/20240417-x86-fix-kexec-with-llvm-18-v1-0-5383121e8fb7@kernel.org/
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 76e9762d66373354b45c33b60e9a53ef2a3c5ff2 ]
Commit:
aaa8736370db ("x86, relocs: Ignore relocations in .notes section")
... only started ignoring the .notes sections in print_absolute_relocs(),
but the same logic should also by applied in walk_relocs() to avoid
such relocations.
[ mingo: Fixed various typos in the changelog, removed extra curly braces from the code. ]
Fixes: aaa8736370db ("x86, relocs: Ignore relocations in .notes section")
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Fixes: da1a679cde9b ("Add /sys/kernel/notes")
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240317150547.24910-1-weiguixiong@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 455f9075f14484f358b3c1d6845b4a438de198a7 upstream.
When the BIOS configures the architectural TSC-adjust MSRs on secondary
sockets to correct a constant inter-chassis offset, after Linux brings the
cores online, the TSC sync check later resets the core-local MSR to 0,
triggering HPET fallback and leading to performance loss.
Fix this by unconditionally using the initial adjust values read from the
MSRs. Trusting the initial offsets in this architectural mechanism is a
better approach than special-casing workarounds for specific platforms.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steffen Persvold <sp@numascale.com>
Reviewed-by: James Cleverdon <james.cleverdon.external@eviden.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Link: https://lore.kernel.org/r/20240419085146.175665-1-daniel@quora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
This is the 5.4.275 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYzoYAACgkQONu9yGCS
# aT5Gog/8DiBn13NgIqTvlQucOpiKcQrFS6ghuLjWUj+H+toaBPCoiAaSNzDgk3a6
# Vc7aW+/H6035Wpt6qmA9nr9VljS4JgHRlxPbp442zcXoaQabxXstt80s05EgC+ue
# MamCnirQwtkSaHYTQDHwumyr+J73b/EM6w0MHinmKJKbmLOpF6ejt87bBN8i1i4N
# 0/atD79jq6l3wOIwNpJI5hIdc/OwNKnCd0M1lQL60XQgN+4Yc5Gup/BazSnVzTHN
# PvkyqJOPSHcOcTkaDsS6t6RRJ5fQcHDihEPILwQuyqDbasyVULaNWxuZjVUeXbIS
# rcgrlPTsJB1KjtZuGSkSCmj3FrmBBhs9Fd2HBEGE2UfShJa1c4O/Hbmq3sHvijUQ
# i8Oaakkx0Z4XKiXf5djwG0QNm4vT8M+nBDNTxw+/pzWg2L0uzlpVh87YwODsrtT2
# /WHQ0B1MFXncCDebweXqlxTWKRCoqR183MjMedDJt25smCKp1RZZR7pJb9c065Mn
# bg2benOfKUr/+Us5oiQboV3XZkZiPc39V/E6J9NjYE62EPht6Gb/yoblFKp9t+/f
# 8MYPXqyAjGGXIzkqD7e1+YyOH6N/96EOZ2TrLhZiMb/3W6+dCjc1YeFtquNmaN9b
# pdYhzG3qlneflRNTz/30YKIyMQE5Zv+1DMR2mpBSD4hvSEYCmE4=
# =yaox
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 02 May 2024 10:21:52 AM EDT
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
This is the 5.4.274 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYaZJMACgkQONu9yGCS
# aT4fYxAAwlCrdAhIzqVcOGR9fgr0spXmY9c85YpqOtBGQJO4OTr/wvbNpBx3/kqo
# 6Va+R1sebIIHMSz84R7wy0C63Kzm5FdpKto0cEOTOZUuUPLueFb1cVNuPFRvn2iF
# g0PrRepX1jxAHBGiI5eE6vxFzSfx5RfS5tU3AWbpQ/7j9ytIoowxfrb3hi5c3nS5
# 2OHhWtg9TvA5S+l+I4m42aJUeK70vJfFCWQNnufD6RJNT6aukke2AqgemzKof3R7
# UWDLRvcuCEx0oSQZvXtAJJQ6+4sL9DVRUsff1bS2BNDeI26G93AhbahX1FhsHFNq
# iJ4AMVRECwyjlsMjm9WhsU2p28EHa2Y4QA8oz2BdbUrmKSaSf7FkWsy0T4cs4TXe
# bWrqLMp63yjXiiEQFOWtgjFCq4M8jMBHRvnDM/3FcFyTvRXw8mDo2DH9WwLNmGxG
# 9gMJW36J/jN3hDB/aMzno8RqvnGBsZGioXsB1QO+vtLSedGknITZeX3g5PnqqA8N
# Ebb6VXR5kmX+Abm6odMFDjnBSEHVvGQap6jGvJqWuXV1hR8zsblBrvbeA+SEuoX7
# FX5E+5dGkc9EKdrQa9LpFnbHLCQQ4jmLTzzjJ0xxB3kLzCeKigsyoltLSlRE/0tt
# j8upZAXWBBDIBJHFwG7luoZA3rU42pLgWWAfjq/0Rerl3olYxEc=
# =HbFv
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 13 Apr 2024 06:55:15 AM EDT
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
[ Upstream commit 9543f6e26634537997b6e909c20911b7bf4876de ]
Fix cpuid_deps[] to list the correct dependencies for GFNI, VAES, and
VPCLMULQDQ. These features don't depend on AVX512, and there exist CPUs
that support these features but not AVX512. GFNI actually doesn't even
depend on AVX.
This prevents GFNI from being unnecessarily disabled if AVX is disabled
to mitigate the GDS vulnerability.
This also prevents all three features from being unnecessarily disabled
if AVX512VL (or its dependency AVX512F) were to be disabled, but it
looks like there isn't any case where this happens anyway.
Fixes: c128dbfa0f87 ("x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20240417060434.47101-1-ebiggers@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
When done from a virtual machine, instructions that touch APIC memory
must be emulated. By convention, MMIO accesses are typically performed
via io.h helpers such as readl() or writeq() to simplify instruction
emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention,
allowing the compiler to emit instructions other than the MOV
instruction generated by readl(). In particular, when the kernel is
compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
would emit a TESTL instruction which is not supported by the SEV-ES
emulator, causing a boot failure in that environment. It is likely the
same problem would happen in a TDX guest as that uses the same
instruction emulator as SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use
the readl() function in native_apic_mem_read(). It is expected that any
emulator would support MOV in any addressing mode as it is the most
generic and is what is usually emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined
into apic_mem_wait_icr_idle(). The emulator comes from
insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
extend insn_decode_mmio() to support more instructions since, in theory,
the compiler could choose to output nearly any instruction for such
reads which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
[ bp: Massage commit message, fix typos. ]
Signed-off-by: Adam Dunlap <acdunlap@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Kevin Loughlin <kevinloughlin@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit abee7c494d8c41bb388839bccc47e06247f0d7de upstream.
When running in lazy TLB mode the currently active page tables might
be the ones of a previous process, e.g. when running a kernel thread.
This can be problematic in case kernel code is being modified via
text_poke() in a kernel thread, and on another processor exit_mmap()
is active for the process which was running on the first cpu before
the kernel thread.
As text_poke() is using a temporary address space and the former
address space (obtained via cpu_tlbstate.loaded_mm) is restored
afterwards, there is a race possible in case the cpu on which
exit_mmap() is running wants to make sure there are no stale
references to that address space on any cpu active (this e.g. is
required when running as a Xen PV guest, where this problem has been
observed and analyzed).
In order to avoid that, drop off TLB lazy mode before switching to the
temporary address space.
Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.
Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported is done over
/sys/devices/system/machinecheck/
├── machinecheck0
│ ├── bank0
│ ├── bank1
│ ├── bank10
│ ├── bank11
...
sysfs nodes by writing the new bit mask of events to enable.
When the write is accepted, the kernel deletes all current timers and
reinits all banks.
Doing that in parallel can lead to initializing a timer which is already
armed and in the timer wheel, i.e., in use already:
ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
does.
Reported by: Yue Sun <samsun1006219@gmail.com>
Reported by: xingwei lee <xrivendell7@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c567f2948f57bdc03ed03403ae0234085f376b7d upstream.
This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
While the original change tries to fix a bug, it also unintentionally broke
existing systems, see the regressions reported at:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
Since d794734c9bbf was also marked for -stable, let's back it out before
causing more damage.
Note that due to another upstream change the revert was not 100% automatic:
0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7f274e609f3d5f45c22b1dd59053f6764458b492 upstream.
Add a new word for scattered features because all free bits among the
existing Linux-defined auxiliary flags have been exhausted.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.1711091584.git.sandipan.das@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fd470a8beed88440b160d690344fbae05a0b9b1b upstream.
Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not
provide protection to processes running at CPL3/user mode, see section
"Extended Feature Enable Register (EFER)" in the APM v2 at
https://bugzilla.kernel.org/attachment.cgi?id=304652
Explicitly enable STIBP to protect against cross-thread CPL3
branch target injections on systems with Automatic IBRS enabled.
Also update the relevant documentation.
Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720194727.67022-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8afd1c7da2b0 ("x86/speculation: Change FILL_RETURN_BUFFER
to work with objtool") does not support intra-function call
stack validation, which causes kernel live patching to fail.
This commit adds support for this, and after testing, the kernel
live patching feature is restored to normal.
Fixes: 8afd1c7da2b0 ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool")
Cc: <stable@vger.kernel.org> # v5.4.250+
Signed-off-by: Rui Qi <qirui.001@bytedance.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5c84b051bd4e777cf37aaff983277e58c99618d5 ]
Update them to the correct revision numbers.
Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 1d30800c0c0ae1d086ffad2bdf0ba4403370f132 upstream.
Those mitigations are very talkative; use the printing helper which pays
attention to the buffer size.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220809153419.10182-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7862eda309ecfccc36bb5558d937ed3ace07f3f upstream.
The AMD Zen4 core supports a new feature called Automatic IBRS.
It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
h/w manages its IBRS mitigation resources automatically across CPL transitions.
The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
setting MSR C000_0080 (EFER) bit 21.
Enable Automatic IBRS by default if the CPU feature is present. It typically
provides greater performance over the incumbent generic retpolines mitigation.
Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and
Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to
cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.
The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
IBRS, if available.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
# Conflicts:
# arch/x86/kernel/cpu/mce/core.c
|
|
Linux 5.4.273
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmYDT24ACgkQ3qZv95d3
# LNwsuhAAj/Ar2XNkDzi3TSj7CFF7Qh7P2m8YA3XLrwTBkFBgBteJN4GF7/HON/pU
# 7drEcLWodiQf/7HYKUiseEiU4d6g5f818+HdFFMpWrlW0DxrfWG+WoX4NgWiXEtk
# 0NgaWTgUkuYHemhevv3ohUQ6AuqddUXwJ4kp+xEZiTvcHF2F4pIOGjKjYX2PGzsT
# jCeEDMhmFn4PBr88c1dmgpYUNBVJxv7FIiorELuKChgfAOfyWmcHD1ckA/6fCAkR
# L2oX/pEBxUU2ADB+0HmXFUPxIXqyMhQmxL0fe7n6VGIpjq2yH9u79tUAwqOhCMcD
# aQ8Lg9HJt9Z2HX7EryC1nxeLdD0IKI1OcG6lOkDoLMPYLpcTm3RJrbv9wRHdrsYI
# 3XJZVpMBi7Jp1M1y1jVhYGP4H85oKAxoHFcI1M9YRn0Q9pk42QCt1pw3CS/NWN4G
# JEz45CXJhi66EN/lKfr7d3bmhjGYnSweWUILPXNu+3YwM8FPgIoZV5LlNvi8dQg8
# PpbtYbZek3/5s/LSoHFvdlxv4UEmrLb+Rta55iJrIjOK7yG1gj64h2jA+6LnlBEL
# L7EB1Jy5HwdQCVlyabmOZA8QouVJkGLifQ1SQJnV3V5ftFgCvURXtGv/9aYugFFH
# 6J8Yzos7yMIY4RuEyZ7TLg29nU4/iQD4wfKC1TVpEu1WtRFXkKA=
# =/Slq
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 26 Mar 2024 06:42:54 PM EDT
# gpg: using RSA key E27E5D8A3403A2EF66873BBCDEA66FF797772CDC
# gpg: Can't check signature: No public key
|
|
Linux 5.4.272
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0nCUACgkQ3qZv95d3
# LNzTSg/8D8YrvES4AQlmO6GRMBb/06gCxskgyG88s2EfIFfrC70P6v8vcq5ZOWow
# mR/dxoIxz1rmzkW45zgc/64ihXAI3hpMAw5XM9kGIVD0/oH2PhFcZRb2ybouZX3V
# Ts6aepLw/jzlSmTZAIBfaMH7KHU1HPJpuQ0vwOYn1KZrysFi4xROKeYl4lWY2I6x
# 4fQSWe5y3z3PY0X/XBn2OPFqbGlD6FwWX5SX1M3kFfFg+aC5GPfvS+DyQy1+GClb
# 0zBPM4ph4c59laj2fFm5W2k2MWuWTYxZNW7PXvgkEt9dFqAhVr6XXvTdZa09u6o4
# jIrGOkwheVMFQ7VgH/VdjpIWnTrDscAYYdAnuihPBQTYDRpn01mHMUtVV+nvdgnD
# QMhwNRrv9MiaaFim2OmTs0AgCmYMw38D3+DrlwUQjqWqpojGW1AM6xGNeIUN9Hrw
# lcqy4jqkW08x+af9HZ9488S1csCzamwYLkj1UE0m2ZQb/vSNEtyFbLSd9sjcR8Lv
# g+fuVgvWIq6xuee00P/JhRgCASRjBrACV/uyzp0t/ka3R97qD04Gw+tZdd15oyF7
# HkK17SjhYGLFALZnVbTQVqqu2ISzXeplipUVxJbhB2xkcf0ZQh98lGFxGv/MbmaL
# 58Jyia4KAx1YRK5X9XAoQzQcLW7j3ErNWzMFKd1kJw3UqbOEQso=
# =tsbn
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 15 Mar 2024 03:06:13 PM EDT
# gpg: using RSA key E27E5D8A3403A2EF66873BBCDEA66FF797772CDC
# gpg: Can't check signature: No public key
|
|
This is the 5.4.271 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXof14ACgkQONu9yGCS
# aT4K1Q//Wt8oz/83saKdDP4Uo51NWuyLHCLzBotCKDP1f5g4gTZbvZzzDsC+jCPB
# Tavkw3BHPLfIpvDt5WV8eKTGrkbmw/xIyLqUyDxZDD7hx8p7YJ3lMBcSfZvP/YGm
# 1vmiQDmTp6ZIfh5Fl9aNjYDkS/HxRVHlZMJx2ERI480Zm7FbtbCVUYK1gkHGHXYD
# aVW3X7UKsZVYvK2I4jI5IGg/zP0PTfH9XnFRTm5WRvOLoRYF8eFZ2NiBHACkZyoh
# UWYc5Kl0rdurSkPEo8qN1KPDagbxyGhCVU2v+tpSNC3eeSF3WkNCvuCD7zX4LRkJ
# 3FSxB7oVmgKn95/BREwkcegzDD9ZwTaXFuQVelWF4+ajdjCEMkxOuUoFajSI83IG
# LXHHIn4WYkcSkokgdaYGTP6CbdqSvcuLJCfS/ANwFvdsBSUylBGHbUuc+53jD754
# sbNBLZnWIiERSx6mUy3/jGwMfncDEroouMZVL3YkXkDuWYryZEDvIdP7jMsZ2u6d
# GM6EmGbi7aWsxdaPr+B638ph58eFkvTdcLyAwp+Sn5AORK2569Gg0Rubhl+xX54X
# y8Wa7+93ohh59yifQll2OqDLN+Wlj6Sg1smvitjdyiK4Od7Id0plJ7Gzg1Wj97C+
# vGkD33vfH++tvZdsNU6Ynw4oY2mQ7fBOqhCm5XavDFXDUMOZI2s=
# =XlXs
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 06 Mar 2024 09:36:14 AM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
This is the 5.4.270 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXhxnYACgkQONu9yGCS
# aT5+zA//YASY5iWEz26+3UpKxBwlHje+6BOqm5JOO420iIT75NcPe7RG4GXkQWtM
# xwS7VGUjFFFgq+vE5bq4L8WMkBc1djBhcPmavIx9qNb2QiN0y60WRGblHJRaSnOz
# q+y9PA0OrMMI9ctC7x2jI37cUK/6YCYrNsXRLyjJ8GF6Xsnfqe0ZJsAWKpjUaEOb
# tYUvDbKgpT2kL9YpKCJuk7Z1BnDA84fN3TAwrh659eEcxSqFlUxvVCvWrACWKVQz
# FCpULyqScpJLytG9PGQIxrMQ4GSaH5FAZ4KKPu4UsSkFOStiWIx0zm4DEm52+x3q
# OPWvGW8mavDo+C+LNt96B4lcvYn3dITvY5yvryTHXbJQCyvrP3dpJcMzlasdJmaS
# zR7NZEmQFKTijn76J5xbPePw2MG9edKMLAVukhj78ioWBGhwOXeSmIStUrpsd74r
# Xv4YCXgige3O7MDaZub3rXB9vgr+yj8U/f327z91WyrYzJpwsZVTFRWJcjJF1WHG
# 6V6SgtkX2vfoV6wbWtK7dq/nhyA7RSEESXq+g8mKOQbGcnRpuQam81tNJgzhEUPi
# gUq/Sj1+L8AOaKIhePAiVVqe9fT6UIxoLa1JCVI5JqXM2uyErOqUHOVcnzCDHjm/
# V1IVdbjJPJ6fUUYij6v0wJpAUMkM1nNqWurmMSxSPjBKZ2TUSJs=
# =7VkG
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Mar 2024 07:13:42 AM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
This is the 5.4.269 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXYSHgACgkQONu9yGCS
# aT74ag/6AqWJBzK/2xvUCYjfBU5+4ApFWQt47Ly9MKFhFX7YBjQGXS6av1YFA9Kw
# i01R9SCpIv2eaDrM7/J0wvXGybemfvQ8VyNngG30QC/0jTc4ZAj0PEbtyHpUaz4F
# HWOFfAlHAYcLQWhmjhXitoGUfeyhchWnQZpn45mkT0i3DSAEFc5gsiMlO+jaM8No
# hOaAHEpGsd7zlH32NYpWrFI0i54HSCwlaHlQFJ7U+rbWyG935RdLjMAX+488R8oc
# KccOj+xb4zQyASdC7qdgPz02U7Tm3UB5s0lLrviDiBrYVxSe77vw2TBEeejF+7ZE
# oYqjsygRYmRbKuI55rxyxph7cT93ZctL48DZJ4fT4zVIT9kak3S/NtFs0Hyr3TkY
# N6ZlDnd10cj8QsnXXtTd9QgT7Ind+3KySv7sr4a/gLO/N39EYpztrMCc/lKfG/Bu
# MPDMXBrEtKkjMelcnISwac9QcOb/MAJaepCWtYgcXbEcaBP+/Or8OM0yZPOEk7SA
# 3CamE+ou0Ds/c6gnsBw6WDMTJd+sX6sw6+4cMEaWzaWiE12Ryc0gscCDJXjEAYzc
# +47PiPijNJ+iPjsos8ZaNnTQHALemgJ4cjolHivsEvAYU1s5cyKjVEgMB1MJN8ib
# y19D9L8T9BtG2ukWBxtIXMIt51VZ7B8fXodcYXbyqtV25JZj/k8=
# =cJfu
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 23 Feb 2024 02:25:44 AM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
[ Upstream commit aaa8736370db1a78f0e8434344a484f9fd20be3b ]
When building with CONFIG_XEN_PV=y, .text symbols are emitted into
the .notes section so that Xen can find the "startup_xen" entry point.
This information is used prior to booting the kernel, so relocations
are not useful. In fact, performing relocations against the .notes
section means that the KASLR base is exposed since /sys/kernel/notes
is world-readable.
To avoid leaking the KASLR base without breaking unprivileged tools that
are expecting to read /sys/kernel/notes, skip performing relocations in
the .notes section. The values readable in .notes are then identical to
those found in System.map.
Reported-by: Guixiong Wei <guixiongwei@gmail.com>
Closes: https://lore.kernel.org/all/20240218073501.54555-1-guixiongwei@gmail.com/
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Fixes: da1a679cde9b ("Add /sys/kernel/notes")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3693bb4465e6e32a204a5b86d3ec7e6b9f7e67c2 ]
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401161119.iof6BQsf-lkp@intel.com/
Suggested-by: Markus Elfring <Markus.Elfring@web.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240119094948.275390-1-chentao@kylinos.cn
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 386093c68ba3e8bcfe7f46deba901e0e80713c29 ]
There doesn't seem to be any reason for the rpath being set in
the binaries, at on systems that I tested on. On the other hand,
setting rpath is actually harming binaries in some cases, e.g.
if using nix-based compilation environments where /lib & /lib64
are not part of the actual environment.
Add a new Kconfig option (under EXPERT, for less user confusion)
that allows disabling the rpath additions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Stable-dep-of: 846cfbeed09b ("um: Fix adding '-no-pie' for clang")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6890cb1ace350b4386c8aee1343dc3b3ddd214da upstream.
MKTME repurposes the high bit of physical address to key id for encryption
key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same,
the valid bits in the MTRR mask register are based on the reduced number
of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts
it from the total usable physical bits, but it is called too late.
Move the call to early_init_intel() so that it is called in setup_arch(),
before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with
"disable_mtrr_cleanup". Without the patch, the values written to the
MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and
the writes failed; with the patch, the values are 46-bit wide, which
matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen <zixchen@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d35652a5fc9944784f6f50a5c979518ff8dacf61 ]
Fei has reported that KASAN triggers during apply_alternatives() on
a 5-level paging machine:
BUG: KASAN: out-of-bounds in rcu_is_watching()
Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
...
__asan_load4()
rcu_is_watching()
trace_hardirqs_on()
text_poke_early()
apply_alternatives()
...
On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57)
gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on
__VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled().
KASAN gets confused when apply_alternatives() patches the
KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START
static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue.
Fix it for real by disabling KASAN while the kernel is patching alternatives.
[ mingo: updated the changelog ]
Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y")
Reported-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream.
When ident_pud_init() uses only gbpages to create identity maps, large
ranges of addresses not actually requested can be included in the
resulting table; a 4K request will map a full GB. On UV systems, this
ends up including regions that will cause hardware to halt the system
if accessed (these are marked "reserved" by BIOS). Even processor
speculation into these regions is enough to trigger the system halt.
Only use gbpages when map creation requests include the full GB page
of space. Fall back to using smaller 2M pages when only portions of a
GB page are included in the request.
No attempt is made to coalesce mapping requests. If a request requires
a map entry at the 2M (pmd) level, subsequent mapping requests within
the same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could have been combined to map a full
gbpage. Existing usage starts with larger regions and then adds
smaller regions, so this should not have any great consequence.
[ dhansen: fix up comment formatting, simplifty changelog ]
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f6a1892585cd19e63c4ef2334e26cd536d5b678d upstream.
The kernel built with MCRUSOE is unbootable on Transmeta Crusoe. It shows
the following error message:
This kernel requires an i686 CPU, but only detected an i586 CPU.
Unable to boot - please use a kernel appropriate for your CPU.
Remove MCRUSOE from the condition introduced in commit in Fixes, effectively
changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the
CPU family given by CPUID.
[ bp: Massage commit message. ]
Fixes: 25d76ac88821 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig")
Signed-off-by: Aleksander Mazur <deweloper@wp.pl>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]
Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.
When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.
However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.
Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.
[ bp: Rewrite commit message and comment. ]
Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 56062d60f117dccfb5281869e0ab61e090baf864 upstream.
Presently ia32 registers stored in ptregs are unconditionally cast to
unsigned int by the ia32 stub. They are then cast to long when passed to
__se_sys*, but will not be sign extended.
This takes the sign of the syscall argument into account in the ia32
stub. It still casts to unsigned int to avoid implementation specific
behavior. However then casts to int or unsigned int as necessary. So that
the following cast to long sign extends the value.
This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently
the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the
maximum number of events.
It doesn't appear other systemcalls with signed arguments are effected
because they all have compat variants defined and wired up.
Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com
Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The stable kernel version backport of the patch disabling XSAVES on AMD
Zen family 0x17 applied this change to the wrong function (init_amd_k6()),
one which isn't called for Zen CPUs.
Move the erratum to the init_amd_zn() function instead.
Add an explicit family 0x17 check to the erratum so nothing will break if
someone naively makes this kernel version call init_amd_zn() also for
family 0x19 in the future (as the current upstream code does).
Fixes: e40c1e9da1ec ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
This is the 5.4.268 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWy4hYACgkQONu9yGCS
# aT7SVBAAyx1DlSyJWcqzpESH0+VfqyWHxXlKS6Ip5wT0/+t0gglIKkwU/O0FsRXw
# pLO24wL0+MuIzgfZZj7wieAOPlGLOonKAvvUHGEMlpfAzyKjmZuW93WLKQlA/Oec
# uaT2ooQevRQcgXzbuV1yN/CeCnhbtmiQdcwy6OU5QACfzguQYtDbNGpbVHJEyEIW
# khlr+tj1KgRMzh/Sx76RPg4C/hkZBHun3tPcE0lTg+5QZDSkUj5gEdhVOSG2qmSh
# Lj9zt/isY3v6Whixel9YoTLr9SukI7ZlKzMrH1kSbGtTW3uZqgqB+7wCi1tWoNE1
# Zwu9/kUe1dU1kfwYW8AA5OwupjBjADVnZZx1cKN3nQZG2J8bSKHwHmuZPx3DGhJ1
# sxlaQ0nGvcEbCKljlIqsHzx2U22YKk939mVz5Y+MZYT5uwWRHI+iH4yRW97putSP
# t8tb3uX69Gsl6B+gLu38Mr7kkwyY06xmMnc5dfNCPwh8SxLj3dG7Gft90CNq1JKT
# q2cwlMEcDZRlC08kwzD7pRehZ6hYLRlTOv8yhQsQefcfzrtsT18Cec5TI2k72NOe
# fbIY8us3Qsr8JVSYuObGqT8LmkX9pkmRozEXgENvwltijEsWULoO2Hs+Z/yD07z8
# RYqtxWxVxFVeHTkrXbbMUTZWhFx5LE+rtxCySpfeFkv0WgRRwa8=
# =vkKq
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 25 Jan 2024 05:35:02 PM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
commit 1c6d984f523f67ecfad1083bb04c55d91977bb15 upstream.
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:
unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)
kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.
Do not disable kvmclock if it was not enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a24d61c609813963aacc9f6ec8343f4fcaac7243 ]
tl;dr: The num_digits() function has a theoretical overflow issue.
But it doesn't affect any actual in-tree users. Fix it by using
a larger type for one of the local variables.
Long version:
There is an overflow in variable m in function num_digits when val
is >= 1410065408 which leads to the digit calculation loop to
iterate more times than required. This results in either more
digits being counted or in some cases (for example where val is
1932683193) the value of m eventually overflows to zero and the
while loop spins forever).
Currently the function num_digits is currently only being used for
small values of val in the SMP boot stage for digit counting on the
number of cpus and NUMA nodes, so the overflow is never encountered.
However it is useful to fix the overflow issue in case the function
is used for other purposes in the future. (The issue was discovered
while investigating the digit counting performance in various
kernel helper functions rather than any real-world use-case).
The simplest fix is to make m a long long, the overhead in
multiplication speed for a long long is very minor for small values
of val less than 10000 on modern processors. The alternative
fix is to replace the multiplication with a constant division
by 10 loop (this compiles down to an multiplication and shift)
without needing to make m a long long, but this is slightly slower
than the fix in this commit when measured on a range of x86
processors).
[ dhansen: subject and changelog tweaks ]
Fixes: 646e29a1789a ("x86: Improve the printout of the SMP bootup CPU table")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231102174901.2590325-1-colin.i.king%40gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
This is the 5.4.266 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWbzp8ACgkQONu9yGCS
# aT6Zug//b3T7xamNVN4IseLrH0FdOl0RkYwOhEl+cH79qn4PnNCuZuch7+RK/tT1
# LRp/8sSKYPP4H+MI3RU2f/KIYcZBgoLHzEkRfpgzxpxg41vvNQF96+Li9xAzhfuy
# 9nKBf+AbZFCUrTHs8je13SseMXOHzMQlZDjoJk5m5yCw7LWF0FPeghePK2tSctYY
# 8yXdvi3J36wOwpKGihO6RqYvvY2OV+NE2ky/U7Wseo7+1/GsJaMIjMDK/HMon/nv
# Y0HB1tmvljzp6rqMw4f1UBvKGEj9ataYOaJzwsmXLcBRAlFjKMSGx2A6/Ad8OGTP
# zhHcaXegLvCNSGBNGY9kaXe3eF04e1T58W3yFfz4tN8UvhKipLO5vbScOiDsLyKr
# 9oNcFZAYiKX3OvFVEhUS8LL1r+gXUu6wR2gUeR7a02ZQA+Bj56lIRLAmINcHmyFa
# Pgtbv1I+foU/kt4ckBxoe68B9kcIbWIfnm/l+Ioy96CENRnXDyuE/bts3dFqbb4a
# Hka9JphZ8PfFwe09ZOJ1AN2cbSr/eDo7UPMrI5RRQq4sBMSqFo2B+c4YVWbEVIM/
# xu4ZnLMa04wy6rMbGlkwtgDuyuZu2f22kWkEuYmya0BbrbeH4QVlAq34CEj1wti2
# 4tQkCPErjBfrgRZdI3Qx61Lskg944BV7EuxnwdffmB3mmXBr5V0=
# =5A3R
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 08 Jan 2024 05:29:51 AM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
commit 3ea1704a92967834bf0e64ca1205db4680d04048 upstream.
text_poke_early() does:
local_irq_save(flags);
memcpy(addr, opcode, len);
local_irq_restore(flags);
sync_core();
That's not really correct because the synchronization should happen before
interrupts are re-enabled to ensure that a pending interrupt observes the
complete update of the opcodes.
It's not entirely clear whether the interrupt entry provides enough
serialization already, but moving the sync_core() invocation into interrupt
disabled region does no harm and is obviously correct.
Fixes: 6fffacb30349 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
This is the 5.4.264 stable release
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmV5528ACgkQONu9yGCS
# aT7+DxAAl/t4oGT1Di8mIhCfqsezTj/SQ6HAMaAFpKjGXdgTBX9QavwTHp35qLpv
# xZtibdl611oEDT2B3//cvJu9Qs8sJGlDWxpJF8fkF/NED2EHLRwUAOmhc4fyBCBq
# 01GfPRefU9G5E0Nw2g3o07dD0otKQzENh74iAzUr/cyju5TBgS4NC7CljiD6GXP8
# DtCcbz2UMmHF5icyasjw7WoCqKaWUn7KLqU3RyQfmDgnh3z0vSXKs17CFoNk1+R6
# EAkZJkrUsIDO3N6aRkJGvwtbEFDws4S7onpcthckXt6IF/OQ+h8LcHvyTpblVeWH
# qVBXmj1QZD+SwfP5qVtM+RwHHTE3GVJI3+MVTInu0vzgEz6lhNPILzWwso+AIhHM
# +SBZkx4/pO2LbrSgqn+NYWRcBTn4fXNURPd+zLXNJ1ZSnhamWm7kuFK8B36lz0s8
# CB5ngLld1wg6m2NVPneqAeEcD54msGd18iNZvnxfx3c28sb27RgLqLJQKdSrtFMH
# tZj0+3KGkYHzWQmt07iLmg1DPAmZSNzMW7MDJRHlqDCKS2gBKO/LKxWiri2WPh+o
# IvgrfGzEJsyYXi5+JcGC9+RkoPcGaous4VA2kWFJSjXiXPrJC4jknQm3mtdMiTf3
# V9DEOMvot9ELGI9RODaNsk59C2RhOuVimm2XWV9n88UcX38reAQ=
# =gOna
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 13 Dec 2023 12:18:39 PM EST
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Can't check signature: No public key
|
|
commit 9b8493dc43044376716d789d07699f17d538a7c4 upstream.
Commit in Fixes added an AMD-specific microcode callback. However, it
didn't check the CPU vendor the kernel runs on explicitly.
The only reason the Zenbleed check in it didn't run on other x86 vendors
hardware was pure coincidental luck:
if (!cpu_has_amd_erratum(c, amd_zenbleed))
return;
gives true on other vendors because they don't have those families and
models.
However, with the removal of the cpu_has_amd_erratum() in
05f5f73936fa ("x86/CPU/AMD: Drop now unused CPU erratum checking function")
that coincidental condition is gone, leading to the zenbleed check
getting executed on other vendors too.
Add the explicit vendor check for the whole callback as it should've
been done in the first place.
Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231201184226.16749-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|