aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/vmlinux.lds.S
AgeCommit message (Collapse)Author
2014-11-04x86-64: Use RIP-relative addressing for most per-CPU accessesJan Beulich
Observing that per-CPU data (in the SMP case) is reachable by exploiting 64-bit address wraparound (building on the default kernel load address being at 16Mb), the one byte shorter RIP-relative addressing form can be used for most per-CPU accesses. The one exception are the "stable" reads, where the use of the "P" operand modifier prevents the compiler from using RIP-relative addressing, but is unavoidable due to the use of the "p" constraint (side note: with gcc 4.9.x the intended effect of this isn't being achieved anymore, see gcc bug 63637). With the dependency on the minimum kernel load address, arbitrarily low values for CONFIG_PHYSICAL_START are now no longer possible. A link time assertion is being added, directing to the need to increase that value when it triggers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-18x86, vdso: Zero-pad the VVAR pageAndy Lutomirski
By coincidence, the VVAR page is at the end of an ELF segment. As a result, if it ends up being a partial page, the kernel loader will leave garbage behind at the end of the vvar page. Zero-pad it to a full page to fix this issue. This has probably been broken since the VVAR page was introduced. On QEMU, if you dump the run-time contents of the VVAR page, you can find entertaining strings from seabios left behind. It's remotely possible that this is a security bug -- conceivably there's some BIOS out there that leaves something sensitive in the few K of memory that is exposed to userspace. Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-12-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18x86, vdso: Make vsyscall_gtod_data handling x86 genericStefani Seibold
This patch move the vsyscall_gtod_data handling out of vsyscall_64.c into an additonal file vsyscall_gtod.c to make the functionality available for x86 32 bit kernel. It also adds a new vsyscall_32.c which setup the VVAR page. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17x86: intel-mid: Add section for sfi device tableDavid Cohen
When Intel mid uses SFI table to enumerate devices, it requires an extra device table with further information about how to probe such devices. This patch creates a section where the device table will stay if CONFIG_X86_INTEL_MID is selected. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-12-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-03-11x86: Drop always empty .text..page_aligned sectionJan Beulich
Commit e44b7b7 ("x86: move suspend wakeup code to C") didn't care to also eliminate the side effects that the earlier 4c49156 ("x86: make arch/x86/kernel/acpi/wakeup_32.S use a separate") had, thus leaving a now pointless, almost page size gap at the beginning of .text. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Machek <pavel@ucw.cz> Link: http://lkml.kernel.org/r/513DBAA402000078000C4896@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08x86, realmode: Move ACPI wakeup to unified realmode codeJarkko Sakkinen
Migrated ACPI wakeup code to the real-mode blob. Code existing in .x86_trampoline can be completely removed. Static descriptor table in wakeup_asm.S is courtesy of H. Peter Anvin. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-7-git-send-email-jarkko.sakkinen@intel.com Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <len.brown@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10x86-64: Rework vsyscall emulation and add vsyscall= parameterAndy Lutomirski
There are three choices: vsyscall=native: Vsyscalls are native code that issues the corresponding syscalls. vsyscall=emulate (default): Vsyscalls are emulated by instruction fault traps, tested in the bad_area path. The actual contents of the vsyscall page is the same as the vsyscall=native case except that it's marked NX. This way programs that make assumptions about what the code in the page does will not be confused when they read that code. vsyscall=none: Trying to execute a vsyscall will segfault. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04x86-64: Work around gold bug 13023Andy Lutomirski
Gold has trouble assigning numbers to the location counter inside of an output section description. The bug was triggered by 9fd67b4ed0714ab718f1f9bd14c344af336a6df7, which consolidated all of the vsyscall sections into a single section. The workaround is IMO still nicer than the old way of doing it. This produces an apparently valid kernel image and passes my vdso tests on both GNU ld version 2.21.51.0.6-2.fc15 20110118 and GNU gold (version 2.21.51.0.6-2.fc15 20110118) 1.10 as distributed by Fedora 15. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/0b260cb806f1f9a25c00ce8377a5f035d57f557a.1312378163.git.luto@mit.edu Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04x86-64: Move the "user" vsyscall segment out of the data segment.Andy Lutomirski
The kernel's loader doesn't seem to care, but gold complains. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f0716870c297242a841b949953d80c0d87bf3d3f.1312378163.git.luto@mit.edu Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14x86-64: Move vread_tsc and vread_hpet into the vDSOAndy Lutomirski
The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-06x86-64: Fill unused parts of the vsyscall page with 0xccAndy Lutomirski
Jumping to 0x00 might do something depending on the following bytes. Jumping to 0xcc is a trap. So fill the unused parts of the vsyscall page with 0xcc to make it useless for exploits to jump there. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/ed54bfcfbe50a9070d20ec1edbe0d149e22a4568.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-06x86-64: Remove vsyscall number 3 (venosys)Andy Lutomirski
It just segfaults since April 2008 (a4928cff), so I'm pretty sure that nothing uses it. And having an empty section makes the linker script a bit fragile. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/4a4abcf47ecadc269f2391a313576fe6d06acef7.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-05x86-64: Give vvars their own pageAndy Lutomirski
Move vvars out of the vsyscall page into their own page and mark it NX. Without this patch, an attacker who can force a daemon to call some fixed address could wait until the time contains, say, 0xCD80, and then execute the current time. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/b1460f81dc4463d66ea3f2b5ce240f58d48effec.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: vdso: Remove unused variable x86-64: Optimize vDSO time() x86-64: Add time to vDSO x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO x86-64: Move vread_tsc into a new file with sensible options x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 x86-64: Don't generate cmov in vread_tsc x86-64: Remove unnecessary barrier in vread_tsc x86-64: Clean up vdso/kernel shared variables
2011-05-24Merge branch 'for-2.6.40' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Unify input section names percpu: Avoid extra NOP in percpu_cmpxchg16b_double percpu: Cast away printk format warning percpu: Always align percpu output section to PAGE_SIZE Fix up fairly trivial conflict in arch/x86/include/asm/percpu.h as per Tejun
2011-05-24x86-64: Clean up vdso/kernel shared variablesAndy Lutomirski
Variables that are shared between the vdso and the kernel are currently a bit of a mess. They are each defined with their own magic, they are accessed differently in the kernel, the vsyscall page, and the vdso, and one of them (vsyscall_clock) doesn't even really exist. This changes them all to use a common mechanism. All of them are delcared in vvar.h with a fixed address (validated by the linker script). In the kernel (as before), they look like ordinary read-write variables. In the vsyscall page and the vdso, they are accessed through a new macro VVAR, which gives read-only access. The vdso is now loaded verbatim into memory without any fixups. As a side bonus, access from the vdso is faster because a level of indirection is removed. While we're at it, pack jiffies and vgetcpu_mode into the same cacheline. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-22x86, apic: Introduce .apicdrivers section to find the list of apic driversSuresh Siddha
This will pave the way for each apic driver to be self-contained and eliminate the need for apic_probe[]. Order in which apic drivers are listed in the .apicdrivers section is important, as this determines the apic probe order. And this is enforced by the ordering of apic driver files in the Makefile and the macros apic_driver()/apic_drivers(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: steiner@sgi.com Cc: gorcunov@openvz.org Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110521005526.068775085@sbsiddha-MOBL3.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-24percpu: Always align percpu output section to PAGE_SIZETejun Heo
Percpu allocator honors alignment request upto PAGE_SIZE and both the percpu addresses in the percpu address space and the translated kernel addresses should be aligned accordingly. The calculation of the former depends on the alignment of percpu output section in the kernel image. The linker script macros PERCPU_VADDR() and PERCPU() are used to define this output section and the latter takes @align parameter. Several architectures are using @align smaller than PAGE_SIZE breaking percpu memory alignment. This patch removes @align parameter from PERCPU(), renames it to PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it, add PCPU_SETUP_BUG_ON() checks such that alignment problems are reliably detected and remove percpu alignment comment recently added in workqueue.c as the condition would trigger BUG way before reaching there. For um, this patch raises the alignment of percpu area. As the area is in .init, there shouldn't be any noticeable difference. This problem was discovered by David Howells while debugging boot failure on mn10300. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: uclinux-dist-devel@blackfin.uclinux.org Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: user-mode-linux-devel@lists.sourceforge.net
2011-03-16Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix binutils-2.21 symbol related build failures x86-64, trampoline: Remove unused variable x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot x86, reboot: Move the real-mode reboot code to an assembly file x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly x86, trampoline: Use the unified trampoline setup for ACPI wakeup x86, trampoline: Common infrastructure for low memory trampolines Fix up trivial conflicts in arch/x86/kernel/Makefile
2011-03-16Merge branch 'for-2.6.39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support percpu: Generic support for this_cpu_cmpxchg_double() alpha: use L1_CACHE_BYTES for cacheline size in the linker script percpu: align percpu readmostly subsection to cacheline Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the percpu alignment having changed ("x86: Reduce back the alignment of the per-CPU data section")
2011-03-15Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, binutils, xen: Fix another wrong size directive x86: Remove dead config option X86_CPU x86: Really print supported CPUs if PROCESSOR_SELECT=y x86: Fix a bogus unwind annotation in lib/semaphore_32.S um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S x86: Remove unused bits from lib/thunk_*.S x86: Use {push,pop}_cfi in more places x86-64: Add CFI annotations to lib/rwsem_64.S x86, asm: Cleanup unnecssary macros in asm-offsets.c x86, system.h: Drop unused __SAVE/__RESTORE macros x86: Use bitmap library functions x86: Partly unify asm-offsets_{32,64}.c x86: Reduce back the alignment of the per-CPU data section
2011-03-08x86: Separate out entry text sectionJiri Olsa
Put x86 entry code into a separate link section: .entry.text. Separating the entry text section seems to have performance benefits - caused by more efficient instruction cache usage. Running hackbench with perf stat --repeat showed that the change compresses the icache footprint. The icache load miss rate went down by about 15%: before patch: 19417627 L1-icache-load-misses ( +- 0.147% ) after patch: 16490788 L1-icache-load-misses ( +- 0.180% ) The motivation of the patch was to fix a particular kprobes bug that relates to the entry text section, the performance advantage was discovered accidentally. Whole perf output follows: - results for current tip tree: Performance counter stats for './hackbench/hackbench 10' (500 runs): 19417627 L1-icache-load-misses ( +- 0.147% ) 2676914223 instructions # 0.497 IPC ( +- 0.079% ) 5389516026 cycles ( +- 0.144% ) 0.206267711 seconds time elapsed ( +- 0.138% ) - results for current tip tree with the patch applied: Performance counter stats for './hackbench/hackbench 10' (500 runs): 16490788 L1-icache-load-misses ( +- 0.180% ) 2717734941 instructions # 0.502 IPC ( +- 0.079% ) 5414756975 cycles ( +- 0.148% ) 0.206747566 seconds time elapsed ( +- 0.137% ) Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-17x86, trampoline: Common infrastructure for low memory trampolinesH. Peter Anvin
Common infrastructure for low memory trampolines. This code installs the trampolines permanently in low memory very early. It also permits multiple pieces of code to be used for this purpose. This code also introduces a standard infrastructure for computing symbol addresses in the trampoline code. The only change to the actual SMP trampolines themselves is that the 64-bit trampoline has been made reusable -- the previous version would overwrite the code with a status variable; this moves the status variable to a separate location. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <4D5DFBE4.7090104@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2011-02-10x86: Reduce back the alignment of the per-CPU data sectionJan Beulich
This complements commit: 47f19a0814e8: percpu: Remove the multi-page alignment facility reverting one leftover of: fe8e0c25cad2: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4D525CE60200007800030EE5@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Alexander van Heukelum <heukelum@fastmail.fm>
2011-01-25percpu: align percpu readmostly subsection to cachelineTejun Heo
Currently percpu readmostly subsection may share cachelines with other percpu subsections which may result in unnecessary cacheline bounce and performance degradation. This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR() linker macros, makes each arch linker scripts specify its cacheline size and use it to align percpu subsections. This is based on Shaohua's x86 only patch. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shaohua.li@intel.com>
2011-01-19Revert "x86: Make relocatable kernel work with new binutils"Ingo Molnar
This reverts commit 86b1e8dd83cb ("x86: Make relocatable kernel work with new binutils"). Markus Trippelsdorf reported a boot failure caused by this patch. The real solution to the original patch will likely involve an arch-generic solution to define an overlaid jiffies_64 and jiffies variables. Until that's done and tested on all architectures revert this commit to solve the regression. Reported-and-bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <4D36A759.60704@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-18x86: Make relocatable kernel work with new binutilsShaohua Li
The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make boot panic. According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to 2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is using such binutils. See: http://sourceware.org/bugzilla/show_bug.cgi?id=12327 The reason of the boot panic is that we have 'jiffies = jiffies_64;' in vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is warning saying jiffies is an absolute address and can't be relocatable. At runtime, jiffies will have virtual address 0. Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Lu Hongjiu<hongjiu.lu@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1295312269.1949.725.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86: Add NX protection for kernel dataMatthieu Castet
This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22Merge branch 'x86-irq-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE x86: Move alloc_desk_mask variables inside ifdef x86-32: Align IRQ stacks properly x86: Remove CONFIG_4KSTACKS x86: Always use irq stacks Fixed up trivial conflicts in include/linux/{irq.h, percpu-defs.h}
2010-09-07x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZEAlexander van Heukelum
The irq stacks, located in the percpu-area, need to be THREAD_SIZE aligned. Add the infrastucture to align percpu variables to larger-than-pagesize amounts within the percpu area, and use it to specify the alignment for the irq stacks. Also align the percpu area itself to THREAD_SIZE. This should make irq stacks work with 8K THREAD_SIZE. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Tejun Heo <tj@kernel.org> Cc: hch@lst.de LKML-Reference: <1283799222.15941.1393621887@webmail.messagingengine.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31x86, iommu: Fix IOMMU_INIT alignment rulesKonrad Rzeszutek Wilk
This boot crash was observed: DMA-API: preallocated 32768 debug entries DMA-API: debugging enabled by kernel config BUG: unable to handle kernel paging request at 19da8955 IP: [<f4ffffff>] 0xf4ffffff *pde = 00000000 The crux of the failure was that even if we did not use any of the .iommu_table section, the linker would still insert it in the vmlinux file. This patch fixes that and also fixes the runtime crash where we would try to access the array. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-27x86, doc: Adding comments about .iommu_table and its neighbors.Konrad Rzeszutek Wilk
Updating the linker section with comments about .iommu_table and some other ones that I know of. CC: Sam Ravnborg <sam@ravnborg.org> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282933173-19960-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-26x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and ↵Konrad Rzeszutek Wilk
iommu_table_entry structure This patch set adds a mechanism to "modularize" the IOMMUs we have on X86. Currently the count of IOMMUs is up to six and they have a complex relationship that requires careful execution order. 'pci_iommu_alloc' does that today, but most folks are unhappy with how it does it. This patch set addresses this and also paves a mechanism to jettison unused IOMMUs during run-time. For details that sparked this, please refer to: http://lkml.org/lkml/2010/8/2/282 The first solution that comes to mind is to convert wholesale the IOMMU detection routines to be called during initcall time frame. Unfortunately that misses the dependency relationship that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work, GART detection MUST run first, and before all of that SWIOTLB MUST run). The second solution would be to introduce a registration call wherein the IOMMU would provide its detection/init routines and as well on what MUST run before it. That would work, except that the 'pci_iommu_alloc' which would run through this list, is called during mem_init. This means we don't have any memory allocator, and it is so early that we haven't yet started running through the initcall_t list. This solution borrows concepts from the 2nd idea and from how MODULE_INIT works. A macro is provided that each IOMMU uses to define it's detect function and early_init (before the memory allocate is active), and as well what other IOMMU MUST run before us. Since most IOMMUs depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro to depends on that is also provided. This macro is similar in design to MODULE_PARAM macro wherein we setup a .iommu_table section in which we populate it with the values that match a struct iommu_table_entry. During bootup we will sort through the array so that the IOMMUs that MUST run before us are first elements in the array. And then we just iterate through them calling the detection routine and if appropiate, the init routines. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-01Merge branch 'for-35' of git://repo.or.cz/linux-kbuildLinus Torvalds
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits) kbuild: Revert part of e8d400a to resolve a conflict kbuild: Fix checking of scm-identifier variable gconfig: add support to show hidden options that have prompts menuconfig: add support to show hidden options which have prompts gconfig: remove show_debug option gconfig: remove dbg_print_ptype() and dbg_print_stype() kconfig: fix zconfdump() kconfig: some small fixes add random binaries to .gitignore kbuild: Include gen_initramfs_list.sh and the file list in the .d file kconfig: recalc symbol value before showing search results .gitignore: ignore *.lzo files headerdep: perlcritic warning scripts/Makefile.lib: Align the output of LZO kbuild: Generate modules.builtin in make modules_install Revert "kbuild: specify absolute paths for cscope" kbuild: Do not unnecessarily regenerate modules.builtin headers_install: use local file handles headers_check: fix perl warnings export_report: fix perl warnings ...
2010-03-29x86: Make smp_locks end with page alignmentYinghai Lu
Fix: ------------[ cut here ]------------ WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa() free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace: [<40232e9f>] warn_slowpath_common+0x65/0x7c [<4021c9f0>] ? free_init_pages+0x4c/0xfa [<40881434>] ? _etext+0x0/0x24 [<40232eea>] warn_slowpath_fmt+0x24/0x27 [<4021c9f0>] free_init_pages+0x4c/0xfa [<40881434>] ? _etext+0x0/0x24 [<40d3f4bd>] alternative_instructions+0xf6/0x100 [<40d3fe4f>] check_bugs+0xbd/0xbf [<40d398a7>] start_kernel+0x2d5/0x2e4 [<40d390ce>] i386_start_kernel+0xce/0xd5 ---[ end trace 4eaa2a86a8e2da22 ]--- Comments in vmlinux.lds.S already said: | /* | * smp_locks might be freed after init | * start/end must be page aligned | */ Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03Rename .text.page_aligned to .text..page_aligned.Denys Vlasenko
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
2010-03-03Rename .bss.page_aligned to .bss..page_aligned.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
2010-01-05Merge branch 'master' into percpuTejun Heo
Conflicts: arch/powerpc/platforms/pseries/hvCall.S include/linux/percpu.h
2009-12-14x86: Regex support and known-movable symbols for relocs, fix _endH. Peter Anvin
This adds a new category of symbols to the relocs program: symbols which are known to be relative, even though the linker emits them as absolute; this is the case for symbols that live in the linker script, which currently applies to _end. Unfortunately the previous workaround of putting _end in its own empty section was defeated by newer binutils, which remove empty sections completely. This patch also changes the symbol matching to use regular expressions instead of hardcoded C for specific patterns. This is a decidedly non-minimal patch: a modified version of the relocs program is used as part of the Syslinux build, and this is basically a backport to Linux of some of those changes; they have thus been well tested. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4AF86211.3070103@zytor.com> Acked-by: Michal Marek <mmarek@suse.cz> Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2009-12-08Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) x86, mm: Correct the implementation of is_untracked_pat_range() x86/pat: Trivial: don't create debugfs for memtype if pat is disabled x86, mtrr: Fix sorting of mtrr after subtracting x86: Move find_smp_config() earlier and avoid bootmem usage x86, platform: Change is_untracked_pat_range() to bool; cleanup init x86: Change is_ISA_range() into an inline function x86, mm: is_untracked_pat_range() takes a normal semiclosed range x86, mm: Call is_untracked_pat_range() rather than is_ISA_range() x86: UV SGI: Don't track GRU space in PAT x86: SGI UV: Fix BAU initialization x86, numa: Use near(er) online node instead of roundrobin for NUMA x86, numa, bootmem: Only free bootmem on NUMA failure path x86: Change crash kernel to reserve via reserve_early() x86: Eliminate redundant/contradicting cache line size config options x86: When cleaning MTRRs, do not fold WP into UC x86: remove "extern" from function prototypes in <asm/proto.h> x86, mm: Report state of NX protections during boot x86, mm: Clean up and simplify NX enablement x86, pageattr: Make set_memory_(x|nx) aware of NX support x86, sleep: Always save the value of EFER ... Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range) to 'struct x86_platform_ops') in arch/x86/include/asm/x86_init.h arch/x86/kernel/x86_init.c
2009-11-19x86: Eliminate redundant/contradicting cache line size config optionsJan Beulich
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT (with inconsistent defaults), just having the latter suffices as the former can be easily calculated from it. To be consistent, also change X86_INTERNODE_CACHE_BYTES to X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA to account for last level cache line size (which here matters more than L1 cache line size). Finally, make sure the default value for X86_L1_CACHE_SHIFT, when X86_GENERIC is selected, is being seen before that for the individual CPU model options (other than on x86-64, where GENERIC_CPU is part of the choice construct, X86_GENERIC is a separate option on ix86). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ravikiran Thirumalai <kiran@scalex86.org> Acked-by: Nick Piggin <npiggin@suse.de> LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29percpu: remove per_cpu__ prefix.Rusty Russell
Now that the return from alloc_percpu is compatible with the address of per-cpu vars, it makes sense to hand around the address of per-cpu variables. To make this sane, we remove the per_cpu__ prefix we used created to stop people accidentally using these vars directly. Now we have sparse, we can use that (next patch). tj: * Updated to convert stuff which were missed by or added after the original patch. * Kill per_cpu_var() macro. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
2009-10-20x86-64: add comment for RODATA large page retainmentSuresh Siddha
Add a comment explaining why RODATA is aligned to 2 MB. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATASuresh Siddha
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel text/rodata/data to small 4KB pages as they are mapped with different attributes (text as RO, RODATA as RO and NX etc). On x86_64, preserve the large page mappings for kernel text/rodata/data boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the RODATA section to be hugepage aligned and having same RWX attributes for the 2MB page boundaries Extra Memory pages padding the sections will be freed during the end of the boot and the kernel identity mappings will have different RWX permissions compared to the kernel text mappings. Kernel identity mappings to these physical pages will be mapped with smaller pages but large page mappings are still retained for kernel text,rodata,data mappings. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-16x86: Document linker script ASSERT() quirkIngo Molnar
Older binutils breaks if ASSERT() is used without a sink for the output. For example 2.14.90.0.6 is known to be broken, the link fails with: LD .tmp_vmlinux1 ld:arch/x86/kernel/vmlinux.lds:678: parse error Document this quirk in all three files that use it. See: http://marc.info/?l=linux-kbuild&m=124930110427870&w=2 See[2]: d2ba8b2 ("x86: Fix assert syntax in vmlinux.lds.S") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Roland McGrath <roland@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <4AD6523D.5030909@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15Revert "x86: linker script syntax nits"Ingo Molnar
This reverts commit e9a63a4e559fbdc522072281d05e6b13c1022f4b. This breaks older binutils, where sink-less asserts are broken. See this commit for further details: d2ba8b2: x86: Fix assert syntax in vmlinux.lds.S Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4AD6523D.5030909@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14x86: linker script syntax nitsRoland McGrath
The linker scripts grew some use of weirdly wrong linker script syntax. It happens to work, but it's not what the syntax is documented to be. Clean it up to use the official syntax. Signed-off-by: Roland McGrath <roland@redhat.com> CC: Ian Lance Taylor <iant@google.com>
2009-09-25Merge branch 'x86/asm' into x86/urgentIngo Molnar
Merge reason: The linker script cleanups are ready for upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-20x86: Correct segment permission flags in 64-bit linker scriptJan Beulich
While these don't get actively used (afaict), it still doesn't hurt for them to properly reflect what how respective segments will get mapped/ accessed. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4AA0E95F0200007800013707@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18x86: Cleanup linker script using new linker script macros.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>