aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2009-12-07Merge branch 'for-next' into for-linusJiri Kosina
Conflicts: kernel/irq/chip.c
2009-12-07x86 insn: Delete empty or incomplete inat-tables.cMasami Hiramatsu
Delete empty or incomplete inat-tables.c if gen-insn-attr-x86.awk failed, because it causes a build error if user tries to build kernel next time. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091207170033.19230.37688.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07x86: Fix bogus warning in apic_noop.apic_write()Thomas Gleixner
apic_noop is used to provide dummy apic functions. It's installed when the CPU has no APIC or when the APIC is disabled on the kernel command line. The apic_noop implementation of apic_write() warns when the CPU has an APIC or when the APIC is not disabled. That's bogus. The warning should only happen when the CPU has an APIC _AND_ the APIC is not disabled. apic_noop.apic_read() has the correct check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: <stable@kernel.org> # in <= .32 this typo resides in native_apic_write_dummy() LKML-Reference: <alpine.LFD.2.00.0912071255420.3089@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: we want to queue up a dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07x86: Compile insn.c and inat.c only for KPROBESOGAWA Hirofumi
At least, insn.c and inat.c is needed for kprobe for now. So, this compile those only if KPROBES is enabled. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <878wdg8icq.fsf@devron.myhome.or.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06x86, perf probe: Fix warning in test_get_len()Jean Delvare
Fix the following warning: arch/x86/tools/test_get_len.c: In function "main": arch/x86/tools/test_get_len.c:116: warning: unused variable "c" Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06x86: Fix typo in arch/x86/mm/kmmio.cShaun Patterson
Signed-off-by: Shaun Patterson <shaunpatterson@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: pq@iki.fi LKML-Reference: <1260027694.10074.170.camel@linux-4lgc.site> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06x86: Fixup wrong irq frame link in stacktracesFrederic Weisbecker
When we enter in irq, two things can happen to preserve the link to the previous frame pointer: - If we were in an irq already, we don't switch to the irq stack as we are inside. We just need to save the previous frame pointer and to link the new one to the previous. - Otherwise we need another level of indirection. We enter the irq with the previous stack. We save the previous bp inside and make bp pointing to its saved address. Then we switch to the irq stack and push bp another time but to the new stack. This makes two levels to dereference instead of one. In the second case, the current stacktrace code omits the second level and loses the frame pointer accuracy. The stack that follows will then be considered as unreliable. Handling that makes the perf callchain happier. Before: 43.94% [k] _raw_read_lock | --- _read_lock | |--60.53%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | synaptics_process_byte | psmouse_handle_byte | psmouse_interrupt | serio_interrupt | i8042_interrupt | handle_IRQ_event | handle_edge_irq | handle_irq | __irqentry_text_start | ret_from_intr | | | |--30.43%-- __select | | | |--17.39%-- 0x454f15 | | | |--13.04%-- __read | | | |--13.04%-- vread_hpet | | | |--13.04%-- _xcb_lock_io | | | --13.04%-- 0x7f630878ce8 After: 50.00% [k] _raw_read_lock | --- _read_lock | |--98.97%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | | | |--96.88%-- synaptics_process_byte | | psmouse_handle_byte | | psmouse_interrupt | | serio_interrupt | | i8042_interrupt | | handle_IRQ_event | | handle_edge_irq | | handle_irq | | __irqentry_text_start | | ret_from_intr | | | | | |--39.78%-- __const_udelay | | | | | | | |--91.89%-- ath5k_hw_register_timeout | | | | ath5k_hw_noise_floor_calibration | | | | ath5k_hw_reset | | | | ath5k_reset | | | | ath5k_config | | | | ieee80211_hw_config | | | | | | | | | |--88.24%-- ieee80211_scan_work | | | | | worker_thread | | | | | kthread | | | | | child_rip | | | | | | | | | --11.76%-- ieee80211_scan_completed | | | | ieee80211_scan_work | | | | worker_thread | | | | kthread | | | | child_rip | | | | | | | --8.11%-- ath5k_hw_noise_floor_calibration | | | ath5k_hw_reset | | | ath5k_reset | | | ath5k_config Note: This does not only affect perf events but also x86-64 stacktraces. They were considered as unreliable once we quit the irq stack frame. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
2009-12-06x86: Fixup wrong debug exception frame link in stacktracesFrederic Weisbecker
While dumping a stacktrace, the end of the exception stack won't link the frame pointer to the previous stack. The interrupted stack will then be considered as unreliable and ignored by perf, as the frame pointer is unreliable itself. This happens because we overwrite the frame pointer that links to the interrupted frame with the address of the exception stack. This is done in order to reserve space inside. But rbp has been chosen here only because it is not a scratch register, so that the address of the exception stack remains in rbp after calling do_debug(), we can then release the exception stack space without the need to retrieve its address again. But we can pick another non-scratch register to do that, so that we preserve the link to the interrupted stack frame in the stacktraces. Just randomly choose r12. Every registers are saved just before and restored just after calling do_debug(). And r12 is not used in the middle, which makes it a perfect candidate. Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw Before: 44.18% [k] _raw_read_lock | | --- |--6.31%-- waitid | |--4.26%-- writev | |--3.63%-- __select | |--3.15%-- __waitpid | | | |--28.57%-- 0x8b52e00000139f | | | |--28.57%-- 0x8b52e0000013c6 | | | |--14.29%-- 0x7fde786dc000 | | | |--14.29%-- 0x62696c2f7273752f | | | --14.29%-- 0x1ea9df800000000 | |--3.00%-- __poll After: 43.94% [k] _raw_read_lock | --- _read_lock | |--60.53%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | synaptics_process_byte | psmouse_handle_byte | psmouse_interrupt | serio_interrupt | i8042_interrupt | handle_IRQ_event | handle_edge_irq | handle_irq | __irqentry_text_start | ret_from_intr | | | |--30.43%-- __select | | | |--17.39%-- 0x454f15 | | | |--13.04%-- __read | | | |--13.04%-- vread_hpet | | | |--13.04%-- _xcb_lock_io | | | --13.04%-- 0x7f630878ce87 Note: it does not only affect perf events but also other stacktraces in x86-64. They were considered as unreliable once we quit the debug stack frame. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
2009-12-06x86/perf: Exclude the debug stack from the callchainsFrederic Weisbecker
Dumping the callchains from breakpoint events with perf gives strange results: 3.75% perf [kernel] [k] _raw_read_unlock | --- _raw_read_unlock perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_bp_event hw_breakpoint_exceptions_notify notifier_call_chain __atomic_notifier_call_chain atomic_notifier_call_chain notify_die do_debug debug munmap We are infected with all the debug stack. Like the nmi stack, the debug stack is undesired as it is part of the profiling path, not helpful for the user. Ignore it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-06hw-breakpoints: Use overflow handler instead of the event callbackFrederic Weisbecker
struct perf_event::event callback was called when a breakpoint triggers. But this is a rather opaque callback, pretty tied-only to the breakpoint API and not really integrated into perf as it triggers even when we don't overflow. We prefer to use overflow_handler() as it fits into the perf events rules, being called only when we overflow. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-06hw-breakpoints: Drop callback and task parameters from modify helperFrederic Weisbecker
Drop the callback and task parameters from modify_user_hw_breakpoint(). For now we have no user that need to modify a breakpoint to the point of changing its handler or its task context. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-05Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Limit number of per cpu TSC sync messages x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes x86: Suppress stack overrun message for init_task x86: Fix cpu_devs[] initialization in early_cpu_init() x86: Remove CPU cache size output for non-Intel too x86: Minimise printk spew from per-vendor init code x86: Remove the CPU cache size printk's cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c x86: Make sure we also print a Code: line for show_regs()
2009-12-05Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t x86, cpuid: Simplify the code in cpuid_open x86, cpuid: Remove the bkl from cpuid_open() x86, msr: Remove the bkl from msr_open() x86: AMD Geode LX optimizations x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
2009-12-05Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix a section mismatch in arch/x86/kernel/setup.c x86: Fixup last users of irq_chip->typename x86: Remove BKL from apm_32 x86: Remove BKL from microcode x86: use kernel_stack_pointer() in kprobes.c x86: use kernel_stack_pointer() in kgdb.c x86: use kernel_stack_pointer() in dumpstack.c x86: use kernel_stack_pointer() in process_32.c
2009-12-05Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h x86/alternatives: Check replacementlen <= instrlen at build time x86, 64-bit: Set data segments to null after switching to 64-bit mode x86: Clean up the loadsegment() macro x86: Optimize loadsegment() x86: Add missing might_fault() checks to copy_{to,from}_user() x86-64: __copy_from_user_inatomic() adjustments x86: Remove unused thread_return label from switch_to() x86, 64-bit: Fix bstep_iret jump x86: Don't use the strict copy checks when branch profiling is in use x86, 64-bit: Move K8 B step iret fixup to fault entry asm x86: Generate cmpxchg build failures x86: Add a Kconfig option to turn the copy_from_user warnings into errors x86: Turn the copy_from_user check into an (optional) compile time warning x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
2009-12-05Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) x86, apic: Enable lapic nmi watchdog on AMD Family 11h x86: Remove unnecessary mdelay() from cpu_disable_common() x86, ioapic: Document another case when level irq is seen as an edge x86, ioapic: Fix the EOI register detection mechanism x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq x86: SGI UV: Map low MMR ranges x86: apic: Print out SRAT table APIC id in hex x86: Re-get cfg_new in case reuse/move irq_desc x86: apic: Remove not needed #ifdef x86: io-apic: IO-APIC MMIO should not fail on resource insertion x86: Remove asm/apicnum.h x86: apic: Do not use stacked physid_mask_t x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit x86, ioapic: Use snrpintf while set names for IO-APIC resourses x86, apic: Use PAGE_SIZE instead of numbers x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs() x86: Use EOI register in io-apic on intel platforms x86: Force irq complete move during cpu offline x86: Remove move_cleanup_count from irq_cfg x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping ...
2009-12-05Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits) x86: Fix comments of register/stack access functions perf tools: Replace %m with %a in sscanf hw-breakpoints: Keep track of user disabled breakpoints tracing/syscalls: Make syscall events print callbacks static tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook perf: Don't free perf_mmap_data until work has been done perf_event: Fix compile error perf tools: Fix _GNU_SOURCE macro related strndup() build error trace_syscalls: Remove unused syscall_name_to_nr() trace_syscalls: Simplify syscall profile trace_syscalls: Remove duplicate init_enter_##sname() trace_syscalls: Add syscall_nr field to struct syscall_metadata trace_syscalls: Remove enter_id exit_id trace_syscalls: Set event_enter_##sname->data to its metadata trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit perf_event: Initialize data.period in perf_swevent_hrtimer() perf probe: Simplify event naming perf probe: Add --list option for listing current probe events perf probe: Add argv_split() from lib/argv_split.c perf probe: Move probe event utility functions to probe-event.c ...
2009-12-05Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
Conflicts: drivers/net/pcmcia/fmvj18x_cs.c drivers/net/pcmcia/nmclan_cs.c drivers/net/pcmcia/xirc2ps_cs.c drivers/net/wireless/ray_cs.c
2009-12-05Merge branch 'tracing-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits) tracing: Separate raw syscall from syscall tracer ring-buffer-benchmark: Add parameters to set produce/consumer priorities tracing, function tracer: Clean up strstrip() usage ring-buffer benchmark: Run producer/consumer threads at nice +19 tracing: Remove the stale include/trace/power.h tracing: Only print objcopy version warning once from recordmcount tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used ring-buffer: Move access to commit_page up into function used tracing: do not disable interrupts for trace_clock_local ring-buffer: Add multiple iterations between benchmark timestamps kprobes: Sanitize struct kretprobe_instance allocations tracing: Fix to use __always_unused attribute compiler: Introduce __always_unused tracing: Exit with error if a weak function is used in recordmcount.pl tracing: Move conditional into update_funcs() in recordmcount.pl tracing: Add regex for weak functions in recordmcount.pl tracing: Move mcount section search to front of loop in recordmcount.pl tracing: Fix objcopy revision check in recordmcount.pl tracing: Check absolute path of input file in recordmcount.pl tracing: Correct the check for number of arguments in recordmcount.pl ...
2009-12-05Merge branch 'core-iommu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits) x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree x86/amd-iommu: Remove amd_iommu_pd_table x86/amd-iommu: Move reset_iommu_command_buffer out of locked code x86/amd-iommu: Cleanup DTE flushing code x86/amd-iommu: Introduce iommu_flush_device() function x86/amd-iommu: Cleanup attach/detach_device code x86/amd-iommu: Keep devices per domain in a list x86/amd-iommu: Add device bind reference counting x86/amd-iommu: Use dev->arch->iommu to store iommu related information x86/amd-iommu: Remove support for domain sharing x86/amd-iommu: Rearrange dma_ops related functions x86/amd-iommu: Move some pte allocation functions in the right section x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc x86/amd-iommu: Use get_device_id and check_device where appropriate x86/amd-iommu: Move find_protection_domain to helper functions x86/amd-iommu: Simplify get_device_resources() x86/amd-iommu: Let domain_for_device handle aliases x86/amd-iommu: Remove iommu specific handling from dma_ops path x86/amd-iommu: Remove iommu parameter from __(un)map_single x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs ...
2009-12-05x86: Convert BUG() to use unreachable()David Daney
Use the new unreachable() macro instead of for(;;);. When allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the text segment is reduced by 3987 bytes (from 6827019 to 6823032). Signed-off-by: David Daney <ddaney@caviumnetworks.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: x86@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-04x86: ASUS P4S800 reboot=bios quirkLeann Ogasawara
Bug reporter noted their system with an ASUS P4S800 motherboard would hang when rebooting unless reboot=b was specified. Their dmidecode didn't contain descriptive System Information for Manufacturer or Product Name, so I used their Base Board Information to create a reboot quirk patch. The bug reporter confirmed this patch resolves the reboot hang. Handle 0x0001, DMI type 1, 25 bytes System Information Manufacturer: System Manufacturer Product Name: System Name Version: System Version Serial Number: SYS-1234567890 UUID: E0BFCD8B-7948-D911-A953-E486B4EEB67F Wake-up Type: Power Switch Handle 0x0002, DMI type 2, 8 bytes Base Board Information Manufacturer: ASUSTeK Computer INC. Product Name: P4S800 Version: REV 1.xx Serial Number: xxxxxxxxxxx BugLink: http://bugs.launchpad.net/bugs/366682 ASUS P4S800 will hang when rebooting unless reboot=b is specified. Add a quirk to reboot through the bios. Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> LKML-Reference: <1259972107.4629.275.camel@emiko> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>
2009-12-04PCI: add pci_request_acsChris Wright
Commit ae21ee65e8bc228416bbcc8a1da01c56a847a60c "PCI: acs p2p upsteram forwarding enabling" doesn't actually enable ACS. Add a function to pci core to allow an IOMMU to request that ACS be enabled. The existing mechanism of using iommu_found() in the pci core to know when ACS should be enabled doesn't actually work due to initialization order; iommu has only been detected not initialized. Have Intel and AMD IOMMUs request ACS, and Xen does as well during early init of dom0. Cc: Allen Kay <allen.m.kay@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04x86/PCI: claim SR-IOV BARs in pcibios_allocate_resourceYinghai Lu
This allows us to use the BIOS SR-IOV allocations rather than assigning our own later on. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04tree-wide: fix misspelling of "definition" in commentsAdam Buchbinder
"Definition" is misspelled "defintion" in several comments; this patch fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-03xen: call clock resume notifier on all CPUsIan Campbell
tick_resume() is never called on secondary processors. Presumably this is because they are offlined for suspend on native and so this is normally taken care of in the CPU onlining path. Under Xen we keep all CPUs online over a suspend. This patch papers over the issue for me but I will investigate a more generic, less hacky, way of doing to the same. tick_suspend is also only called on the boot CPU which I presume should be fixed too. Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2009-12-03xen: use iret for return from 64b kernel to 32b usermodeJeremy Fitzhardinge
If Xen wants to return to a 32b usermode with sysret it must use the right form. When using VCGF_in_syscall to trigger this, it looks at the code segment and does a 32b sysret if it is FLAT_USER_CS32. However, this is different from __USER32_CS, so it fails to return properly if we use the normal Linux segment. So avoid the whole mess by dropping VCGF_in_syscall and simply use plain iret to return to usermode. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Jan Beulich <jbeulich@novell.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: register runstate info for boot CPU earlyJeremy Fitzhardinge
printk timestamping uses sched_clock, which in turn relies on runstate info under Xen. So make sure we set it up before any printks can be called. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: register runstate on secondary CPUsIan Campbell
The commit "xen: re-register runstate area earlier on resume" caused us to never try and setup the runstate area for secondary CPUs. Ensure that we do this... Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: register timer interrupt with IRQF_TIMERIan Campbell
Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents correct operation of stop_machine on multi-processor systems and breaks suspend. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: correctly restore pfn_to_mfn_list_list after resumeIan Campbell
pvops kernels >= 2.6.30 can currently only be saved and restored once. The second attempt to save results in: ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000 ERROR Internal error: Failed to map/save the p2m frame list I finally narrowed it down to: commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Fri Feb 27 15:34:59 2009 -0800 xen: split construction of p2m mfn tables from registration Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> The unforeseen side-effect of this change was to cause the mfn list list to not be rebuilt on resume. Prior to this change it would have been rebuilt via xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list(). Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend(). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: restore runstate_info even if !have_vcpu_info_placementJeremy Fitzhardinge
Even if have_vcpu_info_placement is not set, we still need to set up the runstate area on each resumed vcpu. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03xen: re-register runstate area earlier on resume.Ian Campbell
This is necessary to ensure the runstate area is available to xen_sched_clock before any calls to printk which will require it in order to provide a timestamp. I chose to pull the xen_setup_runstate_info out of xen_time_init into the caller in order to maintain parity with calling xen_setup_runstate_info separately from calling xen_time_resume. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
2009-12-03Merge branch 'perf/probes' into perf/coreIngo Molnar
Merge reason: add these fixes to 'perf probe'. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03Merge branch 'perf/mce' into perf/coreIngo Molnar
Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03x86, apic: Enable lapic nmi watchdog on AMD Family 11hMikael Pettersson
The x86 lapic nmi watchdog does not recognize AMD Family 11h, resulting in: NMI watchdog: CPU not supported As far as I can see from available documentation (the BKDM), family 11h looks identical to family 10h as far as the PMU is concerned. Extending the check to accept family 11h results in: Testing NMI watchdog ... OK. I've been running with this change on a Turion X2 Ultra ZM-82 laptop for a couple of weeks now without problems. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: <stable@kernel.org> LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03Merge branch 'master' into for-2.6.33Jens Axboe
2009-12-03x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistencyXiaotian Feng
pci_get_device will increase the ref count of found device. Although we're going to reset soon, we should use pci_dev_put to decrease the ref count for consistency. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1259838400-23833-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the ↵Darrick J. Wong
PCI tree On a multi-node x3950M2 system, there's a slight oddity in the PCI device tree for all secondary nodes: 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) ...as compared to the primary node: 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) In both nodes, the LSI RAID controller hangs off a CalIOC2 device, but on the secondary nodes, the BIOS hides the VGA device and substitutes the device tree ending with the disk controller. It would seem that Calgary devices don't necessarily appear at the top of the PCI tree, which means that the current code to find the Calgary IOMMU that goes with a particular device is buggy. Rather than walk all the way to the top of the PCI device tree and try to match bus number with Calgary descriptor, the code needs to examine each parent of the particular device; if it encounters a Calgary with a matching bus number, simply use that. Otherwise, we BUG() when the bus number of the Calgary doesn't match the bus number of whatever's at the top of the device tree. Extra note: This patch appears to work correctly for the x3950 that came before the x3950 M2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Corinna Schultz <coschult@us.ibm.com> Cc: <stable@kernel.org> LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03KVM: VMX: Fix comparison of guest efer with stale host valueAvi Kivity
update_transition_efer() masks out some efer bits when deciding whether to switch the msr during guest entry; for example, NX is emulated using the mmu so we don't need to disable it, and LMA/LME are handled by the hardware. However, with shared msrs, the comparison is made against a stale value; at the time of the guest switch we may be running with another guest's efer. Fix by deferring the mask/compare to the actual point of guest entry. Noted by Marcelo. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Drop user return notifier when disabling virtualization on a cpuAvi Kivity
This way, we don't leave a dangling notifier on cpu hotunplug or module unload. In particular, module unload leaves the notifier pointing into freed memory. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: VMX: Disable unrestricted guest when EPT disabledSheng Yang
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: x86 emulator: limit instructions to 15 bytesAvi Kivity
While we are never normally passed an instruction that exceeds 15 bytes, smp games can cause us to attempt to interpret one, which will cause large latencies in non-preempt hosts. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86: Add KVM_GET/SET_VCPU_EVENTSJan Kiszka
This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: VMX: Report unexpected simultaneous exceptions as internal errorsAvi Kivity
These happen when we trap an exception when another exception is being delivered; we only expect these with MCEs and page faults. If something unexpected happens, things probably went south and we're better off reporting an internal error and freezing. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: Allow internal errors reported to userspace to carry extra dataAvi Kivity
Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUGJan Kiszka
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance, avoid triggering the WARN_ON in kvm_queue_exception if there is already an exception pending and reject such invalid requests. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapicMarcelo Tosatti
Otherwise kvm might attempt to dereference a NULL pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>