summaryrefslogtreecommitdiffstats
path: root/fs/proc/task_mmu.c
AgeCommit message (Collapse)Author
2010-05-11revert "procfs: provide stack information for threads" and its fixup commitsRobin Holt
Originally, commit d899bf7b ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit 1306d603f ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted c44972f1 ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07pagemap: fix pfn calculation for hugepageNaoya Horiguchi
When we look into pagemap using page-types with option -p, the value of pfn for hugepages looks wrong (see below.) This is because pte was evaluated only once for one vma although it should be updated for each hugepage. This patch fixes it. $ page-types -p 3277 -Nl -b huge voffset offset len flags 7f21e8a00 11e400 1 ___U___________H_G________________ 7f21e8a01 11e401 1ff ________________TG________________ ^^^ 7f21e8c00 11e400 1 ___U___________H_G________________ 7f21e8c01 11e401 1ff ________________TG________________ ^^^ One hugepage contains 1 head page and 511 tail pages in x86_64 and each two lines represent each hugepage. Voffset and offset mean virtual address and physical address in the page unit, respectively. The different hugepages should not have the same offset value. With this patch applied: $ page-types -p 3386 -Nl -b huge voffset offset len flags 7fec7a600 112c00 1 ___UD__________H_G________________ 7fec7a601 112c01 1ff ________________TG________________ ^^^ 7fec7a800 113200 1 ___UD__________H_G________________ 7fec7a801 113201 1ff ________________TG________________ ^^^ OK More info: - This patch modifies walk_page_range()'s hugepage walker. But the change only affects pagemap_read(), which is the only caller of hugepage callback. - Without this patch, hugetlb_entry() callback is called per vma, that doesn't match the natural expectation from its name. - With this patch, hugetlb_entry() is called per hugepte entry and the callback can become much simpler. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-06proc: copy_to_user() returns unsignedDan Carpenter
copy_to_user() returns the number of bytes left to be copied. This was a typo from: d82ef020cf31 "proc: pagemap: Hold mmap_sem during page walk". Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-05Merge branch 'master' into export-slabhTejun Heo
2010-04-04proc: pagemap: Hold mmap_sem during page walkKAMEZAWA Hiroyuki
In initial design, walk_page_range() was designed just for walking page table and it didn't require mmap_sem. Now, find_vma() etc.. are used in walk_page_range() and we need mmap_sem around it. This patch adds mmap_sem around walk_page_range(). Because /proc/<pid>/pagemap's callback routine use put_user(), we have to get rid of it to do sane fix. Changelog: 2010/Apr/2 - fixed start_vaddr and end overflow Changelog: 2010/Apr/1 - fixed start_vaddr calculation - removed unnecessary cast. - removed unnecessary change in smaps. - use GFP_TEMPORARY instead of GFP_KERNEL Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: San Mehat <san@google.com> Cc: Brian Swetland <swetland@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> [ Fixed kmalloc failure return code as per Matt ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-06mm: count swap usageKAMEZAWA Hiroyuki
A frequent questions from users about memory management is what numbers of swap ents are user for processes. And this information will give some hints to oom-killer. Besides we can count the number of swapents per a process by scanning /proc/<pid>/smaps, this is very slow and not good for usual process information handler which works like 'ps' or 'top'. (ps or top is now enough slow..) This patch adds a counter of swapents to mm_counter and update is at each swap events. Information is exported via /proc/<pid>/status file as [kamezawa@bluextal memory]$ cat /proc/self/status Name: cat State: R (running) Tgid: 2910 Pid: 2910 PPid: 2823 TracerPid: 0 Uid: 500 500 500 500 Gid: 500 500 500 500 FDSize: 256 Groups: 500 VmPeak: 82696 kB VmSize: 82696 kB VmLck: 0 kB VmHWM: 432 kB VmRSS: 432 kB VmData: 172 kB VmStk: 84 kB VmExe: 48 kB VmLib: 1568 kB VmPTE: 40 kB VmSwap: 0 kB <=============== this. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06mm: clean up mm_counterKAMEZAWA Hiroyuki
Presently, per-mm statistics counter is defined by macro in sched.h This patch modifies it to - defined in mm.h as inlinf functions - use array instead of macro's name creation. This patch is for reducing patch size in future patch to modify implementation of per-mm counter. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11smaps: fix wrong rss countMinchan Kim
A long time ago we regarded zero page as file_rss and vm_normal_page doesn't return NULL. But now, we reinstated ZERO_PAGE and vm_normal_page's implementation can return NULL in case of zero page. Also we don't count it with file_rss any more. Then, RSS and PSS can't be matched. For consistency, Let's ignore zero page in smaps_pte_range. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Matt Mackall <mpm@selenic.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15mm hugetlb: add hugepage support to pagemapNaoya Horiguchi
This patch enables extraction of the pfn of a hugepage from /proc/pid/pagemap in an architecture independent manner. Details ------- My test program (leak_pagemap) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call page-types with option -p, - munmap() and unlink() the file on hugetlbfs Without my patches ------------------ $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 101 0 The output of page-types don't show any hugepage. With my patches --------------- $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge 0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap 0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 51300 200 The output of page-types shows 51200 pages contributing to hugepages, containing 100 head pages and 51100 tail pages as expected. [akpm@linux-foundation.org: build fix] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23procfs: provide stack information for threadsStefani Seibold
A patch to give a better overview of the userland application stack usage, especially for embedded linux. Currently you are only able to dump the main process/thread stack usage which is showed in /proc/pid/status by the "VmStk" Value. But you get no information about the consumed stack memory of the the threads. There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which marks the vm mapping where the thread stack pointer reside with "[thread stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a value information, because libpthread doesn't set the start of the stack to the top of the mapped area, depending of the pthread usage. A sample output of /proc/<pid>/task/<tid>/maps looks like: 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z 0804a000-0806b000 rw-p 00000000 00:00 0 [heap] a7d12000-a7d13000 ---p 00000000 00:00 0 a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4] a7f13000-a7f14000 ---p 00000000 00:00 0 a7f14000-a7f36000 rw-p 00000000 00:00 0 a7f36000-a8069000 r-xp 00000000 03:00 4222 /lib/libc.so.6 a8069000-a806b000 r--p 00133000 03:00 4222 /lib/libc.so.6 a806b000-a806c000 rw-p 00135000 03:00 4222 /lib/libc.so.6 a806c000-a806f000 rw-p 00000000 00:00 0 a806f000-a8083000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0 a8083000-a8084000 r--p 00013000 03:00 14462 /lib/libpthread.so.0 a8084000-a8085000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0 a8085000-a8088000 rw-p 00000000 00:00 0 a8088000-a80a4000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 a80a4000-a80a5000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 a80a5000-a80a6000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 afaf5000-afb0a000 rw-p 00000000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status which will you give the current stack usage in kb. A sample output of /proc/self/status looks like: Name: cat State: R (running) Tgid: 507 Pid: 507 . . . CapBnd: fffffffffffffeff voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 0 Stack usage: 12 kB I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the base address of the associated thread stack and not the one of the main process. This makes more sense. [akpm@linux-foundation.org: fs/proc/array.c now needs walk_page_range()] Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23fs/proc/task_mmu.c v1: fix clear_refs_write() input sanity checkVincent Li
Andrew Morton pointed out similar string hacking and obfuscated check for zero-length input at the end of the function, David Rientjes suggested to use strict_strtol to replace simple_strtol, this patch cover above suggestions, add removing of leading and trailing whitespace from user input. It does not change function behavious. Signed-off-by: Vincent Li <macli@brc.ubc.ca> Acked-by: David Rientjes <rientjes@google.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Amerigo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22pagemap clear_refs: modify to specify anon or mapped vma clearingMoussa A. Ba
The patch makes the clear_refs more versatile in adding the option to select anonymous pages or file backed pages for clearing. This addition has a measurable impact on user space application performance as it decreases the number of pagewalks in scenarios where one is only interested in a specific type of page (anonymous or file mapped). The patch adds anonymous and file backed filters to the clear_refs interface. echo 1 > /proc/PID/clear_refs resets the bits on all pages echo 2 > /proc/PID/clear_refs resets the bits on anonymous pages only echo 3 > /proc/PID/clear_refs resets the bits on file backed pages only Any other value is ignored Signed-off-by: Moussa A. Ba <moussa.a.ba@gmail.com> Signed-off-by: Jared E. Hulbert <jaredeh@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-10mm_for_maps: shift down_read(mmap_sem) to the callerOleg Nesterov
mm_for_maps() takes ->mmap_sem after security checks, this looks strange and obfuscates the locking rules. Move this lock to its single caller, m_start(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-05-02pagemap: require aligned-length, non-null reads of /proc/pid/pagemapVitaly Mayatskikh
The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915 ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a multiple of 8 bytes, but now it allows to read 0 bytes, which actually puts some data to user's buffer. According to POSIX, if count is zero, read() should return zero and has no other results. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Thomas Tuttle <ttuttle@google.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07/proc/pid/maps: don't show pgoff of pure ANON VMAsKAMEZAWA Hiroyuki
Recently, it's argued that what proc/pid/maps shows is ugly when a 32bit binary runs on 64bit host. /proc/pid/maps outputs vma's pgoff member but vma->pgoff is of no use information is the vma is for ANON. With this patch, /proc/pid/maps shows just 0 if no file backing store. [akpm@linux-foundation.org: coding-style fixes] [kamezawa.hiroyu@jp.fujitsu.com: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mike Waychison <mikew@google.com> Reported-by: Ying Han <yinghan@google.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-31proc: fix sparse warnings in pagemap_read()Milind Arun Choudhary
fs/proc/task_mmu.c:696:12: warning: cast removes address space of expression fs/proc/task_mmu.c:696:9: warning: incorrect type in assignment (different address spaces) fs/proc/task_mmu.c:696:9: expected unsigned long long [noderef] [usertype] <asn:1>*out fs/proc/task_mmu.c:696:9: got unsigned long long [usertype] *<noident> fs/proc/task_mmu.c:697:12: warning: cast removes address space of expression fs/proc/task_mmu.c:697:9: warning: incorrect type in assignment (different address spaces) fs/proc/task_mmu.c:697:9: expected unsigned long long [noderef] [usertype] <asn:1>*end fs/proc/task_mmu.c:697:9: got unsigned long long [usertype] *<noident> fs/proc/task_mmu.c:723:12: warning: cast removes address space of expression fs/proc/task_mmu.c:723:26: error: subtraction of different types can't work (different address spaces) fs/proc/task_mmu.c:725:24: error: subtraction of different types can't work (different address spaces) Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-06mm: report the MMU pagesize in /proc/pid/smapsMel Gorman
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: report the pagesize backing a VMA in /proc/pid/smapsMel Gorman
It is useful to verify a hugepage-aware application is using the expected pagesizes for its memory regions. This patch creates an entry called KernelPageSize in /proc/pid/smaps that is the size of page used by the kernel to back a VMA. The entry is not called PageSize as it is possible the MMU uses a different size. This extension should not break any sensible parser that skips lines containing unrecognised information. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10pagemap: fix 32-bit pagemap regressionMatt Mackall
The large pages fix from bcf8039ed45 broke 32-bit pagemap by pulling the pagemap entry code out into a function with the wrong return type. Pagemap entries are 64 bits on all systems and unsigned long is only 32 bits on 32-bit systems. Signed-off-by: Matt Mackall <mpm@selenic.com> Reported-by: Doug Graham <dgraham@nortel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23proc: fix vma display mismatch between /proc/pid/{maps,smaps}Joe Korty
Commit 4752c369789250eafcd7813e11c8fb689235b0d2 aka "maps4: simplify interdependence of maps and smaps" broke /proc/pid/smaps, causing it to display some vmas twice and other vmas not at all. For example: grep .- /proc/1/smaps >/tmp/smaps; diff /proc/1/maps /tmp/smaps 1 25d24 2 < 7fd7e23aa000-7fd7e23ac000 rw-p 7fd7e23aa000 00:00 0 3 28a28 4 > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] The bug has something to do with setting m->version before all the seq_printf's have been performed. show_map was doing this correctly, but show_smap was doing this in the middle of its seq_printf sequence. This patch arranges things so that the setting of m->version in show_smap is also done at the end of its seq_printf sequence. Testing: in addition to the above grep test, for each process I summed up the 'Rss' fields of /proc/pid/smaps and compared that to the 'VmRSS' field of /proc/pid/status. All matched except for Xorg (which has a /dev/mem mapping which Rss accounts for but VmRSS does not). This result gives us some confidence that neither /proc/pid/maps nor /proc/pid/smaps are any longer skipping or double-counting vmas. Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-10proc: remove kernel.maps_protectAlexey Dobriyan
After commit 831830b5a2b5d413407adf380ef62fe17d6fcbf2 aka "restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace" sysctl stopped being relevant because commit moved security checks from ->show time to ->start time (mm_for_maps()). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <kees.cook@canonical.com>
2008-08-20/proc/self/maps doesn't display the real file offsetClement Calmels
This addresses http://bugzilla.kernel.org/show_bug.cgi?id=11318 In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20 than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE equal to 4096 (i.e. 2^12). The next seq_printf use an unsigned long for the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset value displayed in /proc/self/maps is truncated if the page offset is greater than 2^20. A test that shows this issue: #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <stdlib.h> #include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #define PAGE_SIZE (getpagesize()) #if __i386__ # define U64_STR "%llx" #elif __x86_64 # define U64_STR "%lx" #else # error "Architecture Unsupported" #endif int main(int argc, char *argv[]) { int fd; char *addr; off64_t offset = 0x10000000; char *filename = "/dev/zero"; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return 1; } offset *= 0x10; printf("offset = " U64_STR "\n", offset); addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, offset); if ((void*)addr == MAP_FAILED) { perror("mmap64"); return 1; } { FILE *fmaps; char *line = NULL; size_t len = 0; ssize_t read; size_t filename_len = strlen(filename); fmaps = fopen("/proc/self/maps", "r"); if (!fmaps) { perror("fopen"); return 1; } while ((read = getline(&line, &len, fmaps)) != -1) { if ((read > filename_len + 1) && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0)) printf("%s", line); } if (line) free(line); fclose(fmaps); } close(fd); return 0; } [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Clement Calmels <cboulte@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22proc: fix /proc/*/pagemap some moreAlexey Dobriyan
struct pagemap_walk was placed on stack, some hooks are initialized, the rest (->pgd_entry, ->pud_entry, ->pte_entry) are valid but junk. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-14Security: split proc ptrace checking into read vs. attachStephen Smalley
Enable security modules to distinguish reading of process state via proc from full ptrace access by renaming ptrace_may_attach to ptrace_may_access and adding a mode argument indicating whether only read access or full attach access is requested. This allows security modules to permit access to reading process state without granting full ptrace access. The base DAC/capability checking remains unchanged. Read access to /proc/pid/mem continues to apply a full ptrace attach check since check_mem_permission() already requires the current task to already be ptracing the target. The other ptrace checks within proc for elements like environ, maps, and fds are changed to pass the read mode instead of attach. In the SELinux case, we model such reading of process state as a reading of a proc file labeled with the target process' label. This enables SELinux policy to permit such reading of process state without permitting control or manipulation of the target process, as there are a number of cases where programs probe for such information via proc but do not need to be able to control the target (e.g. procps, lsof, PolicyKit, ConsoleKit). At present we have to choose between allowing full ptrace in policy (more permissive than required/desired) or breaking functionality (or in some cases just silencing the denials via dontaudit rules but this can hide genuine attacks). This version of the patch incorporates comments from Casey Schaufler (change/replace existing ptrace_may_attach interface, pass access mode), and Chris Wright (provide greater consistency in the checking). Note that like their predecessors __ptrace_may_attach and ptrace_may_attach, the __ptrace_may_access and ptrace_may_access interfaces use different return value conventions from each other (0 or -errno vs. 1 or 0). I retained this difference to avoid any changes to the caller logic but made the difference clearer by changing the latter interface to return a bool rather than an int and by adding a comment about it to ptrace.h for any future callers. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-07-05Fix pagemap_read() use of struct mm_walkAndrew Morton
Fix some issues in pagemap_read noted by Alexey: - initialize pagemap_walk.mm to "mm" , so the code starts working as advertised - initialize ->private to "&pm" so it wouldn't immediately oops in pagemap_pte_hole() - unstatic struct pagemap_walk, so two threads won't fsckup each other (including those started by root, including flipping ->mm when you don't have permissions) - pagemap_read() contains two calls to ptrace_may_attach(), second one looks unneeded. - avoid possible kmalloc(0) and integer wraparound. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Personally, I'd just remove the functionality entirely - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-05Fix clear_refs_write() use of struct mm_walkAndrew Morton
Don't use a static entry, so as to prevent races during concurrent use of this function. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12pagemap: fix large pages in pagemapDave Hansen
We were walking right into huge page areas in the pagemap walker, and calling the pmds pmd_bad() and clearing them. That leaked huge pages. Bad. This patch at least works around that for now. It ignores huge pages in the pagemap walker for the time being, and won't leak those pages. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12pagemap: pass mm into pagewalkersDave Hansen
We need this at least for huge page detection for now, because powerpc needs the vm_area_struct to be able to determine whether a virtual address is referring to a huge page (its pmd_huge() doesn't work). It might also come in handy for some of the other users. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-06pagemap: fix bug in add_to_pagemap, require aligned-length reads of ↵Thomas Tuttle
/proc/pid/pagemap Fix a bug in add_to_pagemap. Previously, since pm->out was a char *, put_user was only copying 1 byte of every PFN, resulting in the top 7 bytes of each PFN not being copied. By requiring that reads be a multiple of 8 bytes, I can make pm->out and pm->end u64*s instead of char*s, which makes put_user work properly, and also simplifies the logic in add_to_pagemap a bit. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Thomas Tuttle <ttuttle@google.com> Cc: Matt Mackall <mpm@selenic.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-08fs/proc/task_mmu.c: remove duplicated include filesHuang Weiyi
Removed duplicated include files <linux/ptrace.h> and <linux/seq_file.h> in fs/proc/task_mmu.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29procfs task exe symlinkMatt Helsley
The kernel implements readlink of /proc/pid/exe by getting the file from the first executable VMA. Then the path to the file is reconstructed and reported as the result. Because of the VMA walk the code is slightly different on nommu systems. This patch avoids separate /proc/pid/exe code on nommu systems. Instead of walking the VMAs to find the first executable file-backed VMA we store a reference to the exec'd file in the mm_struct. That reference would prevent the filesystem holding the executable file from being unmounted even after unmapping the VMAs. So we track the number of VM_EXECUTABLE VMAs and drop the new reference when the last one is unmapped. This avoids pinning the mounted filesystem. [akpm@linux-foundation.org: improve comments] [yamamoto@valinux.co.jp: fix dup_mmap] Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com> Cc:"Eric W. Biederman" <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28smaps: account swap entriesPeter Zijlstra
Show the amount of swap for each vma. This can be used to see where all the swap goes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28make swap_pte_to_pagemap_entry() staticAdrian Bunk
Make the needlessly global swap_pte_to_pagemap_entry() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-22Change pagemap output format to allow for future reporting of huge pagesHans Rosenfeld
Change pagemap output format to allow for future reporting of huge pages. (Format comment and minor cleanups: mpm@selenic.com) Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-13pagemap: proper read error handlingMarcelo Tosatti
Fix pagemap_read() error handling by releasing acquired resources and checking for get_user_pages() partial failure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23/proc/pid/pagemap: fix PM_SPECIAL macroHans Rosenfeld
There seems to be a bug in the PM_SPECIAL macro for /proc/pid/pagemap. I think masking out those other bits makes more sense then setting all those mask bits. Signed-off-by: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14d_path: Make seq_path() use a struct path argumentJan Blunck
seq_path() is always called with a dentry and a vfsmount from a struct path. Make seq_path() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14d_path: Make proc_get_link() use a struct path argumentJan Blunck
proc_get_link() is always called with a dentry and a vfsmount from a struct path. Make proc_get_link() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08procfs: constify function pointer tablesJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Acked-By: David Howells <dhowells@redhat.com> Acked-by: Bryan Wu <bryan.wu@analog.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: seqfile convert proc_pid_status to properly handle pid namespacesEric W. Biederman
Currently we possibly lookup the pid in the wrong pid namespace. So seq_file convert proc_pid_status which ensures the proper pid namespaces is passed in. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: another build fix] [akpm@linux-foundation.org: s390 build fix] [akpm@linux-foundation.org: fix task_name() output] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: make page monitoring /proc file optionalMatt Mackall
Make /proc/ page monitoring configurable This puts the following files under an embedded config option: /proc/pid/clear_refs /proc/pid/smaps /proc/pid/pagemap /proc/kpagecount /proc/kpageflags [akpm@linux-foundation.org: Kconfig fix] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: add /proc/pid/pagemap interfaceMatt Mackall
This interface provides a mapping for each page in an address space to its physical page frame number, allowing precise determination of what pages are mapped and what pages are shared between processes. New in this version: - headers gone again (as recommended by Dave Hansen and Alan Cox) - 64-bit entries (as per discussion with Andi Kleen) - swap pte information exported (from Dave Hansen) - page walker callback for holes (from Dave Hansen) - direct put_user I/O (as suggested by Rusty Russell) This patch folds in cleanups and swap PTE support from Dave Hansen <haveblue@us.ibm.com>. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: regroup task_mmu by interfaceMatt Mackall
Reorder source so that all the code and data for each interface is together. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: move clear_refs code to task_mmu.cMatt Mackall
This puts all the clear_refs code where it belongs and probably lets things compile on MMU-less systems as well. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: simplify interdependence of maps and smapsMatt Mackall
This pulls the shared map display code out of show_map and puts it in show_smap where it belongs. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: use pagewalker in clear_refs and smapsMatt Mackall
Use the generic pagewalker for smaps and clear_refs Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: add proportional set size accounting in smapsFengguang Wu
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500. - lwn.net: "ELC: How much memory are applications really using?" The PSS proposed by Matt Mackall is a very nice metic for measuring an process's memory footprint. So collect and export it via /proc/<pid>/smaps. Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the job. They are comprehensive tools. But for PSS, let's do it in the simple way. Cc: John Berthels <jjberthels@gmail.com> Cc: Bernardo Innocenti <bernie@codewiz.org> Cc: Padraig Brady <P@draigBrady.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Hugh Dickins <hugh@veritas.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace pidAl Viro
Contents of /proc/*/maps is sensitive and may become sensitive after open() (e.g. if target originally shares our ->mm and later does exec on suid-root binary). Check at read() (actually, ->start() of iterator) time that mm_struct we'd grabbed and locked is - still the ->mm of target - equal to reader's ->mm or the target is ptracable by reader. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08proc: maps protectionKees Cook
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive information about the memory location and usage of processes. Issues: - maps should not be world-readable, especially if programs expect any kind of ASLR protection from local attackers. - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc check the maps when %n is in a *printf call, and a setuid(getuid()) process wouldn't be able to read its own maps file. (For reference see http://lkml.org/lkml/2006/1/22/150) - a system-wide toggle is needed to allow prior behavior in the case of non-root applications that depend on access to the maps contents. This change implements a check using "ptrace_may_attach" before allowing access to read the maps contents. To control this protection, the new knob /proc/sys/kernel/maps_protect has been added, with corresponding updates to the procfs documentation. [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: New sysctl numbers are old hat] Signed-off-by: Kees Cook <kees@outflux.net> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>