aboutsummaryrefslogtreecommitdiffstats
path: root/mm
AgeCommit message (Collapse)Author
2024-04-15Merge tag 'v5.15.155' into v5.15/standard/baseBruce Ashfield
This is the 5.15.155 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYaZkEACgkQONu9yGCS # aT7WPQ//UOemmbVF8Xpa/B1YJ0SRX03M+Q2wD1sLmhVrQjjBtRePsGfj3Zrl/TqK # 0/3kDhkhwX33dSPnpQsmzDkrJwBi6Cj6j88JSubYeSV8WWwfR0mivNBT7peH09pr # Q+41GFcM7Yyul6ycXZ5Z9T1W/s2x1foY7UUeACQvWgXnfgNbPh25VEhj76wvgGCU # aOIfHpRMI/FSiz3tbGFsg9IFRKXbRh33d2HpighhV1a9cW2lmG/IjtxykAnOFgot # ARvkw2PttXN6O05K+FXBdkv744jYLwEr8O7RuBQlfjfiH4R1Cg2IbDbYFuNgcUs7 # Cv5oN0YL+nAtho05/JmllB1/0pzJ9Fj3lrm0JlKo/DsjuKIAy5jwyC6WjjpOKhVT # FTCUOEWHg1Qct38vG0kgylnXmD7qUwXYB3VAIOXcBEtr6f4nYeeZG/70CP6Esq7M # wzxofybkZgvq39TCK7PFf0cyzynUmOGJ9eizPdNFR9SfvAcV3/vPQcrENS0Vnlsn # FMwrXPZlNkhhJrN68VVONX9NjAsw0HJw53f+Pd7jjp48DPgK+QEj1zg1w6mz5TH3 # E8GUV0lgHCZ5Y4CArATb12S9N82PXXQFvXkVT+ajR60Oaf3dSOM/hy5P6YH+WCN3 # E7Qha7isjx9+79UY89OtnbKW5htbU2thKA2LGSoj5Y/GZMA+32U= # =ETyc # -----END PGP SIGNATURE----- # gpg: Signature made Sat 13 Apr 2024 07:02:25 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-04-15Merge tag 'v5.15.154' into v5.15/standard/baseBruce Ashfield
This is the 5.15.154 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYWoCsACgkQONu9yGCS # aT4kexAAq18sMargvRMhSjsMtcrT8A/a9riOcjRjf60TfJOPSMYsHSx5rc7wuOr5 # j7LenaBJLR1nbBYuxzmH+RlXHI9CNqp24UHup6Qj8KeAxW9efkEP2Q27Dylk+Lke # XA7lnC2xceS/eCU1uNtYLjnzc2FfxpOdYlAAaNbxIhbn3Z2yomWD3FcwJyVAPdU1 # gl8TWqMFG+RDYy3xB8zE/AV+9J1X1UQgjhOX+uMOwmb8XBJl4GCSDNMMTf62XqfZ # g0Y5y+h/9nhT4esDrkWQkUd9P//3oyVJ/z2CdsiJ9wJXjHJSOmna/MbZEJjkTUzH # pxrYWNSCoMLpWxy8f9ajbFoOnDN7pAAuShNEFzFqjUEx06UdvTEG/osPSCFHCwpr # 3HdsiPCkjalyK9NESLAKj6s99A0CN7vUT1QeJsXwLcYlAs7TBnC9ITMa2DpvjZ2k # 7flENlsIrZHfLjEre7Wt28AlDYECkHEHr01W3J2Tz15K2F3telmsUd6jnVKyqmjC # 6NkVQUUqflTtc9fjUBxSqKGmGcCwMEzcy74AG03B3Rf7etCGWdyFglhLW7Ce1600 # +b9rNjHEcnTfaOQ1Z2iD8yC2leGUaedbIpmmtUwqPS4RcKzbLV8Fs8yOYlD13jPO # CH9qc8yhA5BWf2+kZiiyTLeQY5mhvTfhHL1UADkCjk+phQsq7Sk= # =dFke # -----END PGP SIGNATURE----- # gpg: Signature made Wed 10 Apr 2024 10:20:27 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-04-13x86/mm/pat: fix VM_PAT handling in COW mappingsDavid Hildenbrand
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream. PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings. Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range(). In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory. To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that. We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size. For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios. Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn(): <--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h> int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map; ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned); /* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; } /* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer ---> On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ... Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Wupeng Ma <mawupeng1@huawei.com> Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL ↵Vlastimil Babka
allocations commit 803de9000f334b771afacb6ff3e78622916668b0 upstream. Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sven van Ashbrook <svenva@chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Curtis Malainey <cujomalainey@chromium.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10mm/migrate: set swap entry values of THP tail pages properly.Zi Yan
The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption. This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory: work on folio->swap instead of page->private when splitting folio"), subpages of a swapcached THP no longer requires the maintenance. Adding THPs to the swapcache was introduced in commit 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"), where each subpage of a THP added to the swapcache had its own swapcache entry and required the ->private field to point to the correct swapcache entry. Later, when THP migration functionality was implemented in commit 616b8371539a6 ("mm: thp: enable thp migration in generic path"), it initially did not handle the subpages of swapcached THPs, failing to update their ->private fields or replace the subpage pointers in the swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration for shmem thp") addressed the swapcache update aspect. This patch fixes the update of subpage ->private fields. Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_charante@quicinc.com/ Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10memtest: use {READ,WRITE}_ONCE in memory scanningQiang Zhang
[ Upstream commit 82634d7e24271698e50a3ec811e5f50de790a65f ] memtest failed to find bad memory when compiled with clang. So use {WRITE,READ}_ONCE to access memory to avoid compiler over optimization. Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10mm: swap: fix race between free_swap_and_cache() and swapoff()Ryan Roberts
[ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ] There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10swap: comments get_swap_device() with usage ruleHuang Ying
[ Upstream commit a95722a047724ef75567381976a36f0e44230bd9 ] The general rule to use a swap entry is as follows. When we get a swap entry, if there aren't some other ways to prevent swapoff, such as the folio in swap cache is locked, page table lock is held, etc., the swap entry may become invalid because of swapoff. Then, we need to enclose all swap related functions with get_swap_device() and put_swap_device(), unless the swap functions call get/put_swap_device() by themselves. Add the rule as comments of get_swap_device(). Link: https://lkml.kernel.org/r/20230529061355.125791-6-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chris Li (Google) <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 82b1c07a0af6 ("mm: swap: fix race between free_swap_and_cache() and swapoff()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-03Merge tag 'v5.15.150' into v5.15/standard/baseBruce Ashfield
This is the 5.15.150 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXhyIMACgkQONu9yGCS # aT5h3hAAjwfrpfkyZCFnYb5KWR06B1quaY2u0Tw5wKoLvtIhbnxBeNW0w+YuXr2p # GV6Dlozx4RnRdnJFIs0R98APPgw7SY8QRfPTsr2xoMr96FUg6VFuZFv4HyRBbDac # NqKOVQvWXGgC/56uGnw1IcewUcZtmT/QfwthEtfjfBzKdqJC0HnGhSGPDUeimw9k # h9jghsdWYZB4ykMSz22VngsztJJ60t5EAlwEMueBQsQeRPODcvGE1va9Jlpgjw66 # D4e6+ZGosQP/f9lezvYiESV1dzSRUTpgyl8wky3hSoYQmSBMi6BHq2v6ECn9t2LK # bVMVRzY+OK1AHXzkxHsb/QhHJfS4YNbHmFzpHVenf6I/VsFsxwtKeoT23PCeAMqh # 72v2RnJNDCSdydb3iYKx8s7hpkWumPS4LS0EOEh64+gFo/+TfMyWqIeSJfGmyhx8 # 60DX4FLWjqymP96OU7UjbvnlBhGlL2OmoDzgxGaQ9jFH3Kfmx30yMY3RtIb3Qhic # OiATTmRmRwICUYNskXn3e1jFVW/2a/aEEanCqEi2xBCzGMSzqeAw4hv9EIZOgVcE # FC3a09HpR4G4xP57ZYcz4Dry9xs8HoJJrAQwUUJ6X7SqHZWaiSt92ZblebLeaAbH # uo+PFmG7m9HSv/s66JB5+fW7H6RRnM+ozJ91Iy5N3T356n03N2o= # =hhwy # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Mar 2024 07:22:27 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-03-01userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlbLokesh Gidra
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream. In mfill_atomic_hugetlb(), mmap_changing isn't being checked again if we drop mmap_lock and reacquire it. When the lock is not held, mmap_changing could have been incremented. This is also inconsistent with the behavior in mfill_atomic(). Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-26Merge tag 'v5.15.149' into v5.15/standard/baseBruce Ashfield
This is the 5.15.149 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXYT2wACgkQONu9yGCS # aT6i6w//WPR54VwLS08xIhUBZlaUbSxtQW4IASxE87RklBpl60NvxGQjYpmRTuim # 428ek+PI6qLJnQpESN4s4zf2/q2+VUmbp98Bib4Yi2AtO0GRlk9kg10CA+Rda32l # qddiOpjIZfW2PkxegZ2tE29plgDQT3GWWjaDVSuZMvTVJmftvw6j4A0T9QiXeXOo # jGWzzQSOhbebKxXeSnecNg9CFGmzt9YeiJOPO05f67c8MK9JurF6WxkCVZ9VxF2b # INF21WUqKRB7gNdsJQb8sbaf0sjnVMOpP1lTjcU/IFEKmriVyT/ImyX9KPS3p7cj # 3ZAIZzmCVSM/ATYJm52p31QhytGHe45qf6knr2GQ/FJOhQxsGHKtj47zao9lo2Hi # /JpvyVnh1dWmcv2LtrDPjmRf/QnixVX3Kp/mn4zamp9n7/DqsJ14DSwf4plo/d7d # jN41GBb69e8kI+9rB8uK6pD8ua+gmXaIJW6DF4SySzovG4WYKE88xecrsmegSlOH # AOlr3JWzRE0PW3JHkk9Jc0ZPJrHivp1jv7RPS8i0A1JT+/dFx/B/BdDwhNLX3P/w # ysLaxaF9BAB4+5O6CsW6U0y9CFEZgbBKsVJu5ZY6rvZA8YDPgcfoUB5bZKRnWYcz # Menq1D7tr8hZbQWXbOgu/XyNJJXMl3C2b/j2V2GpNybVm/uXQrs= # =HI6Z # -----END PGP SIGNATURE----- # gpg: Signature made Fri 23 Feb 2024 02:55:24 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-02-23mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), againZach O'Keefe
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream. (struct dirty_throttle_control *)->thresh is an unsigned long, but is passed as the u32 divisor argument to div_u64(). On architectures where unsigned long is 64 bytes, the argument will be implicitly truncated. Use div64_u64() instead of div_u64() so that the value used in the "is this a safe division" check is the same as the divisor. Also, remove redundant cast of the numerator to u64, as that should happen implicitly. This would be difficult to exploit in memcg domain, given the ratio-based arithmetic domain_drity_limits() uses, but is much easier in global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32) Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()") Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: Maxim Patlasov <MPatlasov@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23mm/sparsemem: fix race in accessing memory_section->usageCharan Teja Kalla
[ Upstream commit 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 ] The below race is observed on a PFN which falls into the device memory region with the system memory configuration where PFN's are such that [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and end pfn contains the device memory PFN's as well, the compaction triggered will try on the device memory PFN's too though they end up in NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections). When from other core, the section mappings are being removed for the ZONE_DEVICE region, that the PFN in question belongs to, on which compaction is currently being operated is resulting into the kernel crash with CONFIG_SPASEMEM_VMEMAP enabled. The crash logs can be seen at [1]. compact_zone() memunmap_pages ------------- --------------- __pageblock_pfn_to_page ...... (a)pfn_valid(): valid_section()//return true (b)__remove_pages()-> sparse_remove_section()-> section_deactivate(): [Free the array ms->usage and set ms->usage = NULL] pfn_section_valid() [Access ms->usage which is NULL] NOTE: From the above it can be said that the race is reduced to between the pfn_valid()/pfn_section_valid() and the section deactivate with SPASEMEM_VMEMAP enabled. The commit b943f045a9af("mm/sparse: fix kernel crash with pfn_section_valid check") tried to address the same problem by clearing the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns false thus ms->usage is not accessed. Fix this issue by the below steps: a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage. b) RCU protected read side critical section will either return NULL when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage. c) Free the ->usage with kfree_rcu() and set ms->usage = NULL. No attempt will be made to access ->usage after this as the SECTION_HAS_MEM_MAP is cleared thus valid_section() return false. Thanks to David/Pavan for their inputs on this patch. [1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@quicinc.com/ On Snapdragon SoC, with the mentioned memory configuration of PFN's as [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of issues daily while testing on a device farm. For this particular issue below is the log. Though the below log is not directly pointing to the pfn_section_valid(){ ms->usage;}, when we loaded this dump on T32 lauterbach tool, it is pointing. [ 540.578056] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 540.578068] Mem abort info: [ 540.578070] ESR = 0x0000000096000005 [ 540.578073] EC = 0x25: DABT (current EL), IL = 32 bits [ 540.578077] SET = 0, FnV = 0 [ 540.578080] EA = 0, S1PTW = 0 [ 540.578082] FSC = 0x05: level 1 translation fault [ 540.578085] Data abort info: [ 540.578086] ISV = 0, ISS = 0x00000005 [ 540.578088] CM = 0, WnR = 0 [ 540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--) [ 540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c [ 540.579454] lr : compact_zone+0x994/0x1058 [ 540.579460] sp : ffffffc03579b510 [ 540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c [ 540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640 [ 540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000 [ 540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140 [ 540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff [ 540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001 [ 540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440 [ 540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4 [ 540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001 [ 540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800 [ 540.579524] Call trace: [ 540.579527] __pageblock_pfn_to_page+0x6c/0x14c [ 540.579533] compact_zone+0x994/0x1058 [ 540.579536] try_to_compact_pages+0x128/0x378 [ 540.579540] __alloc_pages_direct_compact+0x80/0x2b0 [ 540.579544] __alloc_pages_slowpath+0x5c0/0xe10 [ 540.579547] __alloc_pages+0x250/0x2d0 [ 540.579550] __iommu_dma_alloc_noncontiguous+0x13c/0x3fc [ 540.579561] iommu_dma_alloc+0xa0/0x320 [ 540.579565] dma_alloc_attrs+0xd4/0x108 [quic_charante@quicinc.com: use kfree_rcu() in place of synchronize_rcu(), per David] Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@quicinc.com Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@quicinc.com Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot") Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-18Merge tag 'v5.15.147' into v5.15/standard/baseBruce Ashfield
This is the 5.15.147 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWlcSEACgkQONu9yGCS # aT6P3w//SCg0ZlSlbJsNhS/d0dd5qprZU46HSOgScpwPSlpeQWN/xPgWgT6NkZaE # kkpSAgG9OQ+cF+CqDOoBMK5agEHGn2/2xtIuSOVvTg8zA04bJmInkwUh/eD4O/kO # +UbYpOVfxhYN/5RCWHNYCpcJ77XhN/CHRuUTMw+6meliLIHkuOj2wzCmaM0Z/4s4 # 46gW03u9GJGam4n9aqy+n+Z6cbEJrQxYjLsglmbvxsXJAyeBGG7jL8GCsfgBFEnh # mz8VzfP5w6g4ItrQIRDLrvFK08upa4pj5TiyxzN/GxXPiHbFZ6wQY4JsWg+XUt0g # WI+72ZJkgQPK4edTGgUhzVDEJ5/Un9WTgbgOJ9lyil9Tnu24BG/WOIx/HB9W0YnP # AR/wrM6F8/GqVCfVAN/7VpZFObFYyTtaoXsRiRHtgkKY1sO0XHktAvwxGgejROaD # b++ZU5PVGE/gQcjF8mqpa2yFAq0XJX9E2LlNQ/uGsXUASZtlyPoV14c5wCJWuoia # x9r9I4qSHeUjSV7vu7BahpohKcHPkQAU3ltxuBNV9aA8b0c4+El7F30uROaWSvEi # VK7Oulv5Ag079XNU44PNBiFVLau4vd/Gyiw5wsRDdB4Xq8jWrgZdKS94VF1sEmrL # +EYqlx27vZyW1M4jy5vWbJ85WI6psoV8AOA31GE5gFBMVDyvA4s= # =2jFW # -----END PGP SIGNATURE----- # gpg: Signature made Mon 15 Jan 2024 12:53:37 PM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-01-15mm: fix unmap_mapping_range high bits shift bugJiajun Xie
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream. The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-15mm/memory-failure: check the mapcount of the precise pageMatthew Wilcox (Oracle)
[ Upstream commit c79c5a0a00a9457718056b588f312baadf44e471 ] A process may map only some of the pages in a folio, and might be missed if it maps the poisoned page but not the head page. Or it might be unnecessarily hit if it maps the head page, but not the poisoned page. Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org Fixes: 7af446a841a2 ("HWPOISON, hugetlb: enable error handling path for hugepage") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-11Merge tag 'v5.15.146' into v5.15/standard/baseBruce Ashfield
This is the 5.15.146 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWYDqcACgkQONu9yGCS # aT6oag//XS7Yvh5iblvruCoweAzN7O4H3O4flPMA5eOyiC3ZPS9ab/+6wToqUqz5 # Bfq4ocSX9bWCRCqb3Nb4R+6TyXhdrkfrkMquvSp+fira4jYBVPAKs1+CNEBd9tJr # MgwSgHH6hDmxhOJsVCWuQkQgkSQUgcEqDYhsI59Xj+VBlAgtjco1JT1YeUxFUii6 # ibFN4RMbMXRETh3OonhfpxlAG7nSPepcgY/9A5+llZLamSW6D5dGkzOwMPxhl+qV # AilmoUVeTvKEZqSReS9XKyQqZwNtylIu6UBBBOrteG4Rz4MGn9VUtcCJgrGEpS2q # jkbyHTHXyGt19akQ/4cpSQjlDs6qi5EafdOcooBYOybL8PXldlXuppR2huhNsKcd # wsym5lO1bwTgEGbOWu4RPXGPeNQ7h2pNItHMWEZ3ZS4qb1tUBfB636Q6apUBObNz # NAtqZ4LwNRzy+SsP8aq1ezjVVp5e1ubJfDjwq5zjO80PfGOO2gtyDEI56vpsGs5n # YhW4UkXbtzdu7qTNu54y5XzUhER90lK8uM8DM9QUW8bPtr1SAvn3WSTR723ld7bO # 1UE8d9PYvZmbubX8PlycgbyXkdEwvyLiuAJ3boIOQYKDsnBCKsn/a990Znlyk/A9 # jZA4EKWu+DBdwO4yGGgGtAWViNLfx+6HfH89aqKsJBBuQ6aszFU= # =9uci # -----END PGP SIGNATURE----- # gpg: Signature made Fri 05 Jan 2024 09:13:59 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-01-05mm/filemap: avoid buffered read/write race to read inconsistent dataBaokun Li
commit e2c27b803bb664748e090d99042ac128b3f88d92 upstream. The following concurrency may cause the data read to be inconsistent with the data on disk: cpu1 cpu2 ------------------------------|------------------------------ // Buffered write 2048 from 0 ext4_buffered_write_iter generic_perform_write copy_page_from_iter_atomic ext4_da_write_end ext4_da_do_write_end block_write_end __block_commit_write folio_mark_uptodate // Buffered read 4096 from 0 smp_wmb() ext4_file_read_iter set_bit(PG_uptodate, folio_flags) generic_file_read_iter i_size_write // 2048 filemap_read unlock_page(page) filemap_get_pages filemap_get_read_batch folio_test_uptodate(folio) ret = test_bit(PG_uptodate, folio_flags) if (ret) smp_rmb(); // Ensure that the data in page 0-2048 is up-to-date. // New buffered write 2048 from 2048 ext4_buffered_write_iter generic_perform_write copy_page_from_iter_atomic ext4_da_write_end ext4_da_do_write_end block_write_end __block_commit_write folio_mark_uptodate smp_wmb() set_bit(PG_uptodate, folio_flags) i_size_write // 4096 unlock_page(page) isize = i_size_read(inode) // 4096 // Read the latest isize 4096, but without smp_rmb(), there may be // Load-Load disorder resulting in the data in the 2048-4096 range // in the page is not up-to-date. copy_page_to_iter // copyout 4096 In the concurrency above, we read the updated i_size, but there is no read barrier to ensure that the data in the page is the same as the i_size at this point, so we may copy the unsynchronized page out. Hence adding the missing read memory barrier to fix this. This is a Load-Load reordering issue, which only occurs on some weak mem-ordering architectures (e.g. ARM64, ALPHA), but not on strong mem-ordering architectures (e.g. X86). And theoretically the problem doesn't only happen on ext4, filesystems that call filemap_read() but don't hold inode lock (e.g. btrfs, f2fs, ubifs ...) will have this problem, while filesystems with inode lock (e.g. xfs, nfs) won't have this problem. Link: https://lkml.kernel.org/r/20231213062324.739009-1-libaokun1@huawei.com Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: yangerkun <yangerkun@huawei.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-27Merge tag 'v5.15.145' into v5.15/standard/baseBruce Ashfield
This is the 5.15.145 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWGq24ACgkQONu9yGCS # aT4KFBAAwRxalkvRDMCpM6A9KkwbmuxNhQ326ylNWeGjCT3c1GDevvp49l5tR5LW # toqkC/U0IlimAH/++4vglWIRonPyouCaqIo8/ezRfjHyqwS8At24eRIqYo5WRRS7 # 135j3LetxSnpmAnuf33MvBuNb6bUSc4PX2l4KhBOd4BOv/38hl61evjUurtxNrFH # NttKSV8nj60NmA3cyCfmJI6MX9YxiLGlRET+Zm4GgJ0cypmx7Sc5mRMdNNZc2QR/ # kB9W3/2v1839VkT/dshv1PLySBCDNahs8ju2D9Kpr/T1eutUN4VD/R+oTiLyBUjt # t11JpqSxdrKZgYbplU863icjRnB/j+BmynD4Zie526uAk3mqt8VnIDycyjCVUbiz # Q/K3QzHyzqhlZntap7RI7bakxIibOmgFU5h5NHJzY98rWopQo4dqWzABDc+MX+sy # q+5OisAxpT4sC60SyfGS2iQQCjgMEAhL01hdr0NZXOUJnylXB2kYzziSsCUym5ry # 2TdBFhwTJ5UDHULiz8uUNM8foZPBsBgS0XW++kgqFD8EkMECNvyS0BYuA/9fUX3S # SnmMBUw1z3bd441orQGmrpMq19GV47GiW+l1A1HiJchC1cYnKUpj+2LenqYs/oNt # tDwjARsyIJcc3qkKPM9Ms+9DVndXkYL2vkcOZH8lMiFCWxj4Ef4= # =xPQ5 # -----END PGP SIGNATURE----- # gpg: Signature made Sat 23 Dec 2023 04:42:06 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-12-27Merge tag 'v5.15.144' into v5.15/standard/baseBruce Ashfield
This is the 5.15.144 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWC94sACgkQONu9yGCS # aT7T4A//QB8F5ia5S28qbzPRHVxMJ2EIytYRzOVi1FmlBYxfY3sYfkGSbSfV6c6h # m7H2Ug+SfonXhLKfgEEgFGjrmiOOcKrEoGEzfdYaT6SnoLzha+dz9SbhK/Oanscz # CUxLXE/sGXgPxvOtFN6qilGnR5j/1h7VtM0wSSlMzOLGY+5MkaDDHk9aJL0vfXO1 # Z3mp19Oqux26fIlkWBXBtgxCS3t4uxVl870i8SXUSmLeCags0Fo4ejm8HsfmTDUc # yAYlyvwXWORsZ4rd9mBX7T02zn939xkZsZV3dgAzL8G1NWw/oWnnzMLkfoALlCtA # Dw2PzJeZ6rF/Pemx0v69ntbkXn88LxsFOAAhmzGuX54vTfIZCn1f9KDs4Im/8mbh # S0+8+bHyUh6ofbeS3pZR4uMavKhI2IUgdOFcaYqMgLtlCNYM+JtL6bxhDgxPLKvo # v/Lni5JNeJqGHImI/rWIZDr9o/lpuqoy1YEYpjznYCvUUqN1/h69b9J9Om/k2WCQ # L5Up62EvxCbZXYM4Q8yO77bjqwrSMOTdqV6hvLUM4rMqjl79MeR0kujsgJrGMYBb # irGeu0CT4VRlUvPHcUtD31hVdiS2C6ruS8RG11s3pRTPOQtgBcTq2NaHY+cJicKB # +GGfHbgvTrNafqMzQpsv9mHcHIb1Ai4UvNSuLYCrHFSplWfLk0A= # =y87P # -----END PGP SIGNATURE----- # gpg: Signature made Wed 20 Dec 2023 09:17:47 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-12-23kasan: disable kasan_non_canonical_hook() for HW tagsArnd Bergmann
commit 17c17567fe510857b18fe01b7a88027600e76ac6 upstream. On arm64, building with CONFIG_KASAN_HW_TAGS now causes a compile-time error: mm/kasan/report.c: In function 'kasan_non_canonical_hook': mm/kasan/report.c:637:20: error: 'KASAN_SHADOW_OFFSET' undeclared (first use in this function) 637 | if (addr < KASAN_SHADOW_OFFSET) | ^~~~~~~~~~~~~~~~~~~ mm/kasan/report.c:637:20: note: each undeclared identifier is reported only once for each function it appears in mm/kasan/report.c:640:77: error: expected expression before ';' token 640 | orig_addr = (addr - KASAN_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT; This was caused by removing the dependency on CONFIG_KASAN_INLINE that used to prevent this from happening. Use the more specific dependency on KASAN_SW_TAGS || KASAN_GENERIC to only ignore the function for hwasan mode. Link: https://lkml.kernel.org/r/20231016200925.984439-1-arnd@kernel.org Fixes: 12ec6a919b0f ("kasan: print the original fault addr when access invalid shadow") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20memblock: allow to specify flags with memblock_add_node()David Hildenbrand
[ Upstream commit 952eea9b01e4bbb7011329f1b7240844e61e5128 ] We want to specify flags when hotplugging memory. Let's prepare to pass flags to memblock_add_node() by adjusting all existing users. Note that when hotplugging memory the system is already up and running and we might have concurrent memblock users: for example, while we're hotplugging memory, kexec_file code might search for suitable memory regions to place kexec images. It's important to add the memory directly to memblock via a single call with the right flags, instead of adding the memory first and apply flags later: otherwise, concurrent memblock users might temporarily stumble over memblocks with wrong flags, which will be important in a follow-up patch that introduces a new flag to properly handle add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shahab Vahedi <shahab@synopsys.com> [arch/arc] Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: c7206e7bd214 ("MIPS: Loongson64: Handle more memory types passed from firmware") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource()David Hildenbrand
[ Upstream commit 53d38316ab2017a7c0d733765b521700aa357ec9 ] Patch series "mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK", v2. Architectures that require CONFIG_ARCH_KEEP_MEMBLOCK=y, such as arm64, don't cleanly support add_memory_driver_managed() yet. Most prominently, kexec_file can still end up placing kexec images on such driver-managed memory, resulting in undesired behavior, for example, having kexec images located on memory not part of the firmware-provided memory map. Teaching kexec to not place images on driver-managed memory is especially relevant for virtio-mem. Details can be found in commit 7b7b27214bba ("mm/memory_hotplug: introduce add_memory_driver_managed()"). Extend memblock with a new flag and set it from memory hotplug code when applicable. This is required to fully support virtio-mem on arm64, making also kexec_file behave like on x86-64. This patch (of 2): If memblock_add_node() fails, we're most probably running out of memory. While this is unlikely to happen, it can happen and having memory added without a memblock can be problematic for architectures that use memblock to detect valid memory. Let's fail in a nice way instead of silently ignoring the error. Link: https://lkml.kernel.org/r/20211004093605.5830-1-david@redhat.com Link: https://lkml.kernel.org/r/20211004093605.5830-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Shahab Vahedi <shahab@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: c7206e7bd214 ("MIPS: Loongson64: Handle more memory types passed from firmware") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-18Merge tag 'v5.15.143' into v5.15/standard/baseBruce Ashfield
This is the 5.15.143 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmV567gACgkQONu9yGCS # aT5tqw/+JcWvNNNODOcZPyAl2c3JyvJHcCAlNuWz6eJyyQAhNA6IbpTWv/tkKRzB # ckWaRobBVP40S9iY+NnZ8V12fqI/K9Fh9BFxsMjabjRZ0aFPqpgkwYJ/IPPLrMks # 5Z7SG1YG48urkIQ08lLvBMvFyMmFc4CYmNP4GeEIvbS3kqMu09BOpSWoiZD7peOR # YKAnZHOGwJ75CJyrma6pWJhJ64UCIGQAAMy6IqbzH393YB1Wc8P4xMWx1XAY4vHP # 3hHgi0BJ1/ke47lXz2uUYaCyceJvxTPMlugDNaEhhXkD20Ov7NXTszX01UJTm6UK # GXHUKoabJcfScL8fI87r9mtjR+pkvbxEYMteQ8p9cqm4gym61KJJmelTleD8nGsF # 1wq5uhQf+dAq+6b3tBdQJAZcakLOyWPh2PEXGprlClgmxx7ph0zt6KWOUTAxqlhD # /hEPHe8cKVXMcF0CnVBSbSGutT1Y7/vi03rdk7ve0C9xr5RWTp7t2ptDZhTJDUbF # VC/FS+EuFm2g5lOuQCNqrfl4ckhDMGQhxPC9WUafOVQIMClKX+1dOED9LsEvRM8R # qYv5v3VI1NsacclV/EhfzIDSWUtaPyeXV26JQLTJmZDTh+GNLsw/AfhPN82mf3eF # 6h+IIwpTVbwBG3G+Ek+HBoGoV5nOEI9Jx/aV+MEwx6+LOUf8tSA= # =ZjLl # -----END PGP SIGNATURE----- # gpg: Signature made Wed 13 Dec 2023 12:36:56 PM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-12-13mm: fix oops when filemap_map_pmd() without prealloc_pteHugh Dickins
commit 9aa1345d66b8132745ffb99b348b1492088da9e2 upstream. syzbot reports oops in lockdep's __lock_acquire(), called from __pte_offset_map_lock() called from filemap_map_pages(); or when I run the repro, the oops comes in pmd_install(), called from filemap_map_pmd() called from filemap_map_pages(), just before the __pte_offset_map_lock(). The problem is that filemap_map_pmd() has been assuming that when it finds pmd_none(), a page table has already been prepared in prealloc_pte; and indeed do_fault_around() has been careful to preallocate one there, when it finds pmd_none(): but what if *pmd became none in between? My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it easy for *pmd to be cleared while servicing a page fault; but even before those, a huge *pmd might be zapped while a fault is serviced. The difference in symptomatic stack traces comes from the "memory model" in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some models that is strict, and will oops on the NULL prealloc_pte; in other models, it will construct a bogus value to be populated into *pmd, then __pte_offset_map_lock() oops when trying to access split ptlock pointer (or some other symptom in normal case of ptlock embedded not pointer). Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxhound.fi/ Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/ Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com>, Cc: José Pekkarinen <jose.pekkarinen@foxhound.fi> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-03Merge tag 'v5.15.140' into v5.15/standard/baseBruce Ashfield
This is the 5.15.140 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVmG8gACgkQONu9yGCS # aT5fDhAAoHORFj/dBspCjhYkR33cx05klPRq5KA3FUWeVozoEKp/EAGYK/VRFYaX # lHf3jIAvZw8X7dirAo6ep6c8sj45IFoNiAVa4vJzhpqGG/sgMGXscTGOXGXH7PdG # 8C0Df/EoxBg0qX/Tmc18xODAz2nuC8OAd64IBbesEq0yTGqA5IYqvtAHxIh1vaUs # UNj0ZuY338JBDnJLDpT8IX6PiGXKRVAQmyznVbElDXX5D4C3BfUkfiRKZB+rCDB2 # C1STU5wPm8I3b+Fl2B/2R38DylkxjwOVKsAxunLSK0GaS+Bd5WEtQUmAR4SZHTnk # uxDavW2Tx3xoExdifY4l2bxHWEhLMhUQ0N8pwTjFG7IvUymeZE26HvTc6fsiq0B9 # j+gEuSpAc2WtUtpJRV6ejJuzSruU8YGZjOxw70Esd70ozn8/qJ6ogce5sTayTPJH # bjgvM2B+VJfLLWmQtJqLmEIzTJpuKaz/RREacL7xQrmfo3eOi3oooUL7YA51QNBY # zBChqmPbMI9YU2NQxMzy14mpnc9l3QOVL7NtGkQ+JZeRG1h3oKRbLc5rCtL876Oe # 1ptLbVYnoPEEtp88zGzjrtonEKFhn8E/TdKDYqd2whJd4vs7xT3qejjfBt+1yJvz # pJsQKGjkjjBgTl/ddbJdOacpNqvbDeX/0O/jYF64zDuqBbG+AxM= # =mE5l # -----END PGP SIGNATURE----- # gpg: Signature made Tue 28 Nov 2023 11:56:40 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-12-03Merge tag 'v5.15.139' into v5.15/standard/baseBruce Ashfield
This is the 5.15.139 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVbMCYACgkQONu9yGCS # aT66XA//ZB+HT7c0EVq8jNCxNQn99P3+ItALwJEDaxDcu+03xBPtYz1PVxLv/Zrb # EXccxQrbC5c3jNw66a04F29248Hx17sGRX6cJkYHAFHEFHjcbXTpmqRTEdphicQV # ucfiw1Y/kxLvwgTCDDB1PKgmmdmLnLUnju3n2qaUA9z/4ITlC9YHmYGVr9lU4HzI # joNSEEZ85IrtYM4UpFTOfhOjxZHQr4qabyOkSpYxIGyWbitGOomK1YyMf6klQSRP # 3Xw/FTjpLajNWH4++fgydB0XdDf1XVl3gbdO2dZkpYbFqqhZ8dZWQmbK87kUsnKm # fQN1wJVoc7nQhcmYWRqhe/ZVmly6ZTC7SSP0hOTFZOopqcpKWDBDc7IzjXinh/kh # DVzUiPGraJI0RYdxHX2b65Azz4gKwIj5NnRgCn35Dtr0MKbjyIEvci3IXN/Ap3cd # n6DM4gkkIS+UomAIsN3ktr2WweheFnO5mJlPYzVQEe3XYlN8/i4hYVGvjcyEyqe2 # S2fJdZiG/29bCg6VQyTwZz5gmK8eozvosSeW6+DUV2aWNKMNYQK8eY41X4CJ3QHo # GbvbFmetP2mgCrJO7thCdvA8og/soc0Fxt8Zep/I5+gAj3w1uYojayDtUYWsC0gN # aLhY/wbIU9V5yNBTgd08jSjU8b4o4yUgG/J7i74+Dg/G9+k7cQs= # =r9mL # -----END PGP SIGNATURE----- # gpg: Signature made Mon 20 Nov 2023 05:08:38 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-11-28mm: kmem: drop __GFP_NOFAIL when allocating objcg vectorsRoman Gushchin
commit 24948e3b7b12e0031a6edb4f49bbb9fb2ad1e4e9 upstream. Objcg vectors attached to slab pages to store slab object ownership information are allocated using gfp flags for the original slab allocation. Depending on slab page order and the size of slab objects, objcg vector can take several pages. If the original allocation was done with the __GFP_NOFAIL flag, it triggered a warning in the page allocation code. Indeed, order > 1 pages should not been allocated with the __GFP_NOFAIL flag. Fix this by simply dropping the __GFP_NOFAIL flag when allocating the objcg vector. It effectively allows to skip the accounting of a single slab object under a heavy memory pressure. An alternative would be to implement the mechanism to fallback to order-0 allocations for accounting metadata, which is also not perfect because it will increase performance penalty and memory footprint of the kernel memory accounting under memory pressure. Link: https://lkml.kernel.org/r/ZUp8ZFGxwmCx4ZFr@P9FQF9L96D.corp.robot.car Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: Christoph Lameter <cl@linux.com> Closes: https://lkml.kernel.org/r/6b42243e-f197-600a-5d22-56bd728a5ad8@gentwo.org Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mm/memory_hotplug: use pfn math in place of direct struct page manipulationZi Yan
commit 1640a0ef80f6d572725f5b0330038c18e98ea168 upstream. When dealing with hugetlb pages, manipulating struct page pointers directly can get to wrong struct page, since struct page is not guaranteed to be contiguous on SPARSEMEM without VMEMMAP. Use pfn calculation to handle it properly. Without the fix, a wrong number of page might be skipped. Since skip cannot be negative, scan_movable_page() will end early and might miss a movable page with -ENOENT. This might fail offline_pages(). No bug is reported. The fix comes from code inspection. Link: https://lkml.kernel.org/r/20230913201248.452081-4-zi.yan@sent.com Fixes: eeb0efd071d8 ("mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages") Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mm/cma: use nth_page() in place of direct struct page manipulationZi Yan
commit 2e7cfe5cd5b6b0b98abf57a3074885979e187c1c upstream. Patch series "Use nth_page() in place of direct struct page manipulation", v3. On SPARSEMEM without VMEMMAP, struct page is not guaranteed to be contiguous, since each memory section's memmap might be allocated independently. hugetlb pages can go beyond a memory section size, thus direct struct page manipulation on hugetlb pages/subpages might give wrong struct page. Kernel provides nth_page() to do the manipulation properly. Use that whenever code can see hugetlb pages. This patch (of 5): When dealing with hugetlb pages, manipulating struct page pointers directly can get to wrong struct page, since struct page is not guaranteed to be contiguous on SPARSEMEM without VMEMMAP. Use nth_page() to handle it properly. Without the fix, page_kasan_tag_reset() could reset wrong page tags, causing a wrong kasan result. No related bug is reported. The fix comes from code inspection. Link: https://lkml.kernel.org/r/20230913201248.452081-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20230913201248.452081-2-zi.yan@sent.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20vfs: fix readahead(2) on block devicesReuben Hawkins
[ Upstream commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2 ] Readahead was factored to call generic_fadvise. That refactor added an S_ISREG restriction which broke readahead on block devices. In addition to S_ISREG, this change checks S_ISBLK to fix block device readahead. There is no change in behavior with any file type besides block devices in this change. Fixes: 3d8f7615319b ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED") Signed-off-by: Reuben Hawkins <reubenhwk@gmail.com> Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-17Merge tag 'v5.15.138' into v5.15/standard/baseBruce Ashfield
This is the 5.15.138 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVLttIACgkQONu9yGCS # aT7JZw//XGysXkRRmATex5/fLTKQhengV03F0qrAGOGHsyd1JojQ6oldRwxZthf1 # V69m4RwrorligmxJ5izCwgISqkF3ptbwb62M0AuLjI6/lo7kft39yUWvgv3CY6U3 # G0FN0cLhGKRpETF/x++/8eDgJKz6DMSkUvX42l2r2r/pjgunRomvnhqm8LCkNMX9 # OPPSgKyR8/2Vlj1qc+Nr+O6JWwMRCCLDCh+RdAqvRioFB78o5LDcYXwGuhm+EESP # 51smz+broUK/DrV7GwMvxFWP6VOxv68nhH5wRt2PDzmIUfGd3BHXTW+5fca9LM5p # hp6zXlrl2KjVzY0Ry/xfx857KM2/Ir+Cdvv5pZI58U9kDMt+NJhOAbyHUpDbwlYk # enmeMkjlr1JbM3Sp89wO0g/VA06xWgylI4O147B5ma3Ctxc/M8ojeiNRwDZSUFwb # OmIq5a+qCl+uZ0hRGGqCz96vfyIwtRUGCkf/VBSA0KCZ+g0H25h3FOcY6jJHXahM # LqpEc77I/KU5QzAqz89KP5oUfe1RZT+vrpcGI2O4vI8OlJfiNke3BhQYcqUveXXR # KwQQ1XhKPEzjjJNFxONf+xa5JdHKsxtzKGaiypDgN0qIHRgjBq04vY3WZxfmH4F1 # jftWbjKuJ490/kwlGUWASR8nSim7Tj1oL/P2SjciSqa8hz6tOzE= # =aBAE # -----END PGP SIGNATURE----- # gpg: Signature made Wed 08 Nov 2023 11:26:58 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-11-08kasan: print the original fault addr when access invalid shadowHaibo Li
commit babddbfb7d7d70ae7f10fedd75a45d8ad75fdddf upstream. when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004eea0fc8("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004eea0fc8("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08mm/migrate: fix do_pages_move for compat pointersGregory Price
commit 229e2253766c7cdfe024f1fe280020cc4711087c upstream. do_pages_move does not handle compat pointers for the page list. correctly. Add in_compat_syscall check and appropriate get_user fetch when iterating the page list. It makes the syscall in compat mode (32-bit userspace, 64-bit kernel) work the same way as the native 32-bit syscall again, restoring the behavior before my broken commit 5b1b561ba73c ("mm: simplify compat_sys_move_pages"). More specifically, my patch moved the parsing of the 'pages' array from the main entry point into do_pages_stat(), which left the syscall working correctly for the 'stat' operation (nodes = NULL), while the 'move' operation (nodes != NULL) is now missing the conversion and interprets 'pages' as an array of 64-bit pointers instead of the intended 32-bit userspace pointers. It is possible that nobody noticed this bug because the few applications that actually call move_pages are unlikely to run in compat mode because of their large memory requirements, but this clearly fixes a user-visible regression and should have been caught by ltp. Link: https://lkml.kernel.org/r/20231003144857.752952-1-gregory.price@memverge.com Fixes: 5b1b561ba73c ("mm: simplify compat_sys_move_pages") Signed-off-by: Gregory Price <gregory.price@memverge.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Arnd Bergmann <arnd@arndb.de> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08mm/page_alloc: correct start page when guard page debug is enabledKemeng Shi
commit 61e21cf2d2c3cc5e60e8d0a62a77e250fccda62c upstream. When guard page debug is enabled and set_page_guard returns success, we miss to forward page to point to start of next split range and we will do split unexpectedly in page range without target page. Move start page update before set_page_guard to fix this. As we split to wrong target page, then splited pages are not able to merge back to original order when target page is put back and splited pages except target page is not usable. To be specific: Consider target page is the third page in buddy page with order 2. | buddy-2 | Page | Target | Page | After break down to target page, we will only set first page to Guard because of bug. | Guard | Page | Target | Page | When we try put_page_back_buddy with target page, the buddy page of target if neither guard nor buddy, Then it's not able to construct original page with order 2 | Guard | Page | buddy-0 | Page | All pages except target page is not in free list and is not usable. Link: https://lkml.kernel.org/r/20230927094401.68205-1-shikemeng@huaweicloud.com Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-25Merge tag 'v5.15.132' into v5.15/standard/baseBruce Ashfield
This is the 5.15.132 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUJdpAACgkQONu9yGCS # aT7LKA//TbjfOj1RRRPE916bAbXiUwXoDDOFaAnUj8+QRLAxDB6g8U2uAMRdPwrE # ChCFkRfma3u1hUloRp4w+IVxDNpyeTYDkK7VK5P0GSX+CUJj8ZtVCGMIYcyzdK18 # UHff2rCQVhkfzfXPxUVYws2JEtFqxeO1VsNJEVFLhMJ1NHePLyrMFyAQNLrLlk8K # mxHjjpNdImSdgh8agAgioUaq+RvrWt2X0CTL8NC3HAU4PwMuDjTiB2YFD3PcQloS # Pszqw1oenTQG9PwuwtnWJyn2U0RkD+IkEXj99ED/ocs73aHOmQ31jjcDXcz3gNJ5 # dZVktqD7y1tAQlivvsiwgumeJWxBQ9u5bEf1i8bAYfjelT6TyNuhk+JDWGRBYetd # fOddhoNHw7KFvB8RKNSW/R+gt6RaeQZB8JN+9qF6vlit/uSP3wC0klKV56gKhXY9 # DMQ9j/FCLHrxOo5vgvMu5LTXJOyn/hgdQ9kYVT7Yz4Y2JDuFR6pE4xzuVsxIhnyX # TIzp8ywsAKDl2d2OZCzp5S9YXxkVDBj0xJIxFSjyq9JPW9iVh18AEsIgkvwBjh/P # 5okd3AIw+zU45dHDDsnePslFxl90La5cACuwEJzGsGuDYomdiUeqSCkB/5zcAWTn # nra2BuxEI/DVHOifygJ4rZA9IBxIUoPrAbIPHR1Knjll+lfSVGY= # =blAW # -----END PGP SIGNATURE----- # gpg: Signature made Tue 19 Sep 2023 06:23:12 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-09-19mm/vmalloc: add a safer version of find_vm_area() for debugJoel Fernandes (Google)
commit 0818e739b5c061b0251c30152380600fb9b84c0c upstream. It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj(). [applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.] Link: https://lkml.kernel.org/r/20230904180806.1002832-1-joel@joelfernandes.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Zqiang <qiang.zhang1211@gmail.com> Cc: <stable@vger.kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19rcu: dump vmalloc memory info safelyZqiang
commit c83ad36a18c02c0f51280b50272327807916987f upstream. Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios. And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something. Link: https://lkml.kernel.org/r/20230904180806.1002832-2-joel@joelfernandes.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19net-memcg: Fix scope of sockmem pressure indicatorsAbel Wu
[ Upstream commit ac8a52962164a50e693fa021d3564d7745b83a7f ] Now there are two indicators of socket memory pressure sit inside struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating memory reclaim pressure in memcg->memory and ->tcpmem respectively. When in legacy mode (cgroupv1), the socket memory is charged into ->tcpmem which is independent of ->memory, so socket_pressure has nothing to do with socket's pressure at all. Things could be worse by taking socket_pressure into consideration in legacy mode, as a pressure in ->memory can lead to premature reclamation/throttling in socket. While for the default mode (cgroupv2), the socket memory is charged into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used. So {socket,tcpmem}_pressure are only used in default/legacy mode respectively for indicating socket memory pressure. This patch fixes the pieces of code that make mixed use of both. Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19tmpfs: verify {g,u}id mount options correctlyChristian Brauner
[ Upstream commit 0200679fc7953177941e41c2a4241d0b6c2c5de8 ] A while ago we received the following report: "The other outstanding issue I noticed comes from the fact that fsconfig syscalls may occur in a different userns than that which called fsopen. That means that resolving the uid/gid via current_user_ns() can save a kuid that isn't mapped in the associated namespace when the filesystem is finally mounted. This means that it is possible for an unprivileged user to create files owned by any group in a tmpfs mount (since we can set the SUID bit on the tmpfs directory), or a tmpfs that is owned by any user, including the root group/user." The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs as above. The new mount api's cross-namespace delegation abilities are already widely used. After having talked to a bunch of userspace this is the most faithful solution with minimal regression risks. I know of one users - systemd - that makes use of the new mount api in this way and they don't set unresolable {g,u}ids. So the regression risk is minimal. Link: https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com Fixes: f32356261d44 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API") Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Reported-by: Seth Jenkins <sethjenkins@google.com> Message-Id: <20230801-vfs-fs_context-uidgid-v1-1-daf46a050bbf@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-04Merge tag 'v5.15.129' into v5.15/standard/baseBruce Ashfield
This is the 5.15.129 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTvT7YACgkQONu9yGCS # aT65pQ/+Ko+cSATQarGOACdQ6dEs1JwPnHkfnvroQedYVo/U2kmtjMCb84BNM7GQ # u68xlJAG/SEhF5775iE5OT72dK/UdyNZtydGvNReJrdJ1wL7MAtaDzLEqr4W5RB8 # buRtcVofW4bi+kvAhGWnU7D0qT/ZdHYX64GKfAqf5/fR/p3ZVxB66VPI9VX5LKTa # dE85Y7rLEApDlDTm/frhAiUijXYS27tJKuyqyWG/gRlCiZA6jeoaU90l5iU7jkPz # LWwcRQ35PMNfpxSyOZdCx11a/zlHR+mw0l5+wxF+u6toWLH+kckWwWMoIKr/Il0H # DZNPYMDn27uNKG+DHndY28/bwj7mAHgAtuTmHpdGfJa4/9zJRN6KDJQrtL+7O9wF # eZTzMV7RpEZzVTyueUhOExaAy1zlap2QW5f8MLOQFGMrxD8oQYEprMkp2jcv4rst # cVO2ayALcyrzWc9FdYwq4XpqL9rdG0LxqM6YQpCz77vBWlSpOseyyqfBX79r6eKk # S2oNFYLgHBJDjOrsKoaHOmpej42/x4B2yktCmwzQdkNCtyBs/lqWfD+xr3yyiy2J # h2p1+kzEpghHdIjqEG0sMW2NMSimScsCZzGoqYKLJ/LZRfC2BaFgkqxWpZtzku8K # Ch8ITZSxEHHKCH+sF/LofhT4nJEGwUt0vIIZcHvtDxaFb32yris= # =ruaG # -----END PGP SIGNATURE----- # gpg: Signature made Wed 30 Aug 2023 10:18:30 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-08-30mm: memory-failure: fix unexpected return value in soft_offline_page()Miaohe Lin
[ Upstream commit e2c1ab070fdc81010ec44634838d24fce9ff9e53 ] When page_handle_poison() fails to handle the hugepage or free page in retry path, soft_offline_page() will return 0 while -EBUSY is expected in this case. Consequently the user will think soft_offline_page succeeds while it in fact failed. So the user will not try again later in this case. Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm: memory-failure: kill soft_offline_free_page()Kefeng Wang
[ Upstream commit 7adb45887c8af88985c335b53d253654e9d2dd16 ] Open-code the page_handle_poison() into soft_offline_page() and kill unneeded soft_offline_free_page(). Link: https://lkml.kernel.org/r/20220819033402.156519-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm: add a call to flush_cache_vmap() in vmap_pfn()Alexandre Ghiti
commit a50420c79731fc5cf27ad43719c1091e842a2606 upstream. flush_cache_vmap() must be called after new vmalloc mappings are installed in the page table in order to allow architectures to make sure the new mapping is visible. It could lead to a panic since on some architectures (like powerpc), the page table walker could see the wrong pte value and trigger a spurious page fault that can not be resolved (see commit f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")). But actually the patch is aiming at riscv: the riscv specification allows the caching of invalid entries in the TLB, and since we recently removed the vmalloc page fault handling, we now need to emit a tlb shootdown whenever a new vmalloc mapping is emitted (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/). That's a temporary solution, there are ways to avoid that :) Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function") Reported-by: Dylan Jhong <dylan@andestech.com> Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Dylan Jhong <dylan@andestech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-26Merge tag 'v5.15.121' into v5.15/standard/baseBruce Ashfield
This is the 5.15.121 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmS9E3sACgkQONu9yGCS # aT573w//WOG9AJV335EzZh9gjjxGg6nbJZZzLAbvC7XgY+ECVPfajaDAMMP7nGM5 # s2CQsbVFp22Rm73r8DVZmGJqAZzATSO1b09yUQZKcN/wWYqQYVzqqSp+vJfTY2zQ # TcNsd2+8AFAZm2e5GBS3HFCBc+I3VqOVyuDBNvf5T+EGZNoJ8mNhVBkpnWudAPwc # ALapsdAov0iv7Rv2pMcroSIKGk/VhERsbzEUV4xRvPH2UqmVVMASrZwWM4DKs1+t # GuePKKloR60Tm+e6ZvVCjdXJlLcgRd4+o9RY9TCdKonsa9xuv0l3p5FwpjhdGIVc # tw6LtiafMhQ2WWybvusMnSaNGhLJuPg2FM95PfUtarg0COflljrssz0mW+zGJpwP # P1f5iKUFZmGQ3Is9ddrO4JoQQokgxDQ/ojhmoRv3tjnG+gMPadQi8wBuasipVt3u # ho2Y7+U6wKfFOPIcUFS4qPMrjCvw28OdAIG6aF5vdd3SIiKejFuO/rMBCx9tHuLJ # x6wPua6Xtmt0WSFlt8J8mcvnQOSj8gK4EIDfJlcXvNsmA0oHvOv+uEHSuxfxZaIq # c23EPGjG+YXmPBJPmdV8WFzVaQc3xYgsg3gMVj99Zx0yTmT2viZmsJkWI/Uz/IKc # G1HEJbuNniuyL4l4nZnNDr44UbA/i38IRy8+Ol8ATToGvDLSUvs= # =XbDI # -----END PGP SIGNATURE----- # gpg: Signature made Sun 23 Jul 2023 07:48:11 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-07-23mm/damon/ops-common: atomically test and clear young on ptes and pmdsRyan Roberts
commit c11d34fa139e4b0fb4249a30f37b178353533fa1 upstream. It is racy to non-atomically read a pte, then clear the young bit, then write it back as this could discard dirty information. Further, it is bad practice to directly set a pte entry within a table. Instead clearing young must go through the arch-provided helper, ptep_test_and_clear_young() to ensure it is modified atomically and to give the arch code visibility and allow it to check (and potentially modify) the operation. Link: https://lkml.kernel.org/r/20230602092949.545577-3-ryan.roberts@arm.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces"). Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfsRoberto Sassu
commit 36ce9d76b0a93bae799e27e4f5ac35478c676592 upstream. As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-06Merge tag 'v5.15.120' into v5.15/standard/baseBruce Ashfield
This is the 5.15.120 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmSlp3wACgkQONu9yGCS # aT4HzRAAzu6rQcCd28wOwWwn2wziG3UjMpDc9YCYxAGI7LOAry8JvdNSP12PSgAE # MzB8VGAEwzNVjmbt9wbMtEJDt+cgw6TS1+ObQM21Yc4dVXjZOOHhN/idiKKlS9H+ # MVKg37Nln566G/HdBeJA0ORcwTDxLuhvdVRyGzbEPJBl4PSnm3wz4/oo2y+j9v1i # t/w6nk6Sy5Ku3IZkIZss/dGzCh/R5RgW33XKsJ4dryl2rvlohvMQL/2T1ztys6++ # MGw2yboU5lt1aHY3GeR4X4OHXQ3Ii8NbH2ouWDP+Le97br2CjDwlkRwaOOffayyh # bYkBRGj8ovCksDTFnNiWx2KRIUO4htZ9PykqiKpAvv1Ali8DKsVezsvYAObw4hpu # iQtwAtG4bGcM3gm/qMafaR97Icz8BgYFCVcZx9WdA1mKmTHobSvqVBeFknmMgxed # YgnrmsUE2vBa2tVptoWb0fezhQYPd9DBZsMoodec97hYBE2miRCJxcsvyJ4rThh7 # h/MI13eJGVKiIEBGCIevo1T0vPu8ciSTCTc4lWJ3fvD6GAbBcaTdYdN2FZiIR/9g # AGV115873anDjjMX/FcD65urMIZOU69GR5iPA/XMOyfAzF8qr33reZO6rD6d3VnN # K7PwgTuUAWXDu5hGcwO47cRAbyNyeiOgv8ssjzj6JRFY9mIi9HE= # =eK94 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 05 Jul 2023 01:25:16 PM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-07-05mm, hwpoison: when copy-on-write hits poison, take page offlineJane Chu
From: Tony Luck <tony.luck@intel.com> commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream. Cannot call memory_failure() directly from the fault handler because mmap_lock (and others) are held. It is important, but not urgent, to mark the source page as h/w poisoned and unmap it from other tasks. Use memory_failure_queue() to request a call to memory_failure() for the page with the error. Also provide a stub version for CONFIG_MEMORY_FAILURE=n Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Due to missing commits e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage") 5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter") The impact of e591ef7d96d6e is its introduction of an additional flag in __get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned hugetlb page should have its migratable bit cleared. The impact of 5033091de814a is contexual. Resolve by ignoring both missing commits. - jane] Signed-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05mm, hwpoison: try to recover from copy-on write faultsTony Luck
commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream. Patch series "Copy-on-write poison recovery", v3. Part 1 deals with the process that triggered the copy on write fault with a store to a shared read-only page. That process is send a SIGBUS with the usual machine check decoration to specify the virtual address of the lost page, together with the scope. Part 2 sets up to asynchronously take the page with the uncorrected error offline to prevent additional machine check faults. H/t to Miaohe Lin <linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for pointing me to the existing function to queue a call to memory_failure(). On x86 there is some duplicate reporting (because the error is also signalled by the memory controller as well as by the core that triggered the machine check). Console logs look like this: This patch (of 2): If the kernel is copying a page as the result of a copy-on-write fault and runs into an uncorrectable error, Linux will crash because it does not have recovery code for this case where poison is consumed by the kernel. It is easy to set up a test case. Just inject an error into a private page, fork(2), and have the child process write to the page. I wrapped that neatly into a test at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git just enable ACPI error injection and run: # ./einj_mem-uc -f copy-on-write Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel() on architectures where that is available (currently x86 and powerpc). When an error is detected during the page copy, return VM_FAULT_HWPOISON to caller of wp_page_copy(). This propagates up the call stack. Both x86 and powerpc have code in their fault handler to deal with this code by sending a SIGBUS to the application. Note that this patch avoids a system crash and signals the process that triggered the copy-on-write action. It does not take any action for the memory error that is still in the shared page. To handle that a call to memory_failure() is needed. But this cannot be done from wp_page_copy() because it holds mmap_lock(). Perhaps the architecture fault handlers can deal with this loose end in a subsequent patch? On Intel/x86 this loose end will often be handled automatically because the memory controller provides an additional notification of the h/w poison in memory, the handler for this will call memory_failure(). This isn't a 100% solution. If there are multiple errors, not all may be logged in this way. Cc: <stable@vger.kernel.org> [tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin] Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Due to missing commits c89357e27f20d ("mm: support GUP-triggered unsharing of anonymous pages") 662ce1dc9caf4 ("delayacct: track delays from write-protect copy") b073d7f8aee4e ("mm: kmsan: maintain KMSAN metadata for page operations") The impact of c89357e27f20d is a name change from cow_user_page() to __wp_page_copy_user(). The impact of 662ce1dc9caf4 is the introduction of a new feature of tracking write-protect copy in delayacct. The impact of b073d7f8aee4e is an introduction of KASAN feature. None of these commits establishes meaningful dependency, hence resolve by ignoring them. - jane] Signed-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>