aboutsummaryrefslogtreecommitdiffstats
path: root/mm
AgeCommit message (Collapse)Author
2024-05-13Merge tag 'v5.10.215' into v5.10/standard/baseBruce Ashfield
This is the 5.10.215 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYaZdgACgkQONu9yGCS # aT4oMxAA0pATFAq8RN5f9CmYlMg5HqHgzZ8lJv8P0/reOINhUa+F5sJb1n+x+Ch4 # WQbmiFeZRzfsKZ2qKhIdNR0Lg+9JOr/DtYXdSBZ6InfSWrTAIrQ9fjl5Warkmcgg # O4WbgF5BVgU3vGFATgxLvnUZwhR1D7WK93oMDunzrT7+OqyncU3f1Uj53ZAu9030 # z18UNqnTxDLYH/CMGwAeRkaZqBev9gZ1HdgQWA27SVLqWQwZq0al81Cmlo+ECVmk # 5dF6V2pid4qfKGJjDDfx1NS0PVnoP68iK4By1SXyoFV9VBiSwp77nUUyDr7YsHsT # u8GpZHr9jZvSO5/xtKv20NPLejTPCRKc06CbkwpikDRtGOocBL8em0GuVqlf8hMs # KwDb6ZEzYhXZGPJHbJM+aRD1tq/KHw9X7TrldOszMQPr6lubBtscPbg1FCg3OlcC # HUrtub0i275x7TH0dJeRTD8TRE9jRmF+tl7KQytEJM3JRrquFjLyhDj+/VJnZkiB # lzj3FRf4zshzgz4+CAeqXO/8Lu8b3fGYmcW1acCmk7emjDcXUKojPj/Aig6T4l7P # oCWDY3+w1E6eiyE8BazxY1KUa/41ld0VJnlW5JWGRaDFTJwrk0h6/rvf9qImSckw # IGx24UezRyp6NS1op3Qm2iwHLr41pFRfKxNm9ppgH9iBPzOhe38= # =pkLL # -----END PGP SIGNATURE----- # gpg: Signature made Sat 13 Apr 2024 07:00:40 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-05-13Merge tag 'v5.10.213' into v5.10/standard/baseBruce Ashfield
Linux 5.10.213 # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0mzQACgkQ3qZv95d3 # LNzcdBAAigKyWPiCYJn6oBg5mChWFopyZpLx0LKA7Svf6whFPO9jJBmmHLZ3A06r # hctOwavg4KEAyjGNhd2cUSvJ8LcfmWOfSjQ2UogV7L4lYvy0NU3hTxvaYvjX0fcG # W0pXzsqHUFHjvkq0qU6ma71PvwfZ0IPWV179iMSOiDbjRDVctDgV3PcvuSJvoMgN # mnP5egJ31OwN80EWrjiGx5QB8DKP2ew6fPEZO9OwtVjKYnP0MZ4F2WLPhLVTeEqU # iIynX6Ncua1fH8q2q0qM4WAxUPdoTI2mGWixjWJDAxagfJpK8PUUNuCEiuZfu2VG # d4JkQXddtvPrUUdhb0194Ywd2yBdyf89kxrTn+Ki9smzgXToxJWm5q6eFHrBOB/x # 3XYfv/Ijd5nrb0DR00QiS2sBiVKhOJRSFJKUr0NoaBaXWNlTAf/VJMLwcToUl+nC # OshS+pwdRfL/PevCkL3S80zoccU8EUlHEVimXnpLeD5UM7wOk4Lv4qsudHj2bV0/ # WqHldGEBJgqSVuWF+1FhVWcwV3zPfckK72HmpHi3eF27la3PhNIvbLfMg1PH1btb # RCg9tblI8XJAiiIfNcl+uUDYViiBIHgskI7m4Kfe2nOeFj9felw4rMiOxbT8hVG7 # iKMv5jTwMPeFEy+i+V4v3ZKOAMz6Ql+VW1Yj7yr9bSM7iLu4cJI= # =OBDX # -----END PGP SIGNATURE----- # gpg: Signature made Fri 15 Mar 2024 03:02:12 PM EDT # gpg: using RSA key E27E5D8A3403A2EF66873BBCDEA66FF797772CDC # gpg: Can't check signature: No public key
2024-05-13Merge tag 'v5.10.211' into v5.10/standard/baseBruce Ashfield
This is the 5.10.211 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXhxzkACgkQONu9yGCS # aT4DvBAAqUYqPc5hcVwlnlgENJtD1zuGbojcq0ziyR4kThlzQVfNlIs62yYbaJEr # Zmmm80fcdXYrrzkZrFfP+PWovehCcgeLDrrX95JkC8rRfT6UnJeT2K1POD8U2X3Z # peeqMt+gktDhRwlC2tVNaXHHCJTVjBl0d37lMu+i0If2ulloXAigbXtUgOJAwbnD # cbf92Tz1YT9VMimvFPSaFv5Rntlih9betZPtz2wcToDYoRTtNhbouAmIhN6E03Ji # um/nzIQMw4RHWqfw26XdKMzE47u2EZLFqkNoH1fzErvEjt2vY7QK/CXjkadXtzEL # l2OQnzYL5rhkRwazyMivs6IM3p6yCtiYC5ySVS3Hwvx0Hv/TUNmlUxM9ycSqOAnm # 9/T8dzztw3JOJGvy/mZ6W5rlatIsR/XYvpBqNrGxZwU4wWf4Dv1snQcbE/eSoXmt # 8xCXFmp0b5DSnXHoqr+1HWTl9iuUWpgrjdqYU6FOngknTKGhWSQUNwbm9fGymGnl # hdPNxS+HSmllLTC/iMkbCXZhYsTiErv4HokyPDDPR9bMzZJNl5aXXBjBJaJibIWC # oEGjDI1yQ1FiDdHX4DM4JgtcEVZcsgkhDwwR2u+/S6kktvnR4KtE0a1Qs42DfPJO # EQneeW8vltxCAVEYIpmYPB16LpQYOPto7h167SIoOZhyg46JGQ8= # =jXCp # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Mar 2024 07:16:57 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-04-13x86/mm/pat: fix VM_PAT handling in COW mappingsDavid Hildenbrand
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream. PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings. Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range(). In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory. To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that. We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size. For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios. Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn(): <--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h> int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map; ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned); /* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; } /* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer ---> On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ... Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Wupeng Ma <mawupeng1@huawei.com> Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL ↵Vlastimil Babka
allocations commit 803de9000f334b771afacb6ff3e78622916668b0 upstream. Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sven van Ashbrook <svenva@chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Curtis Malainey <cujomalainey@chromium.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13mm/migrate: set swap entry values of THP tail pages properly.Zi Yan
The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption. This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory: work on folio->swap instead of page->private when splitting folio"), subpages of a swapcached THP no longer requires the maintenance. Adding THPs to the swapcache was introduced in commit 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"), where each subpage of a THP added to the swapcache had its own swapcache entry and required the ->private field to point to the correct swapcache entry. Later, when THP migration functionality was implemented in commit 616b8371539a6 ("mm: thp: enable thp migration in generic path"), it initially did not handle the subpages of swapcached THPs, failing to update their ->private fields or replace the subpage pointers in the swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration for shmem thp") addressed the swapcache update aspect. This patch fixes the update of subpage ->private fields. Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_charante@quicinc.com/ Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13mm/memory-failure: fix an incorrect use of tail pagesLiu Shixin
When backport commit c79c5a0a00a9 to 5.10-stable, there is a mistake change. The head page instead of tail page should be passed to try_to_unmap(), otherwise unmap will failed as follows. Memory failure: 0x121c10: failed to unmap page (mapcount=1) Memory failure: 0x121c10: recovery action for unmapping failed page: Ignored Fixes: 70168fdc743b ("mm/memory-failure: check the mapcount of the precise page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13memtest: use {READ,WRITE}_ONCE in memory scanningQiang Zhang
[ Upstream commit 82634d7e24271698e50a3ec811e5f50de790a65f ] memtest failed to find bad memory when compiled with clang. So use {WRITE,READ}_ONCE to access memory to avoid compiler over optimization. Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13mm: swap: fix race between free_swap_and_cache() and swapoff()Ryan Roberts
[ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ] There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-15mm/hugetlb: change hugetlb_reserve_pages() to type boolMike Kravetz
[ Upstream commit 33b8f84a4ee78491a8f4f9e4c5520c9da4a10983 ] While reviewing a bug in hugetlb_reserve_pages, it was noticed that all callers ignore the return value. Any failure is considered an ENOMEM error by the callers. Change the function to be of type bool. The function will return true if the reservation was successful, false otherwise. Callers currently assume a zero return code indicates success. Change the callers to look for true to indicate success. No functional change, only code cleanup. Link: https://lkml.kernel.org/r/20201221192542.15732-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: e656c7a9e596 ("mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlbLokesh Gidra
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream. In mfill_atomic_hugetlb(), mmap_changing isn't being checked again if we drop mmap_lock and reacquire it. When the lock is not held, mmap_changing could have been incremented. This is also inconsistent with the behavior in mfill_atomic(). Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-26Merge tag 'v5.10.210' into v5.10/standard/baseBruce Ashfield
This is the 5.10.210 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXYTLkACgkQONu9yGCS # aT4+fhAAqqR/Cvx53ZKMQ8GZTCudAZnr/Dz6kWYwxhhhIbQjDpCaf9mgsrEDaQS2 # ancSZjzYaOUIXq/IsthXxQIUhiZbuM3iuSEi7+odWgSYdkFyzuUt8MWLBGSaB5Er # ojn+APtq7vPXTSnp7uMwqMC3/BHCKkeYIjRVevhhHBKG5d3lzkV1xU8NcvMkLaly # CIRxpWXD3w2b7K0GEbb/zN1GQEHDCQcxjuaJoe/5FKGJkqd3T31eyiJTRumCCMcz # j8vkGkYmcMJpWf04iLgVA1p13I5/HGrXdEBI/GutN8IABIC3Cp42jW8phHYKW5ZM # a4R25LZG5buND1Ubpq+EDrYn3EaPek5XRki0w8ZAXfNa3rYc+N6mQjkzNSOzhJ/5 # VNsn3EAE1Dwtar5Z3ASe9ugDbh+0bgx85PbfaADK88V+qWb3DVr1TBWmDNu2vfVP # rv4I0EKu9r3vOE8aNMEBuhAVkIK3mEQUxwab6RKNrMby/5Uwa+ugrrUtQd8V+T1S # j6r6v7u7aZ8mhYO7d6WSvAKL85lCWGbs3WRIKCJZmDRyqWrWW9tVWRN9wrZ2QnRr # iaCQKk8P474P7/j1zwnmih8l4wS1oszveNziWwd0fi1Nn/WQYM+JKYQvpuQijmQ+ # J9jLyWo7a59zffIE6mzJdNwFy9hlw9X+VnJmExk/Q88Z7Bt5wPQ= # =laYd # -----END PGP SIGNATURE----- # gpg: Signature made Fri 23 Feb 2024 02:43:53 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-02-23mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), againZach O'Keefe
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream. (struct dirty_throttle_control *)->thresh is an unsigned long, but is passed as the u32 divisor argument to div_u64(). On architectures where unsigned long is 64 bytes, the argument will be implicitly truncated. Use div64_u64() instead of div_u64() so that the value used in the "is this a safe division" check is the same as the divisor. Also, remove redundant cast of the numerator to u64, as that should happen implicitly. This would be difficult to exploit in memcg domain, given the ratio-based arithmetic domain_drity_limits() uses, but is much easier in global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32) Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()") Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: Maxim Patlasov <MPatlasov@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23mm/sparsemem: fix race in accessing memory_section->usageCharan Teja Kalla
[ Upstream commit 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 ] The below race is observed on a PFN which falls into the device memory region with the system memory configuration where PFN's are such that [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and end pfn contains the device memory PFN's as well, the compaction triggered will try on the device memory PFN's too though they end up in NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections). When from other core, the section mappings are being removed for the ZONE_DEVICE region, that the PFN in question belongs to, on which compaction is currently being operated is resulting into the kernel crash with CONFIG_SPASEMEM_VMEMAP enabled. The crash logs can be seen at [1]. compact_zone() memunmap_pages ------------- --------------- __pageblock_pfn_to_page ...... (a)pfn_valid(): valid_section()//return true (b)__remove_pages()-> sparse_remove_section()-> section_deactivate(): [Free the array ms->usage and set ms->usage = NULL] pfn_section_valid() [Access ms->usage which is NULL] NOTE: From the above it can be said that the race is reduced to between the pfn_valid()/pfn_section_valid() and the section deactivate with SPASEMEM_VMEMAP enabled. The commit b943f045a9af("mm/sparse: fix kernel crash with pfn_section_valid check") tried to address the same problem by clearing the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns false thus ms->usage is not accessed. Fix this issue by the below steps: a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage. b) RCU protected read side critical section will either return NULL when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage. c) Free the ->usage with kfree_rcu() and set ms->usage = NULL. No attempt will be made to access ->usage after this as the SECTION_HAS_MEM_MAP is cleared thus valid_section() return false. Thanks to David/Pavan for their inputs on this patch. [1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@quicinc.com/ On Snapdragon SoC, with the mentioned memory configuration of PFN's as [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of issues daily while testing on a device farm. For this particular issue below is the log. Though the below log is not directly pointing to the pfn_section_valid(){ ms->usage;}, when we loaded this dump on T32 lauterbach tool, it is pointing. [ 540.578056] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 540.578068] Mem abort info: [ 540.578070] ESR = 0x0000000096000005 [ 540.578073] EC = 0x25: DABT (current EL), IL = 32 bits [ 540.578077] SET = 0, FnV = 0 [ 540.578080] EA = 0, S1PTW = 0 [ 540.578082] FSC = 0x05: level 1 translation fault [ 540.578085] Data abort info: [ 540.578086] ISV = 0, ISS = 0x00000005 [ 540.578088] CM = 0, WnR = 0 [ 540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--) [ 540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c [ 540.579454] lr : compact_zone+0x994/0x1058 [ 540.579460] sp : ffffffc03579b510 [ 540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c [ 540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640 [ 540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000 [ 540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140 [ 540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff [ 540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001 [ 540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440 [ 540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4 [ 540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001 [ 540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800 [ 540.579524] Call trace: [ 540.579527] __pageblock_pfn_to_page+0x6c/0x14c [ 540.579533] compact_zone+0x994/0x1058 [ 540.579536] try_to_compact_pages+0x128/0x378 [ 540.579540] __alloc_pages_direct_compact+0x80/0x2b0 [ 540.579544] __alloc_pages_slowpath+0x5c0/0xe10 [ 540.579547] __alloc_pages+0x250/0x2d0 [ 540.579550] __iommu_dma_alloc_noncontiguous+0x13c/0x3fc [ 540.579561] iommu_dma_alloc+0xa0/0x320 [ 540.579565] dma_alloc_attrs+0xd4/0x108 [quic_charante@quicinc.com: use kfree_rcu() in place of synchronize_rcu(), per David] Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@quicinc.com Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@quicinc.com Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot") Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23mm: vmalloc: introduce array allocation functionsPaolo Bonzini
commit a8749a35c39903120ec421ef2525acc8e0daa55c upstream. Linux has dozens of occurrences of vmalloc(array_size()) and vzalloc(array_size()). Allow to simplify the code by providing vmalloc_array and vcalloc, as well as the underscored variants that let the caller specify the GFP flags. Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alexander Ofitserov <oficerovas@altlinux.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-20Merge tag 'v5.10.208' into v5.10/standard/baseBruce Ashfield
This is the 5.10.208 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWlb9wACgkQONu9yGCS # aT7xVBAAxWKfeQYqCxJbBVpyriOLnbzrNeM4fi0TCyX4J/UsMrpnMP101eFZmyJN # L2W4rzWcRHoogWSdK1SvjRZuBi2bUUj0NVbfsoLZPJvJ5yj8gg4Io/ejhXStvXcH # o0aKVR9pBfLu4NWWNZ0gOJQ5xK+ndHp+naylV6kA6WYyUBr+q8BSTQJcEDM4gAkl # eRdshQIbQK1JUPmgA6Grv/FsaoPsfU+BxriWfRMEiYrFjLDCHkr9YJ3RlIwYVJDU # +7CG6AzLXk3qNczk/4/9Vx67agDpeXT0kCiAPB0Y8GPH+4bNtE6FIzRTNNfHbPOV # Q8U18dToawIvOXUZaenX84Dg8XKS9vUVD/KxBqUTvLwlaUDVZr9XNtOkdCQcnwro # qTMxFE7HttpMJSSHtZp7xLyOwn8i74vgcfYXngZlawZJzAHVqgCDk4105PylSetZ # BTBxn990H9TxDO0weaiDLKMPbXVusxxbFY4hBVxVQdmfwWrJ7yYVb2IAF4kNy7dF # dRVoPDOC5Qf840REJgwzKJysZ4l2Ejcdicxi6zG+7SJH2jKYrcCW4AloALODRVHz # uf8BOgvD2W9lOTXkJy+LziNvDvVqe5B1l2E/WAIXI6jmo/abkmSKJpPWtTcF7xYs # CU7+3dny8BMu7zAIXjgnJmVRiGGMudi/steXyE0KV5X9hOdLQJo= # =F/LV # -----END PGP SIGNATURE----- # gpg: Signature made Mon 15 Jan 2024 12:48:12 PM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-01-15mm: fix unmap_mapping_range high bits shift bugJiajun Xie
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream. The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-15mm/memory-failure: check the mapcount of the precise pageMatthew Wilcox (Oracle)
[ Upstream commit c79c5a0a00a9457718056b588f312baadf44e471 ] A process may map only some of the pages in a folio, and might be missed if it maps the poisoned page but not the head page. Or it might be unnecessarily hit if it maps the head page, but not the poisoned page. Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org Fixes: 7af446a841a2 ("HWPOISON, hugetlb: enable error handling path for hugepage") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03Merge tag 'v5.10.202' into v5.10/standard/baseBruce Ashfield
This is the 5.10.202 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVmG20ACgkQONu9yGCS # aT6dzg/7BnCP2SpVmgEaD7FdPvGO/A6O5VrC9zu3sQE6g2gAwirZhdgE8NRn+ggm # WSQ1kIA+HEcY23FKpq46pBED4P1irudiW7DkLw8nyOGp+XLb4wGkF5lBBP5z+B2P # ga2RgwqKvYWeDaUW4n1Uy7m2Cz+wqCg/EvnITo40glSWPh20gM532/CSnA5akoje # 9mjZYZ0rKHKTZGu65aNScNR7XnXHIivJU6C1jF6L9N1+Xn679nUHKQP4KM/RcjpX # g1WQMWFC3mGIn5IX28W1wvKS320D5HLmTLnLqJvFpJN9+13DUnUoXcX469zvQoxJ # GL3S94goWN/0BPOgr5KcKvTj00b4O+EWhQuQt+x8NLdydzRQuyFu2UpLNhIKKSou # sT+BcxzeuqJhEh1tZItcZkZBptpLEkb0ezT11u5McnU5FjPzzzP8CtEetKKmEaBU # AUoEP/lQQlVyk1I6xAeuzu53smncNQt6CqnXJxYXOBGgJ2txAM5kroMKXPin5C8k # BCpUIqghhKmBd1hwuKyaOBKF99eLKKZsuvXppoPD0Yz7/Nq5TgdBw0qbNt2iLr05 # XSM7WIIeCBROaV+ZiVxgtcXDR51FpMr7CLTbkBQ6IgLwircHeHSK7rQn7kFO3fCg # OezhWAuh72qDZ2PCJ84fj21IhZ49a5oCLbUdBew+KzZervVpSo0= # =eW67 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 28 Nov 2023 11:55:09 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-12-03Merge tag 'v5.10.201' into v5.10/standard/baseBruce Ashfield
This is the 5.10.201 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVbL8cACgkQONu9yGCS # aT4jug//XkUEdRFY18syEy8+idsSSjsnPpLXRTCN70DKsjbPynZjzmp7MQIROhG9 # mfowIsnJSjFB0+umb9rFkXRCQW9xJ3Cci9F5viFGps2uVcLRsAMiXVXjJ92Olw22 # WZpMD/6iujoiOV2efdswOE+dKK5I1EAM7VnmM/Q4pTYvYPwiW63zWuj6gcxHZ0Z8 # McHkgx+rd+kNugI8D+TEpmdciXj7+VyD3H3Vjclm5UV9QtJ6kXak1MToOCZiyOJr # +OuJvI20XLDPNu2HyKiQu97WeXxuOOK2nWFuFnajn5OWYtpD79N1Ibqx16uJUa4U # B+LB6zysfKPNsIdKJs3Y2eWgyYMTHLu+P0kxNKfkO9GjmTg8PQcJ3g+5gx70qmBM # ZtHWeKBJcBmqolHYIgL01JzUex65knMlJguxd8rbD0QCYrp1zYDAITvV0YMomPqg # W9iI/Nf5wrhP7VvU/8FZ17aoqoMjnJpCJSjS6BIo1kLgcwk+pNZyDtoPryNqYLps # 3FpZkFNEZZOxrS2BgVkgEy2OLEwLfv8N975a9qg/UgBbq/g5iLJR9nUFJUeUDbB2 # 7NH5H8euAR0sWR8+0jgxwIZyQC03lX49SpcuhFwhBoQ7JWUr29hD8KMlIVlmHsTB # oL7CuKH7VaAbwIuouSb9o7KMHPiTisI2yu6awlddR9uVA+2IR/U= # =zKbE # -----END PGP SIGNATURE----- # gpg: Signature made Mon 20 Nov 2023 05:07:03 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-11-28mm: kmem: drop __GFP_NOFAIL when allocating objcg vectorsRoman Gushchin
commit 24948e3b7b12e0031a6edb4f49bbb9fb2ad1e4e9 upstream. Objcg vectors attached to slab pages to store slab object ownership information are allocated using gfp flags for the original slab allocation. Depending on slab page order and the size of slab objects, objcg vector can take several pages. If the original allocation was done with the __GFP_NOFAIL flag, it triggered a warning in the page allocation code. Indeed, order > 1 pages should not been allocated with the __GFP_NOFAIL flag. Fix this by simply dropping the __GFP_NOFAIL flag when allocating the objcg vector. It effectively allows to skip the accounting of a single slab object under a heavy memory pressure. An alternative would be to implement the mechanism to fallback to order-0 allocations for accounting metadata, which is also not perfect because it will increase performance penalty and memory footprint of the kernel memory accounting under memory pressure. Link: https://lkml.kernel.org/r/ZUp8ZFGxwmCx4ZFr@P9FQF9L96D.corp.robot.car Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: Christoph Lameter <cl@linux.com> Closes: https://lkml.kernel.org/r/6b42243e-f197-600a-5d22-56bd728a5ad8@gentwo.org Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mm/memory_hotplug: use pfn math in place of direct struct page manipulationZi Yan
commit 1640a0ef80f6d572725f5b0330038c18e98ea168 upstream. When dealing with hugetlb pages, manipulating struct page pointers directly can get to wrong struct page, since struct page is not guaranteed to be contiguous on SPARSEMEM without VMEMMAP. Use pfn calculation to handle it properly. Without the fix, a wrong number of page might be skipped. Since skip cannot be negative, scan_movable_page() will end early and might miss a movable page with -ENOENT. This might fail offline_pages(). No bug is reported. The fix comes from code inspection. Link: https://lkml.kernel.org/r/20230913201248.452081-4-zi.yan@sent.com Fixes: eeb0efd071d8 ("mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages") Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mm/cma: use nth_page() in place of direct struct page manipulationZi Yan
commit 2e7cfe5cd5b6b0b98abf57a3074885979e187c1c upstream. Patch series "Use nth_page() in place of direct struct page manipulation", v3. On SPARSEMEM without VMEMMAP, struct page is not guaranteed to be contiguous, since each memory section's memmap might be allocated independently. hugetlb pages can go beyond a memory section size, thus direct struct page manipulation on hugetlb pages/subpages might give wrong struct page. Kernel provides nth_page() to do the manipulation properly. Use that whenever code can see hugetlb pages. This patch (of 5): When dealing with hugetlb pages, manipulating struct page pointers directly can get to wrong struct page, since struct page is not guaranteed to be contiguous on SPARSEMEM without VMEMMAP. Use nth_page() to handle it properly. Without the fix, page_kasan_tag_reset() could reset wrong page tags, causing a wrong kasan result. No related bug is reported. The fix comes from code inspection. Link: https://lkml.kernel.org/r/20230913201248.452081-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20230913201248.452081-2-zi.yan@sent.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20vfs: fix readahead(2) on block devicesReuben Hawkins
[ Upstream commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2 ] Readahead was factored to call generic_fadvise. That refactor added an S_ISREG restriction which broke readahead on block devices. In addition to S_ISREG, this change checks S_ISBLK to fix block device readahead. There is no change in behavior with any file type besides block devices in this change. Fixes: 3d8f7615319b ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED") Signed-off-by: Reuben Hawkins <reubenhwk@gmail.com> Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-17Merge tag 'v5.10.200' into v5.10/standard/baseBruce Ashfield
This is the 5.10.200 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVLt8IACgkQONu9yGCS # aT5SXw//ZJPwvVxnv21ldEE7vxdLSuSz/Ond0kV6A74ypBbZDsvqwvRbP0aCMoTx # zJOARfQIyBMeMrbD2zKvJrhStMxJaFbzMMr2isMDdY0USRPTWkNu5Y18ZOnT3sgu # 099NOT2pEGNF5IlUxJPS64YTBelf8aD/Vv04AgbpPZ0Zb8n1Xmgg4jP0h5O7ERRU # tevkdHg7xVrfV3dlI14sQbP4b/CHmS8sRFesUAf3qJ+p/7MwVeCk4u6GzlYnLEF6 # O/2K3hmUcvg0d1J9a7ESNohBrdHJXUpgjW+/hfDuMZ2XAW+2DQGT//aCdrilUbGU # sWF8G/NKuDwd6bPmm8+u0ZJfwKHk3PAepQxCdQWlAkKeygx3RDjFYDwBjPiqSeJg # fr1obUGLcwWf7CBrcI2d++oZX+R5Jw7W+c32xfRfHPlX5YFqwtayoKs/z7vhhqox # VRLwCvgbj/WDSF9/H/4yR+pA7PmYLJiCTLWEDq3eEY9sMEYxWytfdU2D4UcHZ7gM # iRxlDOkFLjjlYFPo6Nd004YAQW+K2tMJnaGHuv2Lk2Edt3iJLJcUE2+bRY+n2nsc # DIZ+uZ97/kTRNomyG/XXMuWiJ+2HHgYBnYWDbEBggawhzQPJ/U6KIvMg44Rjn6e7 # 89fv/HxuwcRvNSuxgAs5IL/w9KdwTbmcgCnAn/8wURE3/wH0ybA= # =SpN0 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 08 Nov 2023 11:30:58 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-11-17Merge tag 'v5.10.199' into v5.10/standard/baseBruce Ashfield
This is the 5.10.199 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmU45eIACgkQONu9yGCS # aT6OGw//TD8VOR/VIUsdCH4keamvfoOjW7IdUMI6WWrXGw4TBQhOb1S5OFmUXLIW # 1TQKvnJSpkukW9oQXChEPiVm9LMXq0dsWOaI11I23HmAzenZQ+cdHLFa8Rod3DeJ # t17qefmsZxvI3U5nXJiCYRlUcqWF8rgpYR8NaJass8xOOGKEDk9JMXy1hvCG1N8C # 1Zvth4wJmaDvJxSLHCL0gZkYBQBddePtrHwxWqLJ9vwUJEHGWf6AlwaASFUtRMut # am2sWYx7aDKQT4w6B4MEJfA3bcTbLAglZG5s85ENhYYAMYW+cX/YtQH182KcpRTx # mRmDc2vk1tJsSAuuE94OAAESjCdWF7V5SUkk/GLawnRiR7NeOax7vvS634uPtN+g # LdTOlWMlcum46LmrJd6pu7oLXyZHGrr0/cBPewwYTlcRsmSS+WADUfH1yZL14lDC # Nf8JASLIj68jrxnTn1lWGtShn8unNV9ZVauA8krsXJzvgjYNpaQSRhxOnltc+Zuy # GFC0oipwgbzM8Y3lSPfF8rwBA85tmvF397oBM5c4uzZ+ULn7XWPJG+wIYtk7R9N7 # 57rKAKyu+s3hHEUCyF7Z/HF7pHiL6vg4hQzhgIKqDMYkZmyHYV3iIAy5j5jvCkvD # 8zjBiV3iBC9PYzNYghVRVm5LjRwlXvqSpy88YwgkB1iD+5rZ3RQ= # =PvGf # -----END PGP SIGNATURE----- # gpg: Signature made Wed 25 Oct 2023 05:54:42 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-11-08kasan: print the original fault addr when access invalid shadowHaibo Li
commit babddbfb7d7d70ae7f10fedd75a45d8ad75fdddf upstream. when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004eea0fc8("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004eea0fc8("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08mm/page_alloc: correct start page when guard page debug is enabledKemeng Shi
commit 61e21cf2d2c3cc5e60e8d0a62a77e250fccda62c upstream. When guard page debug is enabled and set_page_guard returns success, we miss to forward page to point to start of next split range and we will do split unexpectedly in page range without target page. Move start page update before set_page_guard to fix this. As we split to wrong target page, then splited pages are not able to merge back to original order when target page is put back and splited pages except target page is not usable. To be specific: Consider target page is the third page in buddy page with order 2. | buddy-2 | Page | Target | Page | After break down to target page, we will only set first page to Guard because of bug. | Guard | Page | Target | Page | When we try put_page_back_buddy with target page, the buddy page of target if neither guard nor buddy, Then it's not able to construct original page with order 2 | Guard | Page | buddy-0 | Page | All pages except target page is not in free list and is not usable. Link: https://lkml.kernel.org/r/20230927094401.68205-1-shikemeng@huaweicloud.com Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25mm/memory_hotplug: rate limit page migration warningsLiam Mark
commit 786dee864804f8e851cf0f258df2ccbb4ee03d80 upstream. When offlining memory the system can attempt to migrate a lot of pages, if there are problems with migration this can flood the logs. Printing all the data hogs the CPU and cause some RT threads to run for a long time, which may have some bad consequences. Rate limit the page migration warnings in order to avoid this. Link: https://lkml.kernel.org/r/20210505140542.24935-1-georgi.djakov@linaro.org Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <pjy@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-11Merge tag 'v5.10.198' into v5.10/standard/baseBruce Ashfield
This is the 5.10.198 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUlq8kACgkQONu9yGCS # aT5GiA//fiURwpUcawIhvgYewMVp+ovJ+mpX5IT+bMbW9Ur0sBhtiiU+WDNYxMru # 34xbSQ/+o2a6N2tmK1JF7o76e2sHw/aRgaoDHkN5oEG+lbRH7TdCv6O0QRFAthcd # sJL+SX/GclcKW0ZHDjJX9Wt5Lq3gqVYlqJlCsw6gI/1JrQTxStrSQh7yRbrYSqpY # wGWEq19IrE/ToZFTBuPEEvlBswszGrI88lVtjvRzIdczQVyFLAoEQ2GNPWl3XNBh # ygGnwiHjk3a+QhZ30evIv2LX+tlGmpLy7gdLDsdZF7RfEkNHQ92IgaHvFDs8JqDg # QnRE8KCrC2V45OIQRRnA5NVtD3LBYM0bUhbqqLiNvTMiSIBWge4efJwxyYcTTfkX # MTmbo9z/bIVFdpgCQtneRw3eUyfbRKQ1cUvtmkuXIVLzvZUQaVMpXVZ6pz864E54 # 3nJrl2HJtIdJsRX5M4unL+AXNLRoJUbfb4hbzAD0Tg8Wbdgrn7vL/z6JmIzA2ssQ # +R/52ghimOThGTUbCi2pJx/cpKhegkJEJ7+JwUhS9L9ybA93g/bD0n9zy6JXpd/H # Cct0JWbiukbDp1CTLQ6Qm9TK5HANW2fXMHoR3H5ltPojNZwN7/pgYqN6ppjtBKVe # gA3k8KYkZoXjbF6VS1B5Y83wJ+H+39Luk/DSmm1ZNvYPHxmz+q0= # =2MQy # -----END PGP SIGNATURE----- # gpg: Signature made Tue 10 Oct 2023 03:53:45 PM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-10-10media: vb2: frame_vector.c: replace WARN_ONCE with a commentHans Verkuil
[ Upstream commit 735de5caf79e06cc9fb96b1b4f4974674ae3e917 ] The WARN_ONCE was issued also in cases that had nothing to do with VM_IO (e.g. if the start address was just a random value and uaccess fails with -EFAULT). There are no reports of WARN_ONCE being issued for actual VM_IO cases, so just drop it and instead add a note to the comment before the function. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reviewed-by: David Hildenbrand <david@redhat.com> Reported-by: Yikebaer Aizezi <yikebaer61@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-26Merge tag 'v5.10.197' into v5.10/standard/baseBruce Ashfield
This is the 5.10.197 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUOqXUACgkQONu9yGCS # aT61cQ//eD8Xrytp1SrUZNpXeCF3d6Ih6wk7it7jJa9rSuFJa/yq3RG1Eni19FNs # eZodcbmdNz6H27vonI7XDVemlkWWw0g/6JTZAwZho93Nnolz4qS88UqWWLmKYcs5 # rpAkVFgMFcS4HnL/YG6DY3GP6lsbhyummySFnfzXncfJbX3GN/mRk/YzCvziLjr5 # +cwwMUW+kkd5VUSssxrLS0VHSRW25SL+JLGRweQwePlMm8H+WTFpQSPeU5mu2gPM # b4kFtAKAk2gzH1JzNJSz6mTM3dtAVFcu+3wAY+NSb4VfSB409rB0YI7ne1Ia6mC9 # TQ3qnfo05JYj5HiglE3QMuamfVuPt2mUwpo7I278fVBwbZo9cDXruVZ9099dVzAc # B0O+xVvGwC6h6rwIKMfh9mzPtULO1CrHlxEVzSLO2szNMAiQk2+hMZDIKbvpBd59 # D08TFP4C4ffCT1K0mWDDhJpp7t37kMumZcB6JiNjYR0YUeDRMhFd4L8oHV5ENfdP # FvIEwCOl/D69DsgmKq1jV33kspqoUfxyxs6MjJ44/0aQyKk1EoemMxkgrAwBDasn # P3KNmShfR7XIgqUz1OX27PAnMDMLZj2DiSSbTyWEUOmfaR/TbaZ04fkHQxWfPfFX # RIa2SFBop87mBfl8H0rhbic0ggQ4c8Io0xiks62pRvXdbcYfNZs= # =Pws3 # -----END PGP SIGNATURE----- # gpg: Signature made Sat 23 Sep 2023 05:01:41 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-09-26Merge tag 'v5.10.195' into v5.10/standard/baseBruce Ashfield
This is the 5.10.195 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUJdfMACgkQONu9yGCS # aT7i/w//Wbvt3F9hF/9Rmg9A4J23OWl2o07Z8Fi0a4F4B0FJjuQGSPRvpSvKtIWv # +7taCzOw9+Qi52hTR7BK+QpLpEPgMbv1IdgyPu1gtjL4WHuKk1aOeafISYuQDgeZ # XSFoV1EGjxkg3wbMZkucnmQVitGxC/iV0ojvxKleiIE9UNzceQclGmmBL0FwmEYp # c91XKEACZ5K/spSyyxocP4Fw6mbk98ISiju+74op5EDFry9qnIYa2pU/au3gZvh/ # TScOYOQsBojOFTy/wuEfpOiVBK9gLFq8du0J/gHS2aUqswkp/qFcpH7wbS5Po3+l # Ja9a76o2B4btMCz6UhyhwzB+0QTQ1Gdea35FHRbF3d4ssNJDqDtwBCHqd3zeMUYo # uTDhyTsSGV40Gm9A5Sojyzjgj4X12rQ0ffL+zcXfXe60flE8SNIxR8DiIXPlAsC+ # pgNQ5l/HcdJE1abRoTkvpsptaT2sNXgwZZij+VOBI3Vp4wr61U69CfP/QWWPZZF5 # ECEh8ZDK1roiEyBjn6njqXmt5vbmNasgI5umgnNPBgKEB2OLXqox6rn9XK0qMJ+X # /oiCaL9RveU/QL5qNvV6Z2beXPwT51Vdy8+bQBfb5bUFRGQcTVIWaBRG0ZIHeSGm # pG10/VAnCGtNrC6M/HVGd0Wyih+ur65Jz/rNKbkMX69cvJuxPWk= # =RAs8 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 19 Sep 2023 06:20:35 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-09-23mm/filemap: fix infinite loop in generic_file_buffered_read()Kent Overstreet
commit 3644e2d2dda78e21edd8f5415b6d7ab03f5f54f3 upstream. If iter->count is 0 and iocb->ki_pos is page aligned, this causes nr_pages to be 0. Then in generic_file_buffered_read_get_pages() find_get_pages_contig() returns 0 - because we asked for 0 pages, so we call generic_file_buffered_read_no_cached_page() which attempts to add a page to the page cache, which fails with -EEXIST, and then we loop. Oops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19tmpfs: verify {g,u}id mount options correctlyChristian Brauner
[ Upstream commit 0200679fc7953177941e41c2a4241d0b6c2c5de8 ] A while ago we received the following report: "The other outstanding issue I noticed comes from the fact that fsconfig syscalls may occur in a different userns than that which called fsopen. That means that resolving the uid/gid via current_user_ns() can save a kuid that isn't mapped in the associated namespace when the filesystem is finally mounted. This means that it is possible for an unprivileged user to create files owned by any group in a tmpfs mount (since we can set the SUID bit on the tmpfs directory), or a tmpfs that is owned by any user, including the root group/user." The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs as above. The new mount api's cross-namespace delegation abilities are already widely used. After having talked to a bunch of userspace this is the most faithful solution with minimal regression risks. I know of one users - systemd - that makes use of the new mount api in this way and they don't set unresolable {g,u}ids. So the regression risk is minimal. Link: https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com Fixes: f32356261d44 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API") Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Reported-by: Seth Jenkins <sethjenkins@google.com> Message-Id: <20230801-vfs-fs_context-uidgid-v1-1-daf46a050bbf@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-04Merge tag 'v5.10.193' into v5.10/standard/baseBruce Ashfield
This is the 5.10.193 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTvUN4ACgkQONu9yGCS # aT7lPhAAyUQ/VO7lONIwD9LBkFxDIWEJBn4928UNFrweLuxd/ZJWZXoLJ9zMykqS # HmKjjKcZXO5fHQxHGHz+zfWKaAqNqCTuvOTKe6HKWRFIEOWjNKmNCJ3+tNgFjQLz # +9G3LXLykpiIc878se7qEZKseiEh56PfE62CCP3+k1cQX6PlIOXCJIq3cntYJJ0j # feV3nHPyR0S/sHnTGZFJDijhLUXd2NUaR6O4AxNk4QiBcw2j7i6B2I8FLvH2LkHi # CeBKt0wl1RJtTIraQP4cKsxrnKYvn8yxfEuIoUNlGlLR0tt8R1LfXlV9pZt0yMoX # 15sS8xkXH8K3+Wau78VuPX8Ar/dckiGU+G1lbrL2Xe4S04dnW3yx/Lq8DoZbFT4s # 4BADWBy8zSobCdVkOaz7G0DVp8m6a9O/dNiCt62C8xg8hJdPAfurnfySMfTPUGKG # 5oLVIcGnjDG/kN+R1kDmejpHtTCpMR53NEbkfN0UxVnx9aClpLZj4Vl9teOdP/ir # mFow0/mNCRZyFlZHV0xU4fwo5xMuwNu3sYee61O12sz3PNA3dgfZawsKfF3vQLy7 # eDPeses8AULl4+IaOwbBXuamtlDiARTM8jcnZ9nZr6LM3040d/2g36XbAkdmhQ/S # jhwWpj6k9uKbrVLWYPnrR2Cc74toMdn80qJy+0Fu861ILpKCPbs= # =8h+N # -----END PGP SIGNATURE----- # gpg: Signature made Wed 30 Aug 2023 10:23:26 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-08-30mm,hwpoison: fix printing of page flagsOscar Salvador
commit 6696d2a6f38c0beedf03c381edfc392ecf7631b4 upstream. Format %pG expects a lower case 'p' in order to print the flags. Fix it. Link: https://lkml.kernel.org/r/20210108085202.4506-1-osalvador@suse.de Fixes: 8295d535e2aa ("mm,hwpoison: refactor get_any_page") Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30mm: memory-failure: fix unexpected return value in soft_offline_page()Miaohe Lin
[ Upstream commit e2c1ab070fdc81010ec44634838d24fce9ff9e53 ] When page_handle_poison() fails to handle the hugepage or free page in retry path, soft_offline_page() will return 0 while -EBUSY is expected in this case. Consequently the user will think soft_offline_page succeeds while it in fact failed. So the user will not try again later in this case. Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm: memory-failure: kill soft_offline_free_page()Kefeng Wang
[ Upstream commit 7adb45887c8af88985c335b53d253654e9d2dd16 ] Open-code the page_handle_poison() into soft_offline_page() and kill unneeded soft_offline_free_page(). Link: https://lkml.kernel.org/r/20220819033402.156519-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm: fix page reference leak in soft_offline_page()Dan Williams
[ Upstream commit dad4e5b390866ca902653df0daa864ae4b8d4147 ] The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup. When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference. Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm,hwpoison: refactor get_any_pageOscar Salvador
[ Upstream commit 8295d535e2aa198bdf65a4045d622df38955ffe2 ] Patch series "HWPoison: Refactor get page interface", v2. This patch (of 3): When we want to grab a refcount via get_any_page, we call __get_any_page that calls get_hwpoison_page to get the actual refcount. get_any_page() is only there because we have a sort of retry mechanism in case the page we met is unknown to us or if we raced with an allocation. Also __get_any_page() prints some messages about the page type in case the page was a free page or the page type was unknown, but if anything, we only need to print a message in case the pagetype was unknown, as that is reporting an error down the chain. Let us merge get_any_page() and __get_any_page(), and let the message be printed in soft_offline_page. While we are it, we can also remove the 'pfn' parameter as it is no longer used. Link: https://lkml.kernel.org/r/20201204102558.31607-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20201204102558.31607-2-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Vlastimil Babka <Vbabka@suse.cz> Cc: Qian Cai <qcai@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mm: add a call to flush_cache_vmap() in vmap_pfn()Alexandre Ghiti
commit a50420c79731fc5cf27ad43719c1091e842a2606 upstream. flush_cache_vmap() must be called after new vmalloc mappings are installed in the page table in order to allow architectures to make sure the new mapping is visible. It could lead to a panic since on some architectures (like powerpc), the page table walker could see the wrong pte value and trigger a spurious page fault that can not be resolved (see commit f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")). But actually the patch is aiming at riscv: the riscv specification allows the caching of invalid entries in the TLB, and since we recently removed the vmalloc page fault handling, we now need to emit a tlb shootdown whenever a new vmalloc mapping is emitted (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/). That's a temporary solution, there are ways to avoid that :) Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function") Reported-by: Dylan Jhong <dylan@andestech.com> Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Dylan Jhong <dylan@andestech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27Merge tag 'v5.10.188' into v5.10/standard/baseBruce Ashfield
This is the 5.10.188 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTCEmYACgkQONu9yGCS # aT5ETA/+MGhe+GasO74Gvx1MaSVJrPZgPzInUg5UoYIkf+N3BfNqH9KVrY/zFKfU # mKNQSQDsli+WG8agHVzoa4lh3ZFHbiUrNx14n+3A8lZ0X5s31fqTLXRvKy9BCu4t # 8OQW6nuMv22SVDd40F5ciroNmAbDquDfUQK4KbETNRPU2Yzvd5VEZiCY9aQAGFbc # YnqBbx1Qc5EQKmzoBmEiup2j04SWXwMPQERKdFVZ1jrjC3hC8MFmL62YwfbCH4gY # faDSZPj++/V5c++bP6oG8QhfrQS+WYGwFmEJpf4GUJ8dxxJC9Ao9CwcXbd2jOjfz # Tk0gNQ9YPs+a2gexAnaHsJqKXn+dcRvkIMzmArApZv73PET0LgMv8N7s3OB5E9ei # K2ft+nfXs5NCLRjPFCqL9nAeclj8ZX92B4d4mrpbqHZ+fFBiHMb0H/aGxfCAR0MJ # BuW1dWQJykR2crhzQ1PJr3OthnL9O4Nl+bBAAuOu6NwqiALFW57uKXQ/2xfhPPbI # qi0cTyXNYYY28kRdprERyV1w4K8W8V6L2YUt3N8LWuPNsI9pHSSQQDKru2JIR1T5 # rHeC41JSR6iw8rBXtkCj1YhGbH5P8CP3fxlikuKo3Q4PHCjVJo8ZpzYU/Ci8FFCL # g/g6DLb9/AHtIhJ8WgcRcxbRNkdyGUc2w9uh6c3rBVS4gwFm/44= # =2pvu # -----END PGP SIGNATURE----- # gpg: Signature made Thu 27 Jul 2023 02:44:54 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-07-27shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfsRoberto Sassu
commit 36ce9d76b0a93bae799e27e4f5ac35478c676592 upstream. As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-03Merge tag 'v5.10.186' into v5.10/standard/baseBruce Ashfield
This is the 5.10.186 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmSb7zsACgkQONu9yGCS # aT4owg/8CVQEVFLuUleIxBOTav2SJI9gOr0Iv1T9ULQUyHhTQBKqpDzG4TQRosqQ # NuXlDCog5PIHDnzBbt9PTKvy0biORSJ2miKMvNkVdeNfEUcyHlLt2HrwQkJPSXvN # bNDDKGbNjnOfC5eojpNd7T0/SD199ykkkDLg9gRm4bLtYIL4Q4pf+d5s5raVv3TS # lBXLNg6Cp9vVfMm1qVyQ+MaRsIHrOSf9J9J1nYRIISM5nBdZmEiCvZBUwwb1BvsF # 0FUv583Cgjf/fbQp8cI0VWZhpNJM1+16MvrBQtiePJ+Jm77WxJlqtDvdKSN0QPhe # rLS5mZAfxKN3Yg2YgTyXpmLKjPxdPz5ufYg0F49EV35GBwmwhrs7dNhB1UTsWnhL # jJJCCkgORIcE2EVyZtnXywBS6ACUldis3FoIiuuuav/wODBs2XD5s0xAYSzaAHSp # fePjOgvzNVoLVA0FQKZf39oFalNatpYJLyZLL9WauZ/OAulPIr3gLIPrTi6nQc2v # nNObn0GouIBZSeBWfRW2znKh7KYs10dD0YFtiVUV7na3L/jVXXIwK0b53Ajy1LVE # 4WYa9EEtchrohE54jCXIuDQPY/dJp/gGKACMy2sCYOw2dJrqh0qPokQ6Xgi1tpMl # hwDEnGpjF9Tko47AvJs50d8O1uZEUL+6LJCQ34LP9FfVs3cKhJ0= # =PJ0K # -----END PGP SIGNATURE----- # gpg: Signature made Wed 28 Jun 2023 04:28:43 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-06-28memfd: check for non-NULL file_seals in memfd_create() syscallRoberto Sassu
[ Upstream commit 935d44acf621aa0688fef8312dec3e5940f38f4e ] Ensure that file_seals is non-NULL before using it in the memfd_create() syscall. One situation in which memfd_file_seals_ptr() could return a NULL pointer when CONFIG_SHMEM=n, oopsing the kernel. Link: https://lkml.kernel.org/r/20230607132427.2867435-1-roberto.sassu@huaweicloud.com Fixes: 47b9012ecdc7 ("shmem: add sealing support to hugetlb-backed memfd") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-22Merge tag 'v5.10.185' into v5.10/standard/baseBruce Ashfield
This is the 5.10.185 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmSS/y4ACgkQONu9yGCS # aT7AiBAAukB1bg+mF6pAjs8KHjn68ULXFeuwcJIMWWGNVBhvCWrjoZVmOIeGV7pZ # m2LA9juhyFHgn/s0IvwLvGtYz8emkeZ+LxpPzrWj/DaijelP+tK5nmCm/tdfkS0F # f+1GRPiAZAjnQG1h38iGQrydUiT9//qOdNou5rRkvxo8p0nHz4o66mgVMVJLDdVw # 2K5t44qOry1KYBx6/XQmd7RjyXFlkTJg+SUXYGV4aZ7kwzVtYkydJ9x4YPY0X16K # CXVPKEauJg7kuZB87xRopfldXbDMtBuWqnyvO9WJccgVuxeA65CNkcnRCM7Y95Z9 # TH3ryve6kdEniN1vBGvuQOWHLK7qV62wEFvn1o+jJIko7sHKqmOyfOsHneriFvzE # 7S4cJSs61TCyfJ448axEbEyi7VZFhdf6TUfdM5X19S6aIoCNT85hBYoV/3Hubk/G # lsjVCtnJKa8kMFz8lEBWwqV/Svqqzgi1I2HBJhqvILjsi2Kl7JCBuZfajZQC+myH # zZsGczwUHRFkfs9u2zTCqQ/l4uwL6vMU8rOIBdG5Y2QKwt0roApZXT2QQomWMXtJ # Az57rySRFyBJlIQszDh+22DpgodwEPNErTPZCu3IHqH4J45uvgFcZURIx0l7wTrW # 7BbnhENLoAnFO15n7r7DaDgweb/ILk2g9viXJVWnDbXbI0djWXs= # =6Kc9 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 21 Jun 2023 09:46:22 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-06-21mm/memory_hotplug: extend offline_and_remove_memory() to handle more than ↵David Hildenbrand
one memory block commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream. virtio-mem soon wants to use offline_and_remove_memory() memory that exceeds a single Linux memory block (memory_block_size_bytes()). Let's remove that restriction. Let's remember the old state and try to restore that if anything goes wrong. While re-onlining can, in general, fail, it's highly unlikely to happen (usually only when a notifier fails to allocate memory, and these are rather rare). This will be used by virtio-mem to offline+remove memory ranges that are bigger than a single memory block - for example, with a device block size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory block size of 128MB. While we could compress the state into 2 bit, using 8 bit is much easier. This handling is similar, but different to acpi_scan_try_to_offline(): a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG optimization is still relevant - it should only apply to ZONE_NORMAL (where we have no guarantees). If relevant, we can always add it. b) acpi_scan_try_to_offline() simply onlines all memory in case something goes wrong. It doesn't restore previous online type. Let's do that, so we won't overwrite what e.g., user space configured. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-05Merge tag 'v5.10.181' into v5.10/standard/baseBruce Ashfield
This is the 5.10.181 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmR15NEACgkQONu9yGCS # aT53/RAAzq4D4nxVEzuAoluJCIN1eR56Ii20NhPnV3A6nfrlYrLvvCdBlgbI2C6U # UrhPvARRwgxyIQ5qwhF3F66sXU+ZAjdMDpgTCxDkU7XIKvPIuGAs87Zv/CKTSi/X # mJbmZQZQnpreunJ2zZGjbde5RP8tKmG2iVQkkDsQjRBYWyXhgMuaode2/OO64RHO # by5gfFd3jbZMDQNmLZa8xAZ5Dbg0grsGS1r+tfKAaWpDVDDzYAVhRT1foPr0m7Uq # 8Mp3A7mP0S/24S1BnE0wty8bgx2FcVXYp928BCeA0eZ/a+AY+E7EZAcGlgim+C6X # ZNr1NtYpJ2kV34A4Lywb4bHU79ng9b4xvvlKB7x1XC9XVFDQ0o0PPnDj+Z/whe5Q # 5/1wnBEKMPopk7y2nAK/KL815d2gllWqO/HSIM/8ECzUS8cb3VFMAM81+V9o0Txo # ggABS9GQ6kVuGrbZpD4ZeXc1SgRQStHeuG9jfQ2F5ErFyyGmiU6Qxjk0Rma1idSs # T6M7GCEezJ+SxjW4J5gBNWHfVqLM1kJEwdjrrguFbtTYHz21qLs87O4l0AinV0gm # KFZXQAso90wJl+V9m7kQKIM1loiuLGAG+jP/Wv8lH882vvwVLUCrzJoIVBvVvbw1 # L/NW2Fvm5lo1DQBFGvqYHdfmiCSrZx+34Re6VD7R70RNU3dAYpY= # =66a2 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 30 May 2023 07:58:09 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2023-05-30writeback, cgroup: remove extra percpu_ref_exit()Greg Thelen
5.10 stable commit 2b00b2a0e642 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs") is a backport of upstream 6.3 commit 1ba1199ec574. In the 5.10 stable commit backport percpu_ref_exit() is called twice: first in cgwb_release_workfn() and then in cgwb_free_rcu(). The 2nd call is benign as percpu_ref_exit() internally detects there's nothing to do. This fixes an non-upstream issue that only applies to 5.10.y. Fixes: 2b00b2a0e642 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs") Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>