summaryrefslogtreecommitdiffstats
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2021-09-18f2fs: should put a page beyond EOF when preparing a writeJaegeuk Kim
[ Upstream commit 9605f75cf36e0bcc0f4ada07b5be712d30107607 ] The prepare_compress_overwrite() gets/locks a page to prepare a read, and calls f2fs_read_multi_pages() which checks EOF first. If there's any page beyond EOF, we unlock the page and set cc->rpages[i] = NULL, which we can't put the page anymore. This makes page leak, so let's fix by putting that page. Fixes: a949dc5f2c5c ("f2fs: compress: fix race condition of overwrite vs truncate") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: deallocate compressed pages when error happensJaegeuk Kim
[ Upstream commit 827f02842e40ea2e00f401e8f4cb1bccf3b8cd86 ] In f2fs_write_multi_pages(), f2fs_compress_pages() allocates pages for compression work in cc->cpages[]. Then, f2fs_write_compressed_pages() initiates bio submission. But, if there's any error before submitting the IOs like early f2fs_cp_error(), previously it didn't free cpages by f2fs_compress_free_page(). Let's fix memory leak by putting that just before deallocating cc->cpages. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix to unmap pages from userspace process in punch_hole()Chao Yu
[ Upstream commit c8dc3047c48540183744f959412d44b08c5435e1 ] We need to unmap pages from userspace process before removing pagecache in punch_hole() like we did in f2fs_setattr(). Similar change: commit 5e44f8c374dc ("ext4: hole-punch use truncate_pagecache_range") Fixes: fbfa2cc58d53 ("f2fs: add file operations") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()Chao Yu
[ Upstream commit adf9ea89c719c1d23794e363f631e376b3ff8cbc ] In below path, it will return ENOENT if filesystem is shutdown: - f2fs_map_blocks - f2fs_get_dnode_of_data - f2fs_get_node_page - __get_node_page - read_node_page - is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) return -ENOENT - force return value from ENOENT to 0 It should be fine for read case, since it indicates a hole condition, and caller could use .m_next_pgofs to skip the hole and continue the lookup. However it may cause confusing for write case, since leaving a hole there, and said nothing was wrong doesn't help. There is at least one case from dax_iomap_actor() will complain that, so fix this in prior to supporting dax in f2fs. xfstest generic/388 reports below warning: ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 485833 at fs/dax.c:1127 dax_iomap_actor+0x339/0x370 Call Trace: iomap_apply+0x1c4/0x7b0 ? dax_iomap_rw+0x1c0/0x1c0 dax_iomap_rw+0xad/0x1c0 ? dax_iomap_rw+0x1c0/0x1c0 f2fs_file_write_iter+0x5ab/0x970 [f2fs] do_iter_readv_writev+0x273/0x2e0 do_iter_write+0xab/0x1f0 vfs_iter_write+0x21/0x40 iter_file_splice_write+0x287/0x540 do_splice+0x37c/0xa60 __x64_sys_splice+0x15f/0x3a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ RIP: 0010:dax_iomap_pte_fault.isra.0+0x72e/0x14a0 Call Trace: dax_iomap_fault+0x44/0x70 f2fs_dax_huge_fault+0x155/0x400 [f2fs] f2fs_dax_fault+0x18/0x30 [f2fs] __do_fault+0x4e/0x120 do_fault+0x3cf/0x7a0 __handle_mm_fault+0xa8c/0xf20 ? find_held_lock+0x39/0xd0 handle_mm_fault+0x1b6/0x480 do_user_addr_fault+0x320/0xcd0 ? rcu_read_lock_sched_held+0x67/0xc0 exc_page_fault+0x77/0x3f0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix to account missing .skipped_gc_rwsemChao Yu
[ Upstream commit ad126ebddecbf696e0cf214ff56c7b170fa9f0f7 ] There is a missing place we forgot to account .skipped_gc_rwsem, fix it. Fixes: 6f8d4455060d ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix to keep compatibility of fault injection interfaceChao Yu
[ Upstream commit b96d9b3b09f0427b289332c6f6bfbf747a19b654 ] The value of FAULT_* macros and its description in f2fs.rst became inconsistent, fix this to keep compatibility of fault injection interface. Fixes: 67883ade7a98 ("f2fs: remove FAULT_ALLOC_BIO") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: reduce the scope of setting fsck tag when de->name_len is zeroYangtao Li
[ Upstream commit d4bf15a7ce172d186d400d606adf4f34a59130d6 ] I recently found a case where de->name_len is 0 in f2fs_fill_dentries() easily reproduced, and finally set the fsck flag. Thread A Thread B - f2fs_readdir - f2fs_read_inline_dir - ctx->pos = d.max - f2fs_add_dentry - f2fs_add_inline_entry - do_convert_inline_dir - f2fs_add_regular_entry - f2fs_readdir - f2fs_fill_dentries - set_sbi_flag(sbi, SBI_NEED_FSCK) Process A opens the folder, and has been reading without closing it. During this period, Process B created a file under the folder (occupying multiple f2fs_dir_entry, exceeding the d.max of the inline dir). After creation, process A uses the d.max of inline dir to read it again, and it will read that de->name_len is 0. And Chao pointed out that w/o inline conversion, the race condition still can happen as below: dir_entry1: A dir_entry2: B dir_entry3: C free slot: _ ctx->pos: ^ Thread A is traversing directory, ctx-pos moves to below position after readdir() by thread A: AAAABBBB___ ^ Then thread B delete dir_entry2, and create dir_entry3. Thread A calls readdir() to lookup dirents starting from middle of new dirent slots as below: AAAACCCCCC_ ^ In these scenarios, the file system is not damaged, and it's hard to avoid it. But we can bypass tagging FSCK flag if: a) bit_pos (:= ctx->pos % d->max) is non-zero and b) before bit_pos moves to first valid dir_entry. Fixes: ddf06b753a85 ("f2fs: fix to trigger fsck if dirent.name_len is zero") Signed-off-by: Yangtao Li <frank.li@vivo.com> [Chao: clean up description] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix wrong checkpoint_changed value in f2fs_remount()Chao Yu
[ Upstream commit 277afbde6ca2b38729683fc17c031b4bc942068d ] In f2fs_remount(), return value of test_opt() is an unsigned int type variable, however when we compare it to a bool type variable, it cause wrong result, fix it. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: turn back remapped address in compressed page endioDaeho Jeong
[ Upstream commit 4931e0c93e124357308893a3e5e224cbeeabc721 ] Turned back the remmaped sector address to the address in the partition, when ending io, for compress cache to work properly. Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Hyeong Jun Kim <hj514.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: do not submit NEW_ADDR to read node blockJaegeuk Kim
[ Upstream commit b7ec2061737f12c33e45beeb967d17f31abc1ada ] After the below patch, give cp is errored, we drop dirty node pages. This can give NEW_ADDR to read node pages. Don't do WARN_ON() which gives generic/475 failure. Fixes: 28607bf3aa6f ("f2fs: drop dirty node pages when cp is in error status") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: quota: fix potential deadlockChao Yu
[ Upstream commit 9de71ede81e6d1a111fdd868b2d78d459fa77f80 ] xfstest generic/587 reports a deadlock issue as below: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc1 #69 Not tainted ------------------------------------------------------ repquota/8606 is trying to acquire lock: ffff888022ac9320 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}, at: f2fs_quota_sync+0x207/0x300 [f2fs] but task is already holding lock: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sbi->quota_sem){.+.+}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_quota_sync+0x59/0x300 [f2fs] f2fs_quota_on+0x48/0x100 [f2fs] do_quotactl+0x5e3/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&sbi->cp_rwsem){++++}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_unlink+0x353/0x670 [f2fs] vfs_unlink+0x1c7/0x380 do_unlinkat+0x413/0x4b0 __x64_sys_unlinkat+0x50/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}: check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &sb->s_type->i_mutex_key#18 --> &sbi->cp_rwsem --> &sbi->quota_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->quota_sem); lock(&sbi->cp_rwsem); lock(&sbi->quota_sem); lock(&sb->s_type->i_mutex_key#18); *** DEADLOCK *** 3 locks held by repquota/8606: #0: ffff88801efac0e0 (&type->s_umount_key#53){++++}-{3:3}, at: user_get_super+0xd9/0x190 #1: ffff8880084bc380 (&sbi->cp_rwsem){++++}-{3:3}, at: f2fs_quota_sync+0x3e/0x300 [f2fs] #2: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] stack backtrace: CPU: 6 PID: 8606 Comm: repquota Not tainted 5.14.0-rc1 #69 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: dump_stack_lvl+0xce/0x134 dump_stack+0x17/0x20 print_circular_bug.isra.0.cold+0x239/0x253 check_noncircular+0x1be/0x1f0 check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f883b0b4efe The root cause is ABBA deadlock of inode lock and cp_rwsem, reorder locks in f2fs_quota_sync() as below to fix this issue: - lock inode - lock cp_rwsem - lock quota_sem Fixes: db6ec53b7e03 ("f2fs: add a rw_sem to cover quota flag changes") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: compress: fix to set zstd compress level correctlyChao Yu
[ Upstream commit 01f6afd0f3ccaa2d5f7fb87e7bd910dc17eef06b ] As 5kft reported in [1]: set_compress_context() should set compress level into .i_compress_flag for zstd as well as lz4hc, otherwise, zstd compressor will still use default zstd compress level during compression, fix it. [1] https://lore.kernel.org/linux-f2fs-devel/8e29f52b-6b0d-45ec-9520-e63eb254287a@www.fastmail.com/T/#u Fixes: 3fde13f817e2 ("f2fs: compress: support compress level") Reported-by: 5kft <5kft@5kft.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18f2fs: fix to do sanity check for sb/cp fields correctlyChao Yu
commit 65ddf6564843890a58ee3b18bb46ce67d96333fb upstream. This patch fixes below problems of sb/cp sanity check: - in sanity_check_raw_superi(), it missed to consider log header blocks while cp_payload check. - in f2fs_sanity_check_ckpt(), it missed to check nat_bits_blocks. Cc: <stable@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18f2fs: let's keep writing IOs on SBI_NEED_FSCKJaegeuk Kim
commit 1ffc8f5f7751f91fe6af527d426a723231b741a6 upstream. SBI_NEED_FSCK is an indicator that fsck.f2fs needs to be triggered, so it is not fully critical to stop any IO writes. So, let's allow to write data instead of reporting EIO forever given SBI_NEED_FSCK, but do keep OPU. Fixes: 955772787667 ("f2fs: drop inplace IO if fs status is abnormal") Cc: <stable@kernel.org> # v5.13+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15f2fs: guarantee to write dirty data when enabling checkpoint backJaegeuk Kim
commit dddd3d65293a52c2c3850c19b1e5115712e534d8 upstream. We must flush all the dirty data when enabling checkpoint back. Let's guarantee that first by adding a retry logic on sync_inodes_sb(). In addition to that, this patch adds to flush data in fsync when checkpoint is disabled, which can mitigate the sync_inodes_sb() failures in advance. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03f2fs: report correct st_size for encrypted symlinksEric Biggers
commit 461b43a8f92e68e96c4424b31e15f2b35f1bbfa9 upstream. The stat() family of syscalls report the wrong size for encrypted symlinks, which has caused breakage in several userspace programs. Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for encrypted symlinks. This function computes the correct size by reading and decrypting the symlink target (if it's not already cached). For more details, see the commit which added fscrypt_symlink_getattr(). Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210702065350.209646-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-06f2fs: drop dirty node pages when cp is in error statusJaegeuk Kim
Otherwise, writeback is going to fall in a loop to flush dirty inode forever before getting SBI_CLOSING. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-05f2fs: initialize page->private when using for our internal useJaegeuk Kim
We need to guarantee it's initially zero. Otherwise, it'll hurt entire flag operations. Fixes: b763f3bedc2d ("f2fs: restructure f2fs page.private layout") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-01f2fs: compress: add nocompress extensions supportFengnan Chang
When we create a directory with enable compression, all file write into directory will try to compress.But sometimes we may know, new file cannot meet compression ratio requirements. We need a nocompress extension to skip those files to avoid unnecessary compress page test. After add nocompress_extension, the priority should be: dir_flag < comp_extention,nocompress_extension < comp_file_flag, no_comp_file_flag. Priority in between FS_COMPR_FL, FS_NOCOMP_FS, extensions: * compress_extension=so; nocompress_extension=zip; chattr +c dir; touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so and baz.txt should be compresse, bar.zip should be non-compressed. chattr +c dir/bar.zip can enable compress on bar.zip. * compress_extension=so; nocompress_extension=zip; chattr -c dir; touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so should be compresse, bar.zip and baz.txt should be non-compressed. chattr+c dir/bar.zip; chattr+c dir/baz.txt; can enable compress on bar.zip and baz.txt. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-28f2fs: remove false alarm on iget failure during GCJaegeuk Kim
This patch removes setting SBI_NEED_FSCK when GC gets an error on f2fs_iget, since f2fs_iget can give ENOMEM and others by race condition. If we set this critical fsck flag, we'll get EIO during fsync via the below code path. In f2fs_inplace_write_data(), if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) { err = -EIO; goto drop_bio; } Fixes: 9557727876674 ("f2fs: drop inplace IO if fs status is abnormal") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-28f2fs: enable extent cache for compression files in read-onlyDaeho Jeong
Let's allow extent cache for RO partition. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: introduce f2fs_casefolded_name slab cacheChao Yu
Add a slab cache: "f2fs_casefolded_name" for memory allocation of casefold name. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: swap: support migrating swapfile in aligned write modeChao Yu
This patch supports to migrate swapfile in aligned write mode during swapon in order to keep swapfile being aligned to section as much as possible, then pinned swapfile will locates fully filled section which may not affected by GC. However, for the case that swapfile's size is not aligned to section size, it will still leave last extent in file's tail as unaligned due to its size is smaller than section size, like case #2. case #1 xfs_io -f /mnt/f2fs/file -c "pwrite 0 4M" -c "fsync" Before swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..3047]: 1123352..1126399 3048 0x1000 1: [3048..7143]: 237568..241663 4096 0x1000 2: [7144..8191]: 245760..246807 1048 0x1001 After swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..8191]: 249856..258047 8192 0x1001 Kmsg: F2FS-fs (zram0): Swapfile (2) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n) case #2 xfs_io -f /mnt/f2fs/file -c "pwrite 0 3M" -c "fsync" Before swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..3047]: 246808..249855 3048 0x1000 1: [3048..6143]: 237568..240663 3096 0x1001 After swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 258048..262143 4096 0x1000 1: [4096..6143]: 238616..240663 2048 0x1001 Kmsg: F2FS-fs (zram0): Swapfile: last extent is not aligned to section F2FS-fs (zram0): Swapfile (2) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n) Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: swap: remove dead codesChao Yu
After commit af4b6b8edf6a ("f2fs: introduce check_swap_activate_fast()"), we will never run into original logic of check_swap_activate() before f2fs supports non 4k-sized page, so let's delete those dead codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: compress: add compress_inode to cache compressed blocksChao Yu
Support to use address space of inner inode to cache compressed block, in order to improve cache hit ratio of random read. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: clean up /sys/fs/f2fs/<disk>/featuresJaegeuk Kim
Let's create /sys/fs/f2fs/<disk>/feature_list/ to meet sysfs rule. Note that there are three feature list entries: 1) /sys/fs/f2fs/features : shows runtime features supported by in-kernel f2fs along with Kconfig. - ref. F2FS_FEATURE_RO_ATTR() 2) /sys/fs/f2fs/$s_id/features <deprecated> : shows on-disk features enabled by mkfs.f2fs, used for old kernels. This won't add new feature anymore, and thus, users should check entries in 3) instead of this 2). 3) /sys/fs/f2fs/$s_id/feature_list : shows on-disk features enabled by mkfs.f2fs per instance, which follows sysfs entry rule where each entry should expose single value. This list covers old feature list provided by 2) and beyond. Therefore, please add new on-disk feature in this list only. - ref. F2FS_SB_FEATURE_RO_ATTR() Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: add pin_file in feature listJaegeuk Kim
This patch adds missing pin_file feature supported by kernel. Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: Advertise encrypted casefolding in sysfsDaniel Rosenberg
Older kernels don't support encryption with casefolding. This adds the sysfs entry encrypted_casefold to show support for those combined features. Support for this feature was originally added by commit 7ad08a58bf67 ("f2fs: Handle casefolding with Encryption") Fixes: 7ad08a58bf67 ("f2fs: Handle casefolding with Encryption") Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: Show casefolding support only when supportedDaniel Rosenberg
The casefolding feature is only supported when CONFIG_UNICODE is set. This modifies the feature list f2fs presents under sysfs accordingly. Fixes: 5aba54302a46 ("f2fs: include charset encoding information in the superblock") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: support RO featureJaegeuk Kim
Given RO feature in superblock, we don't need to check provisioning/reserve spaces and SSA area. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: logging neateningJoe Perches
Update the logging uses that have unnecessary newlines as the f2fs_printk function and so its f2fs_<level> macro callers already adds one. This allows searching single line logging entries with an easier grep and also avoids unnecessary blank lines in the logging. Miscellanea: o Coalesce formats o Align to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bitJaegeuk Kim
Once we release compressed blocks, we used to set IMMUTABLE bit. But it turned out it disallows every fs operations which we don't need for compression. Let's just prevent writing data only. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23f2fs: compress: remove unneeded preallocationChao Yu
We will reserve iblocks for compression saved, so during compressed cluster overwrite, we don't need to preallocate blocks for later write. In addition, it adds a bug_on to detect wrong reserved iblock number in __f2fs_cluster_blocks(). Bug fix in the original patch by Jaegeuk: If we released compressed blocks having an immutable bit, we can see less number of compressed block addresses. Let's fix wrong BUG_ON. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-26f2fs: atgc: export entries for better tunability via sysfsChao Yu
This patch export below sysfs entries for better ATGC tunability. /sys/fs/f2fs/<disk>/atgc_candidate_ratio /sys/fs/f2fs/<disk>/atgc_candidate_count /sys/fs/f2fs/<disk>/atgc_age_weight /sys/fs/f2fs/<disk>/atgc_age_threshold Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-26f2fs: compress: fix to disallow temp extensionChao Yu
This patch restricts to configure compress extension as format of: [filename + '.' + extension] rather than: [filename + '.' + extension + (optional: '.' + temp extension)] in order to avoid to enable compression incorrectly: 1. compress_extension=so 2. touch file.soa 3. touch file.so.tmp Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-26f2fs: let's allow compression for mmap filesJaegeuk Kim
This patch allows to compress mmap files. E.g., for so files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-26f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfsChao Yu
As marcosfrm reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213089 Initramfs generators rely on "pre" softdeps (and "depends") to include additional required modules. F2FS does not declare "pre: crc32" softdep. Then every generator (dracut, mkinitcpio...) has to maintain a hardcoded list for this purpose. Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code. Fixes: 43b6573bac95 ("f2fs: use cryptoapi crc32 functions") Reported-by: marcosfrm <marcosfrm@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-26f2fs: return success if there is no work to doTom Rix
Static analysis reports this problem file.c:3206:2: warning: Undefined or garbage value returned to caller return err; ^~~~~~~~~~ err is only set if there is some work to do. Because the loop returns immediately on an error, if all the work was done, a 0 would be returned. Instead of checking the unlikely case that there was no work to do, change the return of err to 0. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: compress: clean up parameter of __f2fs_cluster_blocks()Chao Yu
Previously, in order to reuse __f2fs_cluster_blocks(), f2fs_is_compressed_cluster() assigned a compress_ctx type variable, which is used to pass few parameters (cc.inode, cc.cluster_size, cc.cluster_idx), it's wasteful to allocate such large space in stack. Let's clean up parameters of __f2fs_cluster_blocks() to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: compress: remove unneeded f2fs_put_dnode()Chao Yu
If we don't initialize dn.inode_page for f2fs_get_block(), f2fs_get_block() will call f2fs_put_dnode() itself, so let's remove unneeded f2fs_put_dnode() in f2fs_vm_page_mkwrite(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: atgc: fix to set default age thresholdChao Yu
Default age threshold value is missed to set, fix it. Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Reported-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: Prevent swap file in LFS modeShin'ichiro Kawasaki
The kernel writes to swap files on f2fs directly without the assistance of the filesystem. This direct write by kernel can be non-sequential even when the f2fs is in LFS mode. Such non-sequential write conflicts with the LFS semantics. Especially when f2fs is set up on zoned block devices, the non-sequential write causes unaligned write command errors. To avoid the non-sequential writes to swap files, prevent swap file activation when the filesystem is in LFS mode. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instancesChao Yu
As syzbot reported, there is an use-after-free issue during f2fs recovery: Use-after-free write at 0xffff88823bc16040 (in kfence-#10): kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486 f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869 f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945 mount_bdev+0x26c/0x3a0 fs/super.c:1367 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is multi f2fs filesystem instances can race on accessing global fsync_entry_slab pointer, result in use-after-free issue of slab cache, fixes to init/destroy this slab cache only once during module init/destroy procedure to avoid this issue. Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: restructure f2fs page.private layoutChao Yu
Restruct f2fs page private layout for below reasons: There are some cases that f2fs wants to set a flag in a page to indicate a specified status of page: a) page is in transaction list for atomic write b) page contains dummy data for aligned write c) page is migrating for GC d) page contains inline data for inline inode flush e) page belongs to merkle tree, and is verified for fsverity f) page is dirty and has filesystem/inode reference count for writeback g) page is temporary and has decompress io context reference for compression There are existed places in page structure we can use to store f2fs private status/data: - page.flags: PG_checked, PG_private - page.private However it was a mess when we using them, which may cause potential confliction: page.private PG_private PG_checked page._refcount (+1 at most) a) -1 set +1 b) -2 set c), d), e) set f) 0 set +1 g) pointer set The other problem is page.flags has no free slot, if we can avoid set zero to page.private and set PG_private flag, then we use non-zero value to indicate PG_private status, so that we may have chance to reclaim PG_private slot for other usage. [1] The other concern is f2fs has bad scalability in aspect of indicating more page status. So in this patch, let's restructure f2fs' page.private as below to solve above issues: Layout A: lowest bit should be 1 | bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... | bit 0 PAGE_PRIVATE_NOT_POINTER bit 1 PAGE_PRIVATE_ATOMIC_WRITE bit 2 PAGE_PRIVATE_DUMMY_WRITE bit 3 PAGE_PRIVATE_ONGOING_MIGRATION bit 4 PAGE_PRIVATE_INLINE_INODE bit 5 PAGE_PRIVATE_REF_RESOURCE bit 6- f2fs private data Layout B: lowest bit should be 0 page.private is a wrapped pointer. After the change: page.private PG_private PG_checked page._refcount (+1 at most) a) 11 set +1 b) 101 set +1 c) 1001 set +1 d) 10001 set +1 e) set f) 100001 set +1 g) pointer set +1 [1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: add cp_error check in f2fs_write_compressed_pagesChao Yu
This patch adds cp_error check in f2fs_write_compressed_pages() like we did in f2fs_write_single_data_page() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: compress: rename __cluster_may_compressChao Yu
This patch renames __cluster_may_compress() to cluster_has_invalid_data() for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-12f2fs: return EINVAL for hole cases in swap fileJaegeuk Kim
This tries to fix xfstests/generic/495. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: avoid swapon failure by giving a warning firstJaegeuk Kim
The final solution can be migrating blocks to form a section-aligned file internally. Meanwhile, let's ask users to do that when preparing the swap file initially like: 1) create() 2) ioctl(F2FS_IOC_SET_PIN_FILE) 3) fallocate() Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 36e4d95891ed ("f2fs: check if swapfile is section-alligned") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: compress: fix to assign cc.cluster_idx correctlyChao Yu
In f2fs_destroy_compress_ctx(), after f2fs_destroy_compress_ctx(), cc.cluster_idx will be cleared w/ NULL_CLUSTER, f2fs_cluster_blocks() may check wrong cluster metadata, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: compress: fix race condition of overwrite vs truncateChao Yu
pos_fsstress testcase complains a panic as belew: ------------[ cut here ]------------ kernel BUG at fs/f2fs/compress.c:1082! invalid opcode: 0000 [#1] SMP PTI CPU: 4 PID: 2753477 Comm: kworker/u16:2 Tainted: G OE 5.12.0-rc1-custom #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: writeback wb_workfn (flush-252:16) RIP: 0010:prepare_compress_overwrite+0x4c0/0x760 [f2fs] Call Trace: f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs] f2fs_write_cache_pages+0x468/0x8a0 [f2fs] f2fs_write_data_pages+0x2a4/0x2f0 [f2fs] do_writepages+0x38/0xc0 __writeback_single_inode+0x44/0x2a0 writeback_sb_inodes+0x223/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x290 wb_workfn+0x309/0x500 process_one_work+0x220/0x3c0 worker_thread+0x53/0x420 kthread+0x12f/0x150 ret_from_fork+0x22/0x30 The root cause is truncate() may race with overwrite as below, so that one reference count left in page can not guarantee the page attaching in mapping tree all the time, after truncation, later find_lock_page() may return NULL pointer. - prepare_compress_overwrite - f2fs_pagecache_get_page - unlock_page - f2fs_setattr - truncate_setsize - truncate_inode_page - delete_from_page_cache - find_lock_page Fix this by avoiding referencing updated page. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>