Skip to content

Commit d6fd0ae

Browse files
osandovkdave
authored andcommitted
Btrfs: fix missing delayed iputs on unmount
There's a race between close_ctree() and cleaner_kthread(). close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it sees it set, but this is racy; the cleaner might have already checked the bit and could be cleaning stuff. In particular, if it deletes unused block groups, it will create delayed iputs for the free space cache inodes. As of "btrfs: don't run delayed_iputs in commit", we're no longer running delayed iputs after a commit. Therefore, if the cleaner creates more delayed iputs after delayed iputs are run in btrfs_commit_super(), we will leak inodes on unmount and get a busy inode crash from the VFS. Fix it by parking the cleaner before we actually close anything. Then, any remaining delayed iputs will always be handled in btrfs_commit_super(). This also ensures that the commit in close_ctree() is really the last commit, so we can get rid of the commit in cleaner_kthread(). The fstest/generic/475 followed by 476 can trigger a crash that manifests as a slab corruption caused by accessing the freed kthread structure by a wake up function. Sample trace: [ 5657.077612] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 5657.079432] PGD 1c57a067 P4D 1c57a067 PUD da10067 PMD 0 [ 5657.080661] Oops: 0000 [#1] PREEMPT SMP [ 5657.081592] CPU: 1 PID: 5157 Comm: fsstress Tainted: G W 4.19.0-rc8-default+ torvalds#323 [ 5657.083703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 5657.086577] RIP: 0010:shrink_page_list+0x2f9/0xe90 [ 5657.091937] RSP: 0018:ffffb5c745c8f728 EFLAGS: 00010287 [ 5657.092953] RAX: 0000000000000074 RBX: ffffb5c745c8f830 RCX: 0000000000000000 [ 5657.094590] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a8747fdf3d0 [ 5657.095987] RBP: ffffb5c745c8f9e0 R08: 0000000000000000 R09: 0000000000000000 [ 5657.097159] R10: ffff9a8747fdf5e8 R11: 0000000000000000 R12: ffffb5c745c8f788 [ 5657.098513] R13: ffff9a877f6ff2c0 R14: ffff9a877f6ff2c8 R15: dead000000000200 [ 5657.099689] FS: 00007f948d853b80(0000) GS:ffff9a877d600000(0000) knlGS:0000000000000000 [ 5657.101032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5657.101953] CR2: 00000000000000cc CR3: 00000000684bd000 CR4: 00000000000006e0 [ 5657.103159] Call Trace: [ 5657.103776] shrink_inactive_list+0x194/0x410 [ 5657.104671] shrink_node_memcg.constprop.84+0x39a/0x6a0 [ 5657.105750] shrink_node+0x62/0x1c0 [ 5657.106529] try_to_free_pages+0x1a4/0x500 [ 5657.107408] __alloc_pages_slowpath+0x2c9/0xb20 [ 5657.108418] __alloc_pages_nodemask+0x268/0x2b0 [ 5657.109348] kmalloc_large_node+0x37/0x90 [ 5657.110205] __kmalloc_node+0x236/0x310 [ 5657.111014] kvmalloc_node+0x3e/0x70 Fixes: 30928e9 ("btrfs: don't run delayed_iputs in commit") Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add trace ] Signed-off-by: David Sterba <[email protected]>
1 parent ac765f8 commit d6fd0ae

File tree

1 file changed

+15
-36
lines changed

1 file changed

+15
-36
lines changed

fs/btrfs/disk-io.c

+15-36
Original file line numberDiff line numberDiff line change
@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
16641664
struct btrfs_root *root = arg;
16651665
struct btrfs_fs_info *fs_info = root->fs_info;
16661666
int again;
1667-
struct btrfs_trans_handle *trans;
16681667

1669-
do {
1668+
while (1) {
16701669
again = 0;
16711670

16721671
/* Make the cleaner go to sleep early. */
@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
17151714
*/
17161715
btrfs_delete_unused_bgs(fs_info);
17171716
sleep:
1717+
if (kthread_should_park())
1718+
kthread_parkme();
1719+
if (kthread_should_stop())
1720+
return 0;
17181721
if (!again) {
17191722
set_current_state(TASK_INTERRUPTIBLE);
1720-
if (!kthread_should_stop())
1721-
schedule();
1723+
schedule();
17221724
__set_current_state(TASK_RUNNING);
17231725
}
1724-
} while (!kthread_should_stop());
1725-
1726-
/*
1727-
* Transaction kthread is stopped before us and wakes us up.
1728-
* However we might have started a new transaction and COWed some
1729-
* tree blocks when deleting unused block groups for example. So
1730-
* make sure we commit the transaction we started to have a clean
1731-
* shutdown when evicting the btree inode - if it has dirty pages
1732-
* when we do the final iput() on it, eviction will trigger a
1733-
* writeback for it which will fail with null pointer dereferences
1734-
* since work queues and other resources were already released and
1735-
* destroyed by the time the iput/eviction/writeback is made.
1736-
*/
1737-
trans = btrfs_attach_transaction(root);
1738-
if (IS_ERR(trans)) {
1739-
if (PTR_ERR(trans) != -ENOENT)
1740-
btrfs_err(fs_info,
1741-
"cleaner transaction attach returned %ld",
1742-
PTR_ERR(trans));
1743-
} else {
1744-
int ret;
1745-
1746-
ret = btrfs_commit_transaction(trans);
1747-
if (ret)
1748-
btrfs_err(fs_info,
1749-
"cleaner open transaction commit returned %d",
1750-
ret);
17511726
}
1752-
1753-
return 0;
17541727
}
17551728

17561729
static int transaction_kthread(void *arg)
@@ -3931,6 +3904,13 @@ void close_ctree(struct btrfs_fs_info *fs_info)
39313904
int ret;
39323905

39333906
set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
3907+
/*
3908+
* We don't want the cleaner to start new transactions, add more delayed
3909+
* iputs, etc. while we're closing. We can't use kthread_stop() yet
3910+
* because that frees the task_struct, and the transaction kthread might
3911+
* still try to wake up the cleaner.
3912+
*/
3913+
kthread_park(fs_info->cleaner_kthread);
39343914

39353915
/* wait for the qgroup rescan worker to stop */
39363916
btrfs_qgroup_wait_for_completion(fs_info, false);
@@ -3958,9 +3938,8 @@ void close_ctree(struct btrfs_fs_info *fs_info)
39583938

39593939
if (!sb_rdonly(fs_info->sb)) {
39603940
/*
3961-
* If the cleaner thread is stopped and there are
3962-
* block groups queued for removal, the deletion will be
3963-
* skipped when we quit the cleaner thread.
3941+
* The cleaner kthread is stopped, so do one final pass over
3942+
* unused block groups.
39643943
*/
39653944
btrfs_delete_unused_bgs(fs_info);
39663945

0 commit comments

Comments
 (0)