Restoring a snapshot causes snapshot delete to block until drop_caches is called #736

ticpu · 2024-08-27T02:13:36Z

Details

Setup

Using Kernel: 6.11 / bcachefs-testing / fcd6549.
bcachefs is mounted at /mnt/bcachefs
/opt is a bind mount to /mnt/bcachefs/opt
daily snapshots of opt are taken in /mnt/bcachefs/snapshots/opt

Rollback operation:

During an update using jetbrain-toolbox to update all applications, my computer crashed and I wanted to start from a clean state.
confirm with fuser -vm /opt no application using /opt, nothing is configured to use /mnt/bcachefs/opt directly.
umount /opt
cd /mnt/bcachefs
mv opt opt.dead
bcachefs subvolume snapshot snapshots/opt/@GMT-2024.08.26-05.00.18/ opt
ls opt all files are present.
mount --bind /mnt/bcachefs/opt /opt
Restart services.
bcachefs subvolume del ./opt.dead

Problem

Everything starts as expected, I/O are happening on all devices.
I restart updates from the jetbrain-toolbox.
Repeating stack traces appears in dmesg: stack1.txt
umounting the filesystem fixes the issue and ends the deletion with:

bch2_delete_dead_snapshots: error deleting keys from dying snapshots erofs_trans_commit
bch2_delete_dead_snapshots: error erofs_trans_commit
shutdown complete, journal seq 34783919.

Reproducing the problem with more debug information.

Repeat rollback and step 1, 2 previously.
New debug message appears repeatedly: bch2_evict_subvolume_inodes() waited 10 seconds for inode 671283974:6768 to go away: ref 1 state 65536
echo w shows the same stack for the blocked delete: stack2.txt
after waiting about half an hour, issue echo 3 > /proc/sys/vm/drop_caches
repeating log stops and multiple gigabytes of discard operation start on both NVMe.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring a snapshot causes snapshot delete to block until drop_caches is called #736

Restoring a snapshot causes snapshot delete to block until drop_caches is called #736

ticpu commented Aug 27, 2024 •

edited

Loading

Restoring a snapshot causes snapshot delete to block until drop_caches is called #736

Restoring a snapshot causes snapshot delete to block until drop_caches is called #736

Comments

ticpu commented Aug 27, 2024 • edited Loading

Details

Setup

Rollback operation:

Problem

Reproducing the problem with more debug information.

ticpu commented Aug 27, 2024 •

edited

Loading