feat(library): Add incremental library scanning#280
feat(library): Add incremental library scanning#280wilywyrm wants to merge 3 commits intotranxuanthang:mainfrom
Conversation
|
Learning Rust or Vue just for the sake of making a feature request shows how dedicated you are for this project plus the improvements you made are insane lol |
Haha no you're giving me too much credit! I develop software sometimes and I was feeling lazy, so I prompted claude a few times into making these changes. These days, I mostly use lrcget to manually adjust auto-generated lyric timings from a tool I built around whisper/stable-ts before I upload them, so I ran into needing to rescan a lot. |
Update:Just applied your PR to my LRCGET dev folder, and DAMN the improvement is crazy Those weren't counted into the incremental library database, and will always be tagged as a new file which would make it scan whenever I open the program all the time I would love to help but I don't know anything about sqlite lol so I'll be awaiting your updates! |
|
Ah yeah good point thanks. I did see this happen sometimes but left it in so I knew the scan was still able to pick up "new" files. We should probably store the mtime of files that failed to be processed for reasons like missing lyrics and not network-related issues. |
|
I didn't check too closely but I think I've addressed the known unsupported files case in the latest changes. Test it if you're feeling brave. |
|
Just applied the update and it's much faster now; you really nailed it Previously, I had 67 items failed to process (intentionally) and because it wasn't added onto the DB, it would get tagged as a "new" item and rescan them every time, even though there's no modification on the content/metadata Much appreciated, now it's truly ready to be merged |
|
Would love to see this merged 😁 |
Background
Users currently must manually trigger a full rebuild the music library in order to get any changes picked up by LRCGET. Since LRCGET must retrieve each entire file in order to scan its metadata, this makes updates for any sizable music library very slow, especially if the files must be retrieved from slower storage across the network.
As most files will not change in between scans in normal usage, LRCGET should support some recordkeeping to detect when tracks/lyric files have changed in order to avoid unnecessary lookup.
Implementation
At its heart, this change adds modification time (mtime) fields for media files, txt, and lrc files to the sqlite database's
trackstables, which are then compared to the mtimes on disk to determine if the whole file needs to be retrieved and processed again as part of a scan.If the mtimes haven't changed, we don't retrieve the full file. If the file does not exist in sqlite, we try to process it. If we fail to process it because of missing metadata that is required to add it to the tracks table, we instead add a record of the file to the new
failed_filestable, along with the corresponding mtime. Because missing metadata errors would require modification of files on disk before they could possibly be reprocessed successfully, we skip failed files whose mtimes have not changed on disk during later rescans.In order to make this operation cancellable through the UI, I added an ARC
scan_cancel_flagwhich is shared between the webview and the backend. All write operations in the scan are wrapped in a transaction which can be dropped before it is committed and the logic checks for the scan_cancel_flag at multiple points during the scan.Additions:
get_track()(in case lyrics on disk are newer than db)As a piece of anecdata, my music library of ~5500 songs/145GB sits on a NAS and a full scan would take 1926s (32 minutes). An incremental scan of the library completes in 32s, which is mostly spent waiting on the metadata retrieval calls from disk. 🎉
(Disclaimer: I'm not very familiar with Rust or Vue, so I leaned on LLM tooling very hard for the code changes in this PR. I still did do a good bit of manual testing and this description was written manually.)