-
Notifications
You must be signed in to change notification settings - Fork 89
[NSFS | NC | GLACIER] Add support for tape reclaim #9241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[NSFS | NC | GLACIER] Add support for tape reclaim #9241
Conversation
WalkthroughAdds end-to-end Glacier tape reclaim: new config flags, CLI reclaim action, manage_nsfs reclaim handler, Glacier reclaim flow and cluster lock, TapeCloud reclaim execution, NamespaceFS reclaim WAL logging, and native DMAPI xattr exposure for tape UID. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client as App/Client
participant NFS as NamespaceFS
participant FS as Filesystem (stat/xattrs)
participant WAL as Reclaim WAL
Client->>NFS: finalize upload / delete
alt DMAPI reclaim enabled
NFS->>FS: stat & read xattrs (incl. IBMUID)
NFS->>WAL: append(full_path, logical_size, xattrs)
WAL-->>NFS: ack
end
NFS-->>Client: operation complete
sequenceDiagram
autonumber
actor Op as Operator/Timer
participant CLI as manage_nsfs
participant MNG as manage_nsfs_glacier
participant G as Glacier.perform("RECLAIM")
participant Lock as ClusterLock
participant TC as TapeCloudGlacier
participant Utils as TapeCloudUtils
Op->>CLI: run --glacier reclaim
CLI->>MNG: manage_glacier_operations(RECLAIM)
MNG->>G: perform(fs_context, "RECLAIM")
G->>Lock: acquire GLACIER_RECLAIM_CLUSTER_LOCK
alt interval & free-space checks pass
G->>TC: reclaim(fs_context, log_file, failure_recorder)
TC->>Utils: reclaim(WAL entry file)
Utils-->>TC: returns (logs errors)
TC-->>G: processed entries
G->>G: update reclaim timestamp file
else skipped
G-->>MNG: noop
end
G->>Lock: release
MNG-->>CLI: done
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)
2128-2140: Pass the stat to avoid duplicate filesystem calls.In both deletion paths, the code calls
append_to_reclaim_walwithout passing thestatparameter, which causes the method to fetch the stat again from the filesystem. However, the stat is already available in the lifecycle deletion path from the verification step.Optimize by passing the stat when available:
At line 2128, in the lifecycle deletion path:
try { files = await this._open_files(fs_context, { src_path: file_path, delete_version: true }); + const stat = await files.delete_version.src_file.stat(fs_context); await this._verify_lifecycle_filter_and_unlink(fs_context, params, file_path, files.delete_version); - await this.append_to_reclaim_wal(fs_context, file_path); + await this.append_to_reclaim_wal(fs_context, file_path, stat); } catch (err) {Note: The non-lifecycle path at lines 2135-2140 doesn't have easy access to the stat, so it's acceptable to let the method fetch it.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
config.js(3 hunks)src/cmd/manage_nsfs.js(1 hunks)src/manage_nsfs/manage_nsfs_constants.js(2 hunks)src/manage_nsfs/manage_nsfs_glacier.js(2 hunks)src/sdk/glacier.js(7 hunks)src/sdk/glacier_tapecloud.js(4 hunks)src/sdk/namespace_fs.js(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
GLACIER_ACTIONS(25-30)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
fs_context(549-554)config(10-10)
src/sdk/glacier.js (3)
src/cmd/manage_nsfs.js (3)
fs_context(73-73)type(57-57)native_fs_utils(23-23)src/manage_nsfs/manage_nsfs_glacier.js (5)
fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)native_fs_utils(9-9)src/util/native_fs_utils.js (1)
fs_context(549-554)
src/sdk/glacier_tapecloud.js (3)
src/util/native_fs_utils.js (1)
fs_context(549-554)src/manage_nsfs/manage_nsfs_glacier.js (4)
fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
fs_context(191-191)fs_context(303-303)fs_context(439-439)
src/sdk/namespace_fs.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (7)
config(6-6)nb_native(7-7)nb_native(85-85)fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)src/native/fs/fs_napi.cpp (2)
stat(2029-2033)stat(2030-2030)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (6)
src/manage_nsfs/manage_nsfs_constants.js (1)
29-29: LGTM!The RECLAIM action is correctly added to GLACIER_ACTIONS and properly configured in VALID_OPTIONS_GLACIER following the same pattern as existing glacier actions.
Also applies to: 76-76
src/cmd/manage_nsfs.js (1)
882-884: LGTM!The RECLAIM action handler is correctly integrated following the same pattern as other glacier operations (MIGRATE, RESTORE, EXPIRY).
config.js (1)
931-931: LGTM!The new configuration flag is appropriately placed in the GLACIER configuration section with a conservative default of
false, requiring explicit opt-in for tape reclaim functionality.src/sdk/glacier.js (1)
24-24: LGTM!The reclaim infrastructure is correctly integrated:
- New constants follow naming conventions (RECLAIM_TIMESTAMP_FILE, RECLAIM_WAL_NAME, GLACIER_RECLAIM_CLUSTER_LOCK)
- Base
reclaim()method properly throws 'Unimplemented' for subclass implementationperform()correctly handles "RECLAIM" type with single-phase processing (no staging)Note: The reclaim flow intentionally differs from MIGRATE/RESTORE by using direct log processing without a staging phase, which is appropriate for cleanup operations.
Also applies to: 89-89, 101-101, 197-209, 229-229, 296-299
src/sdk/namespace_fs.js (2)
3771-3780: LGTM - reclaim_wal getter follows established patterns.The static getter for
reclaim_walis correctly implemented and follows the same pattern asmigrate_walandrestore_wal.
3854-3855: LGTM - static field declaration is correct.The static field declaration for
_reclaim_walis consistent with other WAL declarations in the class.
fa98cef to
1217008
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/sdk/namespace_fs.js (2)
1408-1409: Logic inconsistency: stat destination before file is moved there, but then log wrong path.This code has a critical flaw flagged in the previous review that remains unaddressed:
- You stat
file_path(the destination) before the uploaded file has been moved there fromupload_path- Errors are silently suppressed with
.catch(_.noop), potentially hiding real issues- At line 1460, you log
upload_pathto the reclaim WAL, notfile_path, which is inconsistent with the stat you just performedIf the intent is to detect tape reclaim on the existing destination file (before overwrite), then:
file_pathmight not exist yet (ENOENT for new uploads)- You should log
file_path(the file being replaced), notupload_path- ENOENT should be handled explicitly
Apply this diff to fix the logic:
-const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && - await nb_native().fs.stat(fs_context, file_path).catch(_.noop); +let file_path_stat; +if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && !same_inode && !part_upload) { + try { + file_path_stat = await nb_native().fs.stat(fs_context, file_path); + } catch (err) { + if (err.code !== 'ENOENT') { + dbg.error('_finish_upload: failed to stat destination for reclaim check', file_path, err); + } + // Destination doesn't exist yet (new file), no reclaim needed + } +}And update line 1460 to use
file_path:if (file_path_stat && file_path_stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) { - await this.append_to_reclaim_wal(fs_context, upload_path, file_path_stat); + await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat); }
3725-3745: Add error handling to prevent reclaim logging failures from breaking main operations.The method lacks error handling for the stat operation and WAL append, which could cause unhandled promise rejections. While debug console.log statements mentioned in the previous review appear to have been removed, the error handling issue remains critical.
Apply this diff:
async append_to_reclaim_wal(fs_context, file_path, stat) { if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return; - if (!stat) { - stat = await nb_native().fs.stat(fs_context, file_path); + try { + if (!stat) { + stat = await nb_native().fs.stat(fs_context, file_path); + } + + const data = JSON.stringify({ + full_path: file_path, + logical_size: stat.size, + ea: stat.xattr, + }); + await NamespaceFS.reclaim_wal.append(data); + } catch (err) { + // Log error but don't fail the operation + dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err); } - - const data = JSON.stringify({ - full_path: file_path, - logical_size: stat.size, - ea: stat.xattr, - }); - await NamespaceFS.reclaim_wal.append(data); }This ensures that reclaim logging failures don't break the main operation flow (upload, delete, etc.).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.editorconfig(0 hunks)config.js(3 hunks)src/cmd/manage_nsfs.js(1 hunks)src/manage_nsfs/manage_nsfs_constants.js(2 hunks)src/manage_nsfs/manage_nsfs_glacier.js(2 hunks)src/native/fs/fs_napi.cpp(2 hunks)src/sdk/glacier.js(7 hunks)src/sdk/glacier_tapecloud.js(4 hunks)src/sdk/namespace_fs.js(6 hunks)
💤 Files with no reviewable changes (1)
- .editorconfig
🚧 Files skipped from review as they are similar to previous changes (2)
- config.js
- src/manage_nsfs/manage_nsfs_constants.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
fs_context(549-554)config(10-10)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
GLACIER_ACTIONS(25-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (3)
src/sdk/namespace_fs.js (3)
2128-2128: LGTM: Reclaim logging after lifecycle deletion.The call to
append_to_reclaim_walhere correctly logs files deleted via lifecycle policies. The error handling appropriately catches ENOENT in case the file was already deleted.
2135-2140: LGTM: Reclaim logging after regular deletion.The call to
append_to_reclaim_walhere correctly logs files after successful deletion in non-lifecycle flows. The try-catch properly handles expected errors (ENOENT, EISDIR) while allowing unexpected errors to propagate.
3769-3778: LGTM: Proper lazy initialization for reclaim WAL logger.The static getter follows the established pattern used by
migrate_walandrestore_wal, with appropriate configuration for shared locking and polling interval.
| static async reclaim(file) { | ||
| try { | ||
| dbg.log1("Starting reclaim for file", file); | ||
| const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true }); | ||
| dbg.log4("reclaim finished with:", out); | ||
| dbg.log1("Finished reclaim for file", file); | ||
| } catch (error) { | ||
| dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error); | ||
| } | ||
|
|
||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagate reclaim failures so WAL entries aren’t lost.
TapeCloudUtils.reclaim() always resolves to true, even when exec() fails. That tells PersistentLogger.process() the reclaim batch succeeded, so the WAL file is deleted and the entries are never retried. Please return false (or rethrow) on failure so the WAL stays on disk for the next run.
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
- } catch (error) {
- dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+ return true;
+ } catch (error) {
+ dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+ return false;
}
-
- return true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static async reclaim(file) { | |
| try { | |
| dbg.log1("Starting reclaim for file", file); | |
| const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true }); | |
| dbg.log4("reclaim finished with:", out); | |
| dbg.log1("Finished reclaim for file", file); | |
| } catch (error) { | |
| dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error); | |
| } | |
| return true; | |
| } | |
| static async reclaim(file) { | |
| try { | |
| dbg.log1("Starting reclaim for file", file); | |
| const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true }); | |
| dbg.log4("reclaim finished with:", out); | |
| dbg.log1("Finished reclaim for file", file); | |
| return true; | |
| } catch (error) { | |
| dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error); | |
| return false; | |
| } | |
| } |
🤖 Prompt for AI Agents
In src/sdk/glacier_tapecloud.js around lines 196-207 the async reclaim() always
returns true even when exec() throws, causing PersistentLogger to treat failures
as successes and delete WAL entries; change the catch to return false (or
rethrow) so failures propagate — keep the existing error logging, and ensure the
function only returns true on successful exec and returns false inside the catch
block so the WAL is retained for retry.
1217008 to
0cd41c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
config.js(3 hunks)src/cmd/manage_nsfs.js(1 hunks)src/manage_nsfs/manage_nsfs_constants.js(2 hunks)src/manage_nsfs/manage_nsfs_glacier.js(2 hunks)src/native/fs/fs_napi.cpp(2 hunks)src/sdk/glacier.js(7 hunks)src/sdk/glacier_tapecloud.js(4 hunks)src/sdk/namespace_fs.js(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- config.js
- src/cmd/manage_nsfs.js
- src/manage_nsfs/manage_nsfs_glacier.js
- src/sdk/glacier_tapecloud.js
- src/sdk/namespace_fs.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/sdk/glacier.js (4)
src/cmd/manage_nsfs.js (3)
fs_context(73-73)type(57-57)native_fs_utils(23-23)src/manage_nsfs/manage_nsfs_glacier.js (5)
fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)native_fs_utils(9-9)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
fs_context(191-191)fs_context(303-303)fs_context(439-439)src/util/native_fs_utils.js (1)
fs_context(549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (9)
src/manage_nsfs/manage_nsfs_constants.js (2)
25-30: LGTM!The addition of the RECLAIM action follows the existing pattern and is consistent with other Glacier actions.
72-77: LGTM!The reclaim option is correctly configured with the same pattern as other Glacier operations.
src/native/fs/fs_napi.cpp (2)
53-57: LGTM!The IBMUID DMAPI extended attribute macros follow the established pattern for other DMAPI xattrs (IBMObj, IBMPMig, IBMTPS) and are correctly constructed.
256-261: LGTM!The GPFS_DMAPI_XATTR_TAPE_UID is correctly added to the GPFS_DMAPI_XATTRS vector, ensuring it will be retrieved alongside other DMAPI attributes.
src/sdk/glacier.js (5)
24-24: LGTM!The RECLAIM_TIMESTAMP_FILE constant follows the established naming pattern for other timestamp files.
89-89: LGTM!The RECLAIM_WAL_NAME constant follows the naming pattern of other WAL constants.
101-101: LGTM!The GLACIER_RECLAIM_CLUSTER_LOCK constant follows the established naming pattern for cluster locks.
229-229: LGTM!The perform method signature is correctly updated to include "RECLAIM" in the type union.
296-299: LGTM!The RECLAIM handling correctly uses a cluster lock and processes the reclaim WAL logs. The simpler pattern (without staging) is appropriate for cleanup operations.
0cd41c1 to
d747791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/sdk/glacier.js (2)
75-84: Tighten IBMUID JSDoc phrasing.Redundant wording; simplify for clarity.
Apply:
- /** - * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID + /** + * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file
197-209: Fix typos in reclaim stub.
- JSDoc: “inindexed” → “unindexed”.
- Error text: “Unimplementented” → “Unimplemented”.
Apply:
- /** - * reclaim cleans up inindexed items in the underlying + /** + * reclaim cleans up unindexed items in the underlying @@ - async reclaim(fs_context, log_file, failure_recorder) { - throw new Error('Unimplementented'); + async reclaim(fs_context, log_file, failure_recorder) { + throw new Error('Unimplemented'); }src/sdk/namespace_fs.js (1)
3730-3750: Make append_to_reclaim_wal non-failing and filter entries.This must never break uploads/deletes. Wrap in try/catch, and log only when DMAPI tape xattr is present. Optionally, trim xattrs (privacy).
Apply:
- async append_to_reclaim_wal(fs_context, file_path, stat) { - if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return; - - if (!stat) { - stat = await nb_native().fs.stat(fs_context, file_path); - } - - const data = JSON.stringify({ - full_path: file_path, - logical_size: stat.size, - ea: stat.xattr, - }); - await NamespaceFS.reclaim_wal.append(data); - } + async append_to_reclaim_wal(fs_context, file_path, stat) { + if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return; + try { + if (!stat) { + stat = await nb_native().fs.stat(fs_context, file_path); + } + if (!stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) return; + const data = JSON.stringify({ + full_path: file_path, + logical_size: stat.size, + // keep only DMAPI-related EAs to reduce noise/sensitivity + ea: { + [Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR], + [Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG], + [Glacier.GPFS_DMAPI_XATTR_TAPE_TPS]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_TPS], + [Glacier.GPFS_DMAPI_XATTR_TAPE_UID]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_UID], + }, + }); + await NamespaceFS.reclaim_wal.append(data); + } catch (err) { + dbg.warn('append_to_reclaim_wal: failed to log reclaim entry', file_path, err); + } + }
🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)
2136-2145: Deletion path: WAL logging should gate on DMAPI indicator and avoid noisy warns.
- Avoid logging all deletes; check DMAPI indicator before appending.
- Wrap stat + append in try/catch; don’t warn on ENOENT.
Apply:
- try { - const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && - await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this)); - await nb_native().fs.unlink(fs_context, file_path); - if (stat) { - await this.append_to_reclaim_wal(fs_context, file_path, stat); - } - } catch (err) { + try { + let stat; + if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) { + try { + stat = await nb_native().fs.stat(fs_context, file_path); + } catch (e) { + if (!['ENOENT', 'EISDIR'].includes(e.code)) dbg.warn('delete_object stat failed', file_path, e); + } + } + await nb_native().fs.unlink(fs_context, file_path); + if (stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) { + await this.append_to_reclaim_wal(fs_context, file_path, stat).catch(dbg.warn.bind(this)); + } + } catch (err) { if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
config.js(3 hunks)src/cmd/manage_nsfs.js(1 hunks)src/manage_nsfs/manage_nsfs_constants.js(2 hunks)src/manage_nsfs/manage_nsfs_glacier.js(2 hunks)src/native/fs/fs_napi.cpp(2 hunks)src/sdk/glacier.js(7 hunks)src/sdk/glacier_tapecloud.js(4 hunks)src/sdk/namespace_fs.js(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/manage_nsfs/manage_nsfs_glacier.js
- src/sdk/glacier_tapecloud.js
- src/native/fs/fs_napi.cpp
- config.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
GLACIER_ACTIONS(25-30)
src/sdk/glacier.js (1)
src/util/native_fs_utils.js (1)
fs_context(549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
src/cmd/manage_nsfs.js (1)
882-884: Add RECLAIM action handling — looks good.Switch case correctly wires to manage_nsfs_glacier.process_reclaim().
Optionally verify manage_nsfs_glacier exports process_reclaim and CLI help includes 'reclaim'.
src/manage_nsfs/manage_nsfs_constants.js (1)
29-30: RECLAIM action and options registered — good coverage.GLACIER_ACTIONS and VALID_OPTIONS_GLACIER updated consistently.
Also applies to: 76-77
src/sdk/glacier.js (1)
24-25: RECLAIM constants and perform() branch — LGTM.New timestamp/WAL/lock constants and perform() path align with existing patterns.
Also applies to: 89-90, 101-102, 296-300
src/sdk/namespace_fs.js (3)
3774-3783: Expose reclaim_wal — LGTM.Consistent with migrate/restore WALs; SHARED writer lock matches usage.
3811-3813: Lifecycle deletion WAL append — OK, will be safe after making append non-throwing.No further action if append_to_reclaim_wal is wrapped.
Ensure reclaim processing tolerates mixed entries with trimmed xattrs as proposed.
3860-3862: Static _reclaim_wal init — LGTM.
Signed-off-by: Utkarsh Srivastava <[email protected]> add all kind of tests for lifecycle Signed-off-by: Utkarsh Srivastava <[email protected]>
d747791 to
5b6ed9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
src/sdk/glacier_tapecloud.js (2)
471-484: Unused parameter: failure_recorder in reclaimfailure_recorder is accepted but never used. Either wire a failure recorder path (if partial failures will be tracked) or drop the param for clarity. The former is preferable if retry semantics are needed.
196-207: Don’t return success on reclaim failures — keep WAL for retryTapeCloudUtils.reclaim() always returns true, even when exec() throws. This will delete WAL files and lose entries.
Apply this fix to propagate failure:
- static async reclaim(file) { + static async reclaim(file) { try { dbg.log1("Starting reclaim for file", file); const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true }); dbg.log4("reclaim finished with:", out); dbg.log1("Finished reclaim for file", file); - } catch (error) { - dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error); - } - - return true; + return true; + } catch (error) { + dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error); + return false; + } }src/sdk/glacier.js (2)
75-84: Tidy IBMUID JSDoc phrasingClarify wording.
- /** - * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID + /** + * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file
197-209: Fix JSDoc typo in reclaim description"inindexed" → "unindexed".
- /** - * reclaim cleans up inindexed items in the underlying + /** + * reclaim cleans up unindexed items in the underlyingsrc/sdk/namespace_fs.js (2)
1410-1411: Consider explicit ENOENT handling instead of silently swallowing all errors.The
.catch(_.noop)suppresses all stat errors (EPERM, EIO, etc.), not just ENOENT when the destination doesn't exist yet. While this might not break functionality, it could hide unexpected issues.Consider applying the suggested fix from the past review comment:
-const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && - await nb_native().fs.stat(fs_context, file_path).catch(_.noop); +let file_path_stat; +if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) { + try { + file_path_stat = await nb_native().fs.stat(fs_context, file_path); + } catch (err) { + if (err.code !== 'ENOENT') { + dbg.warn('append_to_reclaim_wal: failed to stat destination', file_path, err); + } + // Destination doesn't exist yet or stat failed, no reclaim needed + } +}
3733-3753: Add error handling to prevent reclaim logging failures from breaking operations.The past review comment requested wrapping this method's body in try-catch so failures don't bubble up and break the main operation flow. This has not been implemented yet.
As suggested in the previous review, apply this error handling:
async append_to_reclaim_wal(fs_context, file_path, stat) { if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return; - if (!stat) { - stat = await nb_native().fs.stat(fs_context, file_path); - } - - const data = JSON.stringify({ - full_path: file_path, - logical_size: stat.size, - ea: stat.xattr, - }); - await NamespaceFS.reclaim_wal.append(data); + try { + if (!stat) { + stat = await nb_native().fs.stat(fs_context, file_path); + } + + const data = JSON.stringify({ + full_path: file_path, + logical_size: stat.size, + ea: stat.xattr, + }); + await NamespaceFS.reclaim_wal.append(data); + } catch (err) { + // Log error but don't fail the operation + dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err); + } }Based on learnings
🧹 Nitpick comments (1)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)
171-179: Close reclaim WAL in teardown to avoid FD leakNamespaceFS._reclaim_wal is opened but not closed in mocha.after(). Add close similar to migrate/restore to avoid dangling handles.
mocha.after(async function() { - await Promise.all([ - fs.rm(ns_src_bucket_path, { recursive: true, force: true }), - fs.rm(config.NSFS_GLACIER_LOGS_DIR, { recursive: true, force: true }), - ]); + try { + await NamespaceFS._reclaim_wal?.close(); + } finally { + await Promise.all([ + fs.rm(ns_src_bucket_path, { recursive: true, force: true }), + fs.rm(config.NSFS_GLACIER_LOGS_DIR, { recursive: true, force: true }), + ]); + } });
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
config.js(3 hunks)src/cmd/manage_nsfs.js(1 hunks)src/manage_nsfs/manage_nsfs_constants.js(2 hunks)src/manage_nsfs/manage_nsfs_glacier.js(2 hunks)src/native/fs/fs_napi.cpp(2 hunks)src/sdk/glacier.js(7 hunks)src/sdk/glacier_tapecloud.js(4 hunks)src/sdk/namespace_fs.js(7 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/manage_nsfs/manage_nsfs_constants.js
- config.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit configuration file
src/test/**/*.*: Ensure that the PR includes tests for the changes.
Files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧬 Code graph analysis (5)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
GLACIER_ACTIONS(25-30)
src/manage_nsfs/manage_nsfs_glacier.js (3)
src/cmd/manage_nsfs.js (3)
fs_context(73-73)native_fs_utils(23-23)config(18-18)src/manage_nsfs/manage_nsfs_cli_utils.js (2)
fs_context(125-125)native_fs_utils(7-7)src/util/native_fs_utils.js (2)
fs_context(549-554)config(10-10)
src/sdk/namespace_fs.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (7)
config(6-6)nb_native(7-7)nb_native(85-85)fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)src/native/fs/fs_napi.cpp (2)
stat(2032-2036)stat(2033-2033)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
src/manage_nsfs/manage_nsfs_glacier.js (8)
require(5-5)require(8-8)require(10-10)backend(14-14)backend(29-29)backend(43-43)backend(63-63)config(6-6)src/sdk/glacier.js (4)
require(7-7)require(10-10)config(9-9)s3_utils(6-6)src/sdk/namespace_fs.js (8)
require(26-26)require(29-29)require(30-30)lifecycle_utils(27-27)config(13-13)s3_utils(15-15)buffer_utils(18-18)crypto(14-14)
src/sdk/glacier.js (3)
src/cmd/manage_nsfs.js (3)
fs_context(73-73)type(57-57)native_fs_utils(23-23)src/manage_nsfs/manage_nsfs_glacier.js (5)
fs_context(13-13)fs_context(28-28)fs_context(42-42)fs_context(62-62)native_fs_utils(9-9)src/util/native_fs_utils.js (1)
fs_context(549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (12)
src/native/fs/fs_napi.cpp (3)
260-260: LGTM! Addition to GPFS_DMAPI_XATTRS vector is correct.The new xattr is properly added to the
GPFS_DMAPI_XATTRSvector, which is used byget_fd_gpfs_xattr()(line 482) to retrieve GPFS-specific extended attributes whenuse_dmapiis enabled. This change is backward compatible and follows the established pattern.
53-260: Note: PR description and tests need attention.While the code changes are correct and minimal, please note:
- The PR description contains only placeholder text with no actual explanation of the changes
- The PR checklist shows "Tests added" is unchecked
Given that this adds support for a new DMAPI xattr for Glacier tape reclaim, please:
- Update the PR description to explain the purpose of the IBMUID xattr
- Add tests to verify the xattr is correctly retrieved when
use_dmapiis enabled- Document the expected format/content of the IBMUID xattr value
53-57: LGTM! IBMUID attribute name verified and implementation is consistent.The new macros follow the exact same naming convention and pattern as the existing DMAPI xattrs. Verification across the codebase confirms "IBMUID" is the correct GPFS DMAPI extended attribute name for tape UID—it's already used in
src/sdk/glacier.jswith documentation confirming it indicates file migration/premigration status. The C++ implementation properly mirrors the JavaScript constant and adds the xattr to the retrieval vector using the established pattern.src/cmd/manage_nsfs.js (1)
882-884: RECLAIM action wiring LGTMDispatch to manage_nsfs_glacier.process_reclaim() with break is correct and consistent with other actions.
src/manage_nsfs/manage_nsfs_glacier.js (2)
61-73: process_reclaim logic looks correctUses NSFS_GLACIER_RECLAIM_INTERVAL and timestamp gating; updates timestamp post-run.
145-145: Export added correctlyprocess_reclaim exported alongside other flows.
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)
540-596: Reclaim test coverage is solidValidates single/multi delete, overwrite, and lifecycle-led delete entries in reclaim WAL. No issues noted.
Also applies to: 598-657, 659-685
src/sdk/glacier.js (1)
296-300: perform('RECLAIM') path LGTMCluster lock + processing RECLAIM WAL mirrors migrate/restore patterns.
src/sdk/namespace_fs.js (4)
1459-1462: LGTM: Reclaim logging for tape-resident files.The logic correctly checks for the tape indicator xattr and logs the destination file for reclaim before it's overwritten. The path consistency issue mentioned in past reviews has been addressed.
3777-3786: LGTM: Standard lazy initialization pattern.The reclaim WAL getter follows the same pattern as
migrate_walandrestore_wal, with proper lazy initialization and consistent configuration.
3814-3816: LGTM: Lifecycle deletion reclaim logging with error handling.The call to
append_to_reclaim_walis properly wrapped with.catch(dbg.warn.bind(this))to ensure logging failures don't break the lifecycle deletion flow.
3863-3864: LGTM: Static field declaration.Properly declares the static field for the reclaim WAL instance.
| try { | ||
| const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && | ||
| await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this)); | ||
| await nb_native().fs.unlink(fs_context, file_path); | ||
| if (stat) { | ||
| await this.append_to_reclaim_wal(fs_context, file_path, stat); | ||
| } | ||
| } catch (err) { | ||
| if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix error handling in non-lifecycle deletion reclaim logging.
The current error handling has several issues:
.catch(dbg.warn.bind(this))on the stat call (line 2137) returns undefined on error, silently suppressing stat failures while still attempting unlink- If
append_to_reclaim_walthrows (no try-catch around it), it will fail the entire deletion operation - The outer try-catch only handles ENOENT/EISDIR from unlink, not from reclaim logging
Apply this pattern to ensure reclaim logging failures don't break deletions:
- try {
- const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
- await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this));
- await nb_native().fs.unlink(fs_context, file_path);
- if (stat) {
- await this.append_to_reclaim_wal(fs_context, file_path, stat);
- }
- } catch (err) {
- if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err;
- }
+ try {
+ let stat;
+ if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) {
+ try {
+ stat = await nb_native().fs.stat(fs_context, file_path);
+ } catch (err) {
+ dbg.warn('_delete_single_object: failed to stat for reclaim', file_path, err);
+ }
+ }
+ await nb_native().fs.unlink(fs_context, file_path);
+ if (stat) {
+ await this.append_to_reclaim_wal(fs_context, file_path, stat).catch(err => {
+ dbg.warn('_delete_single_object: failed to log reclaim', file_path, err);
+ });
+ }
+ } catch (err) {
+ if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && | |
| await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this)); | |
| await nb_native().fs.unlink(fs_context, file_path); | |
| if (stat) { | |
| await this.append_to_reclaim_wal(fs_context, file_path, stat); | |
| } | |
| } catch (err) { | |
| if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err; | |
| } | |
| try { | |
| let stat; | |
| if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) { | |
| try { | |
| stat = await nb_native().fs.stat(fs_context, file_path); | |
| } catch (err) { | |
| dbg.warn('_delete_single_object: failed to stat for reclaim', file_path, err); | |
| } | |
| } | |
| await nb_native().fs.unlink(fs_context, file_path); | |
| if (stat) { | |
| await this.append_to_reclaim_wal(fs_context, file_path, stat).catch(err => { | |
| dbg.warn('_delete_single_object: failed to log reclaim', file_path, err); | |
| }); | |
| } | |
| } catch (err) { | |
| if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err; | |
| } |
Describe the Problem
Tape keeps on accumulating deleted data and needs a way to reclaim the space. IBM Diamondback has this feature but isn't exposed in NooBaa. This PR adds support for the same feature.
Explain the Changes
This PR adds support for tape reclaim. It adds a new command under
noobaa-cli glaciercommands. Tape reclaim feature just like restore and migrate relies on journaling but it is disabled by default and must be enabled byNSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIMtotrue.Entries are added to the reclaim log/journal under 3 circumstances:
Testing Instructions:
./node_modules/.bin/mocha src/test/unit_tests/nsfs/test_nsfs_glacier_backend.jsSummary by CodeRabbit
New Features
Improvements
Tests
Style