Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hardlink support #1954

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

ChengyuZhu6
Copy link
Contributor

Propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.

Fixes: #1953

Depends-on: #1948

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 3 times, most recently from ff5ecf3 to 3e921b6 Compare January 24, 2025 02:44
Comment on lines +313 to +316
if linkPath, exists := dc.hlManager.GetLink(key); exists {
if _, err := os.Stat(linkPath); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just doing wipfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I fully understand your point. Are you asking, 'Why do we need to check if a file with the same content has been hardlinked?' If that’s the case, let me explain: If a file has the same content, we wouldn’t need to create a new cache file; instead, we could simply hardlink to the existing one and use it. This approach would be beneficial for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful if I can try to refactor the CreateLink function to use the wipfile function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have misunderstood anything, please feel free to correct me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Got your point! I'll refactor my code.

cache/cache.go Outdated
Comment on lines 233 to 234
if linkPath, exists := dc.hlManager.GetLink(key); exists {
if r, err := os.Open(linkPath); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance benefit comparing to just doing os.Open(dc.cachePath(key))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not related to the os.Open function. However, I think it's a good point, and I can reuse the dc.cachePath(key) function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. I'll refactor my code.


## Conclusion

Enabling hardlinking in `stargz-snapshotter` can significantly optimize storage and improve performance. By following the steps outlined in this guide, you can leverage this feature to enhance your containerized environments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to specific numbers regarding performance or storage savings?

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from b879c95 to 457b45b Compare February 5, 2025 11:17
@ChengyuZhu6 ChengyuZhu6 requested a review from ktock February 5, 2025 11:18
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from 8829e66 to 92702af Compare February 10, 2025 06:20
@AkihiroSuda
Copy link
Member

Needs rebase

- Adds the EnableHardlink configuration option
- Adds the HardlinkCapability interface
- Updates the directoryCache struct to support hardlinks
- Adds logging for hardlink configuration
- Updates the layer package to pass through hardlink configuration
- Concurrent access testing

Fixes: containerd#1953

Signed-off-by: ChengyuZhu6 <[email protected]>
1. Implements the core HardlinkManager type with methods for:
   - Creating hardlinks with retry logic and atomic operations
   - Getting existing hardlinks
   - Cleaning up expired hardlinks
   - Persisting and restoring hardlink information
2. Adds a global singleton instance of HardlinkManager
3. Implements periodic cleanup of expired hardlinks
4. Adds proper error handling and logging
5. Uses atomic operations for file operations to ensure consistency

Signed-off-by: ChengyuZhu6 <[email protected]>
- Initializes the hardlink manager in the directory cache
- Updates `Get()` to try hardlinks first before other cache methods
- Updates `Add()` to check for existing hardlinks and create new ones
- Implements the HardlinkCapability interface methods
- Adds proper error handling and logging throughout
- Introduce a new function `wrapMemoryWriter` help function to handle the memory caching logic

Signed-off-by: ChengyuZhu6 <[email protected]>
1. Added batched persistence with a 5-second interval
2. Made persistence asynchronous using a background worker
3. Removed pretty-printing of JSON to reduce file size
4. Added proper cleanup on manager close
5. Reduced logging verbosity
6. Added dirty flag to track changes needing persistence

Signed-off-by: ChengyuZhu6 <[email protected]>
1. Adds comprehensive tests for the hardlink manager functionality:
   - CreateLink tests
   - GetLink tests
   - Cleanup tests
   - Persistence/Restoration tests
   - Concurrent operation tests
   - Disabled functionality tests
2. Adds tests for disabled hardlink functionality
3. Verifies proper error handling and edge cases
4. Tests the integration with the directory cache

The tests ensure that:
   - Hardlinks are created correctly and point to the same inode
   - Invalid links are handled gracefully
   - Cleanup works as expected
   - Link information is properly persisted and restored
   - The system works correctly under concurrent access
   - The cache works properly when hardlinks are disabled

Signed-off-by: ChengyuZhu6 <[email protected]>
This commit introduces a comprehensive guide on how to enable and use
the hardlink feature in stargz-snapshotter. The documentation includes:

- An overview of the hardlink feature and its benefits.
- Step-by-step instructions for enabling hardlinking in the configuration.
- Details on how hardlinking works within the cache system.
- Monitoring, maintenance, and troubleshooting tips.
- A conclusion highlighting the performance and storage optimization benefits.

This guide aims to help users optimize their storage solutions by leveraging
hardlinks in containerized environments.

Signed-off-by: ChengyuZhu6 <[email protected]>
@ChengyuZhu6
Copy link
Contributor Author

Needs rebase

Done.

- Move common path and file creation logic to utils.go
- Add `BuildCachePath` function for consistent cache path generation
- Add `WipFile` function for temporary file creation with directory handling
- Update cache and hardlink managers to use the new utility functions

Signed-off-by: ChengyuZhu6 <[email protected]>
@ChengyuZhu6
Copy link
Contributor Author

Just retrigger ci, no code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Hardlink Feature for Cache Optimization and Data Deduplication
3 participants