-
Notifications
You must be signed in to change notification settings - Fork 38
issue-1751: [Filestore] Report correct file size by GetAttr and ListNodes when using WriteBackCache #4745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note This is an automated comment that will be appended during run. 🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 1a7ef25.
🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 1a7ef25.
|
8e3a16d to
6990335
Compare
|
ToDo: add a test to the suite |
|
Note This is an automated comment that will be appended during run. 🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 6990335.
🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 6990335.
|
6990335 to
7aee24e
Compare
| callback = std::move(callback), | ||
| request = std::move(request)](const auto& future) mutable | ||
| { | ||
| future.GetValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it TFuture<void> and not something like TFuture<NProto::TError>?
WriteData can return a fatal error - both formally (the API allows it) and in real life scenarios (e.g. we're out of space logically - ENOSPC)
WriteBackCache seems to retry failed writes indefinitely disregarding ErrorKind (whether the error is retriable or not)
it doesn't look correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a concern to the main issue:
#1751 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, let's first fix this problem and only then proceed to spread WriteBackCache class usage across our fuse layer code. I see no point in adding usages of the current seemingly incorrect interface only to fix them later. Doing it the other way around looks more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a discussion about it then.
Having a non-retriable error inside Flush means that we have lost client data that is not acceptable.
On the other side, retrying after a non-retriable error will result in infinite loop without progress.
Anyways, this is to be resolved in a separate PR as it is unrelated to the GetAttr issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve this one but the next thing that should be done regarding the write-back-cache task should be a fix for this problem.
Having a non-retriable error inside Flush means that we have lost client data that is not acceptable.
Returning a 0 error code from fsync(fd) or sync() and then dropping the data is not acceptable. Or dropping the data written by a direct write. But if we return an error from fsync/sync, then it's fine as long as we don't drop the content of the write-back-cache for the problematic node-id.
7aee24e to
30046dd
Compare
| return false; | ||
| } | ||
|
|
||
| const ui64 adjustedSize = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А в этом месте не надо тоже править атрибуты?
https://github.com/ydb-platform/nbs/blob/main/cloud/filestore/libs/vfs_fuse/fs_impl_list.cpp#L269
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Нет, не надо - они далее игнорируются
nbs/contrib/libs/virtiofsd/fuse_lowlevel.h
Lines 1454 to 1455 in 1cde045
| * From the 'stbuf' argument the st_ino field and bits 12-15 of the | |
| * st_mode field are used. The other fields are ignored. |
P.S. Было бы неплохо их кэшировать и затем отдавать в GetAttr, но это не относится к этой задаче.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPD: При использовании дефайна FUSE_VIRTIO вызывается fuse_add_direntry_plus - да, туда атрибуты передаются.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Задаётся ли у нас флаг FUSE_VIRTIO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Задаётся
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a tricky situation: without a snapshot commit, different ListDirs requests of the same handle may use different commitIds for the data. As a result, directory entries from different batches are not consistent with each other.
POSIX doesn’t require us to reflect the actual file sizes after the first readdir (after opendir or rewinddir). So in an ideal solution, we would flush the WBC, take a single commitId, and then perform all directory listings (until rewinddir or closedir) using that commitId, without worrying about the WBC.
In our current reality, a single WBC flush before the first listing is probably sufficient. After that, it doesn’t really make sense to try to maintain “actual” sizes from the WBC.
Correct me if I am wrong @debnatkh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for simple scenarios like 'ls' it should be fine (updating size from WBC) and observed result will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of flushing the cache before each directory listing because it will significantly increase directory listing time. Since WriteBackCache knows nothing about directories, it will have to flush everything. Also, POSIX doesn't require us to return size at all so we are not restricted and may return whatever we want.
Virtiofsd may cache the returned attributes and reuse it so we need to report the most recent attributes.
But there is a corner case:
- User writtes some data.
- User requests ListNodes
- The data is flushed and removed from cache.
- ListNodes request returns old file size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree about current limitations with flushing cache before every listing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect performance degradation in listing due to often calls to AdjustNodeSize ?
#1751
The issue resolves the problem that
GetAttrreturns node size without taking into account cached data.Problem indicated by strace:
What happened:
Profile logs:
Proposed solution: