Skip to content

Conversation

@e673
Copy link
Collaborator

@e673 e673 commented Nov 28, 2025

#1751

The issue resolves the problem that GetAttr returns node size without taking into account cached data.

$ strace -f -tttT -o strace-git-clone-error.txt git clone https://github.com/martinetd/UDR
Cloning into 'UDR'...
remote: Enumerating objects: 826, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 826 (delta 0), reused 1 (delta 0), pack-reused 823 (from 1)
Receiving objects: 100% (826/826), 467.76 KiB | 426.00 KiB/s, done.
fatal: premature end of pack file, 69 bytes missing
fatal: fetch-pack: invalid index-pack output

Problem indicated by strace:

<... pread64 resumed>"", 69, 475369) = 0 <0.001746>

What happened:

  1. User opened file for writing and wrote 478966 bytes.
  2. Write-back cache wrote 475147 bytes.
  3. User opened a file for reading, CreateHandle reported size 475147.
  4. Write-back cache wrote the remaining bytes.
  5. User tried to read file beyond offset 475147 and received 0 bytes.
  6. User closed file for writing.

Profile logs:

CreateHandle	0.000960s	S_OK	{parent_node_id=2488515, node_name=, flags=1, mode=0, node_id=2488515, handle=29247412031098704, size=475147}
CreateHandle	0.000704s	S_OK	{parent_node_id=2488515, node_name=, flags=1, mode=0, node_id=2488515, handle=70244355433295593, size=475147}
GetNodeAttr	0.000383s	S_OK	{parent_node_id=2488515, node_name=, flags=0, mode=292, node_id=2488515, handle=29247412031098704, size=475147}
...
WriteData	0.001408s	S_OK	[{node_id=2488515, handle=38285405765139321, offset=475147, bytes=3819}]
...
ReadData	0.000573s	S_OK	[{node_id=2488515, handle=70244355433295593, offset=471040, bytes=8192, actual_bytes=7926}]

Proposed solution:

  • Adjust size returned by GetAttr using WriteBackCache
  • Flush cache before processing SetAttr.

@e673 e673 requested a review from SvartMetal November 28, 2025 17:49
@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 1a7ef25.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
9636 9634 0 1 0 1 0

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 1a7ef25.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
45 45 0 0 0 0 0

@e673 e673 marked this pull request as ready for review December 2, 2025 11:47
@e673 e673 changed the title issue-1751: [Filestore] Report correct file size when using WriteBackCache issue-1751: [Filestore] Report correct file size by GetAttr when using WriteBackCache Dec 2, 2025
@e673 e673 force-pushed the users/nasonov/issue-1751-filesize branch from 8e3a16d to 6990335 Compare December 2, 2025 12:56
@e673
Copy link
Collaborator Author

e673 commented Dec 2, 2025

ToDo: add a test to the suite

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 6990335.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
9580 9578 0 0 1 1 0

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 6990335.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
43 43 0 0 0 0 0

@SvartMetal SvartMetal requested a review from neihar December 2, 2025 21:56
@e673 e673 force-pushed the users/nasonov/issue-1751-filesize branch from 6990335 to 7aee24e Compare December 3, 2025 16:22
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 7aee24e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
9644 9643 0 0 0 1 0

@e673 e673 requested a review from qkrorlqr December 3, 2025 19:14
callback = std::move(callback),
request = std::move(request)](const auto& future) mutable
{
future.GetValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it TFuture<void> and not something like TFuture<NProto::TError>?

WriteData can return a fatal error - both formally (the API allows it) and in real life scenarios (e.g. we're out of space logically - ENOSPC)

WriteBackCache seems to retry failed writes indefinitely disregarding ErrorKind (whether the error is retriable or not)

it doesn't look correct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a concern to the main issue:
#1751 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's first fix this problem and only then proceed to spread WriteBackCache class usage across our fuse layer code. I see no point in adding usages of the current seemingly incorrect interface only to fix them later. Doing it the other way around looks more efficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a discussion about it then.
Having a non-retriable error inside Flush means that we have lost client data that is not acceptable.
On the other side, retrying after a non-retriable error will result in infinite loop without progress.

Anyways, this is to be resolved in a separate PR as it is unrelated to the GetAttr issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this one but the next thing that should be done regarding the write-back-cache task should be a fix for this problem.

Having a non-retriable error inside Flush means that we have lost client data that is not acceptable.

Returning a 0 error code from fsync(fd) or sync() and then dropping the data is not acceptable. Or dropping the data written by a direct write. But if we return an error from fsync/sync, then it's fine as long as we don't drop the content of the write-back-cache for the problematic node-id.

qkrorlqr
qkrorlqr previously approved these changes Dec 4, 2025
yegorskii
yegorskii previously approved these changes Dec 8, 2025
@e673 e673 dismissed stale reviews from yegorskii and qkrorlqr via 30046dd December 8, 2025 13:09
@e673 e673 force-pushed the users/nasonov/issue-1751-filesize branch from 7aee24e to 30046dd Compare December 8, 2025 13:09
@e673 e673 requested review from neihar, qkrorlqr and yegorskii December 8, 2025 13:09
neihar
neihar previously approved these changes Dec 8, 2025
return false;
}

const ui64 adjustedSize =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А в этом месте не надо тоже править атрибуты?

https://github.com/ydb-platform/nbs/blob/main/cloud/filestore/libs/vfs_fuse/fs_impl_list.cpp#L269

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Нет, не надо - они далее игнорируются

* From the 'stbuf' argument the st_ino field and bits 12-15 of the
* st_mode field are used. The other fields are ignored.

P.S. Было бы неплохо их кэшировать и затем отдавать в GetAttr, но это не относится к этой задаче.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPD: При использовании дефайна FUSE_VIRTIO вызывается fuse_add_direntry_plus - да, туда атрибуты передаются.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Задаётся ли у нас флаг FUSE_VIRTIO?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Задаётся

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a tricky situation: without a snapshot commit, different ListDirs requests of the same handle may use different commitIds for the data. As a result, directory entries from different batches are not consistent with each other.

POSIX doesn’t require us to reflect the actual file sizes after the first readdir (after opendir or rewinddir). So in an ideal solution, we would flush the WBC, take a single commitId, and then perform all directory listings (until rewinddir or closedir) using that commitId, without worrying about the WBC.

In our current reality, a single WBC flush before the first listing is probably sufficient. After that, it doesn’t really make sense to try to maintain “actual” sizes from the WBC.

Correct me if I am wrong @debnatkh

Copy link
Collaborator

@neihar neihar Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for simple scenarios like 'ls' it should be fine (updating size from WBC) and observed result will be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of flushing the cache before each directory listing because it will significantly increase directory listing time. Since WriteBackCache knows nothing about directories, it will have to flush everything. Also, POSIX doesn't require us to return size at all so we are not restricted and may return whatever we want.

Virtiofsd may cache the returned attributes and reuse it so we need to report the most recent attributes.

But there is a corner case:

  1. User writtes some data.
  2. User requests ListNodes
  3. The data is flushed and removed from cache.
  4. ListNodes request returns old file size.

Copy link
Collaborator

@neihar neihar Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree about current limitations with flushing cache before every listing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect performance degradation in listing due to often calls to AdjustNodeSize ?

@e673 e673 requested a review from debnatkh December 8, 2025 13:42
drbasic
drbasic previously approved these changes Dec 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 30046dd.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
9685 9684 0 0 0 1 0

@e673 e673 changed the title issue-1751: [Filestore] Report correct file size by GetAttr when using WriteBackCache issue-1751: [Filestore] Report correct file size by GetAttr and ListNodes when using WriteBackCache Dec 8, 2025
@e673 e673 requested review from drbasic and neihar December 8, 2025 15:22
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 3519555.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
9685 9684 0 0 0 1 0

@neihar neihar self-requested a review December 9, 2025 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants