-
Notifications
You must be signed in to change notification settings - Fork 12
vvc_deblock.c: fix RANDCLIP #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: deblock_asm_20250223
Are you sure you want to change the base?
Conversation
Previously RANDCLIP(x, diff) was computing the difference x - diff and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range. Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff. This greatly improves the generation of strong deblocking data.
@@ -67,8 +67,8 @@ static const uint32_t pixel_mask[3] = {0xffffffff, 0x03ff03ff, 0x0fff0fff}; | |||
else \ | |||
*(uint16_t*)(&x) = z; \ | |||
} while (0) | |||
#define RANDCLIP(x, diff) av_clip(GET(x) - (diff), 0, \ | |||
(1 << (bit_depth)) - 1) + rnd() % FFMAX(2 * (diff), 1) | |||
#define RANDCLIP(x, diff) av_clip(GET(x) - (diff) + rnd() % FFMAX(2 * (diff), 1), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Stone,
Do we need to update HEVC as well?
If so, could you submit the HEVC patch for upstream review first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sounds good!
@nuomi2021 been looking into further improvements to the luma generation, it seems fairly non-trivial. One of the main issues is occasionally Where The current code does actually try to compensate for it, since (d0 << 1) < beta_2 == d0 < (beta_2 >> 1) which is beta_3. It becomes difficult to solve both constraints while also satisfying (d0 + d1 < beta). I attempted to put it into a computer algebra solver (wxMaxima) but it's quite messy. |
If it's difficult, perhaps we should approach it as it is.
Which one do you prefer? |
AVX2 sounds good, we need to modify the C side to expose multiple blocks right? I'm trying to learn more about video decoding overall some more exposure to the c would be good. |
👍
Yes, we need to set up parameters for a single line within a CTU. SSE can process 16 bytes at a time, AVX2 can handle 32 bytes, and AVX-512 can manage 64 bytes per operation.
You can start from https://www.amazon.com/Coding-Video-Practical-Guide-Beyond/dp/1118711785 :) |
Previously RANDCLIP(x, diff) was computing
x - diff
and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range.Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff.
This greatly improves the generation of strong deblocking data.