Skip to content

vvc_deblock.c: fix RANDCLIP #281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: deblock_asm_20250223
Choose a base branch
from

Conversation

stone-d-chen
Copy link
Collaborator

Previously RANDCLIP(x, diff) was computing x - diff and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range.

Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff.

This greatly improves the generation of strong deblocking data.

Previously RANDCLIP(x, diff) was computing the difference x - diff and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range.

Instead compute (x - diff) + rnd() % 2 * diff. This returns  a value such that abs(value - x) < diff.

This greatly improves the generation of strong deblocking data.
@@ -67,8 +67,8 @@ static const uint32_t pixel_mask[3] = {0xffffffff, 0x03ff03ff, 0x0fff0fff};
else \
*(uint16_t*)(&x) = z; \
} while (0)
#define RANDCLIP(x, diff) av_clip(GET(x) - (diff), 0, \
(1 << (bit_depth)) - 1) + rnd() % FFMAX(2 * (diff), 1)
#define RANDCLIP(x, diff) av_clip(GET(x) - (diff) + rnd() % FFMAX(2 * (diff), 1), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stone,
Do we need to update HEVC as well?
If so, could you submit the HEVC patch for upstream review first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sounds good!

@stone-d-chen
Copy link
Collaborator Author

@nuomi2021 been looking into further improvements to the luma generation, it seems fairly non-trivial.

One of the main issues is occasionally (d0 << 1) < beta_2 condition fails in the filter template.

Where d0 = abs(p2 - 2 * P1 + P0) + abs(Q2 - 2 * Q1 + Q0)

The current code does actually try to compensate for it, since (d0 << 1) < beta_2 == d0 < (beta_2 >> 1) which is beta_3.

It becomes difficult to solve both constraints while also satisfying (d0 + d1 < beta).

I attempted to put it into a computer algebra solver (wxMaxima) but it's quite messy.

@nuomi2021
Copy link
Member

If it's difficult, perhaps we should approach it as it is.
I'll rebase the code to the latest version to fix the fuzz issue.
Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

@stone-d-chen
Copy link
Collaborator Author

If it's difficult, perhaps we should approach it as it is. I'll rebase the code to the latest version to fix the fuzz issue. Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

AVX2 sounds good, we need to modify the C side to expose multiple blocks right? I'm trying to learn more about video decoding overall some more exposure to the c would be good.

@nuomi2021
Copy link
Member

If it's difficult, perhaps we should approach it as it is. I'll rebase the code to the latest version to fix the fuzz issue. Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

AVX2 sounds good,

👍

we need to modify the C side to expose multiple blocks right?

Yes, we need to set up parameters for a single line within a CTU. SSE can process 16 bytes at a time, AVX2 can handle 32 bytes, and AVX-512 can manage 64 bytes per operation.

I'm trying to learn more about video decoding overall some more exposure to the c would be good.

You can start from https://www.amazon.com/Coding-Video-Practical-Guide-Beyond/dp/1118711785 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants