fix: improved quantization error#368

Open

ddeadguyy wants to merge 1 commit intonfrechette:fix/improve-quantization-errorfrom

ddeadguyy:fix/improve-quantization-error

Contributor

ddeadguyy commented Jun 3, 2021

Feel free to link this failed attempt to #352
I doubt it can compete with status quo unless clip ranges are manipulated further, but once you start doing that, you're just introducing floating point error somewhere else.


          fix: improved quantization error

8e6f28f

commit-lint bot commented Jun 3, 2021

Bug Fixes

improved quantization error (8e6f28f)

Contributors

Commit-Lint commands

You can trigger Commit-Lint actions by commenting on this PR:

@Commit-Lint merge patch will merge dependabot PR on "patch" versions (X.X.Y - Y change)
@Commit-Lint merge minor will merge dependabot PR on "minor" versions (X.Y.Y - Y change)
@Commit-Lint merge major will merge dependabot PR on "major" versions (Y.Y.Y - Y change)
@Commit-Lint merge disable will desactivate merge dependabot PR
@Commit-Lint review will approve dependabot PR
@Commit-Lint stop review will stop approve dependabot PR

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/normalize_streams.h


		#ifdef ACL_PRECISION_BOOST

		const float max_range_value_flt = float((1 << k_segment_range_reduction_num_bits_per_component) - 2);

Owner

nfrechette Jun 4, 2021

Note: now 254 instead of 255

Owner

nfrechette Jun 4, 2021

With this change, does it mean we can never have a 255 value?

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/normalize_streams.h

+              			const rtm::vector4f half = rtm::vector_set(0.5F);
+              			const rtm::vector4f half_neg = rtm::vector_set(-0.5F);
+              			const rtm::vector4f half_range_value = rtm::vector_set(float((1 << (k_segment_range_reduction_num_bits_per_component - 1)) - 1));

Owner

nfrechette Jun 4, 2021

Note: half_range_value = 127

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/normalize_streams.h

+              				// To get the best accuracy, we pick the values closest to the true minimum that is slightly lower, and true maximum that is slightly higher.
+              				// This is to ensure that we encompass the lowest and highest values even after quantization.
+              				const rtm::vector4f range_min = range.get_min();
+              				const rtm::vector4f quantized_min0 = rtm::vector_clamp(rtm::vector_add(rtm::vector_floor(rtm::vector_mul_add(range_min, max_range_value, half)), half_range_value), zero, max_range_value);

Owner

nfrechette Jun 4, 2021 •

edited

Loading

Note: range_min is already normalized in the [-0.5, 0.5] range here, we convert it back into [0.0, 254.0] range after symmetric rounding

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/normalize_streams.h


		#ifdef ACL_PRECISION_BOOST

		const rtm::vector4f padded_range_min0 = rtm::vector_mul_add(quantized_min0, inv_max_range_value, half_neg);

Owner

nfrechette Jun 4, 2021

Note: convert back into [0.0, 1.0] range and offset to [-0.5, 0.5] range

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/normalize_streams.h

+              				// Finally, we pad the range further so the midpoint has minimal quantization error.
+              				const rtm::vector4f quantized_extent = rtm::vector_sub(quantized_max, quantized_min);
+              				union MaskVector

Owner

nfrechette Jun 4, 2021

I'll create a task for myself to add and/or/xor support to masks in RTM so we can clean this up later

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/quantize_track_impl.h

+              				const float max_value_ = rtm::scalar_safe_to_float(1 << num_bits);
+              				limit = rtm::vector_set(max_value_ - 1.0F);
+              				mid_compress = rtm::vector_set(0.5F * max_value_);
+              				mid_decompress = rtm::vector_set((0.5F * max_value_) - 0.5F);

Owner

nfrechette Jun 4, 2021

Note: for 8 bits, limit = 255, mid_compress = 128, mid_decompress = 127.5

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/quantize_track_impl.h

+              #endif
               				max_value = rtm::vector_set(max_value_);
               				inv_max_value = rtm::vector_set(1.0F / max_value_);

Owner

nfrechette Jun 4, 2021

Note: for 8 bits, max_value = 256, inv_max_value = 1/256

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/quantize_track_impl.h


		ACL_ASSERT(vector_all_greater_equal(value, rtm::vector_set(-0.5F)) && vector_all_less_equal(value, rtm::vector_set(0.5F)), "Expected normalized signed input value: %f, %f, %f, %f", (float)vector_get_x(value), (float)vector_get_y(value), (float)vector_get_z(value), (float)vector_get_w(value));

		const vector4f packed_value = vector_min(vector_add(vector_floor(vector_mul(value, scales.max_value)), scales.mid_compress), scales.limit);

Owner

nfrechette Jun 4, 2021

Note: If our input value is -0.5 and we use 8 bits, we have:
packed_value = min(floor(-0.5 * 256) + 128, 255)
packed_value = min(floor(-128) + 128, 255)
packed_value = min(0, 255)

If our input is 0.5, we have:
packed_value = min(floor(0.5 * 256) + 128, 255)
packed_value = min(floor(128) + 128, 255)
packed_value = min(256, 255)

If our input is [0.496, 0.5], we have:
packed_value = min(floor(0.496 * 256) + 128, 255)
packed_value = min(floor(127) + 128, 255)
packed_value = min(255, 255)

It works similarly for [-0.5, -0.496]

nfrechette reviewed

View reviewed changes

includes/acl/compression/impl/quantize_track_impl.h

+              			ACL_ASSERT(vector_all_greater_equal(value, rtm::vector_set(-0.5F)) && vector_all_less_equal(value, rtm::vector_set(0.5F)), "Expected normalized signed input value: %f, %f, %f, %f", (float)vector_get_x(value), (float)vector_get_y(value), (float)vector_get_z(value), (float)vector_get_w(value));
+              			const vector4f packed_value = vector_min(vector_add(vector_floor(vector_mul(value, scales.max_value)), scales.mid_compress), scales.limit);
+              			const vector4f decayed_value = vector_mul(vector_sub(packed_value, scales.mid_decompress), scales.inv_max_value);

Owner

nfrechette Jun 4, 2021

Note: for packed_value = 0, we have:
decayed_value = (0 - 127.5) / 256 = -0.498

For packed_value = 255:
decayed_value = (255 - 127.5) / 256 = 0.498

nfrechette reviewed

View reviewed changes

includes/acl/decompression/impl/transform_animated_track_cache.h


		#ifdef ACL_PRECISION_BOOST

		rotation_as_vec = _mm_mul_ps(_mm_sub_ps(xyzf, _mm_set_ps1(32767.5F)), _mm_set_ps1(1.0F / 65535.0F));

Owner

nfrechette Jun 4, 2021

For 8 bit quantization we use 1/254, but for 16 bit we use 1/65535? Why the off by 1 for 8 bits but not 16?

nfrechette reviewed

View reviewed changes

includes/acl/math/scalar_packing.h

+              		const float max_value = rtm::scalar_safe_to_float(num_values);
+              		const float inv_max_value = 1.0F / max_value;
+              		const float mid_value = (0.5F * max_value) - 0.5F;
+              		return (input - mid_value) * inv_max_value;

Owner

nfrechette Jun 4, 2021

Here for 8 bits we'd use 256 and for 16 bits 65536 which differs from the code earlier. Why?

nfrechette reviewed

View reviewed changes

includes/acl/math/scalar_packing.h


		#ifdef ACL_PRECISION_BOOST

		: max_value(1.0F / float(1 << num_bits_))

Owner

nfrechette Jun 4, 2021

Similarly we'd use 256 and 65536 for 8/16 bits respectively which differs.

nfrechette reviewed

View reviewed changes

includes/acl/math/vector4_packing.h


		#ifdef ACL_PRECISION_BOOST

		return _mm_mul_ps(_mm_sub_ps(value, _mm_set_ps1(32767.5F)), _mm_set_ps1(1.0F / 65535.0F));

Owner

nfrechette Jun 4, 2021

Here using 65535 for 16 bits

nfrechette reviewed

View reviewed changes

includes/acl/math/vector4_packing.h


		#ifdef ACL_PRECISION_BOOST

		return _mm_mul_ps(value, _mm_set_ps1(1.0F / 254.0F));

Owner

nfrechette Jun 4, 2021

Here 254 for 8 bits

nfrechette reviewed

View reviewed changes

Owner

nfrechette left a comment

Looks like great stuff!

I'm just unclear as to why sometimes we use 256, 255, and 254 sometimes to decompress a value. Can you elaborate a bit?

Thanks!

Contributor Author

ddeadguyy commented Jun 4, 2021

Yeah, I can clear that up.

Most quantization follows the (start, end) -> (-0.5 + max_error, 0.5 - max_error) style. I removed quantization that ACL doesn't use as an added bonus, and to conserve effort.

The vector3_u48 family of functions is used to compress constant channels within segments. I opted to preserve the (start, end) -> (-0.5, 0.5) style there, which is what the precise_endpoints naming convention signifies. Now that I think about it, perhaps the new style would work better. I was more fixated on the bone ranges.

The vector3_u24 family of functions compresses bone ranges within segments. This is the trickiest part. I wanted to preserve endpoints as well as I could, knowing that the quantization within segments introduces error there. I also wanted to compress the midpoint of ranges instead of their mins, to take maximum advantage of the signed normalized range(-0.5, 0.5). In order for this to work for any subrange, the requirements get a bit stricter. (start, middle, end) -> (-0.5, 0.0, 0.5) or (0.0, 0.5, 1.0). I sacrificed a value in the bitmask to achieve this. A similar result would be achieved by using a sign bit instead, but IMO this is simpler. Naming convention is precise_endpoints_midpoint. Stepping through its unit test should help explain the process.

In any case, it didn't work as well as I'd hoped, so if you take a closer look, hopefully you find a bug.

FWIW I considered eliminating segment-level quantization(raw constants and ranges) to see what would happen, but the toughest obstacle was the assumption that segment constants and segment ranges are always the same size.

Owner

nfrechette commented Jun 9, 2021

Awesome, thanks!
I'll keep the PR around for now and when I get to validate all this, I'll take a look on my end to see what I find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet