Skip to content

Conversation

@alexisuri
Copy link
Contributor

@alexisuri alexisuri commented Dec 8, 2025

  • Normalize missing CTV entries to their corresponding OS group:

TV_OS, AppleTV -> tvos
Vizio -> smartcast
Chrome OS and chromecast -> chromeos

  • For non-normalized OS values, remove spaces and convert to lowercase to consolidate variations (i.e Unknown OSand UnknownOS -> unknownos)

The strings.ToLower and strings.ReplaceAll operations add minimal overhead (~2 allocations per call) compared to Ragel zero-allocation FSM matching. However, this only affects the fallback path for non-normalized OS values. Since non-normalized values are < 5% of traffic, the impact is minimal.

For bid opportunities, these would be the non normalized values (except for those mentioned above TV_OS, etc):

SELECT
  os_raw,
  COUNT(*) AS count
FROM `martin-test-datalab.remerge.bid_opportunities`
WHERE DATE(ts) = '2025-12-11'
  AND device_type IN ('SetTopBox', 'TV')
  AND os IS NULL
  AND os_raw IS NOT NULL
GROUP BY
  os_raw
ORDER BY
  count DESC;
Screenshot 2025-12-12 at 11 45 44

For bids, the null values are very few:
Screenshot 2025-12-12 at 11 58 17

UA-904

@alexisuri alexisuri requested a review from a team as a code owner December 8, 2025 17:35
@remerge-hal remerge-hal added the UA label Dec 8, 2025
@remerge-hal
Copy link

This comment ensures that the correct Slack channel is notified after the team/project label UA has been added to this pull request.

See this comment for details.

Copy link

@desi-belokonska desi-belokonska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After struggling with make gen for a while and the subsequent formatting (which removes valid goto statements somehow that I had to manually re-add 😬) I think I can get it running mostly.

If you want to strip spaces and lowercase the rest, I believe you can use this ragel:

%%{
	machine normalize_fallback;
	write data;
}%%

// NormalizeFallback converts a string to lowercase and removes spaces in a single pass
// with minimal allocations, using a pre-allocated destination buffer.
func NormalizeFallback(dst []byte, data string) []byte {
	cs, p, pe := 0, 0, len(data)
	dst = dst[:0]
	var c byte

	%%{
		action emit_lower {
			c = data[p]
			if c >= 'A' && c <= 'Z' {
				c = c + ('a' - 'A')
			}
			dst = append(dst, c)
		}

		action emit {
			dst = append(dst, data[p])
		}

		main := (
			space |
			upper @emit_lower |
			(any - space - upper) @emit
		)*;

		write init;
		write exec;
	}%%

	return dst
}

but you still have to wrap in in sting() which also results in an allocation (1 vs 2 but still) so idk if it's worth it 🤷‍♀️ if CTV becomes big and we have a lot more unnormalized values in bids we can revisit this but for now I think it should be okay 👍

@alexisuri alexisuri merged commit 9c5e6eb into master Jan 5, 2026
6 checks passed
@alexisuri alexisuri deleted the UA-904-os-normalization-followup branch January 5, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants