Updates to excluded benchmark list by KartikP · Pull Request #485 · brain-score/brain-score.web

KartikP · 2025-12-09T21:02:57Z

This PR adds the following to #482

Update global score based on new default benchmarks
Re-rank models based on new global score
Introduce color-utils.js which replicates representative color calculation in mv.sql but in the frontend.

All this is necessary because a lot of information about leaderboard presentation is determined in mv.sql but then modified in the backend when we define what the default benchmarks are via excluded_benchmark_list. For example, as representative color is normalized per-benchmark, parent benchmark colors will be off (e.g., two models can have 0.45 as a score but different colors if the benchmark that was removed was impactful).

A shift from compute in the DB to frontend is messy but unfortunately the ideal way to go about this. Future work should clean up mv.sql to remove re-computed defaults.

Screenshot of localhost (left) and staging (right) with same settings to check scores.

* Wayback Slider Working Version * Fixed Calendar and Unit Test Added * Fixed Unit test * Headless_True * Remove Sleep Time * Small change for date box * Wayback Filter Added to Export * Addressed code-review changes * Try testing without ranks * clean up * reset slider handle after reset button press * Move wayback timestamp slider to first column and adjust input box width * Disable start_timestamp. Make it conditional so we can re-enable if necessary * Fix input box overflow in col container * re-introduce whitespace * fix blank lines * Update tests after web_test update * Merge timestamp fields from kp/add-timestamp-to-scores into mv.sql Adds start_timestamp and end_timestamp fields to materialized views: - Added to mv_base_scores and mv_final_benchmark_context - Added to final_agg_scores table structure - Included in INSERT statements for leaf and parent scores - For parent nodes, uses MIN(start_timestamp) and MAX(end_timestamp) from children - Added to final model/benchmark context SELECT statements - Included in score_json JSON aggregation * upper bound calendar input with checks * parseURLFilters() and setRangerValues() correctly * Update benchmark counts * use color-utils from #485 to rescale colors after wayback * move depth calculation outside of model row loop * fix color and aggregation calculation * Optimization improvement: 1. Cache wayback filtering results: build Set of hidden benchmarks once, reuse for O(1) lookups instead of iterating grid nodes 2. Cache root parent lookups: build Map of benchmark -> root parent once, reuse for color recalculation instead of traversing hierarchy for each benchmark * fix export to exclude hidden benchmark and sort by fscore * Only exclude benchmarks hidden by wayback filtering, not where where all models failed - Problem: Wayback filtering was hiding benchmarks when all values were X, however when a model property filter was applied that produced results where all models visible had all failures, it was hiding the benchmark column and disrupting aggregation - Solution: Introduced logic where only hides columns when wayback filtering is active and produces results where all vlaues in a column are X. When wayback filtering inactive, don't hide columns based on X values online. - Update wayback test to make sure sort by score instead of rank. --------- Co-authored-by: Kartik Pradeepan <[email protected]>

KartikP added 6 commits December 9, 2025 11:04

fixed aggregation and blue column

4eb5568

recompute rank on initial load

ccb09fb

remove color function

756780b

remove more repetition

82f616c

fix rank and sort

df4c56e

implement color scale on frontend

451fe0f

KartikP closed this Dec 10, 2025

KartikP reopened this Dec 10, 2025

KartikP mentioned this pull request Dec 17, 2025

Wayback Slider Working Version #465

Merged

KartikP added a commit that referenced this pull request Dec 17, 2025

use color-utils from #485 to rescale colors after wayback

6fdadd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to excluded benchmark list#485

Updates to excluded benchmark list#485
KartikP wants to merge 6 commits intoexluded_benchmark_listfrom
kp/excluded_benchmark_list_fixed_agg

KartikP commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KartikP commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant