Skip to content

Add Kaggle Benchmarks SDK notebook with belief update task and temporal control results#2

Open
RCSharm07 wants to merge 3 commits intoarjunvad123:mainfrom
master1223347:Kaggle/benchmark_Creation
Open

Add Kaggle Benchmarks SDK notebook with belief update task and temporal control results#2
RCSharm07 wants to merge 3 commits intoarjunvad123:mainfrom
master1223347:Kaggle/benchmark_Creation

Conversation

@RCSharm07
Copy link
Copy Markdown

SDK-compatible notebook for the Learning track submission. Runs
belief_update episodes in canonical and shuffled-control conditions, evaluates against
frontier models (Gemini 2.5 Flash, GPT-5.4 Mini), and computes temporal sensitivity
gaps. Includes retry logic for API flakiness and multi-model comparison output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants