You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"starting_prompt": "Create a new Cloud SQL instance named 'my-fake-db' in project 'astana-evaluation'. Use PostgreSQL 17, and set the password to 'password123'. Also use the 'Development' edition preset.",
6
+
"conversation_plan": "The user wants to create a database. All required parameters are in the starting prompt. The agent should call create_instance and report the success message back.",
7
+
"expected_trajectory": [
8
+
"create_instance"
9
+
],
10
+
"env": {
11
+
"GOOGLE_CLOUD_PROJECT": "astana-evaluation"
12
+
},
13
+
"kind": "tools",
14
+
"max_turns": 3
15
+
},
16
+
{
17
+
"id": "fake-csql-get-instance-failure",
18
+
"starting_prompt": "Get the details for the Cloud SQL instance named 'missing-db' in project 'astana-evaluation'.",
19
+
"conversation_plan": "The user wants to get instance details. The agent should call get_instance, which is hardcoded to fail with an error 'Instance not found or permission denied'. The agent should explain that the instance could not be found based on the error.",
"starting_prompt": "list all Cloud SQL instances in project astana-evaluation",
6
+
"conversation_plan": "Ask the agent to list instances in project astana-evaluation. Once all instances are listed if nl2code exist get its state and validate its RUNNABLE",
7
+
"expected_trajectory": [
8
+
"list_instances",
9
+
"get_instance"
10
+
],
11
+
"env": {
12
+
"GOOGLE_CLOUD_PROJECT": "astana-evaluation"
13
+
},
14
+
"kind": "tools",
15
+
"max_turns": 3
16
+
},
17
+
{
18
+
"id": "csql-create-ambiguous-multiturn-01",
19
+
"starting_prompt": "I need a database.",
20
+
"conversation_plan": "The user starts with a vague request. You want to CREATE a NEW Cloud SQL instance named 'my-pg-app'. If the agent offers to create one, say YES. When asked for details, provide 'my-pg-app' as the instance name and 'user_data' as the database name. Never claim to have an existing instance. The goal is for the agent to eventually create the database 'user_data' inside 'my-pg-app' in astana-evaluation project.",
21
+
"expected_trajectory": [
22
+
"list_instances",
23
+
"create_instance",
24
+
"create_database"
25
+
],
26
+
"env": {
27
+
"GOOGLE_CLOUD_PROJECT": "astana-evaluation"
28
+
},
29
+
"kind": "tools",
30
+
"max_turns": 6
31
+
},
32
+
{
33
+
"id": "csql-instance-not-found-failure",
34
+
"starting_prompt": "Update the instance 'non-existent-db-123' to have 8 cores.",
35
+
"conversation_plan": "The user asks to interact with an instance named 'non-existent-db-123' in astana-evaluation project that doesn't exist. The agent should try to get the instance details or update it directly, fail to find it, and inform the user. The user will then ask to list instances to find the correct name.",
0 commit comments