Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion packages/browseros-agent/apps/server/src/agent/prompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ You control a Chromium browser. Key tool categories:
- \`get_dom\` / \`search_dom\` → raw HTML (use for precise CSS/XPath queries)
- \`take_screenshot\` → visual capture (use for verification or saving)
- \`evaluate_script\` → run JS on the page (use for dynamic data extraction)
- \`browser_run_code\` → run custom async page code when predefined tools are insufficient
- \`get_console_logs\` → browser console output (use for debugging)

**Interaction** — act on page elements:
Expand All @@ -152,6 +153,7 @@ You control a Chromium browser. Key tool categories:
**Navigation**:
- \`navigate_page\` → go to URL, back, forward, reload
- \`new_page\` → open new tab (only when user explicitly asks)
- \`wait_for\` → wait for text/selectors to appear or disappear, or pause briefly
- \`close_page\` → close a tab

**Bookmarks**: \`get_bookmarks\`, \`create_bookmark\`, \`remove_bookmark\`, \`update_bookmark\`, \`move_bookmark\`, \`search_bookmarks\`
Expand Down Expand Up @@ -315,6 +317,7 @@ function getToolSelection(
| Looking for specific links | \`get_page_links\` |
| Need exact HTML or CSS selectors | \`get_dom\` or \`search_dom\` |
| Need runtime data (JS variables, computed values) | \`evaluate_script\` |
| Need custom async page logic | \`browser_run_code\` |
| Something isn't working, need to debug | \`get_console_logs\` |
| Need visual proof or to save an image | \`take_screenshot\` or \`save_screenshot\` |

Expand Down Expand Up @@ -417,7 +420,7 @@ function getErrorRecovery(
## Error Recovery

### Browser interaction errors
- Element not found → \`scroll(page, "down")\`, \`wait_for(page, text)\`, then \`take_snapshot(page)\` to re-fetch elements
- Element not found → \`scroll(page, "down")\`, \`wait_for(page, { text })\`, then \`take_snapshot(page)\` to re-fetch elements
- Click/fill failed → \`scroll(page, "down", element)\` into view, retry once
- Page didn't load → check URL, try \`navigate_page\` with reload
- After 2 failed attempts → describe the blocking issue, request guidance
Expand Down
66 changes: 63 additions & 3 deletions packages/browseros-agent/apps/server/src/browser/browser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -657,13 +657,19 @@ export class Browser {

async waitFor(
page: number,
opts: { text?: string; selector?: string; timeout: number },
opts: {
text?: string
textGone?: string
selector?: string
selectorGone?: string
timeout: number
},
): Promise<boolean> {
const session = await this.resolveSession(page)
const deadline = Date.now() + opts.timeout
const interval = 500

while (Date.now() < deadline) {
while (Date.now() <= deadline) {
if (opts.text) {
const result = await session.Runtime.evaluate({
expression: `document.body?.innerText?.includes(${JSON.stringify(opts.text)}) ?? false`,
Expand All @@ -672,6 +678,14 @@ export class Browser {
if (result.result?.value === true) return true
}

if (opts.textGone) {
const result = await session.Runtime.evaluate({
expression: `!(document.body?.innerText?.includes(${JSON.stringify(opts.textGone)}) ?? false)`,
returnByValue: true,
})
if (result.result?.value === true) return true
}

if (opts.selector) {
const result = await session.Runtime.evaluate({
expression: `!!document.querySelector(${JSON.stringify(opts.selector)})`,
Expand All @@ -680,7 +694,17 @@ export class Browser {
if (result.result?.value === true) return true
}

await new Promise((r) => setTimeout(r, interval))
if (opts.selectorGone) {
const result = await session.Runtime.evaluate({
expression: `!document.querySelector(${JSON.stringify(opts.selectorGone)})`,
returnByValue: true,
})
if (result.result?.value === true) return true
}
Comment on lines +697 to +703
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 selectorGone returns true immediately on a page where the selector never existed

The expression !document.querySelector(selector) returns true whenever the selector is absent, including on blank pages or pages that never had the element. On a freshly navigated page that hasn't yet injected a loading spinner, selectorGone: ".spinner" would resolve to true on the very first poll rather than waiting for the element to appear and then disappear. If the intent is to confirm that a previously-visible element has been removed, callers relying on selectorGone as "wait until element appears then disappears" will get a misleading success on pages that never rendered the element at all.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/browser/browser.ts
Line: 697-703

Comment:
**`selectorGone` returns `true` immediately on a page where the selector never existed**

The expression `!document.querySelector(selector)` returns `true` whenever the selector is absent, including on blank pages or pages that never had the element. On a freshly navigated page that hasn't yet injected a loading spinner, `selectorGone: ".spinner"` would resolve to `true` on the very first poll rather than waiting for the element to appear and then disappear. If the intent is to confirm that a previously-visible element has been removed, callers relying on `selectorGone` as "wait until element appears then disappears" will get a misleading success on pages that never rendered the element at all.

How can I resolve this? If you propose a fix, please make it concise.


const remaining = deadline - Date.now()
if (remaining <= 0) break
await new Promise((r) => setTimeout(r, Math.min(interval, remaining)))
}

return false
Expand Down Expand Up @@ -941,6 +965,42 @@ export class Browser {
}
}

async runCode(
page: number,
code: string,
args?: Record<string, unknown>,
): Promise<{
value?: unknown
error?: string
description?: string
}> {
const session = await this.resolveSession(page)
const expression = `(
async (args) => {
${code}
}
)(${JSON.stringify(args ?? {})})`

const result = await session.Runtime.evaluate({
expression,
returnByValue: true,
awaitPromise: true,
})

if (result.exceptionDetails) {
return {
error:
result.exceptionDetails.exception?.description ??
result.exceptionDetails.text,
}
}

return {
value: result.result?.value,
description: result.result?.description,
}
}

async getDom(page: number, opts?: { selector?: string }): Promise<string> {
const session = await this.resolveSession(page)
const doc = await session.DOM.getDocument({ depth: 0 })
Expand Down
78 changes: 65 additions & 13 deletions packages/browseros-agent/apps/server/src/tools/navigation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -299,13 +299,29 @@ export const close_page = defineNavigationTool({
export const wait_for = defineNavigationTool({
name: 'wait_for',
description:
'Wait for text or a CSS selector to appear on the page. Polls periodically up to a timeout.',
'Wait for text or a CSS selector to appear or disappear on the page, or wait for a fixed time. Polls periodically up to a timeout.',
input: z.object({
page: pageParam,
text: z.string().optional().describe('Text to wait for on the page'),
textGone: z
.string()
.optional()
.describe('Text to wait for to disappear from the page'),
selector: z.string().optional().describe('CSS selector to wait for'),
selectorGone: z
.string()
.optional()
.describe('CSS selector to wait for to disappear from the page'),
time: z
.number()
.min(0)
.max(120000)
.optional()
.describe('Fixed wait time in milliseconds'),
timeout: z
.number()
.min(0)
.max(120000)
.default(10000)
.describe('Maximum wait time in milliseconds'),
}),
Expand All @@ -316,40 +332,76 @@ export const wait_for = defineNavigationTool({
timeout: z.number(),
}),
handler: async (args, ctx, response) => {
if (!args.text && !args.selector) {
response.error('Provide either text or selector to wait for.')
const timeout = args.timeout ?? 10_000
const target =
args.text !== undefined
? `text "${args.text}"`
: args.textGone !== undefined
? `text "${args.textGone}" to disappear`
: args.selector !== undefined
? `selector "${args.selector}"`
: args.selectorGone !== undefined
? `selector "${args.selectorGone}" to disappear`
: args.time !== undefined
? `${args.time}ms`
: ''
Comment on lines +336 to +347
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading target when multiple conditions are provided

When multiple conditions are supplied (e.g., textGone: "Loading" and selectorGone: ".spinner"), target is resolved via a priority chain that always picks the first non-undefined field — even if a lower-priority field triggered the actual match. If .spinner disappears first but textGone is the higher-priority field, the response says target: 'text "Loading" to disappear', which misreports what was actually matched. The success message sent to the agent is equally misleading ("Found text 'Loading' to disappear on page" when it was the selector that disappeared).

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/tools/navigation.ts
Line: 336-347

Comment:
**Misleading `target` when multiple conditions are provided**

When multiple conditions are supplied (e.g., `textGone: "Loading"` and `selectorGone: ".spinner"`), `target` is resolved via a priority chain that always picks the first non-undefined field — even if a lower-priority field triggered the actual match. If `.spinner` disappears first but `textGone` is the higher-priority field, the response says `target: 'text "Loading" to disappear'`, which misreports what was actually matched. The success message sent to the agent is equally misleading ("Found text 'Loading' to disappear on page" when it was the selector that disappeared).

How can I resolve this? If you propose a fix, please make it concise.


if (
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone &&
args.time === undefined
) {
response.error(
'Provide text, textGone, selector, selectorGone, or time to wait for.',
)
return
}

if (
args.time !== undefined &&
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone
) {
Comment on lines +349 to +368
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 When time is provided alongside any of text, textGone, selector, or selectorGone, the fixed-wait branch is skipped and time is silently discarded — only the timeout-based wait runs. There is no validation or feedback to the caller. An LLM agent that passes { time: 2000, text: "hello" } (intending a minimum-pause) will have the 2 000 ms wait entirely ignored and the tool will proceed straight to the waitFor loop.

Suggested change
if (
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone &&
args.time === undefined
) {
response.error(
'Provide text, textGone, selector, selectorGone, or time to wait for.',
)
return
}
if (
args.time !== undefined &&
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone
) {
if (
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone &&
args.time === undefined
) {
response.error(
'Provide text, textGone, selector, selectorGone, or time to wait for.',
)
return
}
if (
args.time !== undefined &&
(args.text || args.textGone || args.selector || args.selectorGone)
) {
response.error(
'time cannot be combined with text, textGone, selector, or selectorGone. Use time alone for a fixed wait.',
)
return
}
if (
args.time !== undefined &&
!args.text &&
!args.textGone &&
!args.selector &&
!args.selectorGone
) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/tools/navigation.ts
Line: 349-368

Comment:
When `time` is provided alongside any of `text`, `textGone`, `selector`, or `selectorGone`, the fixed-wait branch is skipped and `time` is silently discarded — only the timeout-based wait runs. There is no validation or feedback to the caller. An LLM agent that passes `{ time: 2000, text: "hello" }` (intending a minimum-pause) will have the 2 000 ms wait entirely ignored and the tool will proceed straight to the `waitFor` loop.

```suggestion
    if (
      !args.text &&
      !args.textGone &&
      !args.selector &&
      !args.selectorGone &&
      args.time === undefined
    ) {
      response.error(
        'Provide text, textGone, selector, selectorGone, or time to wait for.',
      )
      return
    }

    if (
      args.time !== undefined &&
      (args.text || args.textGone || args.selector || args.selectorGone)
    ) {
      response.error(
        'time cannot be combined with text, textGone, selector, or selectorGone. Use time alone for a fixed wait.',
      )
      return
    }

    if (
      args.time !== undefined &&
      !args.text &&
      !args.textGone &&
      !args.selector &&
      !args.selectorGone
    ) {
```

How can I resolve this? If you propose a fix, please make it concise.

await new Promise((resolve) => setTimeout(resolve, args.time))
response.text(`Waited ${args.time}ms.`)
response.data({
page: args.page,
found: true,
target,
timeout: args.time,
})
return
}

const found = await ctx.browser.waitFor(args.page, {
text: args.text,
textGone: args.textGone,
selector: args.selector,
timeout: args.timeout,
selectorGone: args.selectorGone,
timeout,
})

if (found) {
const target = args.text
? `text "${args.text}"`
: `selector "${args.selector}"`
response.text(`Found ${target} on page.`)
response.data({
page: args.page,
found,
target,
timeout: args.timeout,
timeout,
})
response.includeSnapshot(args.page)
} else {
const target = args.text
? `text "${args.text}"`
: `selector "${args.selector}"`
response.data({
page: args.page,
found,
target,
timeout: args.timeout,
timeout,
})
response.error(`Timed out after ${args.timeout}ms waiting for ${target}.`)
response.error(`Timed out after ${timeout}ms waiting for ${target}.`)
}
},
})
9 changes: 5 additions & 4 deletions packages/browseros-agent/apps/server/src/tools/registry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ import {
new_hidden_page,
new_page,
show_page,
// biome-ignore lint/correctness/noUnusedImports: temporarily disabled
wait_for,
} from './navigation'
import { suggest_app_connection, suggest_schedule } from './nudges'
import { download_file, save_pdf, save_screenshot } from './page-actions'
import {
browser_run_code,
evaluate_script,
get_page_content,
get_page_links,
Expand All @@ -73,7 +73,7 @@ import {
} from './windows'

export const registry = createRegistry([
// Navigation (8)
// Navigation (9)
get_active_page,
list_pages,
navigate_page,
Expand All @@ -82,9 +82,9 @@ export const registry = createRegistry([
show_page,
move_page,
close_page,
// wait_for, // temporarily disabled
wait_for,

// Observation (9)
// Observation (10)
take_snapshot,
take_enhanced_snapshot,
get_page_content,
Expand All @@ -93,6 +93,7 @@ export const registry = createRegistry([
search_dom,
take_screenshot,
evaluate_script,
browser_run_code,
get_console_logs,

// Input (17)
Expand Down
49 changes: 49 additions & 0 deletions packages/browseros-agent/apps/server/src/tools/snapshot.ts
Original file line number Diff line number Diff line change
Expand Up @@ -246,3 +246,52 @@ export const evaluate_script = defineScriptTool({
})
},
})

export const browser_run_code = defineScriptTool({
name: 'browser_run_code',
description:
'Execute async custom JavaScript code in the page context. The code runs as an async function body with a serializable args object available and may use return to produce a result.',
input: z.object({
page: pageParam,
code: z
.string()
.describe(
'JavaScript function body to run in the page context. Use return to provide output.',
),
args: z
.record(z.unknown())
.optional()
.describe('Serializable arguments available to the code as args'),
}),
output: z.object({
text: z.string(),
value: z.unknown().optional(),
description: z.string().optional(),
}),
handler: async (args, ctx, response) => {
const result = await ctx.browser.runCode(args.page, args.code, args.args)

if (result.error) {
response.error(`Code error: ${result.error}`)
return
}

const val = result.value
let text: string
if (val === undefined) {
text = result.description ?? 'undefined'
response.text(text)
} else if (typeof val === 'string') {
text = val
response.text(text)
} else {
text = JSON.stringify(val, null, 2)
response.text(text)
}
response.data({
text,
value: result.value,
description: result.description,
})
},
})
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ const VERB_OVERRIDES: Record<string, string> = {
get_active_page: 'Got active tab',
move_page: 'Moved tab',
group_tabs: 'Grouped tabs',
wait_for: 'Waited for page state',

// Page reading
take_snapshot: 'Captured page snapshot',
Expand Down Expand Up @@ -47,6 +48,7 @@ const VERB_OVERRIDES: Record<string, string> = {

// Console / scripts
evaluate_script: 'Ran script',
browser_run_code: 'Ran custom page code',
get_console_logs: 'Read console logs',

// History / bookmarks
Expand Down Expand Up @@ -292,12 +294,9 @@ function canonicalName(rawName: string): string {
function humanizeToolName(rawName: string): string {
const stripped = canonicalName(rawName)
const words = stripped.split(/[_-]/).filter((w) => w.length > 0)
if (words.length === 0) return rawName
const first = words[0]!
return [
first.charAt(0).toUpperCase() + first.slice(1),
...words.slice(1),
].join(' ')
const [first, ...rest] = words
if (!first) return rawName
return [first.charAt(0).toUpperCase() + first.slice(1), ...rest].join(' ')
}

/**
Expand Down
Loading
Loading