Skip to content
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
d7508b9
feat(web-integration): implement file upload functionality with tests…
quanru Dec 31, 2025
c64451f
feat(docs): add file upload functionality to agent with detailed para…
quanru Dec 31, 2025
0606767
Update apps/site/docs/zh/api.mdx
quanru Jan 4, 2026
553024b
Update packages/web-integration/src/puppeteer/base-page.ts
quanru Jan 4, 2026
2281813
Update packages/web-integration/src/chrome-extension/page.ts
quanru Jan 4, 2026
17887a4
Update packages/web-integration/src/puppeteer/base-page.ts
quanru Jan 4, 2026
422767b
Update packages/core/src/agent/agent.ts
quanru Jan 4, 2026
71b224c
Update apps/site/docs/en/api.mdx
quanru Jan 4, 2026
27525cb
fix(tests): refactor file upload tests to use relative path for test …
quanru Jan 4, 2026
c10fde7
feat(web-integration): enhance file upload functionality in aiTap and…
quanru Jan 5, 2026
7d2c09b
Update packages/web-integration/tests/ai/web/playwright/file-upload.s…
quanru Jan 5, 2026
47b20c9
fix(tests): improve file upload tests with error handling and agent c…
quanru Jan 5, 2026
57895a1
fix(base-page): optimize file handling by importing fs and path modul…
quanru Jan 5, 2026
717c058
feat(core): enhance aiTap to handle file chooser setup and cleanup
quanru Jan 6, 2026
8df55c2
refactor(base-page): unify file chooser handling with wrapper pattern
quanru Jan 6, 2026
50f0cb9
feat(agent): implement file chooser capability with unified handling
quanru Jan 6, 2026
00b5178
refactor(core): simplify actionTapParamSchema by removing file upload…
quanru Jan 6, 2026
022f115
feat(agent): implement file upload handling with support for Puppetee…
quanru Jan 7, 2026
2ea8890
refactor(core): file uploader (#1735)
yuyutaotao Jan 8, 2026
585cb84
Revert "refactor(core): file uploader (#1735)"
yuyutaotao Jan 8, 2026
53e9f76
fix(core): file upload codes (#1736)
yuyutaotao Jan 8, 2026
00133e4
chore(core): merge main
yuyutaotao Jan 8, 2026
819da54
fix(core): enhance file chooser handling and add related tests
quanru Jan 8, 2026
2764a2c
docs(core): update document of aiAct
yuyutaotao Jan 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion apps/site/docs/en/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ Related Documentation:

### `agent.aiTap()`

Tap something.
Tap something. Also supports file upload when tapping triggers a file chooser.

- Type

Expand All @@ -191,6 +191,7 @@ function aiTap(locate: string | Object, options?: Object): Promise<void>;
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element, which can improve accuracy. False by default. With newer models (e.g. Qwen3 / Doubao 1.6 / Gemini 3), the gain is less obvious.
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
- `files?: string | string[]` - File path(s) to upload when tap triggers a file chooser. Can be a single file path or an array of paths. Only available in web pages (Playwright, Puppeteer).

- Return Value:

Expand All @@ -205,6 +206,10 @@ await agent.aiTap('The login button at the top of the page');
await agent.aiTap('The login button at the top of the page', {
deepThink: true,
});

// File upload: tap the upload button and select files
await agent.aiTap('Choose file button', { files: ['./document.pdf'] });
await agent.aiTap('Upload images', { files: ['./image1.jpg', './image2.png'] });
```

### `agent.aiHover()`
Expand Down
7 changes: 6 additions & 1 deletion apps/site/docs/zh/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ await agent.aiAct('发布一条微博,内容为 "Hello World"');

### `agent.aiTap()`

点击某个元素。
点击某个元素。也支持文件上传功能,当点击触发文件选择器时可自动上传文件。

- 类型

Expand All @@ -190,6 +190,7 @@ function aiTap(locate: string | Object, options?: Object): Promise<void>;
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素,从而提升准确性。默认值为 false。对于新一代模型(如 Qwen3 / Doubao 1.6 / Gemini 3),带来的收益不再明显。
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 true
- `files?: string | string[]` - 当点击触发文件选择器时要上传的文件路径。可以是单个文件路径或路径数组。仅在 web 页面(Playwright、Puppeteer)中可用。
- 返回值:

- `Promise<void>`
Expand All @@ -201,6 +202,10 @@ await agent.aiTap('页面顶部的登录按钮');

// 使用 deepThink 功能精确定位元素
await agent.aiTap('页面顶部的登录按钮', { deepThink: true });

// 文件上传:点击上传按钮并选择文件
await agent.aiTap('选择文件按钮', { files: ['./document.pdf'] });
await agent.aiTap('上传图片', { files: ['./image1.jpg', './image2.png'] });
```

### `agent.aiHover()`
Expand Down
27 changes: 21 additions & 6 deletions packages/core/src/agent/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ import {
import { imageInfoOfBase64, resizeImgBase64 } from '@midscene/shared/img';
import { getDebug } from '@midscene/shared/logger';
import { assert } from '@midscene/shared/utils';
import { defineActionAssert } from '../device';
// import type { AndroidDeviceInputOpt } from '../device';
import { defineActionAssert, hasFileChooserCapability } from '../device';
import { TaskCache } from './task-cache';
import { TaskExecutionError, TaskExecutor, locatePlanForLocate } from './tasks';
import { locateParamStr, paramStr, taskTitleStr, typeStr } from './ui-utils';
Expand Down Expand Up @@ -588,14 +587,30 @@ export class Agent<
return output;
}

async aiTap(locatePrompt: TUserPrompt, opt?: LocateOption) {
async aiTap(
locatePrompt: TUserPrompt,
opt?: LocateOption & { files?: string | string[] },
) {
assert(locatePrompt, 'missing locate prompt for tap');

const detailedLocateParam = buildDetailedLocateParam(locatePrompt, opt);
const fileChooserInterface = hasFileChooserCapability(this.interface)
? this.interface
: null;

return this.callActionInActionSpace('Tap', {
locate: detailedLocateParam,
});
if (opt?.files && fileChooserInterface) {
await fileChooserInterface.setFileChooserHandler(opt.files);
}

try {
return await this.callActionInActionSpace('Tap', {
locate: detailedLocateParam,
});
} finally {
if (opt?.files && fileChooserInterface) {
await fileChooserInterface.clearFileChooserHandler();
}
}
}

async aiRightClick(locatePrompt: TUserPrompt, opt?: LocateOption) {
Expand Down
28 changes: 27 additions & 1 deletion packages/core/src/device/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,24 @@ import { _keyDefinitions } from '@midscene/shared/us-keyboard-layout';
import { z } from 'zod';
import type { ElementCacheFeature, Rect, Size, UIContext } from '../types';

export interface FileChooserCapable {
setFileChooserHandler(files: string | string[]): Promise<void>;
clearFileChooserHandler(): Promise<void>;
}

export function hasFileChooserCapability(
obj: unknown,
): obj is FileChooserCapable {
return (
typeof obj === 'object' &&
obj !== null &&
'setFileChooserHandler' in obj &&
typeof (obj as FileChooserCapable).setFileChooserHandler === 'function' &&
'clearFileChooserHandler' in obj &&
typeof (obj as FileChooserCapable).clearFileChooserHandler === 'function'
);
}

export abstract class AbstractInterface {
abstract interfaceType: string;

Expand Down Expand Up @@ -75,18 +93,26 @@ export const defineAction = <
// Tap
export const actionTapParamSchema = z.object({
locate: getMidsceneLocationSchema().describe('The element to be tapped'),
files: z
.union([z.string(), z.array(z.string())])
.optional()
.describe(
'Optional file path(s) to upload when tap triggers a file chooser',
),
});
// Override the inferred type to use LocateResultElement for the runtime locate field
export type ActionTapParam = {
locate: LocateResultElement;
files?: string | string[];
};

export const defineActionTap = (
call: (param: ActionTapParam) => Promise<void>,
): DeviceAction<ActionTapParam> => {
return defineAction<typeof actionTapParamSchema, ActionTapParam>({
name: 'Tap',
description: 'Tap the element',
description:
'Tap the element. If files are provided, handles file upload after tap triggers a file chooser.',
interfaceAlias: 'aiTap',
paramSchema: actionTapParamSchema,
call,
Expand Down
1 change: 1 addition & 0 deletions packages/core/src/yaml.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ export interface LocateOption {
cacheable?: boolean; // user can set this param to false to disable the cache for a single agent api
xpath?: string; // only available in web
uiContext?: UIContext;
files?: string | string[]; // file path(s) to upload when tapping triggers a file chooser
}

export interface ServiceExtractOption {
Expand Down
10 changes: 10 additions & 0 deletions packages/web-integration/src/chrome-extension/page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -865,4 +865,14 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {
this.latestMouseX = to.x;
this.latestMouseY = to.y;
}

async setFileChooserHandler(files: string | string[]): Promise<void> {
throw new Error(
'File upload is not supported in Chrome Extension mode. Use Playwright or Puppeteer instead.',
);
}

async clearFileChooserHandler(): Promise<void> {
// No-op for Chrome Extension
}
}
91 changes: 78 additions & 13 deletions packages/web-integration/src/puppeteer/base-page.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import { existsSync } from 'node:fs';
import { resolve } from 'node:path';
import { type WebPageAgentOpt, WebPageContextParser } from '@/web-element';
import type {
DeviceAction,
Expand Down Expand Up @@ -29,7 +31,10 @@ import {
getExtraReturnLogic,
} from '@midscene/shared/node';
import { assert } from '@midscene/shared/utils';
import type { Page as PlaywrightPage } from 'playwright';
import type {
FileChooser as PlaywrightFileChooser,
Page as PlaywrightPage,
} from 'playwright';
import type { Page as PuppeteerPage } from 'puppeteer';
import {
type KeyInput,
Expand All @@ -39,6 +44,16 @@ import {

export const debugPage = getDebug('web:page');

function normalizeFilePaths(files: string | string[]): string[] {
return (Array.isArray(files) ? files : [files]).map((file) => {
const absolutePath = resolve(file);
if (!existsSync(absolutePath)) {
throw new Error(`File not found: ${file}`);
}
return absolutePath;
});
}

type WebElementCacheFeature = ElementCacheFeature & {
xpaths?: string[];
};
Expand Down Expand Up @@ -361,26 +376,30 @@ export class Page<
const { button = 'left', count = 1 } = options || {};
debugPage(`mouse click ${x}, ${y}, ${button}, ${count}`);

if (count === 2 && this.interfaceType === 'playwright') {
await (this.underlyingPage as PlaywrightPage).mouse.dblclick(x, y, {
button,
});
} else {
if (this.interfaceType === 'puppeteer') {
const doClick = async () => {
if (count === 2 && this.interfaceType === 'playwright') {
await (this.underlyingPage as PlaywrightPage).mouse.dblclick(x, y, {
button,
});
} else if (this.interfaceType === 'puppeteer') {
const page = this.underlyingPage as PuppeteerPage;
if (button === 'left' && count === 1) {
await (this.underlyingPage as PuppeteerPage).mouse.click(x, y);
await page.mouse.click(x, y);
} else {
await (this.underlyingPage as PuppeteerPage).mouse.click(x, y, {
button,
count,
});
await page.mouse.click(x, y, { button, count });
}
} else if (this.interfaceType === 'playwright') {
(this.underlyingPage as PlaywrightPage).mouse.click(x, y, {
await (this.underlyingPage as PlaywrightPage).mouse.click(x, y, {
button,
clickCount: count,
});
}
};

if (this.fileChooserClickWrapper) {
await this.fileChooserClickWrapper(doClick);
} else {
await doClick();
}
},
wheel: async (deltaX: number, deltaY: number) => {
Expand Down Expand Up @@ -686,6 +705,52 @@ export class Page<
await page.mouse.up({ button: 'left' });
}
}

private fileChooserClickWrapper:
| ((clickFn: () => Promise<void>) => Promise<void>)
| null = null;
private fileChooserCleanup: (() => void) | null = null;

/**
* Set up a file chooser handler that will automatically accept files
* when a file chooser dialog is opened.
*/
async setFileChooserHandler(files: string | string[]): Promise<void> {
const normalizedFiles = normalizeFilePaths(files);

if (this.interfaceType === 'puppeteer') {
const page = this.underlyingPage as PuppeteerPage;
this.fileChooserClickWrapper = async (clickFn) => {
const [fileChooser] = await Promise.all([
page.waitForFileChooser(),
clickFn(),
]);
await fileChooser.accept(normalizedFiles);
};
this.fileChooserCleanup = () => {
this.fileChooserClickWrapper = null;
};
} else if (this.interfaceType === 'playwright') {
const page = this.underlyingPage as PlaywrightPage;
const handler = async (fileChooser: PlaywrightFileChooser) => {
await fileChooser.setFiles(normalizedFiles);
};
page.on('filechooser', handler);
this.fileChooserClickWrapper = (clickFn) => clickFn();
this.fileChooserCleanup = () => {
page.off('filechooser', handler);
this.fileChooserClickWrapper = null;
};
}
}

async clearFileChooserHandler(): Promise<void> {
if (this.fileChooserCleanup) {
this.fileChooserCleanup();
this.fileChooserCleanup = null;
}
this.fileChooserClickWrapper = null;
}
}

export function forceClosePopup(
Expand Down
2 changes: 2 additions & 0 deletions packages/web-integration/src/web-page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,8 @@ export const commonWebActionsForWebPage = <T extends AbstractWebPage>(
defineActionTap(async (param) => {
const element = param.locate;
assert(element, 'Element not found, cannot tap');

// Pure tap action - file handling is done at Page layer via setFileChooserHandler
await page.mouse.click(element.center[0], element.center[1], {
button: 'left',
});
Expand Down
57 changes: 57 additions & 0 deletions packages/web-integration/tests/ai/fixtures/file-upload.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
<!DOCTYPE html>
<html>
<head>
<title>File Upload Test</title>
<style>
body { font-family: Arial, sans-serif; padding: 20px; }
.upload-area { border: 2px dashed #ccc; padding: 20px; margin: 20px 0; text-align: center; }
.upload-btn { background: #007bff; color: white; padding: 10px 20px; border: none; cursor: pointer; }
.file-list { margin-top: 20px; }
.file-item { padding: 5px; background: #f0f0f0; margin: 5px 0; }
</style>
</head>
<body>
<h1>File Upload Test Page</h1>

<div class="upload-area">
<input type="file" id="file-input" multiple style="display: none;">
<button class="upload-btn" onclick="document.getElementById('file-input').click()">Choose Files</button>
<p>Supports multiple file upload</p>
</div>

<div class="upload-area">
<input type="file" id="single-file-input" style="display: none;">
<button class="upload-btn" onclick="document.getElementById('single-file-input').click()">Choose Single File</button>
<p>Supports single file upload only</p>
</div>

<div class="file-list" id="file-list">
<h3>Selected Files:</h3>
<div id="selected-files"></div>
</div>

<script>
document.getElementById('file-input').addEventListener('change', function(e) {
const files = Array.from(e.target.files);
displayFiles(files, 'multiple');
});

document.getElementById('single-file-input').addEventListener('change', function(e) {
const files = Array.from(e.target.files);
displayFiles(files, 'single');
});

function displayFiles(files, type) {
const container = document.getElementById('selected-files');
container.innerHTML = '';

files.forEach(file => {
const div = document.createElement('div');
div.className = 'file-item';
div.textContent = `${file.name} (${file.size} bytes) - ${type}`;
container.appendChild(div);
});
}
</script>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Relative path test
1 change: 1 addition & 0 deletions packages/web-integration/tests/ai/fixtures/test-file-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Test file 1 content
1 change: 1 addition & 0 deletions packages/web-integration/tests/ai/fixtures/test-file-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Test file 2 content
1 change: 1 addition & 0 deletions packages/web-integration/tests/ai/fixtures/test-file.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a test file for upload
Loading