Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
d7508b9
feat(web-integration): implement file upload functionality with tests…
quanru Dec 31, 2025
c64451f
feat(docs): add file upload functionality to agent with detailed para…
quanru Dec 31, 2025
0606767
Update apps/site/docs/zh/api.mdx
quanru Jan 4, 2026
553024b
Update packages/web-integration/src/puppeteer/base-page.ts
quanru Jan 4, 2026
2281813
Update packages/web-integration/src/chrome-extension/page.ts
quanru Jan 4, 2026
17887a4
Update packages/web-integration/src/puppeteer/base-page.ts
quanru Jan 4, 2026
422767b
Update packages/core/src/agent/agent.ts
quanru Jan 4, 2026
71b224c
Update apps/site/docs/en/api.mdx
quanru Jan 4, 2026
27525cb
fix(tests): refactor file upload tests to use relative path for test …
quanru Jan 4, 2026
c10fde7
feat(web-integration): enhance file upload functionality in aiTap and…
quanru Jan 5, 2026
7d2c09b
Update packages/web-integration/tests/ai/web/playwright/file-upload.s…
quanru Jan 5, 2026
47b20c9
fix(tests): improve file upload tests with error handling and agent c…
quanru Jan 5, 2026
57895a1
fix(base-page): optimize file handling by importing fs and path modul…
quanru Jan 5, 2026
717c058
feat(core): enhance aiTap to handle file chooser setup and cleanup
quanru Jan 6, 2026
8df55c2
refactor(base-page): unify file chooser handling with wrapper pattern
quanru Jan 6, 2026
50f0cb9
feat(agent): implement file chooser capability with unified handling
quanru Jan 6, 2026
00b5178
refactor(core): simplify actionTapParamSchema by removing file upload…
quanru Jan 6, 2026
022f115
feat(agent): implement file upload handling with support for Puppetee…
quanru Jan 7, 2026
2ea8890
refactor(core): file uploader (#1735)
yuyutaotao Jan 8, 2026
585cb84
Revert "refactor(core): file uploader (#1735)"
yuyutaotao Jan 8, 2026
53e9f76
fix(core): file upload codes (#1736)
yuyutaotao Jan 8, 2026
00133e4
chore(core): merge main
yuyutaotao Jan 8, 2026
819da54
fix(core): enhance file chooser handling and add related tests
quanru Jan 8, 2026
2764a2c
docs(core): update document of aiAct
yuyutaotao Jan 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 51 additions & 1 deletion apps/site/docs/en/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Below are the main APIs available for the various Agents in Midscene.
In Midscene, you can choose to use either auto planning or instant action.

- `agent.ai()` is for Auto Planning: Midscene will automatically plan the steps and execute them. It's more smart and looks like more fashionable style for AI agents. But it may be slower and heavily rely on the quality of the AI model.
- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` are for Instant Action: Midscene will directly perform the specified action, while the AI model is responsible for basic tasks such as locating elements. It's faster and more reliable if you are certain about the action you want to perform.
- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiUploadFile()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` are for Instant Action: Midscene will directly perform the specified action, while the AI model is responsible for basic tasks such as locating elements. It's faster and more reliable if you are certain about the action you want to perform.

:::

Expand Down Expand Up @@ -350,6 +350,56 @@ await agent.aiScroll(
);
```

### `agent.aiUploadFile()`

> Only available in web pages (Playwright, Puppeteer). Not available in Android, iOS, and Chrome Extension mode.

Upload files to a file input element or upload button on the page.

- Type

```typescript
function aiUploadFile(
locate: string | Object,
files: string | string[],
options?: Object,
): Promise<void>;
```

- Parameters:

- `locate: string | Object` - A natural language description of the upload button or file input element, or [prompting with images](#prompting-with-images).
- `files: string | string[]` - File path(s) to upload. Can be a single file path (string) or multiple file paths (string array). Supports both absolute and relative paths.
- `options?: Object` - Optional, a configuration object containing:
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element, which can improve accuracy. False by default. With newer models (e.g. Qwen3 / Doubao 1.6 / Gemini 3), the gain is less obvious.
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.

- Return Value:

- Returns a `Promise<void>`

- Examples:

```typescript
// Upload a single file
await agent.aiUploadFile('Choose file button', './document.pdf');

// Upload multiple files
await agent.aiUploadFile('Upload images', ['./image1.jpg', './image2.png']);

// Use absolute path
await agent.aiUploadFile('Choose file', '/absolute/path/to/file.txt');
```

:::warning Notes

- Chrome Extension mode does not support file upload. Use Playwright or Puppeteer for file upload functionality.
- File paths must point to existing files, otherwise a "File not found" error will be thrown.
- This feature can also be invoked through `agent.aiAct()` or `agent.ai()` auto-planning.

:::

### `agent.aiDoubleClick()`

Double-click on an element.
Expand Down
52 changes: 51 additions & 1 deletion apps/site/docs/zh/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ const agent = new PuppeteerAgent(page, {
在 Midscene 中,你可以选择使用自动规划(Auto Planning)或即时操作(Instant Action)。

- `agent.ai()` 是自动规划(Auto Planning):Midscene 会自动规划操作步骤并执行。它更智能,更像流行的 AI Agent 风格,但可能较慢,且效果依赖于 AI 模型的质量。
- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` 是即时操作(Instant Action):Midscene 会直接执行指定的操作,而 AI 模型只负责底层任务,如定位元素等。这种接口形式更快、更可靠。当你完全确定自己想要执行的操作时,推荐使用这种接口形式。
- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiUploadFile()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` 是即时操作(Instant Action):Midscene 会直接执行指定的操作,而 AI 模型只负责底层任务,如定位元素等。这种接口形式更快、更可靠。当你完全确定自己想要执行的操作时,推荐使用这种接口形式。

:::

Expand Down Expand Up @@ -344,6 +344,56 @@ await agent.aiScroll(
);
```

### `agent.aiUploadFile()`

> 仅在 web 页面中可用(Playwright、Puppeteer),在 Android、iOS 和 Chrome Extension 模式下不可用

上传文件到页面的文件输入框或上传按钮。

- 类型

```typescript
function aiUploadFile(
locate: string | Object,
files: string | string[],
options?: Object,
): Promise<void>;
```

- 参数:

- `locate: string | Object` - 用自然语言描述的上传按钮或文件输入框的位置,或[使用图片作为提示词](#使用图片作为提示词)。
- `files: string | string[]` - 要上传的文件路径。可以是单个文件路径(字符串)或多个文件路径(字符串数组)。支持绝对路径和相对路径。
- `options?: Object` - 可选,一个配置对象,包含:
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素,从而提升准确性。默认值为 false。对于新一代模型(如 Qwen3 / Doubao 1.6 / Gemini 3),带来的收益不再明显。
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 true

- 返回值:

- `Promise<void>`

- 示例:

```typescript
// 上传单个文件
await agent.aiUploadFile('选择文件按钮', './document.pdf');

// 上传多个文件
await agent.aiUploadFile('上传图片', ['./image1.jpg', './image2.png']);

// 使用绝对路径
await agent.aiUploadFile('选择文件', '/absolute/path/to/file.txt');
```

:::warning 注意

- Chrome Extension 模式不支持文件上传功能。如需使用文件上传,请改用 Playwright 或 Puppeteer。
- 文件路径必须指向实际存在的文件,否则会抛出 "File not found" 错误。
- 此功能也可以通过 `agent.aiAct()` 或 `agent.ai()` 自动规划调用。

:::

### `agent.aiDoubleClick()`

双击某个元素。
Expand Down
28 changes: 28 additions & 0 deletions packages/core/src/agent/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1472,6 +1472,34 @@ export class Agent<
return null;
}

async aiUploadFile(
locatePrompt: TUserPrompt,
files: string | string[],
opt?: LocateOption,
): Promise<any> {
assert(locatePrompt, 'missing locate prompt for upload file');
assert(files, 'missing files for upload');

const detailedLocateParam = buildDetailedLocateParam(locatePrompt, opt);

// Check if the interface supports file upload
if (
'uploadFile' in this.interface &&
typeof this.interface.uploadFile === 'function'
) {
// Use the interface's uploadFile method with a click action
await (this.interface as any).uploadFile(files, async () => {
await this.callActionInActionSpace('Tap', {
locate: detailedLocateParam,
});
});

return { success: true };
} else {
throw new Error('File upload is not supported by the current interface');
}
}

/**
* Manually flush cache to file
* @param options - Optional configuration
Expand Down
32 changes: 32 additions & 0 deletions packages/core/src/device/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,38 @@ export const defineActionAssert = (): DeviceAction<ActionAssertParam> => {
});
};

// UploadFile
export const actionUploadFileParamSchema = z.object({
locate: getMidsceneLocationSchema().describe(
'The file upload button or input element to click',
),
files: z
.union([z.string(), z.array(z.string())])
.describe(
'File path(s) to upload. Can be a single path or array of paths for multiple files',
),
});
export type ActionUploadFileParam = {
locate: LocateResultElement;
files: string | string[];
};

export const defineActionUploadFile = (
call: (param: ActionUploadFileParam) => Promise<void>,
): DeviceAction<ActionUploadFileParam> => {
return defineAction<
typeof actionUploadFileParamSchema,
ActionUploadFileParam
>({
name: 'UploadFile',
description:
'Upload file(s) by clicking the upload button/input and selecting files',
interfaceAlias: 'aiUploadFile',
paramSchema: actionUploadFileParamSchema,
call,
});
};

export type { DeviceAction } from '../types';
export type {
AndroidDeviceOpt,
Expand Down
9 changes: 9 additions & 0 deletions packages/web-integration/src/chrome-extension/page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -865,4 +865,13 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {
this.latestMouseX = to.x;
this.latestMouseY = to.y;
}

async uploadFile(
files: string | string[],
clickAction: () => Promise<void>,
): Promise<void> {
throw new Error(
'File upload is not supported in Chrome Extension mode. Please use Playwright or Puppeteer for file upload functionality.',
);
}
}
16 changes: 16 additions & 0 deletions packages/web-integration/src/playwright/ai-fixture.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ export const PlaywrightAiFixture = (options?: {
| 'aiString'
| 'aiBoolean'
| 'aiAsk'
| 'aiUploadFile'
| 'runYaml'
| 'setAIActionContext'
| 'evaluateJavaScript'
Expand Down Expand Up @@ -507,6 +508,18 @@ export const PlaywrightAiFixture = (options?: {
aiActionType: 'aiAsk',
});
},
aiUploadFile: async (
{ page }: { page: OriginPlaywrightPage },
use: any,
testInfo: TestInfo,
) => {
await generateAiFunction({
page,
testInfo,
use,
aiActionType: 'aiUploadFile',
});
},
runYaml: async (
{ page }: { page: OriginPlaywrightPage },
use: any,
Expand Down Expand Up @@ -650,6 +663,9 @@ export type PlayWrightAiFixtureType = {
aiAsk: (
...args: Parameters<PageAgent['aiAsk']>
) => ReturnType<PageAgent['aiAsk']>;
aiUploadFile: (
...args: Parameters<PageAgent['aiUploadFile']>
) => ReturnType<PageAgent['aiUploadFile']>;
runYaml: (
...args: Parameters<PageAgent['runYaml']>
) => ReturnType<PageAgent['runYaml']>;
Expand Down
36 changes: 36 additions & 0 deletions packages/web-integration/src/puppeteer/base-page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import type {
Point,
Rect,
Size,
TUserPrompt,
UIContext,
} from '@midscene/core';
import {
Expand Down Expand Up @@ -686,6 +687,41 @@ export class Page<
await page.mouse.up({ button: 'left' });
}
}

async uploadFile(
files: string | string[],
clickAction: () => Promise<void>,
): Promise<void> {
const { resolve } = await import('node:path');
const { existsSync } = await import('node:fs');

// 路径标准化处理
const normalizedFiles = (Array.isArray(files) ? files : [files]).map(
(file) => {
const absolutePath = resolve(file);
if (!existsSync(absolutePath)) {
throw new Error(`File not found: ${file}`);
}
return absolutePath;
},
);

if (this.interfaceType === 'puppeteer') {
const page = this.underlyingPage as PuppeteerPage;
const [fileChooser] = await Promise.all([
page.waitForFileChooser(),
clickAction(),
]);
await fileChooser.accept(normalizedFiles);
} else if (this.interfaceType === 'playwright') {
const page = this.underlyingPage as PlaywrightPage;
const [fileChooser] = await Promise.all([
page.waitForEvent('filechooser'),
clickAction(),
]);
await fileChooser.setFiles(normalizedFiles);
}
}
}

export function forceClosePopup(
Expand Down
17 changes: 17 additions & 0 deletions packages/web-integration/src/web-page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import {
defineActionScroll,
defineActionSwipe,
defineActionTap,
defineActionUploadFile,
} from '@midscene/core/device';

import { sleep } from '@midscene/core/utils';
Expand Down Expand Up @@ -626,6 +627,22 @@ export const commonWebActionsForWebPage = <T extends AbstractWebPage>(
await page.clearInput(element as unknown as ElementInfo);
}),

...('uploadFile' in page && typeof page.uploadFile === 'function'
? [
defineActionUploadFile(async (param) => {
const element = param.locate;
assert(element, 'Element not found, cannot upload file');

await (page as any).uploadFile(param.files, async () => {
// Click the upload button to trigger file chooser
await page.mouse.click(element.center[0], element.center[1], {
button: 'left',
});
});
}),
]
: []),

defineAction({
name: 'Navigate',
description:
Expand Down
57 changes: 57 additions & 0 deletions packages/web-integration/tests/ai/fixtures/file-upload.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
<!DOCTYPE html>
<html>
<head>
<title>File Upload Test</title>
<style>
body { font-family: Arial, sans-serif; padding: 20px; }
.upload-area { border: 2px dashed #ccc; padding: 20px; margin: 20px 0; text-align: center; }
.upload-btn { background: #007bff; color: white; padding: 10px 20px; border: none; cursor: pointer; }
.file-list { margin-top: 20px; }
.file-item { padding: 5px; background: #f0f0f0; margin: 5px 0; }
</style>
</head>
<body>
<h1>File Upload Test Page</h1>

<div class="upload-area">
<input type="file" id="file-input" multiple style="display: none;">
<button class="upload-btn" onclick="document.getElementById('file-input').click()">Choose Files</button>
<p>Supports multiple file upload</p>
</div>

<div class="upload-area">
<input type="file" id="single-file-input" style="display: none;">
<button class="upload-btn" onclick="document.getElementById('single-file-input').click()">Choose Single File</button>
<p>Supports single file upload only</p>
</div>

<div class="file-list" id="file-list">
<h3>Selected Files:</h3>
<div id="selected-files"></div>
</div>

<script>
document.getElementById('file-input').addEventListener('change', function(e) {
const files = Array.from(e.target.files);
displayFiles(files, 'multiple');
});

document.getElementById('single-file-input').addEventListener('change', function(e) {
const files = Array.from(e.target.files);
displayFiles(files, 'single');
});

function displayFiles(files, type) {
const container = document.getElementById('selected-files');
container.innerHTML = '';

files.forEach(file => {
const div = document.createElement('div');
div.className = 'file-item';
div.textContent = `${file.name} (${file.size} bytes) - ${type}`;
container.appendChild(div);
});
}
</script>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Relative path test
Loading