web-infra-dev · quanru · Jan 8, 2026 · Dec 31, 2025 · Dec 31, 2025 · Jan 4, 2026
diff --git a/apps/site/docs/en/api.mdx b/apps/site/docs/en/api.mdx
@@ -109,7 +109,7 @@ Below are the main APIs available for the various Agents in Midscene.
 In Midscene, you can choose to use either auto planning or instant action.
 
 - `agent.ai()` is for Auto Planning: Midscene will automatically plan the steps and execute them. It's more smart and looks like more fashionable style for AI agents. But it may be slower and heavily rely on the quality of the AI model.
-- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` are for Instant Action: Midscene will directly perform the specified action, while the AI model is responsible for basic tasks such as locating elements. It's faster and more reliable if you are certain about the action you want to perform.
+- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiUploadFile()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` are for Instant Action: Midscene will directly perform the specified action, while the AI model is responsible for basic tasks such as locating elements. It's faster and more reliable if you are certain about the action you want to perform.
 
 :::
 
@@ -350,6 +350,56 @@ await agent.aiScroll(
 );
 ```
 
+### `agent.aiUploadFile()`
+
+> Only available in web pages (Playwright, Puppeteer). Not available in Android, iOS, and Chrome Extension mode.
+
+Upload files to a file input element or upload button on the page.
+
+- Type
+
+```typescript
+function aiUploadFile(
+  locate: string | Object,
+  files: string | string[],
+  options?: Object,
+): Promise<void>;
+```
+
+- Parameters:
+
+  - `locate: string | Object` - A natural language description of the upload button or file input element, or [prompting with images](#prompting-with-images).
+  - `files: string | string[]` - File path(s) to upload. Can be a single file path (string) or multiple file paths (string array). Supports both absolute and relative paths.
+  - `options?: Object` - Optional, a configuration object containing:
+    - `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element, which can improve accuracy. False by default. With newer models (e.g. Qwen3 / Doubao 1.6 / Gemini 3), the gain is less obvious.
+    - `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
+    - `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
+
+- Return Value:
+
+  - Returns a `Promise<void>`
+
+- Examples:
+
+```typescript
+// Upload a single file
+await agent.aiUploadFile('Choose file button', './document.pdf');
+
+// Upload multiple files
+await agent.aiUploadFile('Upload images', ['./image1.jpg', './image2.png']);
+
+// Use absolute path
+await agent.aiUploadFile('Choose file', '/absolute/path/to/file.txt');
+```
+
+:::warning Notes
+
+- Chrome Extension mode does not support file upload. Use Playwright or Puppeteer for file upload functionality.
+- File paths must point to existing files, otherwise a "File not found" error will be thrown.
+- This feature can also be invoked through `agent.aiAct()` or `agent.ai()` auto-planning.
+
+:::
+
 ### `agent.aiDoubleClick()`
 
 Double-click on an element.

diff --git a/apps/site/docs/zh/api.mdx b/apps/site/docs/zh/api.mdx
@@ -111,7 +111,7 @@ const agent = new PuppeteerAgent(page, {
 在 Midscene 中，你可以选择使用自动规划（Auto Planning）或即时操作（Instant Action）。
 
 - `agent.ai()` 是自动规划（Auto Planning）：Midscene 会自动规划操作步骤并执行。它更智能，更像流行的 AI Agent 风格，但可能较慢，且效果依赖于 AI 模型的质量。
-- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` 是即时操作（Instant Action）：Midscene 会直接执行指定的操作，而 AI 模型只负责底层任务，如定位元素等。这种接口形式更快、更可靠。当你完全确定自己想要执行的操作时，推荐使用这种接口形式。
+- `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()`, `agent.aiUploadFile()`, `agent.aiDoubleClick()`, `agent.aiRightClick()` 是即时操作（Instant Action）：Midscene 会直接执行指定的操作，而 AI 模型只负责底层任务，如定位元素等。这种接口形式更快、更可靠。当你完全确定自己想要执行的操作时，推荐使用这种接口形式。
 
 :::
 
@@ -344,6 +344,56 @@ await agent.aiScroll(
 );
 ```
 
+### `agent.aiUploadFile()`
+
+> 仅在 web 页面中可用（Playwright、Puppeteer），在 Android、iOS 和 Chrome Extension 模式下不可用
+
+上传文件到页面的文件输入框或上传按钮。
+
+- 类型
+
+```typescript
+function aiUploadFile(
+  locate: string | Object,
+  files: string | string[],
+  options?: Object,
+): Promise<void>;
+```
+
+- 参数：
+
+  - `locate: string | Object` - 用自然语言描述的上传按钮或文件输入框的位置，或[使用图片作为提示词](#使用图片作为提示词)。
+  - `files: string | string[]` - 要上传的文件路径。可以是单个文件路径（字符串）或多个文件路径（字符串数组）。支持绝对路径和相对路径。
+  - `options?: Object` - 可选，一个配置对象，包含：
+    - `deepThink?: boolean` - 是否开启深度思考。如果为 true，Midscene 会调用 AI 模型两次以精确定位元素，从而提升准确性。默认值为 false。对于新一代模型（如 Qwen3 / Doubao 1.6 / Gemini 3），带来的收益不再明显。
+    - `xpath?: string` - 目标元素的 xpath 路径，用于执行当前操作。如果提供了这个 xpath，Midscene 会优先使用该 xpath 来找到元素，然后依次使用缓存和 AI 模型。默认值为空
+    - `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时，是否允许缓存当前 API 调用结果。默认值为 true
+
+- 返回值：
+
+  - `Promise<void>`
+
+- 示例：
+
+```typescript
+// 上传单个文件
+await agent.aiUploadFile('选择文件按钮', './document.pdf');
+
+// 上传多个文件
+await agent.aiUploadFile('上传图片', ['./image1.jpg', './image2.png']);
+
+// 使用绝对路径
+await agent.aiUploadFile('选择文件', '/absolute/path/to/file.txt');
+```
+
+:::warning 注意
+
+- Chrome Extension 模式不支持文件上传功能。如需使用文件上传，请改用 Playwright 或 Puppeteer。
+- 文件路径必须指向实际存在的文件，否则会抛出 "File not found" 错误。
+- 此功能也可以通过 `agent.aiAct()` 或 `agent.ai()` 自动规划调用。
+
+:::
+
 ### `agent.aiDoubleClick()`
 
 双击某个元素。

diff --git a/packages/core/src/agent/agent.ts b/packages/core/src/agent/agent.ts
@@ -1472,6 +1472,34 @@ export class Agent<
     return null;
   }
 
+  async aiUploadFile(
+    locatePrompt: TUserPrompt,
+    files: string | string[],
+    opt?: LocateOption,
+  ): Promise<any> {
+    assert(locatePrompt, 'missing locate prompt for upload file');
+    assert(files, 'missing files for upload');
+
+    const detailedLocateParam = buildDetailedLocateParam(locatePrompt, opt);
+
+    // Check if the interface supports file upload
+    if (
+      'uploadFile' in this.interface &&
+      typeof this.interface.uploadFile === 'function'
+    ) {
+      // Use the interface's uploadFile method with a click action
+      await (this.interface as any).uploadFile(files, async () => {
+        await this.callActionInActionSpace('Tap', {
+          locate: detailedLocateParam,
+        });
+      });
+
+      return { success: true };
+    } else {
+      throw new Error('File upload is not supported by the current interface');
+    }
+  }
+
   /**
    * Manually flush cache to file
    * @param options - Optional configuration

diff --git a/packages/core/src/device/index.ts b/packages/core/src/device/index.ts
@@ -449,6 +449,38 @@ export const defineActionAssert = (): DeviceAction<ActionAssertParam> => {
   });
 };
 
+// UploadFile
+export const actionUploadFileParamSchema = z.object({
+  locate: getMidsceneLocationSchema().describe(
+    'The file upload button or input element to click',
+  ),
+  files: z
+    .union([z.string(), z.array(z.string())])
+    .describe(
+      'File path(s) to upload. Can be a single path or array of paths for multiple files',
+    ),
+});
+export type ActionUploadFileParam = {
+  locate: LocateResultElement;
+  files: string | string[];
+};
+
+export const defineActionUploadFile = (
+  call: (param: ActionUploadFileParam) => Promise<void>,
+): DeviceAction<ActionUploadFileParam> => {
+  return defineAction<
+    typeof actionUploadFileParamSchema,
+    ActionUploadFileParam
+  >({
+    name: 'UploadFile',
+    description:
+      'Upload file(s) by clicking the upload button/input and selecting files',
+    interfaceAlias: 'aiUploadFile',
+    paramSchema: actionUploadFileParamSchema,
+    call,
+  });
+};
+
 export type { DeviceAction } from '../types';
 export type {
   AndroidDeviceOpt,

diff --git a/packages/web-integration/src/chrome-extension/page.ts b/packages/web-integration/src/chrome-extension/page.ts
@@ -865,4 +865,13 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {
     this.latestMouseX = to.x;
     this.latestMouseY = to.y;
   }
+
+  async uploadFile(
+    files: string | string[],
+    clickAction: () => Promise<void>,
+  ): Promise<void> {
+    throw new Error(
+      'File upload is not supported in Chrome Extension mode. Please use Playwright or Puppeteer for file upload functionality.',
+    );
+  }
 }
diff --git a/packages/web-integration/src/playwright/ai-fixture.ts b/packages/web-integration/src/playwright/ai-fixture.ts
@@ -148,6 +148,7 @@ export const PlaywrightAiFixture = (options?: {
       | 'aiString'
       | 'aiBoolean'
       | 'aiAsk'
+      | 'aiUploadFile'
       | 'runYaml'
       | 'setAIActionContext'
       | 'evaluateJavaScript'
@@ -507,6 +508,18 @@ export const PlaywrightAiFixture = (options?: {
         aiActionType: 'aiAsk',
       });
     },
+    aiUploadFile: async (
+      { page }: { page: OriginPlaywrightPage },
+      use: any,
+      testInfo: TestInfo,
+    ) => {
+      await generateAiFunction({
+        page,
+        testInfo,
+        use,
+        aiActionType: 'aiUploadFile',
+      });
+    },
     runYaml: async (
       { page }: { page: OriginPlaywrightPage },
       use: any,
@@ -650,6 +663,9 @@ export type PlayWrightAiFixtureType = {
   aiAsk: (
     ...args: Parameters<PageAgent['aiAsk']>
   ) => ReturnType<PageAgent['aiAsk']>;
+  aiUploadFile: (
+    ...args: Parameters<PageAgent['aiUploadFile']>
+  ) => ReturnType<PageAgent['aiUploadFile']>;
   runYaml: (
     ...args: Parameters<PageAgent['runYaml']>
   ) => ReturnType<PageAgent['runYaml']>;

diff --git a/packages/web-integration/src/puppeteer/base-page.ts b/packages/web-integration/src/puppeteer/base-page.ts
@@ -6,6 +6,7 @@ import type {
   Point,
   Rect,
   Size,
+  TUserPrompt,
   UIContext,
 } from '@midscene/core';
 import {
@@ -686,6 +687,41 @@ export class Page<
       await page.mouse.up({ button: 'left' });
     }
   }
+
+  async uploadFile(
+    files: string | string[],
+    clickAction: () => Promise<void>,
+  ): Promise<void> {
+    const { resolve } = await import('node:path');
+    const { existsSync } = await import('node:fs');
+
+    // 路径标准化处理
+    const normalizedFiles = (Array.isArray(files) ? files : [files]).map(
+      (file) => {
+        const absolutePath = resolve(file);
+        if (!existsSync(absolutePath)) {
+          throw new Error(`File not found: ${file}`);
+        }
+        return absolutePath;
+      },
+    );
+
+    if (this.interfaceType === 'puppeteer') {
+      const page = this.underlyingPage as PuppeteerPage;
+      const [fileChooser] = await Promise.all([
+        page.waitForFileChooser(),
+        clickAction(),
+      ]);
+      await fileChooser.accept(normalizedFiles);
+    } else if (this.interfaceType === 'playwright') {
+      const page = this.underlyingPage as PlaywrightPage;
+      const [fileChooser] = await Promise.all([
+        page.waitForEvent('filechooser'),
+        clickAction(),
+      ]);
+      await fileChooser.setFiles(normalizedFiles);
+    }
+  }
 }
 
 export function forceClosePopup(

diff --git a/packages/web-integration/src/web-page.ts b/packages/web-integration/src/web-page.ts
@@ -16,6 +16,7 @@ import {
   defineActionScroll,
   defineActionSwipe,
   defineActionTap,
+  defineActionUploadFile,
 } from '@midscene/core/device';
 
 import { sleep } from '@midscene/core/utils';
@@ -626,6 +627,22 @@ export const commonWebActionsForWebPage = <T extends AbstractWebPage>(
     await page.clearInput(element as unknown as ElementInfo);
   }),
 
+  ...('uploadFile' in page && typeof page.uploadFile === 'function'
+    ? [
+        defineActionUploadFile(async (param) => {
+          const element = param.locate;
+          assert(element, 'Element not found, cannot upload file');
+
+          await (page as any).uploadFile(param.files, async () => {
+            // Click the upload button to trigger file chooser
+            await page.mouse.click(element.center[0], element.center[1], {
+              button: 'left',
+            });
+          });
+        }),
+      ]
+    : []),
+
   defineAction({
     name: 'Navigate',
     description:

diff --git a/packages/web-integration/tests/ai/fixtures/file-upload.html b/packages/web-integration/tests/ai/fixtures/file-upload.html
@@ -0,0 +1,57 @@
+<!DOCTYPE html>
+<html>
+<head>
+    <title>File Upload Test</title>
+    <style>
+        body { font-family: Arial, sans-serif; padding: 20px; }
+        .upload-area { border: 2px dashed #ccc; padding: 20px; margin: 20px 0; text-align: center; }
+        .upload-btn { background: #007bff; color: white; padding: 10px 20px; border: none; cursor: pointer; }
+        .file-list { margin-top: 20px; }
+        .file-item { padding: 5px; background: #f0f0f0; margin: 5px 0; }
+    </style>
+</head>
+<body>
+    <h1>File Upload Test Page</h1>
+
+    <div class="upload-area">
+        <input type="file" id="file-input" multiple style="display: none;">
+        <button class="upload-btn" onclick="document.getElementById('file-input').click()">Choose Files</button>
+        <p>Supports multiple file upload</p>
+    </div>
+
+    <div class="upload-area">
+        <input type="file" id="single-file-input" style="display: none;">
+        <button class="upload-btn" onclick="document.getElementById('single-file-input').click()">Choose Single File</button>
+        <p>Supports single file upload only</p>
+    </div>
+
+    <div class="file-list" id="file-list">
+        <h3>Selected Files:</h3>
+        <div id="selected-files"></div>
+    </div>
+
+    <script>
+        document.getElementById('file-input').addEventListener('change', function(e) {
+            const files = Array.from(e.target.files);
+            displayFiles(files, 'multiple');
+        });
+
+        document.getElementById('single-file-input').addEventListener('change', function(e) {
+            const files = Array.from(e.target.files);
+            displayFiles(files, 'single');
+        });
+
+        function displayFiles(files, type) {
+            const container = document.getElementById('selected-files');
+            container.innerHTML = '';
+
+            files.forEach(file => {
+                const div = document.createElement('div');
+                div.className = 'file-item';
+                div.textContent = `${file.name} (${file.size} bytes) - ${type}`;
+                container.appendChild(div);
+            });
+        }
+    </script>
+</body>
+</html>
diff --git a/packages/web-integration/tests/ai/fixtures/relative-test.txt b/packages/web-integration/tests/ai/fixtures/relative-test.txt
@@ -0,0 +1 @@
+Relative path test