We present MAI-UI, a family of GUI agent foundation models spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. Our core contribution includes:
- 🔧 Agent-user interaction and MCP augmentation: enabling agent to interact with user and use MCP tools to complete the task.
- ☁️ Device–cloud collaboration system: dynamically selecting on-device or cloud execution based on task execution state and data sensitivity.
- 📈 Dynamic RL Scaling: large-scale reinforcement learning with scaling parallel environments (up to 512) and context length (up to 50).
- 🏆 State-of-the-Art Performance: MAI-UI establishes new benchmark SOTA results across GUI grounding and navigation tasks.
Overview of MAI-UI performance
- [2026-03-20] 📄 Blog Posts: Our Grounding and Navigation Blog Posts are available now!
- [2026-01-15] 🥇 New Record on AndroidWorld: MAI-UI-235B takes #1 on the AndroidWorld Leaderboard for pure-vision, end-to-end models with a 76.7% success rate.
- [2026-01-13] 🥇 MAI-UI Sweeps ScreenSpot-Pro: MAI-UI (32B, 8B, 2B) now ranks #1 in all size categories on the ScreenSpot-Pro leaderboard. We achieved record scores of 67.9%, 65.7%, and 57.4% respectively—notably reaching these benchmarks without any zoom-in tricks.
- [2026-01-04] 🤝 We're Hiring! We're actively looking for Research Scientists, Engineers, and Interns to work on foundational GUI agents and their applications. Interested candidates please send your resume to: yue.w@alibaba-inc.com
- [2025-12-29] 🏆 New Leaderboard Record: MAI-UI achieves a 41.7% success rate on the MobileWorld benchmark, setting a new record for end-to-end model performance!
- [2025-12-29] 📄 Technical Report & Website: Our technical report is now available on arXiv, and the official project website is live.
- [2025-12-29] 🤗 Model Release: We are excited to release the weights for MAI-UI-8B and MAI-UI-2B on Hugging Face.
Trigger ask_user for more information to complete the task.
User instruction: 去盒马买菜,买一份雪花牛肉卷、一份娃娃菜、一份金针菇,再随便买一个豆制品。对了,去日历中待办里检查下我老婆有什么要在盒马买的,我确认下要不要一起买 |
Use mcp_call to invoke AMap tools for navigation.
User instruction: 我现在在阿里巴巴云谷园区,我要先去 招商银行取钱,再去城西银泰城。帮我规划公交地铁出行的路线,选一家在4公里以内的、用时最短的招商银行,两段行程总时间不要超过2小时,把规划行程记在笔 记中我一会看,标题为下午行程,内容为两段行程细节 |
Cross-apps collaboration to complete the task.
Cross-apps collaboration to complete the task.
User instruction: 我需要紧急出差上海,帮我去12306查询现在最早从杭州西站去上海虹桥、有二等座票的班次,在钉钉前沿技术研讨群里把到达时间同步给大家,再把我和水番的会议日程改到明天同一时间,在群里发消息@他,礼貌解释因为临时出差调整会议时间,询问他明天是否有空 |
Device-cloud collaboration for simple tasks, no need cloud model invocation.
User Instruction: 去飞猪查询12月25日去,28日回,杭州到三亚的往返机票 |
Device-cloud collaboration for complex tasks, requiring cloud model invocation when the task is beyond the device models capabilities.
User Instruction: 去淘票票给我买一张25号下午的疯狂动物城2的电影票,选亲橙里的电影院,中间的座位,加一份可乐和爆米花的单人餐,停在最后的订单界面 |
git clone https://github.com/Tongyi-MAI/MAI-UI.git
cd MAI-UIDownload the model from HuggingFace and deploy the API service using vLLM:
HuggingFace model path:
Deploy the model using vLLM:
# Install vLLM
pip install vllm==0.11.0 # vllm==0.11.0 and transformers>=4.57.0
# Start vLLM API server (replace MODEL_PATH with your local model path or HuggingFace model ID)
python -m vllm.entrypoints.openai.api_server \
--model <huggingface_model_path> \
--served-model-name MAI-UI-8B \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--trust-remote-code💡 Tips:
- IMPORTANT: Must use
VLLM=0.11.0- Adjust
--tensor-parallel-sizebased on your GPU count for multi-GPU inference- The model will be served at
http://localhost:8000/v1
pip install -r requirements.txtWe provide two notebooks in the cookbook/ directory:
The grounding.ipynb demonstrates how to use the MAI Grounding Agent to locate UI elements:
cd cookbook
jupyter notebook grounding.ipynbBefore running, update the API endpoint in the notebook:
agent = MAIGroundingAgent(
llm_base_url="http://localhost:8000/v1", # Update to your vLLM server address
model_name="MAI-UI-8B", # Use the served model name
runtime_conf={
"history_n": 3,
"temperature": 0.0,
"top_k": -1,
"top_p": 1.0,
"max_tokens": 2048,
},
)The run_agent.ipynb demonstrates the full UI navigation agent:
cd cookbook
jupyter notebook run_agent.ipynbSimilarly, update the API endpoint configuration:
agent = MAIUINaivigationAgent(
llm_base_url="http://localhost:8000/v1", # Update to your vLLM server address
model_name="MAI-UI-8B", # Use the served model name
runtime_conf={
"history_n": 3,
"temperature": 0.0,
"top_k": -1,
"top_p": 1.0,
"max_tokens": 2048,
},
)If you find this project useful for your research, please consider citing our works:
@article{zhou2025mai,
title={MAI-UI Technical Report: Real-World Centric Foundation GUI Agents},
author={Zhou, Hanzhang and Zhang, Xu and Tong, Panrong and Zhang, Jianan and Chen, Liangyu and Kong, Quyu and Cai, Chenglin and Liu, Chen and Wang, Yue and Zhou, Jingren and others},
journal={arXiv preprint arXiv:2512.22047},
year={2025}
}
@article{kong2025mobileworld,
title={MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments},
author={Kong, Quyu and Zhang, Xu and Yang, Zhenyu and Gao, Nolan and Liu, Chen and Tong, Panrong and Cai, Chenglin and Zhou, Hanzhang and Zhang, Jianan and Chen, Liangyu and others},
journal={arXiv preprint arXiv:2512.19432},
year={2025}
}
@article{chen2025ui,
title={UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning},
author={Chen, Liangyu and Zhou, Hanzhang and Cai, Chenglin and Zhang, Jianan and Tong, Panrong and Kong, Quyu and Zhang, Xu and Liu, Chen and Liu, Yuqi and Wang, Wenxuan and others},
journal={arXiv preprint arXiv:2510.20286},
year={2025}
}For questions and support, please contact:
-
Hanzhang Zhou
Email: hanzhang.zhou@alibaba-inc.com -
Xu Zhang
Email: hanguang.zx@alibaba-inc.com -
Yue Wang
Email: yue.w@alibaba-inc.com
MAI-UI Mobile is a foundation GUI agent developed by Alibaba Cloud and licensed under the Apache License (Version 2.0).
This product contains various third-party components under other open source licenses. See the NOTICE file for more information.






