[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
May 29, 2025 - Python
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
An open-sourced end-to-end VLM-based GUI Agent
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exploration.
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
This is the official website for TuriX Computer-use-Agent
🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
This is a quick test of Chinese Scripting Language powered by AI. You can use it to open any text file. No illegal use is allowed! Free for commercial use and academic use.
A think-with-image GUI visual grounding model.
This is a quick test of Chinese Scripting Language powered by AI. You can use it to open any text file. No illegal use is allowed! Free for commercial use and academic use.
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."