Skip to content

bin123apple/GUI_Spotlight

Repository files navigation

GUI Spotlight Logo GUI-Spotlight: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding.

Paper Hugging Face Model Hugging Face Dataset

Introduction

GUI_Spotlight is a think-with-image GUI visual grounding model. For each step, it first calls tooling to crop the image according to its own predictions, and then returns an exact coordinate location.

Setup

cd GUI_Spotlight
conda create --name spotlight python=3.12
conda activate spotlight
conda install -c conda-forge uv
uv pip install -e .

Evaluation

Screenspot-pro

python screenspot_pro_evaluation.py

OSWorld-G (Need to download the dataset by yourself)

python osworld_g_evaluation.py \
  --model Bin12345/GUI-Spotlight \
  --dataset_json OSWorld-G_refined.json \
  --images_dir OSWorld-G/benchmark/images \
  --batch_size 1

UI-Vision (Need to download the dataset by yourself)

python uivision_evaluation.py \
  --model Bin12345/GUI-Spotlight \
  --dataset_json `uivision/annotations` \
  --images_dir `ui-vision/images` \
  --batch_size 1

Single Sample Inference

python inference.py --prompt `Your prompt` --image_path `Image Path` --model `The name of the model`

About

A think-with-image GUI visual grounding model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages