GUI_Spotlight is a think-with-image
GUI visual grounding model. For each step, it first calls tooling to crop the image according to its own predictions, and then returns an exact coordinate location.
cd GUI_Spotlight
conda create --name spotlight python=3.12
conda activate spotlight
conda install -c conda-forge uv
uv pip install -e .
Screenspot-pro
python screenspot_pro_evaluation.py
OSWorld-G (Need to download the dataset by yourself)
python osworld_g_evaluation.py \
--model Bin12345/GUI-Spotlight \
--dataset_json OSWorld-G_refined.json \
--images_dir OSWorld-G/benchmark/images \
--batch_size 1
UI-Vision (Need to download the dataset by yourself)
python uivision_evaluation.py \
--model Bin12345/GUI-Spotlight \
--dataset_json `uivision/annotations` \
--images_dir `ui-vision/images` \
--batch_size 1
python inference.py --prompt `Your prompt` --image_path `Image Path` --model `The name of the model`