CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

Guang Yin², Yitong Li⁴, Yixuan Wang¹, Dale McConachie², Paarth Shah², Kunimatsu Hashimoto², Huan Zhang³ Katherine Liu² Yunzhu Li¹

¹Columbia University, ²Toyota Research Institute, ³University of Illinois Urbana-Champaign, ⁴Tsinghua University

teaser.mp4

🎮 Overview

This codebase consists of the following components:

./dataset                         # save generated data and corresponding checkpoints
./diffusion_policy_code           # save core code
./ref_lib                         # save pre-selected dino reference feature

In "diffusion_policy_code", the core components are following sub-directories:

./d3fields_dev                    # perception modules
./general_dp                      # diffusion policy
./eval_env                        # policy evaluation environment
./sapien_env                      # data generation environment

For data generation, the main function is in

./diffusion_policy_code/sapien_env/sapien_env/teleop/hang_mug.py

or

./diffusion_policy_code/sapien_env/sapien_env/teleop/pack_battery.py

For policy training , the main function is in

./diffusion_policy_code/general_dp/train.py

For policy evaluation, the main function is in

./diffusion_policy_code/general_dp/eval.py

🔨 Install

We use anaconda distribution for installation:

conda env create -f environment.yml
conda activate code_diffuser
cd diffusion_policy_code
pip install -e d3fields_dev/
pip install -e general_dp/
pip install -e robomimic/
pip install -e sapien_env/

💾 Generate Dataset

Generate from Existing Environments

We use the SAPIEN to build the simulation environments. To create the data of heuristic policy, use the following command:

python diffusion_policy_code/sapien_env/sapien_env/teleop/pack_battery.py \
        --start_idx [start_idx]\
        --end_idx [end_idx]\
        --dataset_dir [dataset_dir]\
        --resolution [resolution]\
        --view [view]

and

python diffusion_policy_code/sapien_env/sapien_env/teleop/hang_mug.py \
        --start_idx [start_idx]\
        --end_idx [end_idx]\
        --dataset_dir [dataset_dir]\
        --resolution [resolution]\
        --view [view]

Note that

[resolution] in ["low", "middle", "high"]
[view] in ["default", "improved"]

By above commands respectively for hang-mug and pack-battery, you can generate data in ''dataset_dir'' following your own settings.

For arguements, end_idx - start_idx means the total number of episodes. ''resolution'' and ''view'' mean the camera parameters.

Note that considering training load, low resolution is more suitable for rgb-based diffusion policy. And "improved" view is designed for pack-battery task specifically for better performance of baselines.

Generate Large-Scale Data

For our own typical setting, we use bash commands to generate large-scale data.

bash data_collect_[OBS]_[ATTN]_[TASK].sh

and

OBS in ["pcd", "rgb"]
ATTN in ["attn","none"]
TASK in ["battery", "mug"]

You can find all typical combinations in the main directory.

⚙️ Train

Train in Simulation

To run training, a training config needs to be specified.

A typical training script is like:

python diffusion_policy_code/general_dp/train.py \
        --config-dir=diffusion_policy_code/general_dp/config \
        --config-name=hang_mug_act_rgb_attn.yaml \
        training.seed=42 \
        training.device=cuda:0 \
        hydra.run.dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}'

You can find all existing configs in

diffusion_policy_code/general_dp/config

The filename of training config has a similar structure

[TASK]_[POLICY]_[OBS]_[ATTN].yaml

and

TASK in ["pack_battery", "hang_mug"]
POLICY in ["dp", "ACT"]
OBS in ["pcd", "rgb"]
ATTN in ["attn","none"]

All feasible training combination can be found in main directory.

train_[TASK]_[POLICY]_[OBS]_[ATTN].sh

Config Exploration

To run these training bash in local machines, you need to make sure the path is correct although relative path has been used:

_target_: diffusion_policy.workspace.train_act_workspace.TrainACTWorkspace
is_real: false
robot_name: panda
data_root: diffusion_policy_code/general_dp # path for "general_dp"
trial_name: hang_mug_act_rgb_attn
dataset_dir: ./dataset/rgb_attn_mug
output_dir: ${dataset_dir}/${trial_name}
......

For observation settings, you can find it in

d3fields_2d:
  shape:
    - 4 # C
    - 1600 # N
  type: rgb
  info:
    init_name: mug
    tgt_name: branch
......

or

  d3fields:
    shape:
      - 4 # C
      - 1600 # N
    type: spatial
    info:
      init_name: mug
      tgt_name: branch
      task_name: hang_mug
......

for RGB and PCD attention respectively.

🎮 Infer in Simulation

To run an existing policy in the simulator, use the following command:

python eval.py --checkpoint [ckpt_path] -o [eval_result_path]

This command will generate a directory in your argument "eval_result_path".

eval_results/
├── instruction_logs/     # results of attention grounding (if attn is not None)
│   ├── attn_eval.json    # success rate of attention grounding (not whole policy)
│   ├── log_0.txt         # generated code
│   ├── log_1.txt
│   └── ... (more log_[idx].txt)
├── meidia/               # videos for all evaluation episodes (if attn is not None)
│   ├── test_0.mp4
│   ├── test_1.mp4
│   └── ... (more test_[idx].mp4)
└── env_attn.json         # success rate of whole policy

✅ To-Do List

Upload full code for experiments in simulation
Upload running scripts for data generation and training
Code for real world deployment
Standalone example for 3D attention map
Benchmark generation
Dataset and checkpoint downloading

🙏 Acknowledgement

This repository is built upon the following repositories. Thanks for their great work!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
diffusion_policy_code		diffusion_policy_code
media		media
ref_lib		ref_lib
run_scripts		run_scripts
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
imgui.ini		imgui.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

📑 Table of Contents

🎮 Overview

🔨 Install

💾 Generate Dataset

Generate from Existing Environments

Generate Large-Scale Data

⚙️ Train

Train in Simulation

Config Exploration

🎮 Infer in Simulation

✅ To-Do List

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

lyttttt3333/CodeDiffuser

Folders and files

Latest commit

History

Repository files navigation

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

📑 Table of Contents

🎮 Overview

🔨 Install

💾 Generate Dataset

Generate from Existing Environments

Generate Large-Scale Data

⚙️ Train

Train in Simulation

Config Exploration

🎮 Infer in Simulation

✅ To-Do List

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages