Skip to content

matthiaskern/overcooked

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Macro-Action-Based Multi-Agent Reinforcement Learning

Introduction

Adapted from gym-cooking.

Robots need to learn cooperating with each other to prepare a certain dish according to the recipe and deliver it to the `star' counter cell as soon as possible. The challenge is that the recipe is unknown to robots. Robots have to learn the correct procedure in terms of picking up raw vegetables, chopping, and merging in a plate before delivering.

Installation

  • To install all the dependencies:
pip install -U "ray[data,train,tune,serve]"
pip install pandas
pip install numpy
pip install matplotlib
pip install gymnasium
pip install pygame
pip install scipy
pip install tensorboard
pip install dm_tree
pip install torch
pip install pillow
pip install lz4

Code structure

  • play.py: a toy for mutual playing the env.

  • train_rllib.py: training a marl model with rllib.

  • run_trained.py: test the trained model.

  • Agents.py: the agents in the environment.

  • ./environment/Overcooked.py: the main environment file.

  • ./environment/items.py: all the entities in the map(agent/food/plate/knife/delivery counter).

  • ./environment/render: resources and code for render.

Manual control

python play.py

Adding the setting you want to play. Eg:

python play.py --task 6 --grid_dim 7 7 --map_type A

Enter the index of the action(primitive/macro) for each agent. The index of each action is listed in the file.
Eg. When playing Overcooked_MA_V1, entering 1, 2 ,3. Agent1 go to get tomato. Agent2 go to get lettuce. Agent3 go to get onion.

Environment

Map A Map B Map C

Parameters

grid_dim(int, int): grid world size of the map
map_type(str): layout of the map
task(int): receipt agent cooks
mode(str): type of the observation
debug(bool): whehter print the debug information and render
  • grid_dim
[5, 5]: the size of the map is 5X5 
[7, 7]: the size of the map is 7X7 
[9, 9]: the size of the map is 9X9
  • map_type
A: map A
B: map B
C: map C
obs_radius: 0 1 2 3
  • task
TASKLIST = ["tomato salad", "lettuce salad", "onion salad", "lettuce-tomato salad", "onion-tomato salad", "lettuce-onion salad", "lettuce-onion-tomato salad"]

task :
0 : tomato salad
1 : lettuce salad
2 : onion salad
3 : lettuce-tomato salad
4 : onion-tomato salad
5 : lettuce-onion salad
6 : lettuce-onion-tomato salad
lettuce-tomato salad lettuce-onion-tomato salad
  • mode
vector: the observation is returned in vector 
image: the observation is returned in rgb array

Observation

  • vector
obs = [tomato.x, tomato.y, tomato.status, lettuce.x, lettuce.y, lettuce.status, onion.x, onion.y, onion.status, plate-1.x, plate-1.y, plate-2.x, plate-2.y, knife-1.x, knife-1.y, knife-2.x, knife-2.y, delivery.x, delivery.y, agent1.x, agent1.y, agent2.x, agent2.y, (agent3.x, agent3.y), onehotTask]  

Agents only observe the positions and status of the entities within obs_radius. The items not observed are masked as 0 in the corresponding dims.
  • image
if obs_radius > 0:
    height, width = 80 * (obs_radius * 2 + 1)
else:
    height, width = 80 * grid_dim
obs_size = [height, width, 3]

Action

  • Primitive-action
    right, down, left, up, stay

Reward

  • +10 for chopping a correct vegetable into pieces
  • +200 terminal reward for delivering the correct dish
  • −5 for delivering any wrong dish
  • −0.1 for every timestep

Termination

Env terminates when the correct dish is delivered.

Extention

The values of reward can be changed in rewardList. Users can add new map of different layout by adding map in overcooked_V1.py. The new map is allowed to change the position of entities or delete any entities. Adding new entities is not supported.

LLM Agent

To use the LLM-based agent:

  1. Install the LiteLLM package:
pip install litellm
  1. Set your API key (check out other providers https://docs.litellm.ai/docs/):
export OPENAI_API_KEY=your_api_key_here
export ANTHROPIC_API_KEY=your_api_key_here
export DEEPSEEK_API_KEY=your_api_key_here
  1. Co-play with AI agents:

Agent Types

  • human: Interactive human player (default)
  • llm: LLM-based agent using text observations
  • multimodal: LLM-based agent using visual observations
  • random: Random action agent
  • stationary: Stationary agent (no movement)

Human Player Types

  • interactive: Real human input (default)
  • stationary: Stationary human (no movement)
  • random: Random action human
  • llm: LLM-based human player
  • multimodal: Multimodal LLM-based human player

Examples

# Play with LLM agent as teammate
python play.py --agent llm --llm_model openai/gpt-4.1

# LLM vs stationary human
python play.py --agent llm --human stationary --llm_model openai/gpt-4.1

# Multimodal agent vs random human
python play.py --agent multimodal --human random --llm_model openai/gpt-4.1

# LLM vs LLM
python play.py --agent llm --human llm --llm_model openai/gpt-4.1

# Multimodal vs Multimodal
python play.py --agent multimodal --human multimodal --llm_model openai/gpt-4.1

Agent Comparison

Compare different agent configurations:

# Compare LLM vs stationary and multimodal vs random
python compare_agents.py --configs llm_vs_stationary multimodal_vs_random --llm_model openai/gpt-4.1 --trials 3

# Compare multimodal vs multimodal
python compare_agents.py --configs multimodal_vs_multimodal --llm_model openai/gpt-4.1 --trials 5 --task 6

# Test different horizon lengths
python compare_agents.py --configs llm_vs_stationary --horizon_lengths 3 5 7 --trials 3

Citations

If you are using MacroMARL in your research, please cite the corresponding papers listed below:

@InProceedings{xiao_neurips_2022,
  author = "Xiao, Yuchen and Wei, Tan and Amato, Christopher",
  title = "Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning",
  booktitle = "Proceedings of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS)",
  year = "2022"
}

About

For Human AI Collabration courses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%