Macro-Action-Based Multi-Agent Reinforcement Learning

Introduction

Robots need to learn cooperating with each other to prepare a certain dish according to the recipe and deliver it to the `star' counter cell as soon as possible. The challenge is that the recipe is unknown to robots. Robots have to learn the correct procedure in terms of picking up raw vegetables, chopping, and merging in a plate before delivering.

Installation

To install all the dependencies:

pip install -U "ray[data,train,tune,serve]"
pip install pandas
pip install numpy
pip install matplotlib
pip install gymnasium
pip install pygame
pip install scipy
pip install tensorboard
pip install dm_tree
pip install torch
pip install pillow
pip install lz4

Code structure

play.py: a toy for mutual playing the env.
train_rllib.py: training a marl model with rllib.
run_trained.py: test the trained model.
Agents.py: the agents in the environment.
./environment/Overcooked.py: the main environment file.
./environment/items.py: all the entities in the map(agent/food/plate/knife/delivery counter).
./environment/render: resources and code for render.

Manual control

python play.py

Adding the setting you want to play. Eg:

python play.py --task 6 --grid_dim 7 7 --map_type A

Enter the index of the action(primitive/macro) for each agent. The index of each action is listed in the file.
Eg. When playing Overcooked_MA_V1, entering 1, 2 ,3. Agent1 go to get tomato. Agent2 go to get lettuce. Agent3 go to get onion.

Environment

Map A	Map B	Map C

Parameters

grid_dim(int, int): grid world size of the map
map_type(str): layout of the map
task(int): receipt agent cooks
mode(str): type of the observation
debug(bool): whehter print the debug information and render

grid_dim

[5, 5]: the size of the map is 5X5 
[7, 7]: the size of the map is 7X7 
[9, 9]: the size of the map is 9X9

map_type

A: map A
B: map B
C: map C


obs_radius: 0	1	2	3

task

TASKLIST = ["tomato salad", "lettuce salad", "onion salad", "lettuce-tomato salad", "onion-tomato salad", "lettuce-onion salad", "lettuce-onion-tomato salad"]

task :
0 : tomato salad
1 : lettuce salad
2 : onion salad
3 : lettuce-tomato salad
4 : onion-tomato salad
5 : lettuce-onion salad
6 : lettuce-onion-tomato salad


lettuce-tomato salad	lettuce-onion-tomato salad

mode

vector: the observation is returned in vector 
image: the observation is returned in rgb array

Observation

vector

obs = [tomato.x, tomato.y, tomato.status, lettuce.x, lettuce.y, lettuce.status, onion.x, onion.y, onion.status, plate-1.x, plate-1.y, plate-2.x, plate-2.y, knife-1.x, knife-1.y, knife-2.x, knife-2.y, delivery.x, delivery.y, agent1.x, agent1.y, agent2.x, agent2.y, (agent3.x, agent3.y), onehotTask]  

Agents only observe the positions and status of the entities within obs_radius. The items not observed are masked as 0 in the corresponding dims.

image

if obs_radius > 0:
    height, width = 80 * (obs_radius * 2 + 1)
else:
    height, width = 80 * grid_dim
obs_size = [height, width, 3]

Action

Primitive-action
right, down, left, up, stay

Reward

+10 for chopping a correct vegetable into pieces
+200 terminal reward for delivering the correct dish
−5 for delivering any wrong dish
−0.1 for every timestep

Termination

Env terminates when the correct dish is delivered.

Extention

The values of reward can be changed in rewardList. Users can add new map of different layout by adding map in overcooked_V1.py. The new map is allowed to change the position of entities or delete any entities. Adding new entities is not supported.

LLM Agent

To use the LLM-based agent:

Install the LiteLLM package:

pip install litellm

Set your API key (check out other providers https://docs.litellm.ai/docs/):

export OPENAI_API_KEY=your_api_key_here
export ANTHROPIC_API_KEY=your_api_key_here
export DEEPSEEK_API_KEY=your_api_key_here

Co-play with AI agents:

Agent Types

human: Interactive human player (default)
llm: LLM-based agent using text observations
multimodal: LLM-based agent using visual observations
random: Random action agent
stationary: Stationary agent (no movement)

Human Player Types

interactive: Real human input (default)
stationary: Stationary human (no movement)
random: Random action human
llm: LLM-based human player
multimodal: Multimodal LLM-based human player

Examples

# Play with LLM agent as teammate
python play.py --agent llm --llm_model openai/gpt-4.1

# LLM vs stationary human
python play.py --agent llm --human stationary --llm_model openai/gpt-4.1

# Multimodal agent vs random human
python play.py --agent multimodal --human random --llm_model openai/gpt-4.1

# LLM vs LLM
python play.py --agent llm --human llm --llm_model openai/gpt-4.1

# Multimodal vs Multimodal
python play.py --agent multimodal --human multimodal --llm_model openai/gpt-4.1

Agent Comparison

Compare different agent configurations:

# Compare LLM vs stationary and multimodal vs random
python compare_agents.py --configs llm_vs_stationary multimodal_vs_random --llm_model openai/gpt-4.1 --trials 3

# Compare multimodal vs multimodal
python compare_agents.py --configs multimodal_vs_multimodal --llm_model openai/gpt-4.1 --trials 5 --task 6

# Test different horizon lengths
python compare_agents.py --configs llm_vs_stationary --horizon_lengths 3 5 7 --trials 3

Citations

If you are using MacroMARL in your research, please cite the corresponding papers listed below:

@InProceedings{xiao_neurips_2022,
  author = "Xiao, Yuchen and Wei, Tan and Amato, Christopher",
  title = "Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning",
  booktitle = "Proceedings of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS)",
  year = "2022"
}

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.idea		.idea
environment		environment
image		image
output-part-1		output-part-1
output-part-2		output-part-2
output-part-3		output-part-3
output-part-4		output-part-4
.gitignore		.gitignore
Agents.py		Agents.py
LICENSE		LICENSE
README.md		README.md
agent_comparison_combined_20250602_011857.csv		agent_comparison_combined_20250602_011857.csv
agent_comparison_combined_20250602_103104.csv		agent_comparison_combined_20250602_103104.csv
agent_comparison_individual_20250602_011857.csv		agent_comparison_individual_20250602_011857.csv
agent_comparison_individual_20250602_103104.csv		agent_comparison_individual_20250602_103104.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h3_20250602_004519.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h3_20250602_004519.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h3_20250602_092934.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h3_20250602_092934.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h5_20250602_005805.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h5_20250602_005805.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h5_20250602_095510.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h5_20250602_095510.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h9_20250602_010927.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h9_20250602_010927.csv
agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h9_20250602_101258.csv		agent_comparison_multimodal_vs_multimodal_tomato_salad_openai_gpt-4_1_h9_20250602_101258.csv
agent_comparison_results.csv		agent_comparison_results.csv
compare_agents.py		compare_agents.py
human_trials.csv		human_trials.csv
llm_agent.py		llm_agent.py
llm_agent_multimodal.py		llm_agent_multimodal.py
output.csv		output.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_001900_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_001900_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_001900_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_001900_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_003710_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_003710_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_004519_trial3.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_005805_trial3.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_010927_trial3.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_092934_trial3.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_095510_trial3.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial1.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial1.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial2.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial2.csv
output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial3.csv		output_multimodal_vs_multimodal_tomato_salad_20250602_101258_trial3.csv
play.py		play.py
plot_horizon_comparison.py		plot_horizon_comparison.py
plot_horizon_steps.py		plot_horizon_steps.py
plot_human_trials.py		plot_human_trials.py
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
run_trained.py		run_trained.py
train_rllib.py		train_rllib.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Macro-Action-Based Multi-Agent Reinforcement Learning

Introduction

Installation

Code structure

Manual control

Environment

Parameters

Observation

Action

Reward

Termination

Extention

LLM Agent

Agent Types

Human Player Types

Examples

Agent Comparison

Citations

About

Uh oh!

Releases

Packages

Languages

License

matthiaskern/overcooked

Folders and files

Latest commit

History

Repository files navigation

Macro-Action-Based Multi-Agent Reinforcement Learning

Introduction

Installation

Code structure

Manual control

Environment

Parameters

Observation

Action

Reward

Termination

Extention

LLM Agent

Agent Types

Human Player Types

Examples

Agent Comparison

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages