Multi-agent Planning using Visual Language Models

ECAI 2024

take the phone from the bed to the nightstand (trial_T20190909_100946_496614)

Ground truth Single Agent with table Multi Agent with table Single Agent with image Multi Agent with image
STEP 1: go to the bed
STEP 2: pick up the phone
STEP 3: take the phone to the nightstand
STEP 4: stand the phone up on the stand
Turn around and walk to the bed.,
Pick up the CellPhone from the bed.,
Turn around and walk to the nightstand.,
Place the CellPhone on the nightstand.
Turn and walk to the bed.,
Take the cellphone from the bed.,
Turn and walk to the side table.,
Place the cellphone on the side table.
1. Walk to the bed, which is positioned in the center of the room.
2. Pick up the phone from the bed.
3. Turn towards the nightstand, which is located to the right of the bed.
4. Walk to the nightstand next to the bed.
5. Place the phone on the nightstand.
1. Walk to the bed, located in the center of the room.
2. Search for the phone on the bed.
3. Pick up the phone from the bed.
4. Turn left to face the nightstand, which is next to the bed.
5. Walk to the nightstand, located to the left of the bed.
6. Place the phone on the nightstand.
Architecture