Combobench Evaluates LLMs’ Ability To Translate 262 Virtual Reality Actions Into Device Manipulations

Researchers have created a new benchmark, ComboBench, to assess how well large language models translate instructions into the precise sequences of actions needed to play virtual reality games, revealing that while some models demonstrate strong planning skills, they still lag behind human players in understanding complex game interactions and spatial reasoning.