Voice2Action: Language Models as Agent for Efficient Real-Time Interaction in Virtual Reality

AI-generated keywords: Voice2Action Large Language Models Virtual Reality Hierarchical Analysis Real-time Interaction

AI-generated Key Points

  • The Voice2Action framework aims to address challenges of deploying Large Language Models (LLMs) in virtual reality (VR) environments.
  • LLMs are task-driven autonomous agents trained to follow natural language instructions with few examples.
  • Online interactions and complexity of manipulation categories in 3D environments have made it difficult to deploy LLMs in VR.
  • Voice2Action framework hierarchically analyzes voice signals and textual commands through action and entity extraction.
  • It divides execution tasks into canonical interaction subsets in real-time and prevents errors through environment feedback.
  • Voice2Action enables more efficient and accurate performance compared to approaches without optimizations.
  • Experiments conducted in an urban engineering VR environment using synthetic instruction data showed that Voice2Action outperformed other approaches without optimizations.
  • This work highlights the potential of using LLMs as agents for efficient real-time interaction in VR.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yang Su

License: CC BY 4.0

Abstract: Large Language Models (LLMs) are trained and aligned to follow natural language instructions with only a handful of examples, and they are prompted as task-driven autonomous agents to adapt to various sources of execution environments. However, deploying agent LLMs in virtual reality (VR) has been challenging due to the lack of efficiency in online interactions and the complex manipulation categories in 3D environments. In this work, we propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction and divides the execution tasks into canonical interaction subsets in real-time with error prevention from environment feedback. Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.

Submitted to arXiv on 29 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.00092v1

The Voice2Action framework, proposed by Yang Su from Cornell Tech, aims to address the challenges of deploying Large Language Models (LLMs) in virtual reality (VR) environments. LLMs are trained to follow natural language instructions with only a few examples and are designed as task-driven autonomous agents that can adapt to different execution environments. However, the lack of efficiency in online interactions and the complexity of manipulation categories in 3D environments have made it difficult to deploy LLMs in VR. The <Organization>Voice2Action</Organization> framework hierarchically analyzes customized voice signals and textual commands through action and entity extraction. It then divides the execution tasks into canonical interaction subsets in real-time while also preventing errors through feedback from the environment. The goal is to enable more efficient and accurate performance compared to approaches without optimizations. To evaluate the effectiveness of <Organization>Voice2Action</Organization>, experiments were conducted in an urban engineering VR environment using synthetic instruction data. The results demonstrated that <Organization>Voice2Action</Organization> outperformed other approaches without optimizations. This work highlights the potential of using LLMs as agents for efficient real-time interaction in VR.
Created on 09 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.