MRPilot

MRPilot: A Mixed-Reality System for Responsive Navigation of General Procedural Tasks

¹Shenzhen University, ²HKUST
IEEE International Symposium on Mixed and Augmented Reality 2025
^*Corresponding Author

Abstract

People often need guidance to complete tasks with specific requirements or sophisticated steps, such as preparing a meal or assembling furniture. Traditional guidance often relies on unstructured paper instructions that require people to switch between reading instructions and performing actions, resulting in an unsmooth user experience. Recent Mixed Reality (MR) systems alleviate this problem by giving spatialized navigation but demand an authoring step and, therefore, cannot be easily adapted to general tasks. We propose MRPilot, an MR system empowered by Large Language Models (LLMs) and Computer Vision techniques, offering responsive navigation for general tasks without pre-authoring. MRPilot consists of three modules: a Navigation Builder Module using LLMs to generate structured instructions, an Object Anchor Module exploiting Computer Vision techniques to anchor physical objects with virtual proxies, and an Action Recommendation Module giving responsive navigation according to users’ interactions with physical objects. MRPilot bridges the gap between virtual instructions and physical interactions for general tasks, providing contextual and responsive navigation. We conducted a user study to compare MRPilot with a baseline MR system that also exploited LLMs. The results confirmed the effectiveness of MRPilot.

User Interface

Preparation user interface. (a) The user provides task instructions via voice or text to initiate task preparation mode. Note that this panel is also used in Baseline system. (b) The user captures the scene for object recognition using the "Capture Your Scene" button. (c) This panel displays the generated instruction draft, where the user can either accept or discard it. (d) The user enters Object Anchoring mode by clicking "Start Tracking", allowing MRPilot to anchor virtual visual cues above physical objects in the environment. After all objects are anchored, the user can click the "Start MRPilot" button to begin task navigation.

Task consumption mode user interface. (a) Guidance overview panel. (b) The task navigation panel shows the current step and recommended steps. (c) Upon selecting a recommended step, other recommended steps will be discarded. (d) Feedback from the LLM displays in Baseline, where users can also regenerate the result using the interface in Figure 4 (a).

Experiments

**The NASA-TLX (left) and SUS (right) distribution for MRPilot (without hatching) and Baseline (with hatching).** The numbers within each bar segment represent the number of participants who selected the corresponding response option. Statistical significance is indicated above each bar segment (+ : .050 < p < .100, * : p < .050, : p < .010, * : p < .001). A more comprehensive statistical analysis and boxplot figures are provided in the supplemental material.

The distribution of automatic and manual switch proportions across individual participants. Percentages within each bar segment represent the relative frequency of each switch type reported by participants. Tasks completed by each participant using MRPilot are shown in parentheses. Detailed task assignments are provided in the supplemental material.

BibTeX

@INPROCEEDINGS{11220436, author={Yang, Hongliang and Zhou, Jin and Xu, Pengfei and Fu, Hongbo and Huang, Hui}, booktitle={2025 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)}, title={MRpilot: A Mixed-Reality System for Responsive Navigation of General Procedural Tasks}, year={2025}, volume={}, number={}, pages={186-196}, keywords={Computer vision;Technological innovation;Navigation;Large language models;Mixed reality;Virtual reality;Switches;Medical services;User experience;Real-time systems;Mix Reality;Task Guidance System;Context-aware Interaction;Large Language Models}, doi={10.1109/ISMAR67309.2025.00031}}

MRPilot: A Mixed-Reality System for Responsive Navigation of General Procedural Tasks

Abstract

Task preparation of MRPilot. A user starts by providing textual instructions via speech or text, followed by an image capture of the environment. These inputs are processed by the Navigation Builder Module, which identifies task-relevant objects and generates an action transition graph.

User Interface

Experiments

BibTeX