EmbodiedScan
EmbodiedScan enhances embodied AI with a robust multi-modal 3D dataset, supporting effective visual grounding and scene interaction tasks in varied environments. The dataset offers over 5k scans, 1M ego-centric RGB-D views, and 160k categorized 3D boxes, bridging scene perception and language interaction. The Embodied Perceptron, a baseline framework, advances input processing for both structured tasks and real-world applications, with improvements such as dense semantic occupancy mapping and LVIS category compatibility.