MiMo-V2-Omni
MiMo-V2-Omni is a unified omni-modal foundation model that seamlessly integrates perception and agentic action, enabling real-time understanding of images, video, and audio to autonomously execute complex tasks across digital and physical environments.