Hi there, I’am Muyi Bao(包沐亦)

I am currently an M.S. student in Electrical and Computer Engineering at Carnegie Mellon University. My research interests lie in Embodied AI and multimodal foundation models for robotics. I am particularly interested in building vision-language agents that can perceive, reason, and act in embodied environments. At CMU, I am working with Dr. Ji Zhang and Dr. Wenshan Wang on Embodied AI, with a focus on Vision-and-Language Navigation using vision-language models. Before joining CMU, I received my B.Eng. degree in Computer Science and Technology from Xi’an Jiaotong-Liverpool University in 2025. During my undergraduate studies, I focused on computer vision and was fortunate to work with Prof. Guangliang Cheng, Prof. Wei Wang, and Prof. Ming Xu.

My resume can be found here (updated in 2025.08.10) and My email is muyib@andrew.cmu.edu.

I am actively looking for Ph.D. opportunities starting in Fall 2027, with research interests in Embodied AI and multimodal foundation models for robotics.

News

  • [Now] I am working on Vision-and-Language Navigation, where I fine-tune vision-language models to predict navigable goal pixels for embodied navigation (most recent works predict actions).
  • [Feb. 2026] Our survey paper, Vision Mamba in Remote Sensing, was accepted by Remote Sensing.
  • [Aug. 2025] I joined Carnegie Mellon University as an M.S. student in Electrical and Computer Engineering.
  • [Jul. 2025] FTCFormer was accepted by ECAI 2025, the European Conference on Artificial Intelligence.
  • [Jun. 2025] I received my B.Eng. degree in Computer Science and Technology from Xi'an Jiaotong-Liverpool University.
  • [Feb. 2025] My first paper, AlexCapsNet, was accepted by IEEE Access.
  • [Dec. 2024] One paper on Performance Analysis of Rendering optimization on Game Engine was accepted by UIC 2024, IEEE International Conference on Ubiquitous Intelligence and Computing.
Research Projects: 
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook

Authors: Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Huiyu Zhou, Jinchang Ren, Shiming Xiang, Xiangtai Li, Guangliang Cheng

Brief Description: The first survey of Vision Mamba in Remote Sensing

Venue: Remote Sensing, Feb 2026

Repository | Paper
FTCFormer: Fuzzy Token Clustering Transformer for Image Classification

Authors: Muyi Bao, Changyu Zeng, Yifan Wang, Zhengni Yang, Zimu Wang, Guangliang Cheng, Jun Qi and Wei Wang

Brief Description: A clustering-based downsampling method (to replace grid-based methods, such as maxpooling.)

Venue: ECAI2025, July 2025

Repository | Paper
ASP-VMUNet: Atrous Shifted Parallel Vision Mamba U-Net for Skin Lesion Segmentation

Authors: Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Changyu Zeng, Wenpei Bai, Guangliang Cheng

Brief Description: A Mamba/CNN hybrid model on skin lesion segmentation task.

Venue: arXiv, Mar 2025

Repository | Paper
Comparative Performance Analysis of Rendering Optimization Methods in Unity Tuanjie Engine, Unity Global and Unreal Engine

Authors: Muyi Bao, Zeren Tao, Xiaohan Wang, Jiashuo Liu, Qilei Sun

Brief Description: A comparative performance study of Level of Detail (Unity Global), Virtual Geometry (Tuanjie Engine) and Nanite (Unreal Engine)

Venue: UIC 2024, Dec 2024

Repository | Paper
AlexCapsNet: An Integrated Architecture for Image Classification with Background Noise

Authors: Muyi Bao, Ming Xu, Nanlin Jin

Brief Description: A CapsNet-based model for image classification task.

Venue: IEEE Access, Feb 2025

Repository | Paper