Hi there, I’am Muyi Bao(包沐亦)
I am currently an M.S. student in Electrical and Computer Engineering at Carnegie Mellon University. My research interests lie in Embodied AI and multimodal foundation models for robotics. I am particularly interested in building vision-language agents that can perceive, reason, and act in embodied environments. At CMU, I am working with Dr. Ji Zhang and Dr. Wenshan Wang on Embodied AI, with a focus on Vision-and-Language Navigation using vision-language models. Before joining CMU, I received my B.Eng. degree in Computer Science and Technology from Xi’an Jiaotong-Liverpool University in 2025. During my undergraduate studies, I focused on computer vision and was fortunate to work with Prof. Guangliang Cheng, Prof. Wei Wang, and Prof. Ming Xu.
My resume can be found here (updated in 2025.08.10) and My email is muyib@andrew.cmu.edu.
I am actively looking for Ph.D. opportunities starting in Fall 2027, with research interests in Embodied AI and multimodal foundation models for robotics.
News
- [Now] I am working on Vision-and-Language Navigation, where I fine-tune vision-language models to predict navigable goal pixels for embodied navigation (most recent works predict actions).
- [Feb. 2026] Our survey paper, Vision Mamba in Remote Sensing, was accepted by Remote Sensing.
- [Aug. 2025] I joined Carnegie Mellon University as an M.S. student in Electrical and Computer Engineering.
- [Jul. 2025] FTCFormer was accepted by ECAI 2025, the European Conference on Artificial Intelligence.
- [Jun. 2025] I received my B.Eng. degree in Computer Science and Technology from Xi'an Jiaotong-Liverpool University.
- [Feb. 2025] My first paper, AlexCapsNet, was accepted by IEEE Access.
- [Dec. 2024] One paper on Performance Analysis of Rendering optimization on Game Engine was accepted by UIC 2024, IEEE International Conference on Ubiquitous Intelligence and Computing.
| Research Projects: | |
|---|---|
![]() | Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook Authors: Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Huiyu Zhou, Jinchang Ren, Shiming Xiang, Xiangtai Li, Guangliang Cheng Brief Description: The first survey of Vision Mamba in Remote Sensing Venue: Remote Sensing, Feb 2026 Repository | Paper |
![]() | FTCFormer: Fuzzy Token Clustering Transformer for Image Classification Authors: Muyi Bao, Changyu Zeng, Yifan Wang, Zhengni Yang, Zimu Wang, Guangliang Cheng, Jun Qi and Wei Wang Brief Description: A clustering-based downsampling method (to replace grid-based methods, such as maxpooling.) Venue: ECAI2025, July 2025 Repository | Paper |
![]() | ASP-VMUNet: Atrous Shifted Parallel Vision Mamba U-Net for Skin Lesion Segmentation Authors: Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Changyu Zeng, Wenpei Bai, Guangliang Cheng Brief Description: A Mamba/CNN hybrid model on skin lesion segmentation task. Venue: arXiv, Mar 2025 Repository | Paper |
![]() | Comparative Performance Analysis of Rendering Optimization Methods in Unity Tuanjie Engine, Unity Global and Unreal Engine Authors: Muyi Bao, Zeren Tao, Xiaohan Wang, Jiashuo Liu, Qilei Sun Brief Description: A comparative performance study of Level of Detail (Unity Global), Virtual Geometry (Tuanjie Engine) and Nanite (Unreal Engine) Venue: UIC 2024, Dec 2024 Repository | Paper |
![]() | AlexCapsNet: An Integrated Architecture for Image Classification with Background Noise Authors: Muyi Bao, Ming Xu, Nanlin Jin Brief Description: A CapsNet-based model for image classification task. Venue: IEEE Access, Feb 2025 Repository | Paper |





