I am currently a researcher at Shanghai AI Labotory. My research interests include Embodied AI, Computer Vision, Robotic Manipulation and Autonomous Driving.

I received my PhD degree from Robotics Institute, Shanghai Jiao Tong University, supervised by Prof. Honghai Liu, and obtained my bachelor’s degree from Central South University. I am currently working with Jiangmiao Pang on Embodied AI. Over the preceding period, I have worked with Prof. Hongyang Li and Prof. Yu Qiao at Shanghai AI Labotory.

I am an interdisciplinary lifelong learner, with an academic journey that has evolved from bio-mechatronics, computer vision, and autonomous driving to embodied AI. I am currently dedicated to advancing Embodied AGI, with an emphasis on generalizable robotic manipulation.

📝 Selected Publications

🤖 Embodied AI * indicates equal contribution

RSS 2024
sym

Learning Manipulation by Predicting Interaction

Jia Zeng$^\ast$, Qingwen Bu$^\ast$, Bangjun Wang$^\ast$, Wenke Xia$^\ast$, Li Chen, Hao Dong, H.Song, D.Wang, D.Hu, P.Luo, H.Cui, B.Zhao, X.Li, Y.Qiao, Hongyang Li

  • We propose a representation learning framework towards robotic manipulation that learns Manipulation by Predicting Interaction (MPI).
  • RSS 2024 | Project Page | Code
NeurIPS 2024
sym

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu$^\ast$, Jia Zeng$^\ast$, Chen Li$^\ast$, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, D.Hu, Yi Ma, Hongyang Li

  • We propose ​CLOVER, which employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then leverages these sub-goals to guide the feedback-driven policy to generate actions with an error measurement strategy.
  • NeurIPS 2024 | Code
ICLR 2025
sym

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian$^\ast$, Sizhe Yang$^\ast$, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong, Jiangmiao Pang

  • We propose an end-to-end model, Seer, which employs an inverse dynamics model based on the robot’s predicted visual states to forecast actions. We term such architectural framework the Predictive Inverse Dynamics Model (PIDM).

  • ICLR 2025 | Project Page | Code

RSS 2025
sym

Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation

Sizhe Yang$^\ast$, Wenye Yu$^\ast$, Jia Zeng, Jun Lv, Kerui Ren, Cewu Lu, Dahua Lin, Jiangmiao Pang

  • RoboSplat is framework that leverages 3D Gaussian Splatting (3DGS) to generate novel demonstrations for RGB-based policy learning. RoboSplat can generate diverse and visually realistic data across six types of generalization (object poses, object types, camera views, scene appearance, lighting conditions, and embodiments).

  • RSS 2025 | Project Page | Code

RSS 2025
sym

HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit

Qingwei Ben$^\ast$, Feiyu Jia$^\ast$, Jia Zeng, Junting Dong, Dahua Lin, Jiangmiao Pang

  • we propose HOMIE, a novel humanoid teleoperation cockpit integrates a humanoid loco-manipulation policy and a low-cost exoskeleton-based hardware system.

  • RSS 2025 | Project Page | Code

RSS 2025
sym

Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation

Yuyin Yang$^\ast$, Zetao Cai$^\ast$, Yang Tian, Jia Zeng, Jiangmiao Pang

  • PPI integrates the prediction of target gripper poses and object pointflow with the continuous actions estimation, providing enhanced spatial localization and satisfying flexibility in handling movement restrictions.

  • RSS 2025 | Project Page | Code

🚗 Autonomous Driving

CVPR 2023
sym

Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection

Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li

  • We apply knowledge distillation to camera-only 3D object detection, investigate how to distill focal knowledge when confronted with an imperfect 3D object detector teacher.
  • CVPR 2023
Preprint
sym

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang, Huijie Wang, Jia Zeng, et al.

  • We propose a geometric-aware pretraining method called GAPretrain, which distills geometric-rich information from LiDAR modality into camera-based 3D object detectors.
  • arXiv
IEEE T-PAMI 2023
sym

Delving Into the Devils of Bird’s-Eye-View Perception: A Review, Evaluation and Recipe

Hongyang Li$^\ast$, Chonghao Sima$^\ast$, Jifeng Dai$^\ast$, Wenhai Wang$^\ast$, Lewei Lu$^\ast$, Huijie Wang$^\ast$, Jia Zeng$^\ast$, Zhiqi Li$^\ast$, et al.

  • we conduct a thorough review on Bird’s-Eye-View (BEV) perception in recent years and provide a practical recipe according to our analysis in BEV design pipeline.
  • IEEE T-PAMI | Github
SCIENTIA SINICA Informationis
sym

Open-sourced data ecosystem in autonomous driving: the present and future

Hongyang Li$^\ast$, Yang Li$^\ast$, Huijie Wang$^\ast$, Jia Zeng$^\ast$, Huilin Xu, et al.

  • We undertakes an exhaustive analysis and discourse regarding the characteristics and data scales that future third-generation autonomous driving datasets should possess.
  • SCIENTIA SINICA Informationis | arXiv

💪🏻 Bio-Mechatronics & Human-Machine Interaction

🧑‍💻 Career Experience

sym

Shanghai AI Labotory, Researcher

2023.09 - (present).

  • Embodied foundation model and generalizable robotic manipulation.
sym

Shanghai AI Labotory, Research intern

2022.04 - 2023.06, Supervisor: Prof. Hongyang Li.

  • Birds-Eye-View perception and Knowledge distillation for 3D object detection.

🎓 Education

sym

Robotics Institute, Shanghai Jiao Tong University

2017.09 - 2023.8, Supervisor: Prof. Honghai Liu.

  • Bio-signal based human motion recognition and human-machine interaction.
sym

Central South University

2013.09 - 2017.06.

  • Mechatronics engineering
  • Image processing

💼 Service

  • Reviewer for CVPR, ECCV, NeurIPS, ICLR, ICML, etc.
  • Member of CAAI-Embodied AI Committee.