Xi CHEN

I am a third-year (starting from 2022) Ph.D. student at the University of Hong Kong, supervised by Prof. Hengshuang Zhao. Before joining HKU, I worked as a senior algorithm engineer at Alibaba Group. Previously, I got the Master degree at Zhejiang University in 2020. I also got a double master diploma at Ecole Centrale Mediterranee(France). Before that, I received my B.Eng. from Zhejiang University in 2017.

My research interests lie in the field of deep learning and computer vision, I've published multiple research works for image/video perception, open-world multi-modal learning. Now I have interests in AIGC, MLLM, and reinforcement learning.

I've done internship or cooperations with many companies like Adobe, Meta, Megvii, SenseTime, Hikvision, Ufoto, AISegment, LIS(France), etc.

I co-supervise interns with some industrial groups. Contact me over gmail (xichen.csai AT gmail) if you are looking for internships or you would like to cooperate with me.

CV  /  Google Scholar  /  Github  /  Zhihu

profile photo
News
  • [Mar. 2025] Four papers accepted by CVPR2025!
  • [Oct. 2024] Two papers accepted by NeurIPS2024!
  • [July. 2024] Four papers accepted by ECCV2024, see you at Milano!
  • [Jun. 2024] We release MimicBrush for imitative image editing.
  • [Dec. 2023] We release LivePhoto for image-to-video generation.
  • [Dec. 2023] The code of AnyDoor is released here, we would continue to make it stronger.
  • [Jul. 2023] We release AnyDoor for zero-shot image composition & customization.
  • [Jul. 2023] OPSNet is accepted by ICCV2023.
  • [Mar. 2023] We release the manuscripts of OPSNet and ScribbleSeg.
  • [Mar. 2023] One co-authored paper accepted to CVPR2023.
  • [Feb. 2022] I join HKU as a PhD student.
  • ---- show more ----
Selected Publications

Google Scholar

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
CVPR, 2025
pdf/ page

Foundaitional multi-modal generative model cooperated with Adobe. UniReal is a universal framework for multiple image generation and editing tasks. We leverage a video model to handld image tasks by treating different numbers of input/output images as frames. We also seek universal supervisions from video data, thus generating realistic results that understand the world dynamics.

Zero-shot Image Editing with Reference Imitation
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao
NeurIPS, 2024
pdf/ page/ code/ media (AK)/ media (Gradio)

GithHub 1.1k stars. MimicBrush conducts imitative editing by discovering the semantic correspondence between the source and reference image. It supports interesting and practical applications for local region composition and texture transfer.

LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao
ECCV, 2024
pdf/ page/ code/ media (AK)

We present LivePhoto, a real image animation method with text control. Different from previous works, LivePhoto truely listens to the text instructions and well preserves the object-ID.

AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao
CVPR, 2024
pdf/ page/ code/ media[AK]/ [量子位]/ [机器之心]

Selected as the most influencial papers of CVPR 2024. GithHub 4.2k stars. This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Open-vocabulary Panoptic Segmentation with Embedding Modulation
Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao
ICCV, 2023
pdf/ page

We present a omnipotent and efficient framework for open-vocabulary panoptic segmentation, which shows great performance for both closed- and open-vocabulary settings with limited training data.

FocalClick: Towards Practical Interactive Image Segmentation
Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao
CVPR, 2022
pdf / code

FocalClick is a simple and effective solution for interactive segmentation. It largely reduces the computation for various models by focusing on target local regions.

Conditional Diffusion for Interactive Segmentation
Xi Chen, Zhiyan Zhao, Feiwu Yu, Yilei Zhang, Manni Duan
ICCV, 2021
pdf / code

We view interactive segmentation as a diffusion procedure and design feature- and pixel-level diffuion modules for more consistent predictions.

State-Aware Tracker for Real-Time Video Object Segmentation
Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi
CVPR, 2020
pdf / code

We propose a novel pipeline called State-Aware Tracker (SAT), which can produce accurate segmentation results with real-time speed.

Projects & Resources
Siamese Fully Convolutional Object Tracking
Weizhao Wang, Xinyu Chen, Xi Chen, Yinda Xu, Zeyu Wang
pdf / code

Second place solution for VOT2019 real-time track. SiamFCOT serves as a strong pipeline for real-time single object tracking.

Academic Service

Reviewer / Program Committee Member

  • CVPR (2021, 2022, 2023, 2024, 2025)
  • ICCV (2021, 2023)
  • ECCV (2022,2024)
  • Siggraph / Siggraph Asia (2024, 2025)
  • NeurIPS (2023,2024)
  • ICLR (2023)
  • AAAI (2022, 2023)
  • ACM MM (2024)
  • Journals: IJCV, PR, TCSVT, ect

Design and source code from Jon Barron's website