Xi CHEN

I am a third-year (starting from 2022) Ph.D. student at the University of Hong Kong, supervised by Prof. Hengshuang Zhao. Before joining HKU, I worked as a senior algorithm engineer at Alibaba Group. Previously, I got the Master degree at Zhejiang University in 2020. I also got a double master diploma at Ecole Centrale Mediterranee(France). Before that, I received my B.Eng. from Zhejiang University in 2017.

My research interests lie in the field of deep learning and computer vision, I've published multiple research works for image/video perception, open-world multi-modal learning. Now I focus on generative AI.

I've done internship or cooperations with many companies like Adobe, Meta, Megvii, SenseTime, Hikvision, Ufoto, AISegment, LIS(France), etc.

Reach out to me over gmail: chauncey0620 for discussion or any opportunities.

CV  /  Google Scholar  /  Github  /  Zhihu

profile photo
News
  • [Oct. 2024] Two papers accepted by NeurIPS2024!
  • [July. 2024] Four papers accepted by ECCV2024, see you at Milano!
  • [Jun. 2024] We release MimicBrush for imitative image editing.
  • [Dec. 2023] We release LivePhoto for image-to-video generation.
  • [Dec. 2023] The code of AnyDoor is released here, we would continue to make it stronger.
  • [Jul. 2023] We release AnyDoor for zero-shot image composition & customization.
  • [Jul. 2023] OPSNet is accepted by ICCV2023.
  • [Mar. 2023] We release the manuscripts of OPSNet and ScribbleSeg.
  • [Mar. 2023] One co-authored paper accepted to CVPR2023.
  • [Feb. 2022] I join HKU as a PhD student.
  • ---- show more ----
Selected Publications

Google Scholar

(*: Equal contribution)

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
Preprint, 2024
pdf/ page

UniReal is a universal framework for multiple image generation and editing tasks. We leverage a video model to handld image tasks by treating different numbers of input/output images as frames. We also seek universal supervisions from video data, thus generating realistic results that understand the world dynamics.

Zero-shot Image Editing with Reference Imitation
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao
NeurIPS, 2024
pdf/ page/ code/ media (AK)/ media (Gradio)

MimicBrush conducts imitative editing by discovering the semantic correspondence between the source and reference image. It supports interesting and practical applications for local region composition and texture transfer.

LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao
ECCV, 2024
pdf/ page/ code/ media (AK)

We present LivePhoto, a real image animation method with text control. Different from previous works, LivePhoto truely listens to the text instructions and well preserves the object-ID.

AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao
CVPR, 2024
pdf/ page/ code/ media[AK]/ [量子位]/ [机器之心]

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao
ECCV, 2024
pdf/ page/ media[极市平台]

Wear-Any-Way proposes a novel setting for virtual try-on. Besides showing robust performance in different senarios, it supports users to manipulate the wearing styles by clicking and dragging.

FlashFace: Human Image Personalization with High-fidelity Identity Preservation
Shilong Zhang, Lianghua Huang, Xi Chen,Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo
arxiv, 2024
pdf/ page/ code/ media[AK]

FlashFace is a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.

Open-vocabulary Panoptic Segmentation with Embedding Modulation
Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao
ICCV, 2023
pdf/ page

We present a omnipotent and efficient framework for open-vocabulary panoptic segmentation, which shows great performance for both closed- and open-vocabulary settings with limited training data.

FocalClick: Towards Practical Interactive Image Segmentation
Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao
CVPR, 2022
pdf / code

FocalClick is a simple and effective solution for interactive segmentation. It largely reduces the computation for various models by focusing on target local regions.

Conditional Diffusion for Interactive Segmentation
Xi Chen, Zhiyan Zhao, Feiwu Yu, Yilei Zhang, Manni Duan
ICCV, 2021
pdf / code

We view interactive segmentation as a diffusion procedure and design feature- and pixel-level diffuion modules for more consistent predictions.

State-Aware Tracker for Real-Time Video Object Segmentation
Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi
CVPR, 2020
pdf / code

We propose a novel pipeline called State-Aware Tracker (SAT), which can produce accurate segmentation results with real-time speed.

Projects & Resources
Siamese Fully Convolutional Object Tracking
Weizhao Wang, Xinyu Chen, Xi Chen, Yinda Xu, Zeyu Wang
pdf / code

Second place solution for VOT2019 real-time track. SiamFCOT serves as a strong pipeline for real-time single object tracking.

Academic Service

Reviewer / Program Committee Member

  • CVPR (2021, 2022, 2023, 2024)
  • ICCV (2021, 2023)
  • ECCV (2022,2024)
  • NeurIPS (2023,2024)
  • ICLR (2023)
  • AAAI (2022, 2023)
  • ACM MM (2024)
  • Journals: IJCV, PR, TCSVT, ect

Design and source code from Jon Barron's website