Xi CHEN (陈汐)

Xi CHEN

I am a staff research scientist at ByteDance Seed (San Jose, CA), working on multi-modal generation. I received my Ph.D. from the University of Hong Kong, supervised by Prof. Hengshuang Zhao. Previously, I received my Master's and Bachelor's degrees from Zhejiang University, along with a dual master's degree from Ecole Centrale Mediterranee.

During my studies, I interned and collaborated with Google DeepMind (Nano-Banana Team), Adobe Research (Dr. Zhe Lin's Team), Meta, Alibaba Tongyi Lab (Wan Team), Megvii (Detection Team), SenseTime, Hikvision, Ufoto, AISegment, LIS (France), and others. Thanks!

I am always looking for strong research interns/collaborators to push the frontier across multimodal generation, LLMs, VLMs, RL, embodied AI, and beyond — reach me at xichen.csai AT gmail for internships or collaborations.

Google Scholar Github

News

[Apr. 2026] Three papers accepted by ICLR2026, and four papers accepted by CVPR2026 (1 Oral)!
[Sep. 2025] Six papers are accepted by NeurIPS2025! (1 Oral)
[Apr. 2025] Four papers accepted by Siggraph2025!
[Mar. 2025] Four papers accepted by CVPR2025 (two Highlights)!
[Oct. 2024] Two papers accepted by NeurIPS2024!
[July. 2024] Four papers accepted by ECCV2024, see you at Milano!
[Jun. 2024] We release MimicBrush for imitative image editing.
[Dec. 2023] We release LivePhoto for image-to-video generation.
[Dec. 2023] The code of AnyDoor is released here, we would continue to make it stronger.
[Jul. 2023] We release AnyDoor for zero-shot image composition & customization.
[Jul. 2023] OPSNet is accepted by ICCV2023.
[Mar. 2023] We release the manuscripts of OPSNet and ScribbleSeg.
[Mar. 2023] One co-authored paper accepted to CVPR2023.
[Feb. 2022] I join HKU as a PhD student.
[Mar. 2022] One paper accepted to CVPR2022.
[Jun. 2021] One paper accepted to ICCV2021.
[Jun. 2020] I graduate from Zhejiang University and join Alibaba Group.
[Mar. 2020] One paper accepted to CVPR2020.

▼ Show more

First Author Publications

Google Scholar

	MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao NeurIPS, 2025 pdf / media(AK) / code(todo) MiCo is a self-supervised reinforcement learning framework for multi-image visual reasoning. It leverages supervision by comparing an image, its augmented view, and a similar image to encourage chain-of-thought (CoT) reasoning through a method we call "Augmented-GRPO."
	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao CVPR, 2025 (Highlight) pdf/ page/ code (data construction) Foundational multi-modal generative model cooperated with Adobe. UniReal is a universal framework for multiple image generation and editing tasks. We leverage a video model to handld image tasks by treating different numbers of input/output images as frames. We also seek universal supervisions from video data, thus generating realistic results that understand the world dynamics.
	Zero-shot Image Editing with Reference Imitation Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao NeurIPS, 2024 pdf/ page/ code / media (AK)/ media (Gradio) MimicBrush conducts imitative editing by discovering the semantic correspondence between the source and reference image. It supports interesting and practical applications for local region composition and texture transfer.
	LivePhoto: Real Image Animation with Text-guided Motion Control Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao ECCV, 2024 pdf/ page/ code / media (AK) We present LivePhoto, a real image animation method with text control. Different from previous works, LivePhoto truely listens to the text instructions and well preserves the object-ID.
	AnyDoor: Zero-shot Object-level Image Customization Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao CVPR, 2024 / TPAMI, 2025 (Extension) pdf/ page/ code / media[AK]/ [量子位]/ [机器之心] Selected as one of the most influential papers of CVPR 2024. GitHub Trending No.1. This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.
	Open-vocabulary Panoptic Segmentation with Embedding Modulation Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao ICCV, 2023 pdf/ page We present a omnipotent and efficient framework for open-vocabulary panoptic segmentation, which shows great performance for both closed- and open-vocabulary settings with limited training data.
	FocalClick: Towards Practical Interactive Image Segmentation Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao CVPR, 2022 / TPAMI, 2026 (Extension) pdf / code FocalClick is a simple and effective solution for interactive segmentation. It largely reduces the computation for various models by focusing on target local regions.
	Conditional Diffusion for Interactive Segmentation Xi Chen, Zhiyan Zhao, Feiwu Yu, Yilei Zhang, Manni Duan ICCV, 2021 pdf / code We view interactive segmentation as a diffusion procedure and design feature- and pixel-level diffuion modules for more consistent predictions.
	State-Aware Tracker for Real-Time Video Object Segmentation Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi CVPR, 2020 pdf / code We propose a novel pipeline called State-Aware Tracker (SAT), which can produce accurate segmentation results with real-time speed.

Mentored Projects

	PICABench: How Far Are We from Physically Realistic Image Editing? Yuandong Pu, Le Zhuo, Songhao Han, Jinbo Xing, Kaiwen Zhu, Shuo Cao, Bin Fu, Si Liu, Hongsheng Li, Yu Qiao, Wenlong Zhang, Xi Chen, Yihao Liu ICLR, 2026 pdf / page / code We evaluates physical realism across eight sub-dimension for most of the common editing operations (add, remove, attribute change, etc.) We find that even SoTA models could not deal with physics well.
	PlayerOne: Egocentric World Simulator Yuanpeng Tu, Hao Luo, Xi Chen, Xiang Bai, Fan Wang, Hengshuang Zhao NeurIPS(Oral), 2025 pdf / page We develop an ego-centric world model. User could use their own actions to explore and interact with the virtual world.
	OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions Yuanhao Cai, He Zhang, Xi Chen, Jinbo Xing, Yiwei Hu, Yuqian Zhou, Kai Zhang, Zhifei Zhang, Soo Ye Kim, Tianyu Wang, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille NeurIPS, 2025 pdf / page / code We conduct a systematic exploration of customized video generation, including a data construction pipeline, a unified model, and a comprehensive benchmark.

Academic Service

Reviewer / Program Committee Member

Conferences: CVPR, ICCV, ECCV, Siggraph / Siggraph Asia, NeurIPS, ICLR, AAAI, ACM MM, etc
Journals: TPAMI, IJCV, PR, TCSVT, etc

Organizer / Program Chair