Sirui Xie

Ph.D. in Computer Science

Email: srxie [at] ucla [dot] edu

Google Scholar | LinkedIn | Twitter | GitHub

I am a Research Scientist at Google DeepMind. I received my Ph.D. in Computer Science from UCLA, advised by Prof. Ying Nian Wu, Prof. Demetri Terzopoulos, and Prof. Song-Chun Zhu. Previously, I conducted research at Meta FAIR, Amazon AWS AI, and SenseTime Research. I obtained my Bachelor's degree from The Hong Kong University of Science and Technology (HKUST).

I am broadly interested in fundamental problems in Machine Learning and Artificial Intelligence, including Generative Modeling, Sequential Decision-Making, and Representation Learning, etc. My Ph.D. thesis is centered around the statistical and representational structures of latent-variable top-down models, as well as the associated inference and learning algorithms on various data modalities.

Publications / Preprints ( Selected | All )

* indicates equal contribution.

A Generative Account of Latent Abstractions

Ph.D. Thesis

Abstractions-the latent variables underlying our observations-are fundamental to human intelligence. Despite successes in modeling data distributions, Generative AI (GenAI) systems still lack robust principles for unsupervised learning of latent abstractions. This thesis investigates generative modeling of these latent variables to address GenAI systems' bottlenecks in alignment, efficiency, and consistency in representation, inference, and decision-making.

EM Distillation for One-step Diffusion Models

Sirui Xie, Zhisheng Xiao, Durk Kingma, Tingbo Hou, Ying Nian Wu, Kevin Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

NeurIPS 2024 | Paper

To minimize the cost of sampling diffusion models, we propose EM Distillation (EMD), a Maximum Likelihood method that distills diffusion models to 1-step generators. EMD is inspired by Expectation-Maximization, where generators are updated using samples from the joint distribution of the diffusion teacher and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique to stabilize the distillation process. EMD interpolates between mode-seeking and mode-covering KL, excelling in image generation tasks.

Latent Plan Transformer: Planning as Latent Variable Inference

Deqian Kong*, Dehong Xu*, Minglu Zhao*, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie*, Ying Nian Wu

NeurIPS 2024 | Paper

Decision-making via sequence modeling can be viewed as return-conditioned autoregressive behavior cloning. Unaware of their own future behaviors, such models were thought to be susceptible to drifting errors. Decision Transformer alleviates this issue by additionally predicting the return-to-go labels. We propose an unsupervised solution, where a latent variable is first inferred from a target return and then guides the policy throughout the episode, functioning as a plan. Our model discovers improved decisions from suboptimal trajectories.

Learning non-Markovian Decision-Making from State-only Sequences

Aoyang Qin*, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie*

NeurIPS 2023 | Paper

When imitating non-Markovian decisions, behavior cloning may be a preferable option over inverse reinforcement learning, which relies on a Markovian Bellman operator. We introduce a maximum likelihood estimation (MLE) to expand behavior cloning to state-only sequences. In order to gain insight into the acquired decision-making mechanism, we also derive the particular structure of its value, establishing connections with (non-Markovian) soft Q-learning and soft policy iteration.

Emergent Graphical Conventions in a Visual Communication Game

Shuwen Qiu*, Sirui Xie*, Lifeng Fan, Tao Gao, Song-Chun Zhu, Yixin Zhu

NeurIPS 2022 | Paper | Website | Code

Humans have a long history of communicating concepts with drawings. We model and simulate a transition from sequential sketch-drawing to a pictographic sign system via two neural agents playing a visual communication game. The evolved sketches show intrinsic structures, including iconicity, symbolicity, and semanticity. Coadapted agents also show familarity with conventions as they switch between abstract and iconic drawings to communicate seen and unseen concepts.

COAT: Measuring Object Compositionality in Emergent Representations

Sirui Xie, Ari Morcos, Song-Chun Zhu, Ramakrishna Vedantam

ICML 2022 | Paper | Code

Statistical independence and permutation invariance are two parallel assumptions for inducing object-centric representions, but they fail to account for the fact that certain spaces can only accommodate one object. We consider compositionality as the consistency of representational transformations when the same set of objects is added to different scenes. We design a geometric equivariance test and show that existing models seem to lack an understanding of the absence and the unique identity of an object.

Unsupervised Foreground Extraction via Deep Region Competition

Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

NeurIPS 2021 | Paper | Code

Disentangling a static visual scene into distinct representations of foreground and background is challenging due to the lack of independence and symmetry between these two components. Inspired by Julesz ensemble, we propose a latent energy-based generative model, where a pixel reassignment in the background generator equalizes different texture instances. This model effectively capture the regularities in background regions, resolving spurious correlations in the representations. The learned disentanglement generalizes to images from previously unseen classes.

SNAS: Stochastic Neural Architecture Search

Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin

ICLR 2019 | Paper | Code | Chinese Blog

Previous modeling of Neural Architecture Search as a Markov Decision Process ignores its deterministic state transitions and fully delayed rewards. Such an over-modeling may incur exponential delay in the convergence. We reformulate NAS as a stochastic optimization on a differentiable Markov Chain. SNAS learns operation parameters and architecture distributions in the same round of gradient update.

Education

2019.09 - 2024.09, University of California, Los Angeles
PhD in Computer Science
2012.09 - 2016.06, The Hong Kong University of Science and Technology
BEng in Computer Science, First-Class Honor

Selected Awards

Graduate Research Assistantship, UCLA, 2019 - present
Full Scholarship, HKUST, 2012 - 2016

Professional Service

Conference Reviewer: NeurIPS, ICML, ICLR, AISTATS, AAAI, IJCAI, CVPR, ICCV, ECCV, ICRA
Journal Reviewer: IEEE/T-PAMI, IEEE/RA-L