Research
I'm interested in making sense of data and improving its use in generative AI, especially large language models (LLMs). Data powers modern AI; to extract the most from it, I think about the following questions:
- How can we quantify the value of data in a principled way, and use this understanding to guide better data selection and filtering?
- When data is user-contributed and privacy-sensitive, how can we fully leverage it without compromising privacy?
- As web-scale corpora plateau, can synthetic data close the access gap, and under what guarantees?
To address these questions, I work at the intersection of data attribution,
(differential) privacy, and synthetic data. Below are selected works that best reflect my research focus & style. For the full list of publications, see my
Google Scholar.
Selected Research
-
ACTG-ARL: Differentially Private Conditional Text Generation with RL-Boosted Control
Yuzheng Hu, Ryan McKenna, Da Yu, Shanshan Wu, Han Zhao, Zheng Xu, Peter Kairouz
Preprint 2025
-
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu*, Fan Wu*, Haotian Ye, David Forsyth, James Zou, Nan Jiang, Jiaqi W. Ma, Han Zhao
The 39th Annual Conference on Neural Information Processing
Systems (NeurIPS 2025, Oral)
-
Empirical Privacy Variance
Yuzheng Hu*, Fan Wu*, Ruicheng Xian, Yuhang Liu, Lydia Zakynthinou, Pritish Kamath, Chiyuan Zhang, David Forsyth
The 42nd International Conference on Machine Learning (ICML 2025)
-
Most Influential Subset Selection: Challenges, Promises, and
Beyond
Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma
The 38th Annual Conference on Neural Information Processing
Systems (NeurIPS 2024)
-
SoK: Privacy-Preserving Data Synthesis
Yuzheng Hu*, Fan Wu*, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song
The 45th IEEE Symposium on Security and Privacy (S&P 2024)
-
Towards Understanding the Data Dependency of Mixup-style Training
Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge
The 10th International Conference on Learning Representations (ICLR 2022, Spotlight)
I am also a contributor to Humanity's Last Exam (HLE), a benchmark consisting of expert-level questions across various domains designed to test the limits of LLMs, and has been widely adopted by industry labs such as OpenAI, Google DeepMind and xAI. Specifically, I authored 4 of the 2,500 questions for the benchmark, with one being selected as a prize-winning entry (top 550 world-wide).
About
Welcome to my homepage. I am a machine learning researcher, currently in my final year as a PhD student in the Siebel School of Computing and Data Science at University of Illinois Urbana-Champaign. I have worked on a variety of different problems in machine learning, but lately, I am mostly focused on the study of data.
During my PhD, I have spent time at
Google Research (current),
Simons Institute (2024),
Jane Street (2024),
and Alibaba (2022).
Before that, I obtained my bachelor's degree in mathematics from Peking University in 2021. I am grateful to Fangcheng Fu and Bin Cui, who offered me an initial glimpse into research, as well as Liwei Wang and Rong Ge, who showed me what world-class research looks like, and what it takes to get there.
I was born and raised in Guangzhou, completed high school and college in Beijing, and have since lived in the Bay Area (Berkeley and San Jose), Chicago, Las Vegas, New Jersey, the New York metropolitan area (NYC and Long Island), and the Seattle metropolitan area (Bellevue and Redmond).
Misc
Humanities & Arts
I have a profound interest in economy, history and administrative system. These are important lens through which I learn humanity and society.
As a complement, I also enjoy chatting with people from different backgrounds and classes. My life experiences give me the opportunity to do so, and I have benefited tremendously from a lot of these conversations.
I played the piano for over 10 years before stopping after moving to Beijing. Nervertheless, I've developed a deep appreciation for music and art more broadly. For classical music, my favorite are Beethoven , Chopin, and Rachmaninoff. For (mandarin) pop music, I am a big fan of Ethan Chan (陳奕迅) and JJ Lin (林俊杰). Additionally, I enjoy the artistic creation from Korean, especially films and their OST. If you haven't already, I highly recommend checking out the following:
Joint Security Area (공동경비구역 JSA, 2000),
My Sassy Girl (엽기적인 그녀, 2001),
The Classic (클래식, 2003),
and Squid Game (오징어 게임, 2021).
Sports
Like many others, soccer and basketball have long been integral parts of my life. On the soccer field, I am competent across the entire central axis, from ST through CM to CB. For basketball, I play the point guard. I have been an old fan of Barcelona and Spurs. I got my first sneaker of Tim Duncan in 2005, which is the time when I started to watch NBA. I'm also reasonably good at badminton, golf, ping-pong and tennis.
Travel & Culinary Adventures
In my free time, I enjoy traveling. I have been to over 25 countries around the world. My favorite cities include
Lucerne,
Paris,
Prague, San Diego, and Seattle. I also frequently visit the national parks in US.
I'm very picky about food, but Japanese cusines have always been my favorite. The philosophy of preserving the original flavor of ingredients strongly echos with me. Over the years I've explored many forms, including but not limited to
Kaiseki (懐石料理),
Omakase (お任せ),
Sukiyaki (すき焼き),
Tempura (天婦羅),
Teppanyaki (鉄板焼き),
Yakiniku (焼肉),
and Yakitori (焼き鳥).
Omakase has been my go-to choice whenever I arrive in a new city. These experiences have made me quite knowledgeable about beef and fish. I also learned a lot from the channel 日本美食日报 (グルメリア日誌, @jpmeishi).
Fun Fact about my Name
My Chinese name is 宇征 (Yǔ Zhēng). "宇" means "universe", and "征" can mean "to conquer" or "to explore". While I like to think of it as "exploring the universe", my friends often tease that it's more about "conquering the universe"! This notion is the inspiration behind the website's aesthetic.