Yuzheng Hu's Homepage

Research

I develop the science of data—a perspective that views data as an optimizable, expandable and predictable component of modern AI systems, rather than as i.i.d. samples in statistical learning theory or a static input for model training in deep learning practice. To this end, I think about the following questions:

How can we quantify the value of data in a principled way, and use this understanding to guide better data selection and filtering?
When data is user-contributed and privacy-sensitive, how can we fully leverage it without compromising privacy?
As web-scale corpora plateau, can synthetic data close the access gap, and under what guarantees?

More specifically, I work on data attribution, synthetic data, and privacy. Below are selected works that best reflect my research focus & style. For the full list of publications, see my Google Scholar.

Selected Research

A Unified Theory of Random Projection for Influence Functions
Pingbang Hu*, Yuzheng Hu*, Jiaqi W. Ma*, Han Zhao*
Preprint 2026
ACTG-ARL: Differentially Private Conditional Text Generation with RL-Boosted Control
Yuzheng Hu, Ryan McKenna, Da Yu, Shanshan Wu, Han Zhao, Zheng Xu, Peter Kairouz
The 43rd International Conference on Machine Learning (ICML 2026)
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu*, Fan Wu*, Haotian Ye, David Forsyth, James Zou, Nan Jiang, Jiaqi W. Ma, Han Zhao
The 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025, Oral)
Empirical Privacy Variance
Yuzheng Hu*, Fan Wu*, Ruicheng Xian, Yuhang Liu, Lydia Zakynthinou, Pritish Kamath, Chiyuan Zhang, David Forsyth
The 42nd International Conference on Machine Learning (ICML 2025)
Most Influential Subset Selection: Challenges, Promises, and Beyond
Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma
The 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
SoK: Privacy-Preserving Data Synthesis
Yuzheng Hu*, Fan Wu*, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song
The 45th IEEE Symposium on Security and Privacy (S&P 2024)
Towards Understanding the Data Dependency of Mixup-style Training
Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge
The 10th International Conference on Learning Representations (ICLR 2022, Spotlight)

I am also a contributor to Humanity's Last Exam (HLE), a benchmark of expert-level academic questions designed to probe the limits of large language models. HLE has been widely adopted by leading industry labs and was published in Nature (2026). I authored 4 of the 2,500 questions in the benchmark, including one prize-winning entry (top 550 worldwide).

About

I am a research scientist at Google Research NYC.

I received my PhD in computer science from UIUC, where my thesis focuses on understanding and addressing data problems in modern AI systems. During my PhD, I spent time at Simons Institute (2026, 2024), Google Research (2025), Jane Street (2024), and Alibaba (2022). Before that, I received my bachelor's degree in math from Peking University in 2021. I am grateful to Fangcheng Fu and Bin Cui, who offered me an initial glimpse into research, as well as Liwei Wang and Rong Ge, who showed me what world-class research looks like, and what it takes to get there.

I was born and raised in Guangzhou, completed high school and college in Beijing, and have since lived in the Bay Area (Berkeley and San Jose), Chicago, Las Vegas, New Jersey (Jersey City and Morganville), the New York metropolitan area (NYC and Long Island), and the Seattle metropolitan area (Bellevue and Redmond).

Misc

Humanities & Arts

I have a profound interest in economy, history and administrative system. These are important lens through which I learn humanity and society. As a complement, I also enjoy chatting with people from different backgrounds and classes. My life experiences give me the opportunity to do so, and I have benefited tremendously from a lot of these conversations.

I played the piano for over 10 years before stopping after moving to Beijing. Nervertheless, I've developed a deep appreciation for music and art more broadly. For classical music, my favorite are Beethoven , Chopin, and Rachmaninoff. For (mandarin) pop music, I am a big fan of Ethan Chan (陳奕迅) and JJ Lin (林俊杰). Additionally, I enjoy the artistic creation from Korean, especially films and their OST. If you haven't already, I highly recommend checking out the following: Joint Security Area (공동경비구역 JSA, 2000), My Sassy Girl (엽기적인 그녀, 2001), The Classic (클래식, 2003), and Squid Game (오징어 게임, 2021).

Sports

Like many others, soccer and basketball have long been integral parts of my life. On the soccer field, I am competent across the entire central axis, from ST through CM to CB. For basketball, I play the point guard. I have been an old fan of Barcelona and Spurs. I got my first sneaker of Tim Duncan in 2005, which is the time when I started to watch NBA. I'm also reasonably good at badminton, golf, ping-pong and tennis.

Travel & Culinary Adventures

In my free time, I enjoy traveling. I have been to over 25 countries around the world. My favorite cities include Lucerne, Paris, Prague, San Diego, and Seattle. I also frequently visit the national parks in US.

I'm very picky about food, but Japanese cusines have always been my favorite. The philosophy of preserving the original flavor of ingredients strongly echos with me. Over the years I've explored many forms, including but not limited to Kaiseki (懐石料理), Oden (おでん), Omakase (お任せ), Sukiyaki (すき焼き), Tempura (天婦羅), Teppanyaki (鉄板焼き), Yakiniku (焼肉), and Yakitori (焼き鳥). Omakase has been my go-to choice whenever I arrive in a new city. These experiences have made me quite knowledgeable about beef and fish. I also learned a lot from the channel 日本美食日报 (グルメリア日誌, @jpmeishi).

Fun Fact about my Name

My Chinese name is 宇征 (Yǔ Zhēng). "宇" means "universe", and "征" can mean "to conquer" or "to explore". While I like to think of it as "exploring the universe", my friends often tease that it's more about "conquering the universe"! This notion is the inspiration behind the website's aesthetic.