Kai Zhen | Sr. Applied Scientist @ Amazon AGI

About Me

I’m a Senior Applied Scientist at Amazon AGI, working on large-language-model (LLM) training that blends speech and audio toward more natural, interactive intelligence. Earlier in my time at Amazon, I worked on efficient speech-processing models for Alexa devices. I’ve led research in neural efficiency, developing sub-8-bit quantization-aware training and sparsification methods (including structured 2:4 sparsity) that improved model size, latency, and accuracy in production systems. My work has been applied in Echo products used by many customers.

I’ve published papers in several AI and speech venues such as ACL, EMNLP, Interspeech, ICASSP, and IEEE SLT, and co-authored patents. I completed my Ph.D. in Computer Science and Cognitive Science at Indiana University, where I worked on neural waveform coding inspired by human learning.

I sport indoor & outdoor; interact with nature: all as key for me, if not more, to approaching the meaning of life.

News

🇺🇸 My U.S. residency has been "Officially `Tenured`".

Sep 12, 2025: 3 papers accepted at EMNLP 2025 on Efficient and Robust LLM Pre-Training (with UCSB collaboration).

Aug 08, 2025: ACL’25 pruning paper featured on Amazon Science Blog: Prune Gently, Taste Often .

May 20, 2025: Intern project on LLM pruning accepted to ACL Findings.

Selected Publications

Neural Speech and Audio Codec:
[Psychoacoustics@SPL][NSC@TASLP][CMRL@Interspeech][CQ@ICASSP]

Deep Compression and EdgeAI:
[GQ@S8BQAT@SLT][CSP@ICASSP][S8BQAT@Interspeech]

LLM Efficiency:
[Wanda++@ACL][Adazeta@EMNLP] [QuZO@EMNLP]

Something Fun

I like singing with or without audience (live band or shower taking).

From the Phantom of the Opera:

From "the Chinese Drama":

My vocal feature is well preserved in my neural audio codec :) More demos: neural-audio-coding.html