Our neural efficiency innovations have been productized in various Amazon's voice-controlled assistants, such as Echo, Echo Dot, Echo Show, etc. Overall, our methods reduced the memory footprint and user perceived latency with improved recognition accuracy simultaneously. Millions of customers are using them.
Some of our innovations are published in conference proceedings and patented as well.
C-010 Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng Zhang, "AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning," in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, USA, 12-16 November, 2024.
C-009 Rupak Vignesh Swaminathan, Grant Strimel, Ariya Rastrow, Harish Mallidi, Kai Zhen, Hieu Nguyen, Nathan Susanj, Athanasios Mouchtaris, "Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, 14-19 April, 2024.
C-008 Martin Radfar, Paulina Lyskawa, Brandon Trujillo, Yi Xie, Kai Zhen, Jahn Heymann, Denis Filimonov, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, "Conmer: Streaming Conformer with No Self-Attention for Interactive Voice Assistants," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Dublin, Ireland, August 21-24, 2023.
C-007 Kai Zhen, Martin Radfar, Hieu Nguyen, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, "Sub-8-Bit Quantization for On-Device Speech Recognition: A Regularization-Free Approach," in Proceedings of the 2022 IEEE Spoken Language Technology Workshop (SLT 2022), Doha, Qatar, January 9-12, 2023.
[pdf]
C-006 Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, and Ariya Rastrow, "Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Incheon, Korea, September 18-22, 2022.
[pdf]
C-005 Kai Zhen, Hieu Duy Nguyen, Feng-Ju (Claire) Chang, Athanasios Mouchtaris, and Ariya Rastrow, "Sparsification via Compressed Sensing for Automatic Speech Recognition," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, ON, Canada, June 6-12, 2021.
[pdf]
Of course, the data-driven paradigm has built a better ladder; but that may not always "get you to the moon". Usually, the better solution is observed from the marriage between the modern computational framework and conventional domain-specific knowledge. To that end, we proposed ways to incorporated residual coding, linear predictive coding and psychoacoustics in an end-to-end neural waveform codec.
Some of the related publications are
J-002 Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim, "Scalable and Efficient Neural Speech Coding: A Hybrid Design," IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 30 (2021): 12-25.
[pdf]
J-001 Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim, "Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding," IEEE Signal Processing Letters (SPL), vol. 27, pp. 2159-2163, 2020, doi: 10.1109/LSP.2020.3039765.. (Also presented at ICASSP 2022)
[demo]
[pdf]
[code]
C-004 Haici Yang, Kai Zhen, Seungkwon Beack, Minje Kim, "Source-Aware Neural Speech Coding for Noisy Speech Compression," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, ON, Canada, June 6-12, 2021.
[pdf]
C-003 Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim, "Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 4-8, 2020.
[demo]
[pdf]
[code]
C-001 Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, and Minje Kim, "Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Graz, Austria, September 15-19, 2019.
[demo]
[pdf]
P-005 Mi Suk Lee, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Jin Soo Choi, Minje Kim, Kai Zhen, "Method and apparatus for processing audio signal," U.S. Patent Application US20210233547A1.
P-004 Minje Kim, Kai Zhen, Mi Suk Lee, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Jin Soo Choi "Residual Coding Method of Linear Prediction Coding Coefficient Based on Collaborative Quantization, and Computing Device for Performing the Method," U.S. Patent Application No. 17/098,090.
P-002 Minje Kim, Kai Zhen, Seungkwon Beack, et al, "Audio Signal Encoding Method and Audio Signal Decoding Method, And Encoder And Decoder Performing the Same," US Patent Application, US20200135220A1.
Find a complete list of my publications on my Google Scholar profile .