Biography
I graduated from Yali Middle School in 2014.
I received my B.Sc. degree from the Department of Computer Science and Technology, Nanjing University in 2018.
I received my M.Sc. degree from the LAMDA group, Department of Computer Science and Technology, Nanjing University in 2021, under the supervision of Professor Wu-Jun Li.
Currently, I am a chief engineer at HUAWEI.
I am interested in machine learning and data mining. Currently, I am mainly focusing on:
- Large Multimodal Models
I am also interested in:
- Text Recognition
- Speaker Recognition
- Speech Recognition
- Text-to-Speech
- Face Recognition
Publications
Technical Report, 2024
Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI), Pennsylvania, USA, 2025
Under Review, 2024
Under Review, 2024
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021
Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, 2020
Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, 2019
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019
Experiences
Work at speaker recognition group:
- Deep hashing-based large-scale speaker retrieval
- Noise robust speaker verification (Paper accpeted by ICASSP 2021)
Awards & Honors
Projects
R1-Vision
- Let's first take a look at the image
TextHawk
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
D-TDNN
- PyTorch implementation of densely connected time delay neural networks
KaldiFeat
- A light-weight Python library for computing Kaldi-style acoustic features based on NumPy