Biography

I graduated from Yali Middle School in 2014.

I received my B.Sc. degree from the Department of Computer Science and Technology, Nanjing University in 2018.

I received my M.Sc. degree from the LAMDA group, Department of Computer Science and Technology, Nanjing University in 2021, under the supervision of Professor Wu-Jun Li.

Currently, I am a chief engineer at HUAWEI.

I am interested in machine learning and data mining. Currently, I am mainly focusing on:

  • Large Multimodal Models

I am also interested in:

  • Text Recognition
  • Speaker Recognition
  • Speech Recognition
  • Text-to-Speech
  • Face Recognition

Publications

  • TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
  • Ya-Qi Yu, Minghui Liao, Jiwen Zhang, Jihao Wu
    Technical Report, 2024
  • Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
  • Xinmiao Yu, Xiaocheng Feng, Yun Li, Minghui Liao, Ya-Qi Yu, Xiachong Feng, Weihong Zhong, Ruihan Chen, Mengkang Hu, Jihao Wu, Dandan Tu, Duyu Tang, Bing Qin
    Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI), Pennsylvania, USA, 2025
  • UI-Hawk: Unleashing the Screen Stream Understanding for GUI Agents
  • Jiwen Zhang, Ya-Qi Yu, Minghui Liao, Wentao Li, Jihao Wu, Zhongyu Wei
    Under Review, 2024
  • TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
  • Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng and Wei Zeng
    Under Review, 2024
  • CAM: Context-Aware Masking for Robust Speaker Verification
  • Ya-Qi Yu, Siqi Zheng, Hongbin Suo, Yun Lei and Wu-Jun Li
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021
  • Densely Connected Time Delay Neural Network for Speaker Verification
  • Ya-Qi Yu and Wu-Jun Li
    Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, 2020
  • Deep Hashing for Speaker Identification and Retrieval
  • Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu and Wu-Jun Li
    Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, 2019
  • Ensemble Additive Margin Softmax for Speaker Verification
  • Ya-Qi Yu, Lei Fan and Wu-Jun Li
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019

    Experiences

    Chief Engineer

    October 2024 - Now
    HUAWEI, Shanghai, China

    Senior Engineer

    July 2021 - October 2024
    HUAWEI, Shanghai, China

    Algorithm Engineer Intern

    June 2020 - August 2020
    Speech Lab, Alibaba DAMO Academy, Alibaba Group, Hangzhou, China

    Work at speaker recognition group:

    • Deep hashing-based large-scale speaker retrieval
    • Noise robust speaker verification (Paper accpeted by ICASSP 2021)

    Awards & Honors

    Graduation with Distinction
    April 2021
    Nanjing University, Nanjing, China
    Excellent Graduate Student
    December 2020
    Nanjing University, Nanjing, China
    Excellence Scholarship
    November 2020
    Nanjing University, Nanjing, China
    HUAWEI Scholarship
    December 2019
    Nanjing University, Nanjing, China
    Speaker Verification Competition (2/345, ¥50,000)
    October 2018
    Tongdun, Hangzhou, China

    Projects

    R1-Vision - Let's first take a look at the image
    TextHawk - Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
    D-TDNN - PyTorch implementation of densely connected time delay neural networks
    KaldiFeat - A light-weight Python library for computing Kaldi-style acoustic features based on NumPy

    Skills

    PyTorch

    Kaldi

    Python

    C & C++