This ASR actually supports 52 languages

Author's): Gowtham Boyina

Originally published in Towards Artificial Intelligence.

The forced alignment model is the interesting part

Over time, I have tested dozens of speech recognition models. Most claim to support multilingualism, but it quietly falls apart when you feed them actual Chinese dialects, accented English, or anything other than the standard broadcast audio. The ones that work well are usually proprietary APIs that scale uncomfortably.

This ASR actually supports 52 languages

from Qwen-ASR github

Alibaba's Qwen team introduced Qwen3-ASR, an open-source speech recognition system that supports 52 languages ​​and dialects. Key models include Qwen3-ASR-1.7B, which boasts state-of-the-art performance for multilingual tasks, and Qwen3-ForcedAligner-0.6B, a non-autoregressive model for accurate speech and text alignment. These improvements enable better support for Chinese dialects, user-generated content in multiple languages, and improved timestamp accuracy for applications requiring precise audio-text synchronization.

Read the entire blog for free on Medium.

Published via Towards AI


Download our free agent cheat sheet here. Our proven framework for selecting the right AI architecture.
3 years of practical work with real clients on 6 pages.

Take our 90+ year old Beginner to Advanced LLM Developer Certification: From project selection to implementing a working product, this is the most comprehensive and practical LLM course on the market!

Discover your dream career in AI with AI Jobs

Towards AI has created a job board tailored specifically to machine learning and data analytics jobs and skills. Our software finds current AI tasks every hour, tags them and categorizes them so they can be easily searched. Explore over 40,000 live job opportunities with Towards AI Jobs today!

Note: The content contains the views of the authors and not Towards AI.


LEAVE A REPLY

Please enter your comment!
Please enter your name here