CNTXT AI starts Munsit: the most accurate Arabic speech recognition system that has ever been built

At a decisive moment of artificial intelligence in Arabic, CNTXT AI he revealed MunsiteA model of recognizing a new generation Arabic speech, which is not only the most accurate, ever created for Arabic, but which definitely exceeds global giants, such as OpenAI, Meta, Microsoft and ElevenLabs on standard comparative tests. Developed in the United Arab Emirates and adapted to Arabic from scratch, Munsit is a powerful step forward in what CNTXT calls “sovereign AI” – technology built in the region, but with global competitiveness.

The scientific foundations of this achievements are specified in the newly published article of the team, Developing the recognition of Arab speech through large -scale learningwhich introduces a scalable, economical training method that concerns a long -term deficiency of marked Arab speech data. This method – supervised science – enabled the team to construct a system that sets a new transcription quality bar in both modern Arabic (MSA) and over 25 regional dialects.

Overcoming drought in Arabic ASR

Arabic, despite the fact that it is one of the most spoken languages ​​around the world and the official language of the United Nations, has long been considered a low -range language in the field of speech recognition. This is due to both of his Morphological complexity and no large, various, marked speech data sets. Unlike English, which uses countless hours of hand -prescribed audio data, Arabic dialectal wealth and fragmentary digital presence were significant challenges for building solid automatic speech recognition systems (ASR).

Instead of waiting for a slow and expensive manual transcription process, CNTXT AI followed a radically more scalable path: poor supervision. Their approach began with a huge body of over 30,000 hours of undeniable Arab sound collected from various sources. Thanks to the non-standard data processing pipeline, this raw sound has been cleaned, segmented and automatically marked to obtain a high quality 15,000-hour training set-one of the largest and most representative Arabic speeches that have ever been collected.

This process was not based on human annotation. Instead, CNTXT has developed a multi -stage system for generating, assessing and filtering hypotheses from many ASR models. These transcripts were compared by means of Levenshtein to choose the most coherent hypotheses, and then passed the language model to assess their gamatic credibility. The segments that did not reach specific quality thresholds were rejected, ensuring that even without human verification, the training data remained reliable. The team has improved this pipeline by many iterations, each time it improves the accuracy of the label by retraining the ASR system itself and passing it back to the labeling process.

Munsit: Konformer architecture

The heart of Munsit is the conformer model, hybrid neural network architecture, which combines local sensitivity of the weave layers with the possibilities of modeling global transformer sequences. This design means that the converter is particularly running in dealing with the nuances of the spoken language, where both the dependence of long -range (such as the structure of sentences) and fine -grained phonetic details are crucial.

CNTXT AI has implemented a large variant of the conformer, training it from scratch, using the 80-channel meling views as an input. The model consists of 18 layers and covers about 121 million parameters. The training was conducted on a high -performance cluster using the eight NVIDIA A100 GPU with bfloat16 precision, enabling efficient support for massive batch sizes and altitude spaces for features. To cope with the tokenization of the rich morphologically Arabic structure, the team used the tokenizer of a sentence specially trained on their non -standard corps, which results in the vocabulary of 1024 units of hints.

Unlike the conventional supervised ASR training, which usually requires pairing each audio clip with a carefully transcribed label, the CNTxt method operated completely on weak labels. These labels, although louder than man -verified, were optimized with a feedback loop, which prioritized consensus, grammar coherence and lexical reliability. The model has been trained with Connectionist TimPoral Classication (CTC) The loss function, which is suitable for unusual sequence modeling-curly for speech recognition tasks in which the time of spoken words is variable and unpredictable.

Dominating in comparative tests

The results speak for themselves. Munsit has been tested on the leading ASR Open Source and commercial models at six Arabic data sets: Sada, Common Voice 18.0, Masc (clean and noisy), MGB-2 and Casablanca. These data sets together include dozens of dialects and accents throughout the Arab world, from Saudi Arabia to Morocco.

In all comparative tests, Munsit-1 reached the average level of word error (WER) 26.68 and a sign error level (CER) of 10.05. For comparison, the best -efficient version of Whisper OpenAI recorded an average of 36.86 and CER 17.21. Meta Fillessm4T, another most modern multilingual model, was even higher. Munsit exceeded each other system in both clean and noisy way, and showed particularly strong reliability in noisy conditions, which is a key factor for real applications, such as telephone centers and public services.

The gap was equally clear against the reserved systems. Munsit exceeded the Arab models of ASR Microsoft Azure, ElevenLabs writer, and even the function of transcription GPT-4O OPENAI. These results are not marginal profits – they constitute an average relative improvement of 23.19% in WER and 24.78% in the CER compared to the strongest open base line, determining Munsit as a clear leader in recognizing Arabic speech.

Platform for the future of Arabic voice AI

While Munsit-1 is already transforming the possibilities of transcription, inscriptions and customer service on Arab markets, CNTXT AI perceives this premiere as just the beginning. The company provides a full set of voice technologies in Arabic, including speech text, voice assistants and real-time translation systems-all-based on sovereign and regional AI.

“Munsit is more than a breakthrough in speech recognition,” said Mohammad Abu Sheikh, CNTXT AI CEO. “It is a declaration that Arab belongs to the leading global artificial intelligence. We have proved that world-class artificial intelligence does not have to be imported-in Arabic can be built for Arabic.”

With the increase in models specific to the region, such as Munsit, the AI ​​industry enters a new era-as in which language and cultural significance is not devoted to the pursuit of technical perfection. In fact with MunsiteCNTXT AI showed that they are one and the same.

LEAVE A REPLY

Please enter your comment!
Please enter your name here