AI Case Study

Microsoft launches deep learning based local language translation service in India supporting 22 languages

In India language is a barrier to internet access. To address this Microsoft has developed a Deep neural network platform supporting 22 languages in real-time performing speech to text, text cleansing, translation and text to speech in no time.

Industry

Technology

Internet Services Consumer

Project Overview

"The key challenge with developing digital translation capabilities is the availability of data. Traditional statistical machine translation models rely on vast datasets to accurately translate a given query or sentence into another language.

These datasets include sentences in a particular language and accurate corresponding translations of that sentence to another language. Translation tools rely on millions of sets of such unique parallel pairs of sentences.

Although India is the second most populous country in the world with six languages that are globally dominant in terms of the number of native speakers, most Indian languages are underrepresented in online exchanges.

Of the 447 different languages spoken in India, none of them make it to the list of top 50 digital languages. In other words, there is a lack of unique parallel pairs of data for Indian languages. This lack of training data – both in terms of quantity and quality – poses a major challenge for digital translation.

Adding to the complexity are the subtle differences in enunciation, accent, diction, and slang across various regions in India. For example, two native Hindi speakers from different regions of a Hindi-speaking belt may have divergent ways of constructing a sentence or describing the same thing.

This combination of complexity and lack of data has stymied the development of accurate translation tools for Indian users. However, using the recent advances in deep learning and artificial neural networks, we have developed a translation model for Indian languages that is more accurate while relying on fewer datasets.

DNN model more accurate than statistical models

Applying the DNN model for translation leads to output that is more accurate than traditional statistical machine translation models. The accuracy of such output is tested against the external BLEU (Bilingual Evaluation Understudy) score alongside an internal test. Microsoft Translator was switched from statistical machine to deep neural networks in 2016.

The neural network-powered algorithm is trained on a curated database of translated sentences. This is done by scrubbing the data to eliminate errors and streamlining the encoding standard for text (converting to Unicode).

Since Indian languages are morphologically rich, the model deploys a morphological analyzer to enable root and affix segmentation. Moreover, owing to the paucity of data, the model is trained to dynamically decide how much training is required to prevent overfeeding.

We also generate data synthetically through back translations, which gives the model more to work with and learn from. Synthetic data not only augments the resources for training the neural network but also facilitates an iterative bootstrapping process of machine learning, allowing the model to gain fluency and accuracy with limited data.

This new neural network architecture is based on a single Recurrent Neural Network (RNN) ((Gated Recurrent Units (GRU)/Long Short Term Memory (LSTM)) layer at the bottom with a substantial number of fully-connected (FC) layers on top that allow CPU-based decoders, which do not require specialized hardware, to be built into mobile software platforms.

RNN+FC architecture is faster, more accurate, and cheaper to deploy. Training the model is also a lot more data efficient.

How it works

The algorithm breaks down the process of translating a sentence into four distinct steps:

Step 1. Speech Recognition (Speech to Text)

Recognizing spoken words and converting them into text is the most crucial step in the process. The quality of the initial input determines the quality of the eventual output. Microsoft speech translation technologies use advanced LSTM neural network architecture.

Step 2. True Text

The second step involves applying TrueText techniques to eliminate quirks in the data. This step scrubs the data of natural pauses, incomprehensible words and repetitions so that the text format is more readable by the neural network.

Step 3. Translation

Converting the text from the source language to the target language is the third step in the process. Applying the DNN model to convert text into another language is more accurate than traditional statistical machine translation methods.

Step 4. Output (Text to Speech)

The last step involves synthesizing the text-based translation into speech."

Reported Results

The platform supports 22 official languages and gets more than 40 million hits every year

Technology

"Neural network architecture is based on a single Recurrent Neural Network (RNN) ((Gated Recurrent Units (GRU)/Long Short Term Memory (LSTM)) layer at the bottom with a substantial number of fully-connected (FC) layers on top that allow CPU-based decoders, which do not require specialized hardware, to be built into mobile software platforms."

Function

R And D

Product Development

Background

"While English is the most common language on the internet, only about 12 percent of India’s population is familiar with English. India’s increasing digital literacy needs to be supported by a multi-lingual digital world."

Benefits

Data

Trained on curated database of translated sentences