Ms. Pushpa M. | The Benchmarking Conference

Artificial Intelligence systems are advancing and proliferating across the world. There is an increase in integration of Artificial Intelligence in environments. AI has made a significantly amplified impact on various sectors. Artificial Intelligence applications include Machine Translation, ASR, TTS and etc. Artificial Intelligence-powered machine translation has significant impact on accessibility across various platforms with the challenges related to accuracy, cultural nuances and data used to train the algorithms essentially making translation more readily available but not completely replacing the need for human expertise in complex situations. There is a big demand for large amount of training data including structured, unstructured data and especially domain specific training data.

As AI is evolving, benchmarking AI models is essential. In order to understand the potential and constraint of various models, it requires how well they work. Various LLM models are used for more accuracy in machine translation, however, there is a need for evaluation techniques for enhancing the ones system compare to others. In this paper we see on the benchmarking techniques such as BLEU (BLEU compares the n-grams of the candidate translation’s n-grams with the reference translation, counting the number of matches to determine similarity.), WER (Word Edit Rate : metric that measures the number of edits (insertions, deletions, substitutions, and shifts) required to transform a machine-translated sentence into a perfect match with a human reference translation, calculated as a percentage of the total words in the reference sentence) etc..

Models are trained on set of data, it is important to train models on domain specific data, not all domain data are covered in models. It is also important to test the model on specific domain and again train the model on that set of data if translation is not as expected and the Word edit rate is more. Most of the models lack in translating the terminologies used in government department as each department has their own glossaries and differ in referring the context. Most of the models perform well in general translation but when referring to Government administrative terminology models lack in accuracy, adequacy, fluency, and consistency. Government organizations are using the available models for translation. It is important to centric the point on administrative terminologies.

Paper also concentrates on predominant Machine Translation systems used in Government of Karnataka.

Benchmarking of Machine Translation system in the light of Government sector

Abstract