Panel Discussion | The Benchmarking Conference | LDC-IL

Panel Discussion

Evaluation and Benchmarking of AI Applications in Indian Languages

Panel Discussion: Methods of Objective and Third-Party Benchmarking of Public Use AI Tools in Indian Languages

Panelists:

Theme Introduction:

There are a lot of AI tools available now that are for various types of linguistic work. Such tools are readily available to be used in the production pipeline of various companies, business processes, government offices, as well as extensively used by the common public. Not much thought is given to the accuracy of these applications. All the accuracies and evaluation scores that come out are the ones claimed by the respective developers in the research papers they publish. There are also similar platforms that go public without any public scrutiny. As these applications have an impact on the promotion of the languages, and at times also affect it, there is a need for constant evaluation of such applications on a public platform.

The proposed panel discussion aims to identify the common parameters upon which such applications can be tested with a given language. The applications that we seek to constantly evaluate are at present Machine Translation (MT) engines, ASR systems, TTS systems, and Generative AI models. While MT and ASR models are quite straightforward and several evaluation measures are present, that of Generative AI needs greater discussion as to which skill sets are to be tested based on the languages covered.

Generative AI LLMs Skill Sets Include:

The panel discussion will focus on how to evaluate these skills across languages and domains. The end goal is to prepare the criteria that would be implemented in a publicly available web-based tool available at eval.ldcil.org.