Panel Discussion | The Benchmarking Conference | LDC-IL

Panel Discussion

Evaluation and Benchmarking of AI Applications in Indian Languages

Panel Discussion: Methods of Objective and Third-Party Benchmarking of Public Use AI Tools in Indian Languages

Panelists:

Ms. Pragya Misra, Country Head, India, OpenAI
Megh Umekar, Product Manager, Google Deepmind
Mr. Ashish Anand Kulkarni , Director, Ola Krutrim, OlaKrutrim
Mr. Amitabh Nag, CEO, Bhashini, Bhashini, MEITy
Dr. Aditya Maheshwari, Consortium Member, BharatGen, IIM Indore

Theme Introduction:

There are a lot of AI tools available now that are for various types of linguistic work. Such tools are readily available to be used in the production pipeline of various companies, business processes, government offices, as well as extensively used by the common public. Not much thought is given to the accuracy of these applications. All the accuracies and evaluation scores that come out are the ones claimed by the respective developers in the research papers they publish. There are also similar platforms that go public without any public scrutiny. As these applications have an impact on the promotion of the languages, and at times also affect it, there is a need for constant evaluation of such applications on a public platform.

The proposed panel discussion aims to identify the common parameters upon which such applications can be tested with a given language. The applications that we seek to constantly evaluate are at present Machine Translation (MT) engines, ASR systems, TTS systems, and Generative AI models. While MT and ASR models are quite straightforward and several evaluation measures are present, that of Generative AI needs greater discussion as to which skill sets are to be tested based on the languages covered.

Generative AI LLMs Skill Sets Include:

Information Extraction: Extracting relevant information from text
Question Answering: Generic or specific questions and answers are sought from the LLMs
Rewrite: LLM is asked to rewrite a text given in the prompts
Summarize: LLM is asked to summarize a text given in the prompt
Conversation: A generic or specific conversation takes place between human and LLM agent
Content Generation: LLM is asked to generate content in different languages
Coding: LLM is asked to write a piece of code
Brainstorming: LLM is made part of a brainstorming session where it comes up with the ideas available with it

The panel discussion will focus on how to evaluate these skills across languages and domains. The end goal is to prepare the criteria that would be implemented in a publicly available web-based tool available at eval.ldcil.org.