Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Monolingual Text Corpus Creation | Official Website of Linguistic Data Consortium for Indian Languages

Monolingual Text Corpus Creation

Text Data Collection project focuses on gathering diverse text datasets to enrich natural language processing and machine learning applications. It aims to collect a wide range of text data from various sources, including different dialects, styles and contexts. The collected data will be utilized to improve language models, language aided applications, support academic research, and other language related technologies.

Team members

Kannada Dr. Vijayalaxmi F Patil
Sanskrit Chetan Baji
Kashmiri Dr. Zargar Adil Ahmad, Bi Bi Mariyam
Marathi Bhageshree K Khandale
Angika, Chhattisgarhi, RajasthaniAnkita Tiwari
Telugu Dr. Modugu Kasimbabu
Urdu (in Devanagari Script) Dr. Mansoor Khan, Dr. Shahnawaz Alam, Bi Bi Mariyam