Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Released Datasets | Official Website of Linguistic Data Consortium for Indian Languages

Released Datasets of LDC-IL and their Prices

LDC-IL has so far released a total of 58+ datasets. The list of the datasets released is given below along with their prices for the commercial users.

Sl no. Name of datasets Link Prices
46 Bodo Raw Speech Corpus 480896
47 Hindi Raw Speech Corpus 328965
48 Kannada Raw Speech Corpus 488146
49 Konkani Raw Speech Corpus 425842
50 Maithili Raw Speech Corpus 214099
51 Marathi Raw Speech Corpus 242736
52 Nepali Raw Speech Corpus 236937
53 Punjabi Raw Speech Corpus 274999
54 Telugu Raw Speech Corpus 61806
55 Urdu Raw Speech Corpus 269969
56 A Gold Standard Punjabi Raw Text Corpus. 70907
57 Malayalam Raw Speech Corpus 445915
58 Manipuri Raw Speech Corpus 425435

These datasets are distributed for both commercial and non-commercial usage.

Please note that for bonafide non-commercial and academic use, the datasets are free of charge. The requester needs to be a bonafide student/faculty/employee of a government funded research Institute or be a government entity.

Additional discounts are available for Startups, MSMEs, entitites from the SAARC countries. For more details about the discount and the procedure to procure the datasets, please login to the Data Distribution portal and see the FAQ page.