Central Institute of Indian Languages [CIIL] MISSION STATEMENT:  Annotated, quality language data (both-text & speech) and tools in Indian Languages to Individuals, Institutions and Industry for Research & Development - Created in-house, through outsourcing and acquisition..  Our Other Sites  Related Sites 
You are here: BACK
Resources > Speech Corpora
Size of Speech Corpora ( As on Jul 2014)

SPEECH CORPORA (Raw Data)

Sl No.

Languages

Hours

1

Assamese

105:51:38

2

Bengali

138:18:47

3

Bodo

201:10:48

4

Dogri

111:32:11

5

Gujarati

156:23:04

6

Hindi

269:09:50

7

Indian English Bengali

34:12:57

8

Indian English Guajarati (MP3 Format)

21:40:00

9

Indian English Kannada

37:01:33

10

Kannada

198:51:03

11

Kashmiri

44:59:07

12

Konkani

195:14:47

13

Maithili

95:59:54

14

Malayalam

265:24:18

15

Manipuri

187:35:13

16

Marathi

168:13:50

17

Nepali

145:04:46

18

Oriya

165:30:05

19

Punjabi

 187:53:28

20

Tamil

213:37:27

21

Telugu

50:51:36

22

Urdu

124:19:58


Back Top

SPEECH CORPORA (Segmented Data)

Sl. No.

Language

Dialog

Hour

Minutes

Seconds

Speakers

1

Assamese

Upper Assam, Lower Assam

80

8

4

306

2

Bengali

SCB (Kolkata) & Barendri (North Bengal)

125

19

53

697

3

Bodo

Standard and Non Standard

198

10

48

416

4

Dogri

Standard

17

10

58

61

5

English Kannada

Avadhi, Bhojpuri, Magahi and Standard

20

04

18

52

6

English Bengali

Standard And South Gujarati

34

12

57

53

7

Gujarati

Indian

77

38

53

302

8

Gujarati (Mono)

Indian

16

52

24

62

9

Hindi

Standard, Bhojpuri & Magahi

174

31

50

586

10

Kannada

North-East(Hyderabad Karnataka), North-West(Mumbai Karnataka) and Canara

157

50

25

642

11

Kashmiri

Standard

29

26

13

149

12

Konkani

Standard

193

14

47

454

13

Maithili

Standard

88

00

43

301

14

Malayalam

Standard

103

16

12

307

15

Manipuri

Standard and Kakching

175

26

09

668

16

Marathi

Standard

89

18

43

306

17

Nepali

Darjeeling and Assamese

114

39

29

351

18

Oriya

Standard

165

30

05

474

19

Punjabi

Standard

187

53

28

468

20

Tamil

Standard

211

37

27

446

21

Telugu

Standard

13

03

43

56

22

Urdu

Standard

110

46

15

342


Back Top

SPEECH CORPORA (Annotated Data)

Sl. No.

Name of the Language

Annotated
(HH:MM:SS)

1

Assamese

28:18:56

2

Bengali

39:12:31

3

Bodo

30:45:56

4

Gujarati

02:39:39

5

Hindi

80:01:48

6

Kannada

69:05:56

7

Kashmiri

06:28:25

8

Konkani

37:00:00

9

Maithili

30:10:40

10

Malayalam

92:40:43

11

Manipuri

109:48:27

12

Nepali

39:33:34

13

Oriya

62:33:15

14

Punjabi

47:07:13

15

Tamil

72:09:09

16

Urdu

36:33:55


Back Top

Pronunciation Dictionaries (Studio Recording)

Sl. No

Language

Hour

Minutes

Seconds

1

Assamese

36

31

28

2

Bengali

21

55

46

3

Bodo

50

38

55

4

Gujarati

49

0

0

5

Hindi

45

51

32

6

Kannada

58

20

43

7

Konkani

32

29

53

8

Malayalam

33

3

5

9

Manipuri

49

41

18

10

Nepali

23

23

35

11

Oriya

40

0

33

12

Punjabi

33

30

6

13

Tamil

48

12

10

14

Urdu

34

2

39

TOP BACK
You are visitor No.
WAIT...

Developed & Maintained by:
LDC-IL, CIIL
Copyright © LDC-IL,
Central Institute of Indian Languages
Central Institute of Indian Languages
Department of Higher Education
Ministry of Human Resource Development
Government of India
Manasagangothri, Hunsur Road, Mysore-570006, Karnataka, India.
Tel: (0821) 2515820 (Director)
Reception/PABX : (0821) 2345000
Fax: (0821) 2515032 (Off)
        Home | Announcements | News | CIIL | Contact Us