Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
LDC-IL | Official Website of Linguistic Data Consortium for Indian Languages

LDC-IL

Text Data Creation

A corpus is a large collection of language manifestations that accurately represent various aspects of a language, in text or spoken or visual signs. An electronic text corpus consists of language texts in digital format, selected based on external criteria to represent, as accurately as possible, a language or language variety. This serves as a valuable data source for linguistic research.

Corpora are essential resources in language technology. Computers enhance linguistic studies by enabling efficient searching, selecting, sorting, and formatting, thus reducing human bias and increasing the reliability of results. In Corpus Linguistics, corpora form the foundation for numerous research tasks. They are also crucial for various applications like grammar checkers and spell checkers used in word processors.

Indian languages present significant challenges for developer community in Natural Language Processing/Artificial Intelligence. There has been a long-standing demand for extensive linguistic data to develop applications and products. However, it is crucial that this data is collected, organized, and stored in ways that meet the diverse needs of technology developers.

Text Data Collection Guideline

LDC-IL collected text corpus from different sources. They are mainly books, magazines, and newspapers. LDC-IL has different Sampling approach over while extracting text from these three sources.

Sampling Approach for Books

The books were identified so that the representation of different domains can be catered. After identifying the books, the next step is to extract typically 10 pages of text from it. LDC-IL follows a sampling method to collect the pages from a book. For example, if the book has 200+ pages we collect every 20th page of the book.

Other generic principles that have been normally followed in the sampling tasks across languages are as follows:

  • Contents containing obnoxious or vulgar texts should be avoided.
  • Prefer books published after 1990.
  • Text extracts containing poems and formulae should be avoided.
  • Pages containing diagrams, tables or figures should be avoided.
  • Books containing less than 50 pages are not part of sampling.
  • If the text contains content other than the intended language, those texts should be avoided if the other language content is longer than one sentence.
Sampling Approach for Magazines

In the case of magazine texts, which are typically brief and span various domains, the entire magazine should be included in the corpus, excluding advertisements, image captions, tables, and similar non-text elements. A magazine corpus generally encompasses a variety of text types, such as cookery, health, cinema, stories, contemporary articles, and more.

Sampling Approach for Newspaper

The newspaper corpus is contemporary text in nature. The text may contain political news, editorials, sports news etc. The newspaper has separate subcategories which cover all the newspaper domains. Classifieds, very small news snippets were avoided.

Proof Reading

Once the text is in digital form, it is proofread to eliminate any typographical errors. For proof reading the following steps should be taken:

  • Removing any poetic text or poetic structures that appear within the running text.
  • Eliminating incomplete sentences, particularly those at the end of paragraphs.
  • Ensuring the correct use of the visargaha symbol and the colon ‘:’ symbol, verifying that each is used appropriately.
  • Unnecessary space near the punctuation mark should be removed.
  • The digitised text should be true to the hard copy.
  • Ensuring that the Title, Author, and Headline fields are written in Roman script using the LDC-IL transliteration scheme.

Data Encoding

The collected data should be encoded in a machine-readable format for further analysis. When storing the data, it is important to follow to certain standards to ensure ease of storage and overlong retrieval. The LDC-IL Text corpus uses Unicode encoding and is stored in XML format. Large-scale language resources depend on metadata to ensure the authenticity, which serves as a compulsory fragment of any corpus.

LDC-IL XML Markup Standard ( link it here https://dev.ldcil.org/standardsTextXML.aspx)

Speech Data Creation

The LDC-IL speech corpus was precisely accumulated to meet the diverse needs of the research and development community for numerous types of speech-based linguistic analysis. The availability of speech technology in Indian languages has been minimal because speech data accumulation is highly challenging in Indian languages. To provide language support for Indian languages in various speech aided applications LDC-IL started crating speech data.

The data was collected for a wide range of tasks such as Automatic Speech Recognition (ASR), Speech-to-Text (STT) systems, linguistic analysis, and speech therapy. To achieve this, various types of content were recorded. Thus the datasets were designed in such a way that to cover all the phonemes and allophones of the language in all possible environments. To ensure the real world usage continuous speech was recorded in natural environments. This careful planning and expert input have resulted in a versatile and comprehensive speech corpus that supports a broad range of linguistic and technological applications.

Speech Data Collection Guideline

Speech Data Collection
  • Data should be taken from native speaker
  • The data should be collected from all possible dialect.
  • Collect metadata such as speaker ID, age, gender, language, dialect, recording environment, and date.
  • Obtain signed consent forms from each participant.
  • Data should not be taken from too noisy environment.
  • Allow the participant to be as comfortable sit that reduce the stress levels so to get natural speech.
  • Place the microphone at an optimal distance and angle to capture clear audio.
  • Record audio in wav format
  • Ensure the recording equipment functioning correctly and data is stored securely.
  • Carry spare batteries, memory cards etc.
  • Regularly back up recordings to prevent data loss.
Organizing the data
  • After the field work is completed, the data has to be stored in a server as soon as possible for safe keeping.
  • Keep a backup of the saved data.
  • After the data is stored, it is segmented and mapped with its corresponding text and metadata.
Data Verification
  • Metadata needs to be verified.
  • Audio against text mapping needs to be checked.
  • The data duplication needs to be checked.
  • Naming Conventions need to be verified.

Speech Data Types

  • Contemporary Text (News)
  • Creative Text
  • Sentence
  • Date
  • Command and Control Words
  • Place Name
  • Person Name
  • Most Frequent Word-Part
  • Most Frequent Word-Full Set
  • Phonetically Balanced-Full Set
  • Form and Function Word-Full Set

Speech Annotation

A dataset containing language-specific information is called an annotated corpus. It can be used to train machine learning algorithms to capture the computational attributes of language structure. It simplify the system to easily identify patterns.

An annotated speech corpus provides a wide range of linguistic information, which is particularly useful for analyzing the phonetic aspects of a language. Speech data is annotated at various levels, including phone, phoneme, syllable, word, and sentence. Annotating data from the morphosyntactic to the pragmatic level requires careful attention to the structure of speech and its attributes. Annotated data can be used for speech recognition, speech synthesis, and many other language-related technologies.

LDC-IL Guideline for Speech Annotation-1.0

Guidelines for Phonetically Normalized Speech Annotation

In phonetically normalized speech annotation layer should be carried out as per the pronunciation of the speaker in the audio. The wrong pronunciation, that is, deviation from the text should be transcribed accordingly.

Following are the instances which should be marked in the annotation.

  1. Use of ‘#’
  2. ‘#’ symbol is used to mark excessively noisy or unnecessary parts of the audio – generally these portions also contain human speech but are not legible. The audio is post-processed to remove such portions from the audio files that are being released publicly. # should be used within the sentence.

  3. Use of ‘0’
  4. ‘0’ is used to mark silences at the beginning and end of the audio files as well as long silences or non-speech sounds in the intermediate portions of the audio files - these are marked only for those portions which do not have any kind of human speech. These portions are marked so that it can be removed in the post-processing step.

  5. Silences are removed
  6. Any silence longer than 50 ms should be marked.

  7. Cut-off speech and intended speech should be marked.
  8. Example: [mini]*ster —shows that the speaker intended to speak minister but spoke mini in an unclear fashion and ster clearly.

  9. Annotation of speech disfluency
  10. Restarts/false starts should be marked. For example, if the speaker intends to speak “Keralam” but speaks “Ke Keralam”, this should be marked as Ke-Keralam.

  11. Numbers
  12. All number sequences should be spelled out. Years should be transcribed in spoken format.

  13. Mispronunciations
  14. If a speaker mispronounces a word and the mispronunciation is not an actual word, transcription should be done as the word is spoken.

  15. Utterances longer than 30 seconds should be further split into multiple parts - this split is made at the point of a long silence of around 500 ms.
Guidelines for Orthographically Normalized Speech Annotation

Orthographically normalized textual layer is prepared over the phonetically normalized text by using the guidelines given below

  1. Small portion of a word like a grammatical element or a letter is missed then it is corrected as per the correct writing pattern. For example, if the speaker speaks ‘avan viitti poyi’ ‘He went home’ in an informal way then it is annotated as ‘avan viittil poyi’ in the proper standard Malayalam sentence. There is no valid word ‘viitti’ in the language, so it is annotated as a proper word according to the context.
  2. Any deviation in the phonetically annotated text is corrected according to standard writing form. For example, if the audio is ‘Maiyam Engineering’ it must be corrected as ‘Marine Engineering’.
  3. Restarts/false starts are removed. For example, the speaker intends to speak ‘keralam’ but speaks ‘kekeralam’. If the word ‘ke’ is a valid word morpheme, then it has been kept, otherwise ‘ke’ is not marked.
  4. Cut-off speech is written as standard form and remove [ ]* symbol
  5. Incomplete sentence is standardized to the extent of available audio.

LDC-IL Guideline for Speech Annotation-2.0

  1. Use of ‘#’
  2. ‘#’ symbol is used to mark excessively noisy or unnecessary parts of the audio – generally these portions also contain human speech but are not legible. The audio is post-processed to remove such portions from the audio files that are being released publicly. # should be used within the sentence.

  3. Use of ‘0’
  4. ‘0’ is used to mark silences at the beginning and end of the audio files as well as long silences or non-speech sounds in the intermediate portions of the audio files - these are marked only for those portions which do not have any kind of human speech. These portions are marked so that it can be removed in the post-processing step.

  5. Removal of Silence
  6. A period of relative quietness occurs in the audio files where speaker stops to think or hesitates before saying a word. The expression relative quietness is used here because there is no actual silence in the speech signal due to line and environmental noise. Any silence longer than 50 ms to be marked using ‘#’ or ‘0’ based on the context.

  7. Cut-off speech
  8. In case a small portion of word like grammatical element or a letter is missed then it should be corrected as per the correct writing pattern.

    For example if the speaker speaks ‘avan viitti poyi’ ‘He went home’ in an informal way then it is annotated as ‘avan viittil poyi’ in the proper standard Malayalam sentence. There is no valid word ‘viitti’ in the language so it should be annotated as a proper word according to the context.

  9. Dis-fluency
  10. For any deviation in the utterance from the standard form of pronunciation, transcription should to be kept in the standard writing form

    For example if the audio is “Maiyen Engineering”, it has to be corrected as “Marine Engineering”

  11. Mispronunciations
  12. If a speaker mispronounces a word and the mispronunciation is an actual word in the language, transcription to be done as the word is spoken.

    For example if the audio is ‘avaɭ’ ‘she’ instead of ‘avan’ ‘he’ then transcribe according to the wave.

  13. Unnecessary repetition
  14. Eliminate unnecessary repetition of words by marking as ‘0’ or ‘#’ as the case may be.

  15. Restart/False start
  16. Eliminated starts by marking as ‘0’ or ‘#’ as the case may be.

    For example ‘kaɽɳaːʈakajuʈe t̪alast̪ʰaːnamaːɳ beŋgaɭuːru’ ‘Bangalore is the capital of Karnataka’ is the sentence with Restart and false starts words then following way it should be annotated.

    For example “bengaluru is the capital of karnatka”

    In Audio: ma mandya is a district of gar karnataka

    Annotation: | 0 | mandya is the district of | # | karnataka

  17. Utterances should be no longer than 30secs. So the annotator should find a silence and split the sentence appropriately or if the utterance is too long then break the sentence at phrase endings by marking boundary.
  18. Incomplete sentence should be annotated to the extent of available audio only.
  19. Numbers: All number sequences are spelled out
  20. No punctuations should be used in the annotations.

Parts of Speech Tagging

It is the process of assigning a word in a text as corresponding to a particular part of speech on the basis of its definition and its occurrence in a given context. The process is basically to design or provide help in creation of appropriate language technology. Since each PoS tag is attached to a single word, preprocessing mechanisms such as splitting, tokenization, etc. have already been performed to filter out typesetting based-raw corpus. This is in response to meet the requirement of standardization amongst the Indian languages that exhibit a very rich system of morphology where words appear long with complex morpho-phonemic and morpho-syntactic changes at the junctures.

PoS Tagging Guidelines

In order to develop various TagSets for individual languages, the LDC-IL has undertaken certain linguistic modus operandi as laid down below:

  • Defining the traditional parts of speech along with the examples
  • Understanding the concept of Form and Function (Pronouns, Demonstratives, Numerals, etc.)
  • Recognizing the fuzzy boundaries between the grammatical classes, i.e., a lexical item may function as a specific category and the same may function as different category in different context (Gerunds vs. Infinitive/Participle etc).
  • Working out the syntactic relation between the modifier-modified (Adj-Noun; Participle-Noun).
  • Realizing the morpho-syntactic features a particular lexical item carries in a given syntactic configuration. (Person-Number-Gender/Case; Tense-Aspect-Mood/Mod).

LDC-IL POS Tag Set

The priority is to cover all the Scheduled languages and then take up other non-scheduled languages. LDC-IL planned to make PoS tagged data on 22 Scheduled Languages such as Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu. 15 PoS tag set were prepared and they are listed below:

  1. Assamese
  2. Bengali
  3. Bodo
  4. Dogri
  5. Gujarati
  6. Hindi
  7. Kannada
  8. Kashmiri
  9. Malayalam
  10. Manipuri
  11. Nepali
  12. Odia
  13. Punjabi
  14. Tamil
  15. Urdu

Chunking

The process of annotating tagged tokens with structures in a non-hierarchical and non-recursive way is Chunking. It is acknowledged that segmentation and labeling are the most common operations in language processing. Chunking is a popular representative of a segmentation process aiming to segment the tagged tokens into meaningful structures. In the meantime, chunks generally do not try to analyze entire sentences, but only try to build “chunks” of words. In this line of view, the rule system of chunks is relatively simple, robust, and efficient.

Chunking Guidelines

The scheme has adopted certain set of linguistic norms which should be followed by the Resource Persons working on chunking. The chunking of linguistic expression is purely based on specific categorical label and hence the following linguistic guides are being introduced for the ease of annotators.

  • Identifying different chunk levels along with the typical examples.
  • Keeping in mind that minimal recursive phrases (nominal or verbal) should be captured.
  • Understanding the idea that chunking operates on the minimal non-recursive phrases and within such minimal construction, there is no nested structure.
  • Make sure that nested non-recursive clusters are identified with their heads. (Possessive Constructions, Spatial Relational Nouns, Nested Modifier inside noun phrases).
  • Having the knowledge as well as hands-on experience of linguistic phenomena such as scrambling of the lexical items, dislocated element, spelling out of boundary elements realized as case markers or tense, mood, aspects etc, between two expressions that operate on the data of the language concerned.

Parallel Corpora

A parallel corpus contains translations of the same document in two or more languages, aligned at least at the sentence level. Parallel corpora are generally data sets of translated sentence pairs. It can be used to train and test machine translation models. Parallel corpora play a vital role in translation studies and contrastive linguistics. Moreover, it is significantly aiding the exploration of inter-linguistic phenomena and serves as a valuable resource for language teaching.

LDC-IL is creating parallel corpora in 270 Indian languages. The data is creating by outsourcing all over India including the remote part of the country through an online platform called Trankit.

Guideline for Translator

  • The translation should be as natural as the target language.
  • Translate according to the cultural nuances of the target language and try to convey the intended meaning of the source language.
  • Source language style should match with target language.
  • Keep consistency in the terminology.
  • Maintain readability.
  • Keep coherence in the sentence structure that helps to give natural flow of language.
  • Find an equivalent expression of idiomatic expression in the target language.
  • Follow the grammar and syntax rules of the target language.
  • Ensure technical terminology in the target language, especially in scientific, medical, legal, mathematical, and technological domains.
  • Careful about the ethical issues
  • When the source text is intentionally ambiguous, the translator must decide whether to maintain that ambiguity or clarify it in the translation. [if you can make the same ambiguity in the target language then keep it otherwise explain it]

Guidelines for the Reviewers

  • The sentence which has correction/error should be commented and rectified. The edits you make on the portal can be viewed by all concerned individuals (including the user, project manager and the reviewer). Reviewers should be careful in not penalizing the translator without any substantial reason and the reason must be specified for the edits in the comments given to it.
  • Machine translation is not acceptable if it violates the target language structure and flow. However, translators should not be penalized just because a translation is similar to what any MT system gives.
  • The translation should be as natural as per the flow of target language.
  • Translate according to the cultural nuances of the target language and try to convey the intended meaning of the source language.
  • Source language style should match with target language, wherever possible. Adaptation is not intended.
  • Maintain readability i.e. the target text should sound natural in the language.
  • Keep coherence in the sentence structure that helps to give natural flow of language.
  • Find an equivalent expression of idiomatic expression in the target language, wherever possible.
  • Follow the grammar and syntax rules of the target language.
  • Ensure technical terminology in the target language, especially in scientific, medical, legal, mathematical, and technological domains.
  • When the source text is intentionally ambiguous, the translator must decide whether to maintain that ambiguity or clarify it in the translation. [if you can make the same ambiguity in the target language then keep it otherwise explain it]
  • In case of any doubt, please use the "Raise Issue" button to mark the issue to the translator and the project manager.

Trankit Review Procedure for Parallel Corpus

As the reviewers keep on reviewing, they should make a note of their review comments, if required, for each of the segments. This helps in preparing a score card automatically that can be shared with the translator as well.

Preparing the Score Card as you review

The scorecard is the means of providing feedback primarily to the translator, but ultimately to any stakeholders. The scorecard provides insight into the quality of the translation, the errors that were found (if any) and general assessment.

During the review, the reviewer is supposed to record any errors found into the scorecard, including the location of the error (i.e. the segment or the task), source and target text, corrected target text, type of error and its severity.

The final error score is calculated in ampersand based on number of errors (with severity weighted in) with respect of the volume.

How to generate the Review Report

You do not need to anything separately to generate the Review Report. As you go on reviewing the segments of any given task, you can keep on making comments and giving your scores for each of the segments, wherever necessary.

If a segment has any error, you can just edit the text, give a comment and hit the score button to give an error score to the segment. The E button pop's out a window where you can select the error type and its severity. You can also use this button to give kudos to the translator for coming up with an extra-ordinary translation for the given segment.

Do not forget to fill in both error type and severity for all errors. However, the exceptions are Kudos, Preferential Change and Repeat - these do not have any severities, so no point in filling it in (the severity field is greyed out when these three error types are selected.

Error Definitions and Severities
  1. Naturalness
  2. When the translation is not fluent, well-written in the eyes of a native speaker, but is not necessarily due to wrong meaning, wrong grammar, punctuation or spelling. Definition in the Oxford English dictionary: The ease with which a text may be scanned or read; the quality in a book, etc., of being easy to understand and enjoyable to read.

    EXAMPLE:

    Literal and word-to-word translations sound awkward and inauthentic, (no effort to paraphrase).

    Native speakers would not have used unnatural sentence structure and word order or word combination that. No effort to paraphrase and adjust to the norms of your language.

    Literal translations that make it difficult to identify the main topic.

  3. Compliance
  4. When the translator fails to follow requirements specified in the Style guide and/or project instructions.

    EXAMPLE:

    When a segment is left un-translated (missing translation), while it shouldn't be, e.g. there is no instruction telling the translators to leave anything un-translated.

    Identical sentences were translated inconsistently when they should be consistent (maybe because a big file was translated by several translators).

    Grammar and punctuation errors are described in the style guide (if available).

  5. Grammar
  6. Grammatical error, word order replacement, gender, singular/plural, or any other violation of language rules need to be checked.

    The translation does not adhere to the target language-specific rules with regard to grammar or syntax.

  7. Meaning
  8. Translation does not convey the meaning or nuance intended in the original source.

    Omissions or additions that improve the language flow or makes translation more natural should not be a meaning error.

    EXAMPLE:

    "Error: Duplicate site or pages" translated as though it means "Error: please duplicate these sites or pages" when what is meant is "an error occurred where some sites were duplicated".

  9. Punctuation/Spelling
  10. Punctuation: General punctuation error and capitalization e.g. wrong use of commas, colons, dashes, etc. Spelling: Typos, use of outdated/uncommon spelling (or character), etc.

    EXAMPLE:

    Leaving in the period inside the quotation mark in [Click "OK," then exit.] when it should have been moved outside the quotation mark in your language.

    Using double quotation marks (") when you should use single quotation marks (') in your languages.

  11. Terminology
  12. Terms not following the translation in glossary/TD or specification in any other forms of communication including style guide and feedback from previous projects.

  13. Preferential Change
  14. If there is a mistake in the source, reasonable misunderstanding of the source due to lack of information then don’t consider as the fault of a translator even it is inappropriate translation.

    When the translation needs to be changed but the translation is not wrong in any way. This category has no weight in the final score.

    EXAMPLE:

    "Home" can be either the "homepage" or a "home" label for phone number. When there is no description or screenshot and the previous/next string is "office", translating that into "home" phone number would make sense, even though the right translation might be "homepage".

  15. Repeat
  16. The Repeat code is used to mark repetitive instances of an error that was already penalized. It has no weight in the final score.

    If the reviewer has already penalized the translator for an error once, the translator cannot be penalized again for the same.

    EXAMPLE:

    The same grammatical, spelling, or standard punctuation error repeated throughout a project.

  17. Kudos
  18. The Reviewer can assign Kudos to mark excellent translations, but Kudos doesn’t influence the scoring formulas in any way. This is because particularly successful translations do not mitigate the quality impact of any concrete error.

Error category notes
  1. An error should be objectively classified as one of the error categories according to the guidelines above. If an error falls into more than one error category, it will be categorized based on the most severe error category.
  2. Changing the writing style according to personal preferences does not play a role in a translation review, unless specifically required by a project (which is unlikely here). Only indisputable language errors (Errors that can be verified against a grammar/usage book, approved TMs and glossaries, language specific style guides (if available and provided) and translation guidelines for the relevant language.) are classified as errors and included in the error count. There are cases in which a change would greatly improve the intelligibility level of the text, but the text does not contain an objective language error. The reviewer uses Compliance category in such cases.
Severity Levels
  • A CRITICAL ERROR is any error:
  • That causes an application to crash. (such as when the string has a placeholder which should not have been translated, check if your project tasks have placeholders in it).

    That modifies or misinterprets the source text in a way that makes the end-user to be disoriented and that would cause the reader to seek un-called for clarifications.

    That results in a potentially offensive statement (morally, politically, regarding religion or culture) or can cause embarrassment to the client/project owner.

    That is in a highly visible part of documentation or software (e.g. cover page, menu command, site web page, site opening page, etc.).

    For which the localizer/translator has repeatedly ignored previous reviewer feedback without any particular reason for not implementing it. The reviewer should be informed beforehand of any correction from previous reviews not implemented.

    For which the localizer/translator has failed to implement query answers regarding key information.

  • A MAJOR ERROR is any error:
  • That appears in an important or visible location (header, TOC, chapter title, help topic title, web page section title, etc.) regardless of error category.

    That result in a significant change in the meaning, which causes that the user is very likely to be misled (severe Accuracy or context-dependent Terminology errors).

    In grammar or syntax that is a gross violation of generally accepted language conventions.

    For which the localizer/translator has ignored previous language QA feedback without any particular reason for not implementing it. The reviewer/customer should be informed beforehand of any correction from previous QA not implemented.

    For which the localizer/translator has failed to implement query answers that do not influence key product features or information.

  • A MINOR ERROR is any error:
  • That results in a slight change in the meaning.

    That would not confuse or mislead a user but could be noticed.

    In the Formal Category that does not result in misinterpretation of the source or in the wrong functionality of the product (e.g. page layout, spacing, tables and numbered/bulleted lists, fonts, font style, etc.).

    If the use of punctuation or capitalization not resulting in a loss of the meaning.

    In style that does not result in misinterpretation of the source.

    In grammar or syntax that is a minor violation of generally accepted language conventions.

    In the Style Guide/Product Guidelines category that does not result in misinterpretation of the source or in the wrong functionality of the product.

Giving Overall Feedback

Please provide your overall impression, point out repetitive issues, what the localisation team should focus on to improve the quality, what the good points are and what the main problem is.

To provide constructive and actionable feedback to the translator, please answer these questions:

  • How do you rate the overall linguistic experience?
  • What did the translator do well?
  • What are the most significant or recurrent errors found?
  • Can you guess the root cause of the most significant errors?
  • What is the impact of the errors for the end-user?
  • Would you recommend any special training?

Transliteration

Unicode has become the standard for encoding text from different languages and scripts. However, legacy systems may have different approach and it struggle data processing and communication. The Linguistic Data Consortium for Indian Languages (LDC-IL) has addressed this by developing a transliteration schema that converts Indian scripts into Roman script using the ASCII range.

By transliterating Indian scripts into ASCII characters, LDC-IL ensure that these systems can process, store, and transmit text data without encountering errors or data corruption. Transliteration allows text data in Indian scripts to be easily incorporated into ASCII-based Roman script to ensure the text documents can be exchanged without loss of information or integrity. This transliteration schema plays a crucial role in accessibility and usability for users who are not familiar with Indian scripts and maintaining the continuity and functionality of various digital systems.

LDC-IL Transliterator Application converts 9 Indic Scripts to Roman and Vice-versa. They are Assamese/Bengali, Devanagari, Gujarati, Gurumukhi, Kannada, Malayalam, Odia, Tamil and Telugu.

LDC-IL Transliteration Scheme

Character Unicode Value in Decimal Roman Transliteration Schema Character Description
2433 m' ASSAMESE/BENGALI SIGN CANDRABINDU
2434 M ASSAMESE/BENGALI SIGN ANUSVARA
2435 H ASSAMESE/BENGALI SIGN VISARGA
2437 a ASSAMESE/BENGALI LETTER A
2438 A ASSAMESE/BENGALI LETTER AA
2439 i ASSAMESE/BENGALI LETTER I
2440 I ASSAMESE/BENGALI LETTER II
2441 u ASSAMESE/BENGALI LETTER U
2442 U ASSAMESE/BENGALI LETTER UU
2443 x ASSAMESE/BENGALI LETTER VOCALIC R
2444 q ASSAMESE/BENGALI LETTER VOCALIC L
2447 E ASSAMESE/BENGALI LETTER E
2448 ai ASSAMESE/BENGALI LETTER AI
2451 O ASSAMESE/BENGALI LETTER O
2452 au ASSAMESE/BENGALI LETTER AU
2453 ka ASSAMESE/BENGALI LETTER KA
2454 kha ASSAMESE/BENGALI LETTER KHA
2455 ga ASSAMESE/BENGALI LETTER GA
2456 gha ASSAMESE/BENGALI LETTER GHA
2457 ng'a ASSAMESE/BENGALI LETTER NGA
2458 ca ASSAMESE/BENGALI LETTER CA
2459 cha ASSAMESE/BENGALI LETTER CHA
2460 ja ASSAMESE/BENGALI LETTER JA
2461 jha ASSAMESE/BENGALI LETTER JHA
2462 nj'a ASSAMESE/BENGALI LETTER NYA
2463 Ta ASSAMESE/BENGALI LETTER TTA
2464 Tha ASSAMESE/BENGALI LETTER TTHA
2465 Da ASSAMESE/BENGALI LETTER DDA
2466 Dha ASSAMESE/BENGALI LETTER DDHA
2467 Na ASSAMESE/BENGALI LETTER NNA
2468 ta ASSAMESE/BENGALI LETTER TA
2469 tha ASSAMESE/BENGALI LETTER THA
2470 da ASSAMESE/BENGALI LETTER DA
2471 dha ASSAMESE/BENGALI LETTER DHA
2472 na ASSAMESE/BENGALI LETTER NA
2474 pa ASSAMESE/BENGALI LETTER PA
2475 pha ASSAMESE/BENGALI LETTER PHA
2476 ba ASSAMESE/BENGALI LETTER BA
2477 bha ASSAMESE/BENGALI LETTER BHA
2478 ma ASSAMESE/BENGALI LETTER MA
2479 ya ASSAMESE/BENGALI LETTER YA
2480 ra BENGALI LETTER RA
2482 la ASSAMESE/BENGALI LETTER LA
2486 sha ASSAMESE/BENGALI LETTER SHA
2487 Sa ASSAMESE/BENGALI LETTER SSA
2488 sa ASSAMESE/BENGALI LETTER SA
2489 ha ASSAMESE/BENGALI LETTER HA
2492 ' ASSAMESE/BENGALI SIGN NUKTA
2493 ASSAMESE/BENGALI SIGN AVAGRAHA
2494 A ASSAMESE/BENGALI VOWEL SIGN AA
ি 2495 i ASSAMESE/BENGALI VOWEL SIGN I
2496 I ASSAMESE/BENGALI VOWEL SIGN II
2497 u ASSAMESE/BENGALI VOWEL SIGN U
2498 U ASSAMESE/BENGALI VOWEL SIGN UU
2499 x ASSAMESE/BENGALI VOWEL SIGN VOCALIC R
2500 X ASSAMESE/BENGALI VOWEL SIGN VOCALIC RR
2503 E ASSAMESE/BENGALI VOWEL SIGN E
2504 ai ASSAMESE/BENGALI VOWEL SIGN AI
2507 O ASSAMESE/BENGALI VOWEL SIGN O
2508 au ASSAMESE/BENGALI VOWEL SIGN AU
2510 t ASSAMESE/BENGALI LETTER KHANDA TA
2524 D'a ASSAMESE/BENGALI LETTER RRA
2525 Dh'a ASSAMESE/BENGALI LETTER RHA
2527 Ya ASSAMESE/BENGALI LETTER YYA
2528 X ASSAMESE/BENGALI LETTER VOCALIC RR
2529 Q ASSAMESE/BENGALI LETTER VOCALIC LL
2530 q ASSAMESE/BENGALI VOWEL SIGN VOCALIC L
2531 Q ASSAMESE/BENGALI VOWEL SIGN VOCALIC LL
0 2534 0 ASSAMESE/BENGALI DIGIT ZERO
1 2535 1 ASSAMESE/BENGALI DIGIT ONE
2 2536 2 ASSAMESE/BENGALI DIGIT TWO
3 2537 3 ASSAMESE/BENGALI DIGIT THREE
4 2538 4 ASSAMESE/BENGALI DIGIT FOUR
5 2539 5 ASSAMESE/BENGALI DIGIT FIVE
6 2540 6 ASSAMESE/BENGALI DIGIT SIX
7 2541 7 ASSAMESE/BENGALI DIGIT SEVEN
8 2542 8 ASSAMESE/BENGALI DIGIT EIGHT
9 2543 9 ASSAMESE/BENGALI DIGIT NINE
2544 ra ASSAMESE LETTER RA
2545 wa ASSAMESE LETTER WA
2305 m' DEVANAGARI SIGN CANDRABINDU
2306 M DEVANAGARI SIGN ANUSVARA
2307 H DEVANAGARI SIGN VISARGA
2309 a DEVANAGARI LETTER A
2310 A DEVANAGARI LETTER AA
2311 i DEVANAGARI LETTER I
2312 I DEVANAGARI LETTER II
2313 u DEVANAGARI LETTER U
2314 U DEVANAGARI LETTER UU
2315 x DEVANAGARI LETTER VOCALIC R
2316 q DEVANAGARI LETTER VOCALIC L
2319 E DEVANAGARI LETTER E
2320 ai DEVANAGARI LETTER AI
2321 ao DEVANAGARI LETTER CANDRA O
2323 O DEVANAGARI LETTER O
2324 au DEVANAGARI LETTER AU
2325 ka DEVANAGARI LETTER KA
2326 kha DEVANAGARI LETTER KHA
2327 ga DEVANAGARI LETTER GA
2328 gha DEVANAGARI LETTER GHA
2329 ng'a DEVANAGARI LETTER NGA
2330 ca DEVANAGARI LETTER CA
2331 cha DEVANAGARI LETTER CHA
2332 ja DEVANAGARI LETTER JA
2333 jha DEVANAGARI LETTER JHA
2334 nj'a DEVANAGARI LETTER NYA
2335 Ta DEVANAGARI LETTER TTA
2336 Tha DEVANAGARI LETTER TTHA
2337 Da DEVANAGARI LETTER DDA
2338 Dha DEVANAGARI LETTER DDHA
2339 Na DEVANAGARI LETTER NNA
2340 ta DEVANAGARI LETTER TA
2341 tha DEVANAGARI LETTER THA
2342 da DEVANAGARI LETTER DA
2343 dha DEVANAGARI LETTER DHA
2344 na DEVANAGARI LETTER NA
2346 pa DEVANAGARI LETTER PA
2347 pha DEVANAGARI LETTER PHA
2348 ba DEVANAGARI LETTER BA
2349 bha DEVANAGARI LETTER BHA
2350 ma DEVANAGARI LETTER MA
2351 ya DEVANAGARI LETTER YA
2352 ra DEVANAGARI LETTER RA
2353 Ra DEVANAGARI LETTER RRA
2354 la DEVANAGARI LETTER LA
2355 La DEVANAGARI LETTER LLA
2356 Za DEVANAGARI LETTER LLLA
2357 va DEVANAGARI LETTER VA
2358 sha DEVANAGARI LETTER SHA
2359 Sa DEVANAGARI LETTER SSA
2360 sa DEVANAGARI LETTER SA
2361 ha DEVANAGARI LETTER HA
2364 ' DEVANAGARI SIGN NUKTA
2366 A DEVANAGARI VOWEL SIGN AA
ि 2367 i DEVANAGARI VOWEL SIGN I
2368 I DEVANAGARI VOWEL SIGN II
2369 u DEVANAGARI VOWEL SIGN U
2370 U DEVANAGARI VOWEL SIGN UU
2371 x DEVANAGARI VOWEL SIGN VOCALIC R
2372 X DEVANAGARI VOWEL SIGN VOCALIC RR
2375 E DEVANAGARI VOWEL SIGN E
2376 ai DEVANAGARI VOWEL SIGN AI
2377 ao DEVANAGARI VOWEL SIGN CANDRA O
2379 O DEVANAGARI VOWEL SIGN O
2380 au DEVANAGARI VOWEL SIGN AU
2384 @M DEVANAGARI OM
2392 k'a DEVANAGARI LETTER QA
2393 kh'a DEVANAGARI LETTER KHHA
2394 g'a DEVANAGARI LETTER GHHA
2395 j'a DEVANAGARI LETTER ZA
2396 D'a DEVANAGARI LETTER DDDHA
2397 Dh'a DEVANAGARI LETTER RHA
2398 ph'a DEVANAGARI LETTER FA
2399 Ya DEVANAGARI LETTER YYA
2400 X DEVANAGARI LETTER VOCALIC RR
2401 Q DEVANAGARI LETTER VOCALIC LL
2402 q DEVANAGARI VOWEL SIGN VOCALIC L
2403 Q DEVANAGARI VOWEL SIGN VOCALIC LL
2404 . DEVANAGARI DANDA
2405 .. DEVANAGARI DOUBLE DANDA
0 2406 0 DEVANAGARI DIGIT ZERO
1 2407 1 DEVANAGARI DIGIT ONE
2 2408 2 DEVANAGARI DIGIT TWO
3 2409 3 DEVANAGARI DIGIT THREE
4 2410 4 DEVANAGARI DIGIT FOUR
5 2411 5 DEVANAGARI DIGIT FIVE
6 2412 6 DEVANAGARI DIGIT SIX
7 2413 7 DEVANAGARI DIGIT SEVEN
8 2414 8 DEVANAGARI DIGIT EIGHT
9 2415 9 DEVANAGARI DIGIT NINE
2689 m' GUJARATI SIGN CANDRABINDU
2690 M GUJARATI SIGN ANUSVARA
2691 H GUJARATI SIGN VISARGA
2693 a GUJARATI LETTER A
2694 A GUJARATI LETTER AA
2695 i GUJARATI LETTER I
2696 I GUJARATI LETTER II
2697 u GUJARATI LETTER U
2698 U GUJARATI LETTER UU
2699 x GUJARATI LETTER VOCALIC R
2700 q GUJARATI LETTER VOCALIC L
2701 ae GUJARATI VOWEL CANDRA E
2703 E GUJARATI LETTER E
2704 ai GUJARATI LETTER AI
2705 ao GUJARATI VOWEL CANDRA O
2707 O GUJARATI LETTER O
2708 au GUJARATI LETTER AU
2709 ka GUJARATI LETTER KA
2710 kha GUJARATI LETTER KHA
2711 ga GUJARATI LETTER GA
2712 gha GUJARATI LETTER GHA
2713 ng'a GUJARATI LETTER NGA
2714 ca GUJARATI LETTER CA
2715 cha GUJARATI LETTER CHA
2716 ja GUJARATI LETTER JA
2717 jha GUJARATI LETTER JHA
2718 nj'a GUJARATI LETTER NYA
2719 Ta GUJARATI LETTER TTA
2720 Tha GUJARATI LETTER TTHA
2721 Da GUJARATI LETTER DDA
2722 Dha GUJARATI LETTER DDHA
2723 Na GUJARATI LETTER NNA
2724 ta GUJARATI LETTER TA
2725 tha GUJARATI LETTER THA
2726 da GUJARATI LETTER DA
2727 dha GUJARATI LETTER DHA
2728 na GUJARATI LETTER NA
2730 pa GUJARATI LETTER PA
2731 pha GUJARATI LETTER PHA
2732 ba GUJARATI LETTER BA
2733 bha GUJARATI LETTER BHA
2734 ma GUJARATI LETTER MA
2735 ya GUJARATI LETTER YA
2736 ra GUJARATI LETTER RA
2738 la GUJARATI LETTER LA
2739 La GUJARATI LETTER LLA
2741 va GUJARATI LETTER VA
2742 sha GUJARATI LETTER SHA
2743 Sa GUJARATI LETTER SSA
2744 sa GUJARATI LETTER SA
2745 ha GUJARATI LETTER HA
2748 ' GUJARATI SIGN NUKTA
2750 A GUJARATI VOWEL SIGN AA
િ 2751 i GUJARATI VOWEL SIGN I
2752 I GUJARATI VOWEL SIGN II
2753 u GUJARATI VOWEL SIGN U
2754 U GUJARATI VOWEL SIGN UU
2755 x GUJARATI VOWEL SIGN VOCALIC R
2756 X GUJARATI VOWEL SIGN VOCALIC RR
2757 ae GUJARATI VOWEL SIGN CANDRA E
2759 E GUJARATI VOWEL SIGN E
2760 ai GUJARATI VOWEL SIGN AI
2761 ao GUJARATI VOWEL SIGN CANDRA O
2763 O GUJARATI VOWEL SIGN O
2764 au GUJARATI VOWEL SIGN AU
2784 X GUJARATI LETTER VOCALIC RR
2785 Q GUJARATI LETTER VOCALIC LL
2786 q GUJARATI VOWEL SIGN VOCALIC L
2787 Q GUJARATI VOWEL SIGN VOCALIC LL
0 2790 0 GUJARATI DIGIT ZERO
1 2791 1 GUJARATI DIGIT ONE
2 2792 2 GUJARATI DIGIT TWO
3 2793 3 GUJARATI DIGIT THREE
4 2794 4 GUJARATI DIGIT FOUR
5 2795 5 GUJARATI DIGIT FIVE
6 2796 6 GUJARATI DIGIT SIX
7 2797 7 GUJARATI DIGIT SEVEN
8 2798 8 GUJARATI DIGIT EIGHT
9 2799 9 GUJARATI DIGIT NINE
2561 M' GURMUKHI SIGN ADAK BINDI
2562 M GURMUKHI SIGN BINDI
2563 H GURMUKHI SIGN VISARGA
2565 a GURMUKHI LETTER A
2566 A GURMUKHI LETTER AA
2567 i GURMUKHI LETTER I
2568 I GURMUKHI LETTER II
2569 u GURMUKHI LETTER U
2570 U GURMUKHI LETTER UU
2575 E GURMUKHI LETTER EE
2576 ai GURMUKHI LETTER AI
2579 O GURMUKHI LETTER OO
2580 au GURMUKHI LETTER AU
2581 ka GURMUKHI LETTER KA
2582 kha GURMUKHI LETTER KHA
2583 ga GURMUKHI LETTER GA
2584 gha GURMUKHI LETTER GHA
2585 ng'a GURMUKHI LETTER NGA
2586 ca GURMUKHI LETTER CA
2587 cha GURMUKHI LETTER CHA
2588 ja GURMUKHI LETTER JA
2589 jha GURMUKHI LETTER JHA
2590 nj'a GURMUKHI LETTER NYA
2591 Ta GURMUKHI LETTER TTA
2592 Tha GURMUKHI LETTER TTHA
2593 Da GURMUKHI LETTER DDA
2594 Dha GURMUKHI LETTER DDHA
2595 Na GURMUKHI LETTER NNA
2596 ta GURMUKHI LETTER TA
2597 tha GURMUKHI LETTER THA
2598 da GURMUKHI LETTER DA
2599 dha GURMUKHI LETTER DHA
2600 na GURMUKHI LETTER NA
2602 pa GURMUKHI LETTER PA
2603 pha GURMUKHI LETTER PHA
2604 ba GURMUKHI LETTER BA
2605 bha GURMUKHI LETTER BHA
2606 ma GURMUKHI LETTER MA
2607 ya GURMUKHI LETTER YA
2608 ra GURMUKHI LETTER RA
2610 la GURMUKHI LETTER LA
2611 La GURMUKHI LETTER LLA
2613 va GURMUKHI LETTER VA
2614 sha GURMUKHI LETTER SHA
2616 sa GURMUKHI LETTER SA
2617 ha GURMUKHI LETTER HA
2620 ' GURMUKHI SIGN NUKTA
2622 A GURMUKHI VOWEL SIGN AA
ਿ 2623 i GURMUKHI VOWEL SIGN I
2624 I GURMUKHI VOWEL SIGN II
2625 u GURMUKHI VOWEL SIGN U
2626 U GURMUKHI VOWEL SIGN UU
2631 E GURMUKHI VOWEL SIGN EE
2632 ai GURMUKHI VOWEL SIGN AI
2635 O GURMUKHI VOWEL SIGN OO
2636 au GURMUKHI VOWEL SIGN AU
2649 Kh'a GURMUKHI LETTER KHHA
2650 g'a GURMUKHI LETTER GHHA
2651 j'a GURMUKHI LETTER ZA
2652 Ra GURMUKHI LETTER RRA
2654 ph'a GURMUKHI LETTER FA
0 2662 0 GURMUKHI DIGIT ZERO
1 2663 1 GURMUKHI DIGIT ONE
2 2664 2 GURMUKHI DIGIT TWO
3 2665 3 GURMUKHI DIGIT THREE
4 2666 4 GURMUKHI DIGIT FOUR
5 2667 5 GURMUKHI DIGIT FIVE
6 2668 6 GURMUKHI DIGIT SIX
7 2669 7 GURMUKHI DIGIT SEVEN
8 2670 8 GURMUKHI DIGIT EIGHT
9 2671 9 GURMUKHI DIGIT NINE
2672 m' GURMUKHI TIPPI
3201 m' KANNADA SIGN CANDRABINDU
3202 M KANNADA SIGN ANUSVARA
3203 H KANNADA SIGN VISARGA
3205 a KANNADA LETTER A
3206 A KANNADA LETTER AA
3207 i KANNADA LETTER I
3208 I KANNADA LETTER II
3209 u KANNADA LETTER U
3210 U KANNADA LETTER UU
3211 x KANNADA LETTER VOCALIC R
3212 q KANNADA LETTER VOCALIC L
3214 e KANNADA LETTER E
3215 E KANNADA LETTER EE
3216 ai KANNADA LETTER AI
3218 o KANNADA LETTER O
3219 O KANNADA LETTER OO
3220 au KANNADA LETTER AU
3221 ka KANNADA LETTER KA
3222 kha KANNADA LETTER KHA
3223 ga KANNADA LETTER GA
3224 gha KANNADA LETTER GHA
3225 ng'a KANNADA LETTER NGA
3226 ca KANNADA LETTER CA
3227 cha KANNADA LETTER CHA
3228 ja KANNADA LETTER JA
3229 jha KANNADA LETTER JHA
3230 nj'a KANNADA LETTER NYA
3231 Ta KANNADA LETTER TTA
3232 Tha KANNADA LETTER TTHA
3233 Da KANNADA LETTER DDA
3234 Dha KANNADA LETTER DDHA
3235 Na KANNADA LETTER NNA
3236 ta KANNADA LETTER TA
3237 tha KANNADA LETTER THA
3238 da KANNADA LETTER DA
3239 dha KANNADA LETTER DHA
3240 na KANNADA LETTER NA
3242 pa KANNADA LETTER PA
3243 pha KANNADA LETTER PHA
3244 ba KANNADA LETTER BA
3245 bha KANNADA LETTER BHA
3246 ma KANNADA LETTER MA
3247 ya KANNADA LETTER YA
3248 ra KANNADA LETTER RA
3249 Ra KANNADA LETTER RRA
3250 la KANNADA LETTER LA
3251 La KANNADA LETTER LLA
3253 va KANNADA LETTER VA
3254 sha KANNADA LETTER SHA
3255 Sa KANNADA LETTER SSA
3256 sa KANNADA LETTER SA
3257 ha KANNADA LETTER HA
3260 ' KANNADA SIGN NUKTA
3262 A KANNADA VOWEL SIGN AA
ಿ 3263 i KANNADA VOWEL SIGN I
3264 I KANNADA VOWEL SIGN II
3265 u KANNADA VOWEL SIGN U
3266 U KANNADA VOWEL SIGN UU
3267 x KANNADA VOWEL SIGN VOCALIC R
3268 X KANNADA VOWEL SIGN VOCALIC RR
3270 e KANNADA VOWEL SIGN E
3271 E KANNADA VOWEL SIGN EE
3272 ai KANNADA VOWEL SIGN AI
3274 o KANNADA VOWEL SIGN O
3275 O KANNADA VOWEL SIGN OO
3276 au KANNADA VOWEL SIGN AU
3294 Za KANNADA LETTER FA
3296 X KANNADA LETTER VOCALIC RR
3297 Q KANNADA LETTER VOCALIC LL
3298 q KANNADA VOWEL SIGN VOCALIC L
3299 Q KANNADA VOWEL SIGN VOCALIC LL
0 3302 0 KANNADA DIGIT ZERO
1 3303 1 KANNADA DIGIT ONE
2 3304 2 KANNADA DIGIT TWO
3 3305 3 KANNADA DIGIT THREE
4 3306 4 KANNADA DIGIT FOUR
5 3307 5 KANNADA DIGIT FIVE
6 3308 6 KANNADA DIGIT SIX
7 3309 7 KANNADA DIGIT SEVEN
8 3310 8 KANNADA DIGIT EIGHT
9 3311 9 KANNADA DIGIT NINE
3330 M MALAYALAM SIGN ANUSVARA
3331 H MALAYALAM SIGN VISARGA
3333 a MALAYALAM LETTER A
3334 A MALAYALAM LETTER AA
3335 i MALAYALAM LETTER I
3336 I MALAYALAM LETTER II
3337 u MALAYALAM LETTER U
3338 U MALAYALAM LETTER UU
3339 x MALAYALAM LETTER VOCALIC R
3340 q MALAYALAM LETTER VOCALIC L
3342 e MALAYALAM LETTER E
3343 E MALAYALAM LETTER EE
3344 ai MALAYALAM LETTER AI
3346 o MALAYALAM LETTER O
3347 O MALAYALAM LETTER OO
3348 au MALAYALAM LETTER AU
3349 ka MALAYALAM LETTER KA
3350 kha MALAYALAM LETTER KHA
3351 ga MALAYALAM LETTER GA
3352 gha MALAYALAM LETTER GHA
3353 ng'a MALAYALAM LETTER NGA
3354 ca MALAYALAM LETTER CA
3355 cha MALAYALAM LETTER CHA
3356 ja MALAYALAM LETTER JA
3357 jha MALAYALAM LETTER JHA
3358 nj'a MALAYALAM LETTER NYA
3359 Ta MALAYALAM LETTER TTA
3360 Tha MALAYALAM LETTER TTHA
3361 Da MALAYALAM LETTER DDA
3362 Dha MALAYALAM LETTER DDHA
3363 Na MALAYALAM LETTER NNA
3364 ta MALAYALAM LETTER TA
3365 tha MALAYALAM LETTER THA
3366 da MALAYALAM LETTER DA
3367 dha MALAYALAM LETTER DHA
3368 na MALAYALAM LETTER NA
3370 pa MALAYALAM LETTER PA
3371 pha MALAYALAM LETTER PHA
3372 ba MALAYALAM LETTER BA
3373 bha MALAYALAM LETTER BHA
3374 ma MALAYALAM LETTER MA
3375 ya MALAYALAM LETTER YA
3376 ra MALAYALAM LETTER RA
3377 Ra MALAYALAM LETTER RRA
3378 la MALAYALAM LETTER LA
3379 La MALAYALAM LETTER LLA
3380 Za MALAYALAM LETTER LLLA
3381 va MALAYALAM LETTER VA
3382 sha MALAYALAM LETTER SHA
3383 Sa MALAYALAM LETTER SSA
3384 sa MALAYALAM LETTER SA
3385 ha MALAYALAM LETTER HA
3390 A MALAYALAM VOWEL SIGN AA
ി 3391 i MALAYALAM VOWEL SIGN I
3392 I MALAYALAM VOWEL SIGN II
3393 u MALAYALAM VOWEL SIGN U
3394 U MALAYALAM VOWEL SIGN UU
3395 x MALAYALAM VOWEL SIGN VOCALIC R
3396 X MALAYALAM VOWEL SIGN VOCALIC RR
3398 e MALAYALAM VOWEL SIGN E
3399 E MALAYALAM VOWEL SIGN EE
3400 ai MALAYALAM VOWEL SIGN AI
3402 o MALAYALAM VOWEL SIGN O
3403 O MALAYALAM VOWEL SIGN OO
3415 au MALAYALAM AU LENGTH MARK
3424 X MALAYALAM LETTER VOCALIC RR
3425 Q MALAYALAM LETTER VOCALIC LL
3426 q MALAYALAM VOWEL SIGN VOCALIC L
3427 Q MALAYALAM VOWEL SIGN VOCALIC LL
0 3430 0 MALAYALAM DIGIT ZERO
1 3431 1 MALAYALAM DIGIT ONE
2 3432 2 MALAYALAM DIGIT TWO
3 3433 3 MALAYALAM DIGIT THREE
4 3434 4 MALAYALAM DIGIT FOUR
5 3435 5 MALAYALAM DIGIT FIVE
6 3436 6 MALAYALAM DIGIT SIX
7 3437 7 MALAYALAM DIGIT SEVEN
8 3438 8 MALAYALAM DIGIT EIGHT
9 3439 9 MALAYALAM DIGIT NINE
3450 N' MALAYALAM LETTER CHILLU NN
3451 n' MALAYALAM LETTER CHILLU N
3452 Ra' MALAYALAM LETTER CHILLU RR
3453 la' MALAYALAM LETTER CHILLU L
3454 La' MALAYALAM LETTER CHILLU LL
ൿ 3455 k' MALAYALAM LETTER CHILLU K
2817 m' ODIA SIGN CANDRABINDU
2818 M ODIA SIGN ANUSVARA
2819 H ODIA SIGN VISARGA
2821 a ODIA LETTER A
2822 A ODIA LETTER AA
2823 i ODIA LETTER I
2824 I ODIA LETTER II
2825 u ODIA LETTER U
2826 U ODIA LETTER UU
2827 x ODIA LETTER VOCALIC R
2828 q ODIA LETTER VOCALIC L
2831 E ODIA LETTER E
2832 ai ODIA LETTER AI
2835 O ODIA LETTER O
2836 au ODIA LETTER AU
2837 ka ODIA LETTER KA
2838 kha ODIA LETTER KHA
2839 ga ODIA LETTER GA
2840 gha ODIA LETTER GHA
2841 ng'a ODIA LETTER NGA
2842 ca ODIA LETTER CA
2843 cha ODIA LETTER CHA
2844 ja ODIA LETTER JA
2845 jha ODIA LETTER JHA
2846 nj'a ODIA LETTER NYA
2847 Ta ODIA LETTER TTA
2848 Tha ODIA LETTER TTHA
2849 Da ODIA LETTER DDA
2850 Dha ODIA LETTER DDHA
2851 Na ODIA LETTER NNA
2852 ta ODIA LETTER TA
2853 tha ODIA LETTER THA
2854 da ODIA LETTER DA
2855 dha ODIA LETTER DHA
2856 na ODIA LETTER NA
2858 pa ODIA LETTER PA
2859 pha ODIA LETTER PHA
2860 ba ODIA LETTER BA
2861 bha ODIA LETTER BHA
2862 ma ODIA LETTER MA
2863 ya ODIA LETTER YA
2864 ra ODIA LETTER RA
2866 la ODIA LETTER LA
2867 La ODIA LETTER LLA
2869 ODIA LETTER VA
2870 sha ODIA LETTER SHA
2871 Sa ODIA LETTER SSA
2872 sa ODIA LETTER SA
2873 ha ODIA LETTER HA
2876 ' ODIA SIGN NUKTA
2878 A ODIA VOWEL SIGN AA
ି 2879 i ODIA VOWEL SIGN I
2880 I ODIA VOWEL SIGN II
2881 u ODIA VOWEL SIGN U
2882 U ODIA VOWEL SIGN UU
2883 x ODIA VOWEL SIGN VOCALIC R
2884 X ODIA VOWEL SIGN VOCALIC RR
2887 E ODIA VOWEL SIGN E
2888 ai ODIA VOWEL SIGN AI
2891 O ODIA VOWEL SIGN O
2892 au ODIA VOWEL SIGN AU
2911 Ya ODIA LETTER YYA
2912 X ODIA LETTER VOCALIC RR
2913 Q ODIA LETTER VOCALIC LL
0 2918 0 ODIA DIGIT ZERO
1 2919 1 ODIA DIGIT ONE
2 2920 2 ODIA DIGIT TWO
3 2921 3 ODIA DIGIT THREE
4 2922 4 ODIA DIGIT FOUR
5 2923 5 ODIA DIGIT FIVE
6 2924 6 ODIA DIGIT SIX
7 2925 7 ODIA DIGIT SEVEN
8 2926 8 ODIA DIGIT EIGHT
9 2927 9 ODIA DIGIT NINE
2929 wa ODIA LETTER WA
2946 M TAMIL SIGN ANUSVARA
2947 H TAMIL SIGN VISARGA
2949 a TAMIL LETTER A
2950 A TAMIL LETTER AA
2951 i TAMIL LETTER I
2952 I TAMIL LETTER II
2953 u TAMIL LETTER U
2954 U TAMIL LETTER UU
2958 e TAMIL LETTER E
2959 E TAMIL LETTER EE
2960 ai TAMIL LETTER AI
2962 o TAMIL LETTER O
2963 O TAMIL LETTER OO
2964 au TAMIL LETTER AU
2965 ka TAMIL LETTER KA
2969 ng'a TAMIL LETTER NGA
2970 ca TAMIL LETTER CA
2972 ja TAMIL LETTER JA
2974 nj'a TAMIL LETTER NYA
2975 Ta TAMIL LETTER TTA
2979 Na TAMIL LETTER NNA
2980 ta TAMIL LETTER TA
2984 na TAMIL LETTER NA
2985 n'a TAMIL LETTER NNNA
2986 pa TAMIL LETTER PA
2990 ma TAMIL LETTER MA
2991 ya TAMIL LETTER YA
2992 ra TAMIL LETTER RA
2993 Ra TAMIL LETTER RRA
2994 la TAMIL LETTER LA
2995 La TAMIL LETTER LLA
2996 Za TAMIL LETTER LLLA
2997 va TAMIL LETTER VA
2998 TAMIL LETTER SHA
2999 sha TAMIL LETTER SSA
3000 sa TAMIL LETTER SA
3001 ha TAMIL LETTER HA
3006 A TAMIL VOWEL SIGN AA
ி 3007 i TAMIL VOWEL SIGN I
3008 I TAMIL VOWEL SIGN II
3009 u TAMIL VOWEL SIGN U
3010 U TAMIL VOWEL SIGN UU
3014 e TAMIL VOWEL SIGN E
3015 E TAMIL VOWEL SIGN EE
3016 ai TAMIL VOWEL SIGN AI
3018 o TAMIL VOWEL SIGN O
3019 O TAMIL VOWEL SIGN OO
3020 au TAMIL VOWEL SIGN AU
3031 La TAMIL AU LENGTH MARK
0 3046 0 TAMIL DIGIT ZERO
1 3047 1 TAMIL DIGIT ONE
2 3048 2 TAMIL DIGIT TWO
3 3049 3 TAMIL DIGIT THREE
4 3050 4 TAMIL DIGIT FOUR
5 3051 5 TAMIL DIGIT FIVE
6 3052 6 TAMIL DIGIT SIX
7 3053 7 TAMIL DIGIT SEVEN
8 3054 8 TAMIL DIGIT EIGHT
9 3055 9 TAMIL DIGIT NINE
3073 m' TELUGU SIGN CANDRABINDU
3074 M TELUGU SIGN ANUSVARA
3075 H TELUGU SIGN VISARGA
3077 a TELUGU LETTER A
3078 A TELUGU LETTER AA
3079 i TELUGU LETTER I
3080 I TELUGU LETTER II
3081 u TELUGU LETTER U
3082 U TELUGU LETTER UU
3083 x TELUGU LETTER VOCALIC R
3084 q TELUGU LETTER VOCALIC L
3086 e TELUGU LETTER E
3087 E TELUGU LETTER EE
3088 ai TELUGU LETTER AI
3090 o TELUGU LETTER O
3091 O TELUGU LETTER OO
3092 au TELUGU LETTER AU
3093 ka TELUGU LETTER KA
3094 kha TELUGU LETTER KHA
3095 ga TELUGU LETTER GA
3096 gha TELUGU LETTER GHA
3097 ng'a TELUGU LETTER NGA
3098 ca TELUGU LETTER CA
3099 cha TELUGU LETTER CHA
3100 ja TELUGU LETTER JA
3101 jha TELUGU LETTER JHA
3102 nj'a TELUGU LETTER NYA
3103 Ta TELUGU LETTER TTA
3104 Tha TELUGU LETTER TTHA
3105 Da TELUGU LETTER DDA
3106 Dha TELUGU LETTER DDHA
3107 Na TELUGU LETTER NNA
3108 ta TELUGU LETTER TA
3109 tha TELUGU LETTER THA
3110 da TELUGU LETTER DA
3111 dha TELUGU LETTER DHA
3112 na TELUGU LETTER NA
3114 pa TELUGU LETTER PA
3115 pha TELUGU LETTER PHA
3116 ba TELUGU LETTER BA
3117 bha TELUGU LETTER BHA
3118 ma TELUGU LETTER MA
3119 ya TELUGU LETTER YA
3120 ra TELUGU LETTER RA
3121 Ra TELUGU LETTER RRA
3122 la TELUGU LETTER LA
3123 La TELUGU LETTER LLA
3124 Za TELUGU LETTER LLLA
3125 va TELUGU LETTER VA
3126 sha TELUGU LETTER SHA
3127 Sa TELUGU LETTER SSA
3128 sa TELUGU LETTER SA
3129 ha TELUGU LETTER HA
3134 A TELUGU VOWEL SIGN AA
ి 3135 i TELUGU VOWEL SIGN I
3136 I TELUGU VOWEL SIGN II
3137 u TELUGU VOWEL SIGN U
3138 U TELUGU VOWEL SIGN UU
3139 x TELUGU VOWEL SIGN VOCALIC R
3140 X TELUGU VOWEL SIGN VOCALIC RR
3142 e TELUGU VOWEL SIGN E
3143 E TELUGU VOWEL SIGN EE
3144 ai TELUGU VOWEL SIGN AI
3146 o TELUGU VOWEL SIGN O
3147 O TELUGU VOWEL SIGN OO
3148 u TELUGU VOWEL SIGN AU
3168 X TELUGU LETTER VOCALIC RR
3169 Q TELUGU LETTER VOCALIC LL
3170 q TELUGU VOWEL SIGN VOCALIC L
3171 Q TELUGU VOWEL SIGN VOCALIC LL
0 3174 0 TELUGU DIGIT ZERO
1 3175 1 TELUGU DIGIT ONE
2 3176 2 TELUGU DIGIT TWO
3 3177 3 TELUGU DIGIT THREE
4 3178 4 TELUGU DIGIT FOUR
5 3179 5 TELUGU DIGIT FIVE
6 3180 6 TELUGU DIGIT SIX
7 3181 7 TELUGU DIGIT SEVEN
8 3182 8 TELUGU DIGIT EIGHT
9 3183 9 TELUGU DIGIT NINE