Standards > Text > XML Marking
XML Marking

LDC-IL XML Markup Standard Standards

<?xml version="1.0" ?> Version of XML
<?xml-stylesheet type="text/css" href="home.css"?> Style sheet type used and name of style sheet
<Doc id="hin-w-media-HN0001" lang="hindi"> Document ID and name of the language used
<Header type="text"> Starting of header tag and type of  header, i.e., text
<encodingDesc> Encoding description
<projectDesc>  CIIL Contemporary Language Corpora, Monolingual/ Parallel Text Corpora  </projectDesc> Project Description
<samplingDesc> Full/Balanced Simple written text only has been transcribed. Diagrams, pictures and tables have been omitted. Samples taken from page(s) 8 Sample description and page numbers in which we have taken
</samplingDesc> End of Sampling description
</encodingDesc> End of Encoding description
<sourceDesc> Starting of Source Description
<biblStruct> Starting of bibliography structure
<source> Source description starts
<category>  </category> Category from which source text has been selected
<subcategory>  </subcategory> Subcategory of text
<text> Newspaper </text> Source of text
<title> Dainik Jagaran </title> Title of text
<vol> 26 </vol> Volume number
<issue> 58</issue> Issue number
</source> End of source description
<textDes> Starting of text description
<type> Article </type> Type of text (News/Editorial/Article)
<headline> Antarik Suraksha Par Vyarth Ka Shor </headline> Headline
<author>  Rajeev Shukla </author> Author name
<Translator>  </Translator> Translator name
<words> 1189 </words> No of words in text
</textDes> End of text description
<imprint> Printing description
<pubPlace> India-Lucknow </pubPlace> Publication place
<publisher> Jagaran Prakashan Pvt Limited </publisher> Publisher name
<pubDate> 2004 </pubDate> Publication date (Year)
</imprint> End of printing description
<idno type="CIIL code">  </idno> CIIL Accession code (if text taken from CIIL library)
<index> HN0001 </index> Index number
</biblStruct> End of bibliography structure
</sourceDesc> End of source description
<profileDesc> Profile description starts here
<creation> Creation description
<date> 7-Jun-2005 </date> Date of data input
<inputter> Vijayalakshmi </inputter> Data inputter name
<proof> Swasti Mishra </proof> Proof reader name
</creation> End of Creation description
<langUsage> Hindi </langUsage> Name of Language used
<wsdUsage> Writing System Details
<writingSystem id="ISO/IEC 10646">Universal Multiple-Octet Coded Character Set (UCS). </writingSystem> Writing System ID and name of writing system. we have used UNICODE character Set
</wsdUsage> End of writing system details
<textClass> Start of text class which contains references to the text
<channel mode="w"> Print </channel> Mode of data
<domain type="public">  </domain> Domain type
</textClass> End of text class
</profileDesc> End of profile description
</Header> End of header
<text> Start of actual text
<body> Body of text starts here
<p></p> The data which should be typed between this <p> and </p>
</body> End of body of text
</text> End of text
</Doc> End of Document
Central Institute of Indian Languages
