Abstract:
In this paper, we describe the creation of a morphosyntactically annotated treebank for modern
written Tamil following the Universal Dependencies (UD) framework to support the implementation
and evaluation of Tamil dependency parsers. At present, this treebank consists of 534
sentences. This paper discusses unique constructions found in Tamil and explains sub-relations
and language-specific relations introduced, apart from outlining the methodology. This carefully
annotated treebank can also serve as the benchmark dataset to evaluate Tamil Natural Language
Processing (NLP) tools. The treebank will be extended further to cover more complex constructions
in Tamil, and annotations will be enriched by incorporating the Enhanced Universal
Dependencies scheme.