MIT Anna University
Chromepet, Chennai-44 India.

Computational Linguistics Research Group

The Computational Linguistics Research Group (CLRG) at AU-KBC Research Centre works on the scientific study of language from a computational perspective. We develop computational models of various linguistic phenomena, with the aim of building practical natural language processing systems.

Our research interests span a broad range of topics in Computational Linguistics and Natural Language Processing. Our work has combination of traditional and contemporary linguistic knowledge based approaches with statistical and machine learning methods.

The group focuses on converting unstructured data to structured data and translation. We work at intra and inter sentential (Discourse) level. At the Discourse level we do cognitive analysis of discourse such as coherence analysis, anaphora and connective resolution etc. Language families we are interested in are Dravidian, Indo -European and Indo-Aryan.

Glimpses of our research work

Works in Tamil Computing

1. Nigazaayvi - நிகழாய்வி

A Tamil Mobile app that fetches events from Web - "It brings Events into your hand"
Extracts the latest events happening across the globe and provides the user with:

 • the event
 • the people associated with the event
 • the place in which the event happens
 • the cause & effect of the event

'Nigazhaayvi' is available in the link here

2. Machine Translation (MT) Systems (Tamil <==> Malayalam, Tamil <==> Hindi

We have developed Indian Language - Indian Language Machine Translation Systems focusing on Tamil to X Indian language and vice-versa.

 • Tamil - Malayalam MT system is available on link TA-ML MT System
 • Tamil - Hindi MT system is available on link TA-HI MT System

3. Corpus and Other Lexical Resources Released

 • We have released Tamil Part-of-Speech (POS), Named Entity (NE) annotated corpus free for research purposes, enabling researchers across the world to enhance research in Tamil and other Indian languages. Recently we have released a huge (500K word) POS annotated corpus (here).
 • Other corpora and lexical resources such as "Tamil WordNet" have also been released. The details are available in the link (here).

4. 'Searchko' - A Tamil Web Portal

Searchko is a Tamil portal, which has a Tamil search engine as the main constituent. It has news aggregation and AdTrans an automatic advertisement translation system. More about searchko at

Work on Malayalam Computing

1. Malayalam NLP Stack

Malayalam NLP Stack includes Morphological analyer, POS Tagger, Chunker and Named Entity Recogniser Malayalam Stack demo link

Work on Social Media

1. Sentiment Analyser on Twitter Data

A web API which gives the sentiment polarity (Positive/Negative/Neutral) for tweets. Given a phrase fetches from web tweets consisting of the phrase. These fetched tweets are analysed for polarity
This Web API is available in the link <TweetSentiSys>