Introduction
Named Entity Recognition(NER) Refers to automatic identification of named entities in a given text document. Given a text document, named entities such as Person names, Organization names, Location names, Product names are identified and tagged. Identification of named entities is important in several higher language technology systems such as information extraction systems, machine translation systems, and cross-lingual information access systems. Over the past decade Indian language content on various media types such as websites, blogs, email, chats has increased significantly.Content growth is driven by people from non-metros and small cities. Need to process this huge data automatically especially companies are interested to ascertain public view on their products and processes. This requires natural language processing software systems which identify entities, identification of associations or relation between entities.Hence an automatic Named Entity recognizer is required. The objectives of this evaluation exercise are:
Challenges in Indian Language NER
|
NER Annotated Corpus
The FIRE NER 2013 evaluation exercise is over. Any researcher who is interested in obtaining the annotated corpus
for research may please contact by sending an email to sobha@au-kbc.org.
The researchers who are interested in obtaining the corpus should mention in their their full details such as description of their research work, their affiliation details, languages in which they are working etc.
English - Click Here
Hindi - Click Here
Tamil - Click Here
Malayalam - Click Here
Bengali - Click Here
The whole corpus is provided. The researchers may perform a n-fold experiment by partitioning the corpus accordingly. The corpus is protected, the participants will be provided with access code after registering by writing an email as said above.
Evaluation & Results
The evaluation metrics used were Precision, Recall and F-measure. Eight teams registered and only five teams could submit the runs, with a total of 9 submissions. The teams who submitted the runs are viz.,
- Systems Research Lab, Tata Research Development and Design Centre (TRDDC)
- Indian School of Mines , Dhanbad (ISM Dhanbad)
- Indian Statistical Institute, Kolkata (ISI Kolkata)
- CFILT Lab, Indian Institute of Technology Bombay (IIT-B)
- Malaviya National Institute of Technology (MNIT)
Language | Team SystemID | Precision | Recall | F-Measure |
---|---|---|---|---|
Bengali | ISI Kolkata Sys 1 | 23.69 | 28.02 | 25.68 |
ISI Kolkata Sys 2 | 28.61 | 16.09 | 20.59 | |
English | TRDDC Sys 1 | 64.79 | 67.23 | 65.99 |
TRDCC Sys 2 | 64.92 | 68.63 | 66.73 | |
ISM Sys 1 | 14.89 | 32.02 | 20.33 | |
ISM Sys 2 | 39.33 | 34.46 | 36.74 | |
Hindi | TRDCC | 47.51 | 68.35 | 56.06 |
IITB | 83.68 | 74.14 | 78.62 | |
MNIT | 01.72 | 04.82 | 02.53 |
Organizing Committee
Sobha Lalitha Devi
CLR Group @ AU-KBC Research Centre, Chennai, India.
Contact: sobha@au-kbc.org