Program-ICON 2024

Conference Overview

More information will be updated soon

CONFERENCE DATE DEC 19TH 2024 TO DEC 22ND 2024

Venue: AU-KBC Research Centre, MIT Campus of Anna University, Chrompet, Chennai

Program Schedule
The Main Conference Program Schedule is here (Click Here ).

WORKSHOPS

The following workshops have been accepted for ICON 2024.

December 19, 2024

Workshop	Abstract
Open-source Tools for NLP (Half Day) Dr. Rajeev R R ICFOSS, Government of Kerala, Thiruvananthapuram Time: 10.00 AM - 1.00 PM Venue: Ground Floor Class Room, AU-KBC Research Centre, MIT Campus of AnnaUniversity.	This half-day workshop will introduce par- ticipants to key open-source tools and frameworks used in the field of Natural Language Processing (NLP). The workshop will provide participants with hands-on experience in using these tools to perform fundamental NLP tasks such as text preprocessing, sentiment analysis, machine translation, and speech-to-text processing. The focus will be on practical implementation using freely available open-source tools and libraries, emphasizing ease of use and accessibility for a wide range of NLP applications.
Big Ellipsis(Full Day) Dr. Rajesh Bhatt, University of Massachusetts, Amherst Dr. Sobha L, AU-KBC Research Centre, Chennai Dr. Anushree Mishra, EFLU, Hyderabad Time: 10.00 AM - 5.00 PM Venue: 1st Floor Conference Hall, Charles Babbage Building, Dept of Computer Technology, MIT Campus of AnnaUniversity.	We propose a full day workshop on Big Ellipsis. Initial work on ellipsis focused heavily on the phenomena of VP-ellipsis. While this has been a productive area of study, its relevance to South Asian languages has remained somewhat limited, largely due to the uncertainty regarding whether VP-ellipsis occurs in these languages (see Manetta 2011, 2019, 2021). However, it has long been recognized that ellipsis can also take place at the clausal level, a phenomenon we refer to as 'Big Ellipsis.'
Integrating Natural Language Processing and AI for Enhanced Healthcare Communication: Addressing Language Barriers in Patient Care(Full Day) Dr. Hannah Mary Thomas T, Christian Medical College Vellore Dr. Vandan Mujadia, IIIT Hyderabad. Time: 10.00 AM - 1.00 PM Venue: Ground Floor Conference Hall, Charles Babbage Building, Dept of Computer Technology, MIT Campus of AnnaUniversity. Workshop URL	In multilingual societies such as India, effective communication in healthcare is often obstructed by language barriers, especially in clinical interactions between patient and their healthcare providers, in documents provided by the hospitals for the patient’s use (patient-facing documents like consent to treatment/research participation, information sheets, discharge summaries etc). This workshop will explore the application of Natural Language Processing (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI) to create effective mediums of language translation systems that can be tailored for healthcare use cases. We will identify use cases that medical professionals and patients find challenging, explore available systems that can help translating medical information into patient- friendly language, various Indian languages, implementing multilingual speech recognition for clinical documentation, and automating content contextualization through AI. The primary goal is to unite researchers and professionals from linguistics, healthcare, and AI to collaboratively develop solutions that facilitate better communication in multilingual and fast-paced healthcare environments.

Workshop

Abstract

Open-source Tools for NLP (Half Day)

Dr. Rajeev R R
ICFOSS, Government of Kerala, Thiruvananthapuram

Time: 10.00 AM - 1.00 PM
Venue: Ground Floor Class Room,
AU-KBC Research Centre,
MIT Campus of AnnaUniversity.

This half-day workshop will introduce par- ticipants to key open-source tools and frameworks used in the field of Natural Language Processing (NLP). The workshop will provide participants with hands-on experience in using these tools to perform fundamental NLP tasks such as text preprocessing, sentiment analysis, machine translation, and speech-to-text processing. The focus will be on practical implementation using freely available open-source tools and libraries, emphasizing ease of use and accessibility for a wide range of NLP applications.

Big Ellipsis(Full Day)

Dr. Rajesh Bhatt, University of Massachusetts, Amherst
Dr. Sobha L, AU-KBC Research Centre, Chennai
Dr. Anushree Mishra, EFLU, Hyderabad

Time: 10.00 AM - 5.00 PM
Venue: 1st Floor Conference Hall,
Charles Babbage Building, Dept of Computer Technology,
MIT Campus of AnnaUniversity.

We propose a full day workshop on Big Ellipsis. Initial work on ellipsis focused heavily on the phenomena of VP-ellipsis. While this has been a productive area of study, its relevance to South Asian languages has remained somewhat limited, largely due to the uncertainty regarding whether VP-ellipsis occurs in these languages (see Manetta 2011, 2019, 2021). However, it has long been recognized that ellipsis can also take place at the clausal level, a phenomenon we refer to as 'Big Ellipsis.'

Integrating Natural Language Processing and AI for Enhanced Healthcare Communication:
Addressing Language Barriers in Patient Care(Full Day)

Dr. Hannah Mary Thomas T, Christian Medical College Vellore
Dr. Vandan Mujadia, IIIT Hyderabad.

Time: 10.00 AM - 1.00 PM
Venue: Ground Floor Conference Hall,
Charles Babbage Building, Dept of Computer Technology,
MIT Campus of AnnaUniversity.

Workshop URL

In multilingual societies such as India, effective communication in healthcare is often obstructed by language barriers, especially in clinical interactions between patient and their healthcare providers, in documents provided by the hospitals for the patient’s use (patient-facing documents like consent to treatment/research participation, information sheets, discharge summaries etc).

This workshop will explore the application of Natural Language Processing (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI) to create effective mediums of language translation systems that can be tailored for healthcare use cases. We will identify use cases that medical professionals and patients find challenging, explore available systems that can help translating medical information into patient- friendly language, various Indian languages, implementing multilingual speech recognition for clinical documentation, and automating content contextualization through AI. The primary goal is to unite researchers and professionals from linguistics, healthcare, and AI to collaboratively develop solutions that facilitate better communication in multilingual and fast-paced healthcare environments.

December 22, 2024

Workshop	Abstract
Teaching of Natural Language Processing in the Era of LLMs (Half Day) Dr. Vasudeva Varma, IIIT Hyderabad Dr. Dipti Misra Sharma, IIIT Hyderabad Dr. Pushpak Bhattacharya, IIT, Bombay Dr. Sivaji Bandyopadhyay, Jadavpur University , Kolkota Dr. Sobha Lalitha Devi, AU-KBC, Chennai Dr. Sudeshna Sarkar, IIT Kharagpur , Kolkota Dr. Asif Ekbal, IIT Patna/Jodhpur Time: 10.00 AM - 1.30 PM Venue: 1st Floor Conference Hall, Charles Babbage Building, Dept of Computer Technology, MIT Campus of AnnaUniversity.	The field of Natural Language Processing (NLP) is undergoing a rapid transformation due to the rise of Large Language Models (LLMs) like ChatGPT and the widespread adoption of the Transformer architecture. As educators and researchers, we find ourselves at a crossroads: how should we rethink the way NLP is taught to better align with these developments? Should the focus shift away from traditional methods, and if so, which concepts remain valuable? In an academic context, especially in Indian universities with a rich linguistic diversity, how do we ensure that the modern NLP curriculum reflects not just global trends but also the unique challenges of Indian languages? This workshop proposes to explore these questions and help educators think critically about what the future of NLP education should look like.
Workshop on Tamil Computing(Full Day) Dr. Vijay Sundar Ram, AU-KBC Research Centre Dr. Pattabhi RK Rao, AU-KBC Research Centre Dr. Sobha Lalitha Devi, AU-KBC Research Centre Program Schedule Time: 10.00 AM - 4.00 PM Venue: 3rd Floor Conference Hall, Charles Babbage Building, Dept of Computer Technology, MIT Campus of AnnaUniversity.	Tamil Computing 2024 (Tc’24), a workshop on Tamil Computing to be held along with ICON2024 at AU-KBC Research Centre, MIT Campus of Anna University, Chennai. This will be the first edition of the Tamil Computing Workshop at the ICON conference. Tamil Computing workshop, Tc’24, envisaged here is for students and other researchers and users of language technology to know the existing applications and tools that can be ustilized for date to date work. The workshop showcases and educate about Tamil Computing in diverse areas so that Tamil is usable on all routine and high end applications very much like English. This will enable in developing a Tamil centric Information Technology. There will be invited talk by eminent researchers in this field, demonstration of applications and tools. Also a hands-on session will be conducted on linguistic annotations of data.
NLP Tools Development for Gujarati(Full Day) Dr. C.K. Bhensdadia , Dr. Brijesh Bhatt, Dr. Jatayu Baxi Dharmsinh Desai University, Nadiad This workshop is cancelled due to unforeseen conditions	The development of NLP technologies has created advancements in language-based applications. However, low resource languages like Gujarati, despite having significant number of speakers, has scarcity of linguistic resources and NLP tools. This workshop aims to address this gap by promoting the development of linguistic resources, models, and tools specific for the Gujarati language. The primary objective of this workshop is to bring together researchers, developers, linguists, and industry stakeholders to discuss the current challenges and potential solutions for building useful NLP tools for Gujarati. We aim to encourage the submission of research papers and tool demonstrations in this area.

Workshop

Abstract

Teaching of Natural Language Processing in the Era of LLMs (Half Day)

Dr. Vasudeva Varma, IIIT Hyderabad
Dr. Dipti Misra Sharma, IIIT Hyderabad
Dr. Pushpak Bhattacharya, IIT, Bombay
Dr. Sivaji Bandyopadhyay, Jadavpur University , Kolkota
Dr. Sobha Lalitha Devi, AU-KBC, Chennai
Dr. Sudeshna Sarkar, IIT Kharagpur , Kolkota
Dr. Asif Ekbal, IIT Patna/Jodhpur

Time: 10.00 AM - 1.30 PM
Venue: 1st Floor Conference Hall,
Charles Babbage Building, Dept of Computer Technology,
MIT Campus of AnnaUniversity.

The field of Natural Language Processing (NLP) is undergoing a rapid transformation due to the rise of Large Language Models (LLMs) like ChatGPT and the widespread adoption of the Transformer architecture. As educators and researchers, we find ourselves at a crossroads: how should we rethink the way NLP is taught to better align with these developments? Should the focus shift away from traditional methods, and if so, which concepts remain valuable? In an academic context, especially in Indian universities with a rich linguistic diversity, how do we ensure that the modern NLP curriculum reflects not just global trends but also the unique challenges of Indian languages?

This workshop proposes to explore these questions and help educators think critically about what the future of NLP education should look like.

Workshop on Tamil Computing(Full Day)

Dr. Vijay Sundar Ram, AU-KBC Research Centre
Dr. Pattabhi RK Rao, AU-KBC Research Centre
Dr. Sobha Lalitha Devi, AU-KBC Research Centre
Program Schedule Time: 10.00 AM - 4.00 PM
Venue: 3rd Floor Conference Hall,
Charles Babbage Building, Dept of Computer Technology,
MIT Campus of AnnaUniversity.

Tamil Computing 2024 (Tc’24), a workshop on Tamil Computing to be held along with ICON2024 at AU-KBC Research Centre, MIT Campus of Anna University, Chennai. This will be the first edition of the Tamil Computing Workshop at the ICON conference.
Tamil Computing workshop, Tc’24, envisaged here is for students and other researchers and users of language technology to know the existing applications and tools that can be ustilized for date to date work. The workshop showcases and educate about Tamil Computing in diverse areas so that Tamil is usable on all routine and high end applications very much like English. This will enable in developing a Tamil centric Information Technology.
There will be invited talk by eminent researchers in this field, demonstration of applications and tools. Also a hands-on session will be conducted on linguistic annotations of data.

NLP Tools Development for Gujarati(Full Day)

Dr. C.K. Bhensdadia , Dr. Brijesh Bhatt, Dr. Jatayu Baxi
Dharmsinh Desai University, Nadiad

This workshop is cancelled due to unforeseen conditions

The development of NLP technologies has created advancements in language-based applications. However, low resource languages like Gujarati, despite having significant number of speakers, has scarcity of linguistic resources and NLP tools. This workshop aims to address this gap by promoting the development of linguistic resources, models, and tools specific for the Gujarati language. The primary objective of this workshop is to bring together researchers, developers, linguists, and industry stakeholders to discuss the current challenges and potential solutions for building useful NLP tools for Gujarati. We aim to encourage the submission of research papers and tool demonstrations in this area.

TUTORIALS

The following Tutorials have been accepted for ICON 2024.

December 19, 2024

Tutorial	Abstract
Automating Talent Acquisition - The process of Resume Parsing and Screening. (9.30 AM - 5.30 PM) Dr. Keyur Joshi and Dr. Vrunda Gadesha Ahmedabad University This workshop is cancelled due to unforeseen conditions	This tutorial explores the evolving process of resume screening in large corporations, focusing on the shift from traditional methods to Industry 4.0 standards using advanced Natural Language Processing (NLP) techniques. Attendees will gain a deep understanding of how automated resume scanning works in large companies and the importance of key entities in resumes, including fact-based (e.g., education, job title) and competency-based entities (e.g., skills, leadership abilities). Through hands-on experiments, participants will experience different approaches to resume parsing, from conventional methods to those optimized for modern recruitment technologies. The tutorial will also provide practical guidance on how to make resumes compatible with AI-driven screening systems, ensuring they meet Industry 4.0 standards. By the end of the session, attendees will be equipped with actionable insights and skills to navigate and excel in the technology-driven landscape of talent acquisition.
Disfluency Identification and Annotation in Indian Context for NLP Development. (9.30 AM - 1.00 PM) Dr.Vandan Mujadia, Dr.Chayan Kochar, Dr. Nikhilesh Bhatnagar, Dr. Parameswari Krishnamurthy and Dr.Pruthwik Mishra* IIIT-Hyderabad and SVNIT, Surat* Time: 9.30 AM - 1.00 PM Venue: 2nd Floor Conference Hall, Charles Babbage Building,Dept of Computer Technology, MIT Campus of AnnaUniversity.	Disfluency identification is a fundamental natural language processing (NLP) task. It improves the accuracy and fluency of spoken language processing applications such as automatic speech recognition (ASR), machine translation, dialog systems, and language understanding. Disfluencies are categorized into interruptions, hesitations, or corrections in spoken language impacting the overall performance and usability of such applications. In this tutorial, we are going to talk about our work on the development of disfluency (Mu- jadia et al., 2024)annotated corpus in Indian English for the technical lecture domain. We will detail the annotation procedure and the guidelines. We will also present a technique for data augmentation that involves contextual embeddings and part-of-speech patterns. This synthetic data positively impacts the performance of the disfluency identification models. We will also shed light on the extension of this work (Kochar et al., 2024) for 6 Indian languages: Hindi, Bengali, Marathi, Telugu, Kannada and Tamil, highlighting the linguistic challenges and adaptations required to handle disfluencies in these diverse languages.

Tutorial

Abstract

Automating Talent Acquisition - The process of Resume Parsing and Screening.
(9.30 AM - 5.30 PM)

Dr. Keyur Joshi and Dr. Vrunda Gadesha

Ahmedabad University

This workshop is cancelled due to unforeseen conditions

This tutorial explores the evolving process of resume screening in large corporations, focusing on the shift from traditional methods to Industry 4.0 standards using advanced Natural Language Processing (NLP) techniques. Attendees will gain a deep understanding of how automated resume scanning works in large companies and the importance of key entities in resumes, including fact-based (e.g., education, job title) and competency-based entities (e.g., skills, leadership abilities). Through hands-on experiments, participants will experience different approaches to resume parsing, from conventional methods to those optimized for modern recruitment technologies. The tutorial will also provide practical guidance on how to make resumes compatible with AI-driven screening systems, ensuring they meet Industry 4.0 standards. By the end of the session, attendees will be equipped with actionable insights and skills to navigate and excel in the technology-driven landscape of talent acquisition.

Disfluency Identification and Annotation in Indian Context for NLP Development.
(9.30 AM - 1.00 PM)

Dr.Vandan Mujadia, Dr.Chayan Kochar, Dr. Nikhilesh Bhatnagar, Dr. Parameswari Krishnamurthy and Dr.Pruthwik Mishra*

IIIT-Hyderabad and SVNIT, Surat*

Time: 9.30 AM - 1.00 PM
Venue: 2nd Floor Conference Hall,
Charles Babbage Building,Dept of Computer Technology,
MIT Campus of AnnaUniversity.

Disfluency identification is a fundamental natural language processing (NLP) task. It improves the accuracy and fluency of spoken language processing applications such as automatic speech recognition (ASR), machine translation, dialog systems, and language understanding. Disfluencies are categorized into interruptions, hesitations, or corrections in spoken language impacting the overall performance and usability of such applications.
In this tutorial, we are going to talk about our work on the development of disfluency (Mu- jadia et al., 2024)annotated corpus in Indian English for the technical lecture domain. We will detail the annotation procedure and the guidelines. We will also present a technique for data augmentation that involves contextual embeddings and part-of-speech patterns. This synthetic data positively impacts the performance of the disfluency identification models. We will also shed light on the extension of this work (Kochar et al., 2024) for 6 Indian languages: Hindi, Bengali, Marathi, Telugu, Kannada and Tamil, highlighting the linguistic challenges and adaptations required to handle disfluencies in these diverse languages.

December 22, 2024

Tutorial	Abstract
Text Augmentation for Indian Languages. (2.00 PM - 5.00 PM) Asha Hegdea, H L Shashirekhab Mangalore University, Karnataka Time: 2.00 PM - 5.00 PM Venue: Ground Floor Class Room, AU-KBC Research Centre, MIT Campus of AnnaUniversity.	Text Augmentation (TA) is a technique in Natural Language Processing (NLP) that involves artificially increasing the amount of text data. It also helps to enhance model performance by creating diverse variations of the existing data. TA is crucial for low-resource languages to address data scarcity issues. While extensive work has been done on data augmentation for high-resource language like English, there has been significantly less focus on low-resource languages, more specifically Indian languages, despite the need to overcome their data limitations. To address this issue, we intend to organize a tutorial on "Text Augmentation for Dravidian Languages" that aims to address TA for Indian languages. The tutorial covers a talk on TA for Indian languages and hands-on (demo codes).
Transformative Impact of Generative AI on Healthcare: Industry Case Studies. (10.00 AM - 1.00 PM) Dr. Manjira Saha Tata Consultancy Services Time: 10.00 AM - 1.00 PM Venue: Ground Floor Class Room, AU-KBC Research Centre, MIT Campus of AnnaUniversity.	This tutorial will explore the various ways Generative AI and LLMs are transforming healthcare, offering practical examples from the industry use cases and discussing the ethical, regulatory, and technical considerations. Attendees will leave with a clearer understanding of how AI is not only enhancing current medical practices but also paving the way for future innovations.
Harnessing the Power of Large Language Models for Multilingual and Code-mixed NLP task. (9.30 AM - 5.30 PM) Karthika Vijayan and Arindam Chatterjee Sahaj Software, Bangalore and Pune, India This workshop is cancelled due to unforeseen conditions	Multilingual communication is a natural and widespread phenomenon in linguistically di- verse societies. Additionally, code-mixing, the blending of two or more languages in informal communication, is common in these settings. To address this linguistic complexity, Natural Language Processing (NLP) systems must be designed to handle both multilingual data and the code-mixed nature of real-world communication. In this tutorial, we will explore the design and development of robust multilingual NLP systems with a focus on leveraging pre-trained Large Language Models (LLMs) as their foundation. We will delve into techniques such as fine-tuning, knowledge distillation, and using pre-trained embeddings from LLMs to train classifiers and other downstream systems. Moreover, we will address key challenges, including the handling of low-resource languages and the generalization of NLP systems across multiple tasks. Through this tutorial, participants will gain insights into the limitations of out-of-the-box solutions and the critical importance of customizing models to effectively solve specific multilingual and code- mixed NLP tasks.
Diffusion Probabilistic Models for Natural Language Processing. (9.30 AM - 1.00 PM) Tejomay Kishor Padole, Suyash Awate, Prof. Pushpak Bhattacharyya and Amar Prakash Azad* Indian Institute of Technology, Bombay and Fujitsu Research, Bangalore Time:* 9.30 AM - 1.00 PM Venue: Ground Floor Conference Hall, Charles Babbage Building, Dept of Computer Technology, MIT Campus of AnnaUniversity.	In the current era of Natural Language Processing, Large Language Models (LLMs) have risen to be the state-of-the-art generative models. But due to their autoregressive nature (i.e. sequential next word generation) they suffer from sampling drifts, tending to accumulate errors during their sequential generation process. Diffusion Probabilistic Models (DPMs) are a new class of generative models that generate data non autoregressively with iterative denoising which allows us to exhibit more control over the generation. This tutorial aims to present the foundations of DPMs as well as the current state-of-the-art techniques based on DPMs for generating text. The attendees will gain insights on how the DPM framework works and understand the key differences be- tween DPMs and autoregression. Along with the modeling techniques, we also aim to high- light existing challenges with applying DPMs to text and potential research directions in the area.

SHARED TASKS/TOOL/DEMOS

The following Shared Task have been accepted for ICON 2024.

More information will be updated soon

Shared Task	Format
Decoding Fake Narratives in Spreading Hateful Stories (Faux-Hate) The shared task session is scheduled on the 21st Dec 2024 between 2.30 PM - 3.30 PM. Please see Program Schedule for more details. Submission Link \| Shared Task URL	In-person