Wednesday, March 12, 2025

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Nexdata are thrilled to announce that our MLC-SLM Workshop proposal has been officially approved! This means MLC-SLM Workshop is now an Interspeech 2025 Satellite Events! This workshop aims to bring together researchers, developers, and industry professionals to explore the latest advancements in multilingual conversational AI. As conversational speech models are essential to bridging communication gaps across languages and cultures, this event will provide a unique opportunity to delve into innovative solutions and the future of AI-driven dialogue systems.

Whether people are a researcher, developer, or enthusiast, Nexdata invite everybody to actively participate in this collaborative workshop and share erery one insights, contributing to the development of cutting-edge multilingual models that will shape the future of global conversations. Join Nexdadt in this pivotal event to network, learn, and push the boundaries of speech technology!

Workshop Motivation

Large Language Models (LLMs) have demonstrated remarkable capabilities in a wide range of downstream tasks, serving as powerful foundation models for language understanding and generation. Furthermore, there has been significant attention on utilizing LLMs in speech and audio processing tasks such as Automatic Speech Recognition (ASR), Audio Captioning, and emerging areas like Spoken Dialogue Models.

However, real-world conversational speech data is critical for the development of robust LLM-based Spoken Dialogue Models, as it encapsulates the complexity of human communication, including natural pauses, interruptions, speaker overlaps, and diverse conversational styles. The limited availability of such data, especially in multilingual settings, poses a significant challenge to advancing the field.

The importance of real-world conversational speech extends beyond technological advancement—it is essential for building AI systems that can understand and respond naturally in multilingual, dynamic, and context-rich environments. This is especially crucial for next-generation human-AI interaction systems, where spoken dialogue serves as a primary mode of communication.

Thus, this workshop aims to bridge the gap by hosting the challenge of building multilingual conversational speech language models together with the release of a real-world multilingual conversational speech dataset.

Task Setting

The event consists of two tasks, both of which require participants to explore the development of speech language model:

Task 1: Multilingual Conversational Speech Recognition

Participants will be provided with oracle segmentation for each conversation.

Objective: Develop a multilingual LLM based ASR model

This task focuses on optimizing transcription accuracy in a multilingual setting.

Task 2: Multilingual Conversational Speech Diarization and Recognition

No prior or oracle information will be provided during evaluation (e.g., no pre-segmented utterances or speaker labels).

Objective: Develop a system for both speaker diarization (identifying who is speaking when), and recognition (transcribing speech to text).

Both pipeline-based and end-to-end systems are encouraged, providing flexibility in system design and implementation.

Other Topics

Participants are encouraged to submit research papers and system descriptions that showcase innovative findings, practical case studies, and forward-looking ideas. Topics of interest include, but are not limited to:

• Novel architectures and algorithms for training speech language models.

• Novel pipelines for processing raw audio data, which are useful for collecting diverse internet data for training speech language models.

• Algorithms designed to generate more natural and emotionally rich conversational speech for dialogue systems.

• Approaches to leverage multi-turn conversational history to improve recognition and diarization results.

• Innovative evaluation techniques or benchmarks for speech language models.

• New datasets (real and synthetic) for training speech and audio language models.

Important (dataset for ai) Dates

February 20, 2025: Registration opens

March 10, 2025: Training data release

March 17, 2025: Development set and baseline system release

May 15, 2025: Evaluation set release and leaderboard open

June 1, 2025: Leaderboard freeze and submission portal opens (CMT system)

June 20, 2025: Submission deadline

July 10, 2025: Notification of acceptance

August 22, 2025: Workshop date

Organizers

Lei Xie, Professor, Northwestern Polytechnical University (China)

Shinji Watanabe, Associate Professor, Carnegie Mellon University (USA)

Eng Siong Chng, Associate Professor, Nanyang Technological University (Singapore)

Junlan Feng, IEEE Fellow & Chief Scientist, China Mobile (China)

Khalid Choukri, Secretary General, European Language Resources Association (France)

Qiangze Feng, Co-founder & Data Scientist, Nexdata (USA)

Daliang Wang, Data Scientist, Nexdata (USA)

Pengcheng Guo, PhD Student, Northwestern Polytechnical University (China)

Bingshen Mu, PhD Student, Northwestern Polytechnical University (China)

More about: 3d point cloud data service

Media Contact
Company Name: Nexdata
Email:Send Email
Address:28 Birchgove Cr
City: Eastwood
State: NSW 2122
Country: United States
Website: https://www.nexdata.ai/