Shlagha Chouhan | The Benchmarking Conference | LDC-IL

Comparative Analysis of Query-by-Example Algorithms in Speech Retrieval

Shlagha Chouhan


Authors : Shlagha Chouhan, Harsh Hemani, N. Sakthivel

Abstract

Speech data retrieval using a spoken query is an essential task for searching speech files from large audio datasets. As speech data continues to grow, efficient retrieval methods are required, similar to text-based information retrieval. Instead of text-based search terms, the user submits audio snippets as queries. However, the Spoken Term Detection (STD) task presents challenges such as achieving high accuracy with fast processing, detecting new or rarely used words, and handling low-resource languages. This work provides analysis of existing STD techniques and explores an unsupervised template-based Query-by-Example (QBE) approach for low-resource languages. The proposed method extracts template representations from queries and speech files, utilizing Dynamic Time Warping (DTW) for retrieval. This work evaluates the effectiveness of various techniques for STD. We compare the performance of Cepstrum Matching, Gaussian Mixture Model (GMM), Artifi- cial Neural Network (ANN-ASR-based), and Unsupervised Deep Neural Network (DNN-based) methods.