IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

Speech Perception and Automated Speech Recognition Using an Audio-Visual Corpus

Main Article Content

SHUBHAM GUPTA 1, DR. ALOK SEMWAL 2, DR. ABHILASH SINGH 3

Abstract

To make it simpler to use among that in speech perception and automatic voice recognition investigations, an audio-visual corpus has been constructed. Each of the 34 talkers spoke 1000 words in high-quality audio and video recordings that make up the corpus. Short, syntax connected phrases like "put green at B 4 immediately" are included in sentences. Audio signals used in intelligence tests reveal that the substance is clearly recognisable in still, low levels of stationary noise. The annotation corpus is accessible for study over the internet. A crucial area of science is understanding how people process information and make sense of it in difficult conditions. The researchers looked at two approaches to replicate speech perception. Traditionally, total speech intelligibility has been predicted using "macroscopic" models that account for reverberation and masking. The Steeneken and Houtgast voice propagation index from 1980, the French and Steinberg articulation index from 1947, and the ANSI S3.5 speech intelligibility index from 1997 are among the models in this collection. A relatively recent notion is to create as such "microscopic" models of speech perception using automated speech recognition (ASR) technology. These models vary from macroscopic models in that they may anticipate listeners' reactions to particular tokens.

Article Details