Sparse and Low-rank Modeling for Automatic Speech Recognition

Sparse and Low-rank Modeling for Automatic Speech Recognition
Author :
Publisher :
Total Pages : 133
Release :
ISBN-10 : OCLC:1091605389
ISBN-13 :
Rating : 4/5 (89 Downloads)

Synopsis Sparse and Low-rank Modeling for Automatic Speech Recognition by : Pranay Dighe

Mots-clés de l'auteur: automatic speech recognition ; deep neural network ; sparsity ; dictionary learning ; low-rank ; principal component analysis ; far-field speech ; information theory.

The Application of Hidden Markov Models in Speech Recognition

The Application of Hidden Markov Models in Speech Recognition
Author :
Publisher : Now Publishers Inc
Total Pages : 125
Release :
ISBN-10 : 9781601981202
ISBN-13 : 1601981201
Rating : 4/5 (02 Downloads)

Synopsis The Application of Hidden Markov Models in Speech Recognition by : Mark Gales

The Application of Hidden Markov Models in Speech Recognition presents the core architecture of a HMM-based LVCSR system and proceeds to describe the various refinements which are needed to achieve state-of-the-art performance.

Neighborhood Analysis Methods in Acoustic Modeling for Automatic Speech Recognition

Neighborhood Analysis Methods in Acoustic Modeling for Automatic Speech Recognition
Author :
Publisher :
Total Pages : 134
Release :
ISBN-10 : OCLC:711101743
ISBN-13 :
Rating : 4/5 (43 Downloads)

Synopsis Neighborhood Analysis Methods in Acoustic Modeling for Automatic Speech Recognition by : Natasha Singh-Miller

This thesis investigates the problem of using nearest-neighbor based non-parametric methods for performing multi-class class-conditional probability estimation. The methods developed are applied to the problem of acoustic modeling for speech recognition. Neighborhood components analysis (NCA) (Goldberger et al. [2005]) serves as the departure point for this study. NCA is a non-parametric method that can be seen as providing two things: (1) low-dimensional linear projections of the feature space that allow nearest-neighbor algorithms to perform well, and (2) nearest-neighbor based class-conditional probability estimates. First, NCA is used to perform dimensionality reduction on acoustic vectors, a commonly addressed problem in speech recognition. NCA is shown to perform competitively with another commonly employed dimensionality reduction technique in speech known as heteroscedastic linear discriminant analysis (HLDA) (Kumar [1997]). Second, a nearest neighbor-based model related to NCA is created to provide a class-conditional estimate that is sensitive to the possible underlying relationship between the acoustic-phonetic labels. An embedding of the labels is learned that can be used to estimate the similarity or confusability between labels. This embedding is related to the concept of error-correcting output codes (ECOC) and therefore the proposed model is referred to as NCA-ECOC. The estimates provided by this method along with nearest neighbor information is shown to provide improvements in speech recognition performance (2.5% relative reduction in word error rate). Third, a model for calculating class-conditional probability estimates is proposed that generalizes GMM, NCA, and kernel density approaches. This model, called locally-adaptive neighborhood components analysis, LA-NCA, learns different low-dimensional projections for different parts of the space. The models exploits the fact that in different parts of the space different directions may be important for discrimination between the classes. This model is computationally intensive and prone to over-fitting, so methods for sub-selecting neighbors used for providing the classconditional estimates are explored. The estimates provided by LA-NCA are shown to give significant gains in speech recognition performance (7-8% relative reduction in word error rate) as well as phonetic classification.

Robust Acoustic Modeling and Front-end Design for Distant Speech Recognition

Robust Acoustic Modeling and Front-end Design for Distant Speech Recognition
Author :
Publisher :
Total Pages :
Release :
ISBN-10 : OCLC:1029740539
ISBN-13 :
Rating : 4/5 (39 Downloads)

Synopsis Robust Acoustic Modeling and Front-end Design for Distant Speech Recognition by : Seyedmahdad Mirsamadi

In recent years, there has been a significant increase in the popularity of voice-enabled technologies which use human speech as the primary interface with machines. Recent advancements in acoustic modeling and feature design have increased the accuracy of Automatic Speech Recognition (ASR) to levels that enable voice interfaces to be used in many applications. However, much of the current performance is dependent on the use of close-talking microphones, (i.e., scenarios in which the user speaks directly into a hand-held or body-worn microphone). There is still a rather large performance gap experienced in distant-talking scenarios in which speech is recorded by far-field microphones that are placed at a distance from the speaker. In such scenarios, the distorting effects of distance (such as room reverberation and environment noise) make the recognition task significantly more challenging. In this dissertation, we propose novel approaches for designing a distant-talking ASR front-end as well as training robust acoustic models to reduce the existing gap between far-field and close-talking ASR performance. Specifically, we i) propose a novel multi-channel front-end enhancement algorithm for improved ASR in reverberant rooms using distributed non-uniform microphone arrays with random unknown locations; ii) propose a novel neural network model training approach using adversarial training to improve the robustness of multi-condition acoustic models that are trained directly on far-field data; iii) study alternate neural network adaptation strategies for far-field adaptation to the acoustic properties of specific target environments. Experimental results are provided based on far-field benchmark tasks and datasets which demonstrate the effectiveness of the proposed approaches for increasing far-field robustness in ASR. Based on experiments using reverberated TIMIT sentences, the proposed multi-channel front-end provides WER improvements of +21.5% and +37.7% in two-channel and four-channel scenarios over a single-channel scenario in which the channel with best signal quality is selected. On the acoustic modeling side and based on results of experiments on AMI corpus, the proposed multi-domain training approach provides a relative character error rate reduction of +3.3% with respect to a conventional multi-condition trained baseline, and +25.4% with respect to a clean-trained baseline.

Human Behavior Analysis: Sensing and Understanding

Human Behavior Analysis: Sensing and Understanding
Author :
Publisher : Springer Nature
Total Pages : 277
Release :
ISBN-10 : 9789811521096
ISBN-13 : 9811521093
Rating : 4/5 (96 Downloads)

Synopsis Human Behavior Analysis: Sensing and Understanding by : Zhiwen Yu

Over the last decade, there has been a growing interest in human behavior analysis, motivated by societal needs such as security, natural interfaces, affective computing, and assisted living. However, the accurate and non-invasive detection and recognition of human behavior remain major challenges and the focus of many research efforts. Traditionally, in order to identify human behavior, it is first necessary to continuously collect the readings of physical sensing devices (e.g., camera, GPS, and RFID), which can be worn on human bodies, attached to objects, or deployed in the environment. Afterwards, using recognition algorithms or classification models, the behavior types can be identified so as to facilitate advanced applications. Although such traditional approaches deliver satisfactory performance and are still widely used, most of them are intrusive and require specific sensing devices, raising issues such as privacy and deployment costs. In this book, we will present our latest findings on non-invasive sensing and understanding of human behavior. Specifically, this book differs from existing literature in the following senses. Firstly, we focus on approaches that are based on non-invasive sensing technologies, including both sensor-based and device-free variants. Secondly, while most existing studies examine individual behaviors, we will systematically elaborate on how to understand human behaviors of various granularities, including not only individual-level but also group-level and community-level behaviors. Lastly, we will discuss the most important scientific problems and open issues involved in human behavior analysis.

Distant Speech Recognition

Distant Speech Recognition
Author :
Publisher : John Wiley & Sons
Total Pages : 600
Release :
ISBN-10 : 9780470714072
ISBN-13 : 0470714077
Rating : 4/5 (72 Downloads)

Synopsis Distant Speech Recognition by : Matthias Woelfel

A complete overview of distant automatic speech recognition The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. While traditional ASR systems underperform for speech captured with far-field sensors, there are a number of novel techniques within the recognition system as well as techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. Distant Speech Recognitionpresents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant ASR problem. Key Features: Covers the entire topic of distant ASR and offers practical solutions to overcome the problems related to it Provides documentation and sample scripts to enable readers to construct state-of-the-art distant speech recognition systems Gives relevant background information in acoustics and filter techniques, Explains the extraction and enhancement of classification relevant speech features Describes maximum likelihood as well as discriminative parameter estimation, and maximum likelihood normalization techniques Discusses the use of multi-microphone configurations for speaker tracking and channel combination Presents several applications of the methods and technologies described in this book Accompanying website with open source software and tools to construct state-of-the-art distant speech recognition systems This reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students in speech technology, signal processing, acoustics, statistics and artificial intelligence fields.

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning
Author :
Publisher : MIT Press
Total Pages : 266
Release :
ISBN-10 : 9780262182539
ISBN-13 : 026218253X
Rating : 4/5 (39 Downloads)

Synopsis Gaussian Processes for Machine Learning by : Carl Edward Rasmussen

A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Neural Approaches to Conversational AI: Question Answering, Task-Oriented Dialogues and Social Chatbots

Neural Approaches to Conversational AI: Question Answering, Task-Oriented Dialogues and Social Chatbots
Author :
Publisher : Foundations and Trends(r) in I
Total Pages : 184
Release :
ISBN-10 : 1680835521
ISBN-13 : 9781680835526
Rating : 4/5 (21 Downloads)

Synopsis Neural Approaches to Conversational AI: Question Answering, Task-Oriented Dialogues and Social Chatbots by : Jianfeng Gao

This monograph is the first survey of neural approaches to conversational AI that targets Natural Language Processing and Information Retrieval audiences. It provides a comprehensive survey of the neural approaches to conversational AI that have been developed in the last few years, covering QA, task-oriented and social bots with a unified view of optimal decision making.The authors draw connections between modern neural approaches and traditional approaches, allowing readers to better understand why and how the research has evolved and to shed light on how they can move forward. They also present state-of-the-art approaches to training dialogue agents using both supervised and reinforcement learning. Finally, the authors sketch out the landscape of conversational systems developed in the research community and released in industry, demonstrating via case studies the progress that has been made and the challenges that are still being faced.Neural Approaches to Conversational AI is a valuable resource for students, researchers, and software developers. It provides a unified view, as well as a detailed presentation of the important ideas and insights needed to understand and create modern dialogue agents that will be instrumental to making world knowledge and services accessible to millions of users in ways that seem natural and intuitive.

Emotion-Oriented Systems

Emotion-Oriented Systems
Author :
Publisher : Springer Science & Business Media
Total Pages : 787
Release :
ISBN-10 : 9783642151842
ISBN-13 : 3642151841
Rating : 4/5 (42 Downloads)

Synopsis Emotion-Oriented Systems by : Paolo Petta

Emotion pervades human life in general, and human communication in particular, and this sets information technology a challenge. Traditionally, IT has focused on allowing people to accomplish practical tasks efficiently, setting emotion to one side. That was acceptable when technology was a small part of life, but as technology and life become increasingly interwoven we can no longer ask people to suspend their emotional nature and habits when they interact with technology. The European Commission funded a series of related research projects on emotion and computing, culminating in the HUMAINE project which brought together leading academic researchers from the many related disciplines. This book grew out of that project, and its chapters are arranged according to its working areas: theories and models; signals to signs; data and databases; emotion in interaction; emotion in cognition and action; persuasion and communication; usability; and ethics and good practice. The fundamental aim of the book is to offer researchers an overview of the related areas, sufficient for them to do credible work on affective or emotion-oriented computing. The book serves as an academically sound introduction to the range of disciplines involved – technical, empirical and conceptual – and will be of value to researchers in the areas of artificial intelligence, psychology, cognition and user—machine interaction.