Automatic Speech Recognition (ASR) is the enabling technology for hands-free dictation and voice-triggered computer menus. It is becoming increasingly prevalent in environments such as private telephone exchanges and real-time information services. Speech Recognition introduces the principles of ASR systems, including the theory and implementation issues behind multi-speaker continuous speech recognition. Focusing on the algorithms employed in commercial and laboratory systems, the treatment enables the reader to devise practical solutions for ASR system problems. It addresses in detail C++ programming techniques used to develop ASR applications, thus offering skills that will prove useful in any large C++ based software project. Possible extensions of the well-established ASR technology are highlighted, based on "Hidden Markov Models" applied to fields such as modelling and prediction of econometric series. Features include:
Excerpt. © Reprinted by permission. All rights reserved.:
* Accompanying website containing all C++ source code of a complete laboratory multi-speaker continuous-speech ASR system (e.g. Initialisation, Training, Recognition, Evaluation, etc.) www.wiley.com/go/becchetti_speech
* Detailed theoretical, mathematical and technical explanations of ASR
* A practical account of the functioning of ASR
A crucial source of information for researchers, developers and project managers involved with ASR systems, Speech Recognition is also structured for use by students of digital signal processing, speech recognition and C++ programming techniques.
Preface of the Book
"A technology is a real progress when it is available to anyone" --Henry Ford
Speech Recognition is nowadays regarded by market projections as one of the more promising technologies of the future. According to one of these projections, speech technology industrial product sales will rise from $500 million in 1997 to $38 billion in 2003.
Regarding the know-how of Automatic Speech Recognition (ASR) systems, a similar technology is shared by all the commercial and research systems. However, despite the wide diffusion of commercial ASR systems, the underlying technology is mainly known to only a few laboratories.
One of the main objectives of this book consists of showing theory and implementation of ASR systems. We will concentrate our attention on those solutions effectively adopted among all those proposed by the scientific community. When feasible, choices of commercial systems will also be discussed.
In the spirit of the above quotation by Henry Ford, we have supplied the Recognition Experimental system "RES" that is a complete ASR. RES C++ code, fully contained in the enclosed CD, shows many of the undocumented aspects that allow the ASR systems to work. We hope also that RES will grow in its capabilities thanks to independent developers as happened for the free operative system "Linux". At this time we are grateful for external contributions and we will support RES through our WEB page "http:\\www.fub.it\res".
The ASR technology is based on the so-called Hidden Markov Models (HMMs). HMMs are one of the most powerful models that allow description of complex non-stationary phenomena ranging from speech to stock market behavior. In the speech recognition community, much research has produced efficient algorithms and techniques related to HMMs. This technology is highly consolidated and it can therefore be extended to other applications such as, for instance, modeling and prediction of economic time series. This has led us to include an econometric appendix describing "stylized facts" of economic series, unresolved estimation and prediction problems and showing the possible application of HMMs to these issues.
As mentioned above, the book is supplied with the 30 000 line C++ source code of RES. This and other projects have consolidated our experience in developing C++ software. The book contains also our programming and teaching experience filtered by the contributions of the scientific community on Object Oriented programming.
Our programming technique is based on a "conservative" approach that improves reliability of the software. The technique, often used in "mission-critical" software, greatly reduces C++ development and debugging time. Some chapters are devoted to describing this technique as well as the solutions adopted for the software problems met in developing the RES system. Since these problems affect any medium-sized or large project, the discussion achieves a general validity. RES allows us to show the solutions applied to a real system more than to abstract examples.
In essence, the topics considered in the book are:
theory and methods that make the ASR systems work
C++ software implementation of the ASR through RES
the conservative programming technique in C++
solutions to common issues encountered in medium-sized projects
an appendix for applying HMMs to stock exchange prediction
The book contains issues that may interest various readers with different skills. Thus, we have structured the book in a particular way that allows the reader to "browse" the topics easily selecting the more appropriate ones according to her/his need/preparation.
The chapters of the book are organized following the "flow of speech data" inside the ASR system. The ASR systems are easily decomposed into functional blocks implementing different tasks. The blocks have input data and output processed data. Obviously, the first block receives the speech samples as input while the last block returns the string of the recognized words. Each chapter deals with one of these blocks and the order of the chapters reflects the order of the crossing of the signal among the blocks Figure 1.6 on page 19 shows the ASR blocks and the exchanged data in detail. Note that the bold numbers contained in each block correspond to the chapter in which the blocks are addressed.
Each chapter contains two distinctive parts. In the first, theoretical issues related to the functionality of the ASR (i.e. each particular block) are covered. In the second part, C++ implementation and C++ issues related to general programming problems of each block are addressed.
Sections have symbols specifying the topics covered and the required skill when useful, in particular: a framed computer indicates that the section covers programming issues of general interest. The topics are not directly related to ASR, but are inherent in any software project,
Another three symbols are used to classify the sections according to the reader's skill and interest:
"abc" symbol indicates that the topics covered should be assimilated by the reader before going on to the following sections. These sections deal with basic topics and may be skipped by more experienced readers who are already familiar with the specific arguments.
exclamation point marks sections that should not be missed. These sections cover relevant topics which may not be familiar even to experienced readers and, thus, should be considered with great attention.
a lens denotes advanced sections devoted to more experienced or specifically interested readers and may be skipped at a first reading.
The theoretical and the implementation parts constituting each chapter can be considered as two independent books since the reader interested in only one part may skip the other.
Our hope is that the reader can assimilate the technological topics she/he is interested in, in the fastest and most useful way.
"Sobre este título" puede pertenecer a otra edición de este libro.