Language Modeling for Information Retrieval: 13 (The Information Retrieval Series) - Tapa blanda

Libro 6 de 42: The Information Retrieval
 
9789048162635: Language Modeling for Information Retrieval: 13 (The Information Retrieval Series)

Sinopsis

A statisticallanguage model, or more simply a language model, is a prob­ abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat­ egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon’s study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.

"Sinopsis" puede pertenecer a otra edición de este libro.

Reseña del editor

A statisticallanguage model, or more simply a language model, is a prob­ abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat­ egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.

Reseña del editor

This book contains the first collection of papers addressing recent developments in the design of information retrieval systems using language modeling techniques. Language modeling approaches are used in a variety of other language technologies, such as speech recognition and machine translation, and the book shows that applications such as Web search, cross-lingual search, filtering, and summarization can be described in the same formal framework. The book is intended primarily for researchers and advanced graduate students working in the language technologies areas of computer science or information science.

"Sobre este título" puede pertenecer a otra edición de este libro.

Otras ediciones populares con el mismo título

9781402012167: Language Modeling for Information Retrieval: 13 (The Information Retrieval Series, 13)

Edición Destacada

ISBN 10:  1402012160 ISBN 13:  9781402012167
Editorial: Springer, 2003
Tapa dura