The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications.
Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.
"Sinopsis" puede pertenecer a otra edición de este libro.
Paolo Giudici Department of Economics and Quantitative Methods, University of Pavia, A lecturer in data mining, business statistics, data analysis and risk management, Professor Giudici is also the director of the data mining laboratory. He is the author of around 80 publications, and the coordinator of 2 national research grants on data mining, and local coordinator of a European integrated project on the topic. He was the sole author of the first edition of this book, which has been translated into both Italian and Chinese. He is also one of the Editors of Wiley's Series in Computational Statistics.
Silvia Figini, Ms Figini has worked for 2 years for the Competence centre for data mining analysis and business intelligence at SAS Milan. She is currently completing a PhD in statistics, and already has a collection of publications to her name
The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications.
Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.
From an operational point of view, data mining is an integrated process of data analysis that consists of a series of activities that go from the definition of the objectives to be analysed, to the analysis of the data up to the interpretation and evaluation of the results. The various phases of the process are as follows:
Definition of the objectives for analysis. It is not always easy to define statistically the phenomenon we want to analyse. In fact, while the company objectives that we are aiming for are usually clear, they can be difficult to formalise. A clear statement of the problem and the objectives to be achieved is is of the utmost importance in setting up the analysis correctly. This is certainly one of the most difficult parts of the process since it determines the methods to be employed. Therefore the objectives must be clear and there must be no room for doubt or uncertainty.
Selection, organisation and pre-treatment of the data. Once the objectives of the analysis have been identified it is then necessary to collect or select the data needed for the analysis. First of all, it is necessary to identify the data sources. Usually data is taken from internal sources that are cheaper and more reliable. This data also has the advantage of being the result of the experiences and procedures of the company itself. The ideal data source is the company data warehouse, a `store room' of historical data that is no longer subject to changes and from which it is easy to extract topic databases (data marts) of interest. If there is no data warehouse then the data marts must be created by overlapping the different sources of company data.
In general, the creation of data marts to be analysed provides the fundamental input for the subsequent data analysis. It leads to a representation of the data, usually in table form, known as a data matrix that is based on the analytical needs and the previously established aims.
Once a data matrix is available it is often necessary to carry out a process of preliminary cleaning of the data. In other words, a quality control exercise is carried out on the data available. This is a formal process used to find or select variables that cannot be used, that is, variables that exist but are not suitable for analysis. It is also an important check on the contents of the variables and the possible presence of missing or incorrect data. If any essential information is missing it will then be necessary to supply further data. (See Agresti (1990).
Exploratory analysis of the data and their transformation. This phase involves a preliminary exploratory analysis of the data, very similar to on-line analytical process (OLAP) techniques. It involves an initial evaluation of the importance of the collected data. This phase might lead to a transformation of the original variables in order to better understand the phenomenon or which statistical methods to use. An exploratory analysis can highlight any anomalous data, data that is different from the rest. This data will not necessarily be eliminated because it might contain information that is important in achieving the objectives of the analysis. We think that an exploratory analysis of the data is essential because it allows the analyst to select the most appropriate statistical methods for the next phase of the analysis. This choice must consider the quality of the available data. The exploratory analysis might also suggest the need for new data extraction, if the collected data is considered insufficient for the aims of the analysis.
Specification of statistical methods. There are various statistical methods that can be used, and thus many algorithms available, so it is important to have a classification of the existing methods. The choice of which method to use in the analysis depends on the problem being studied or on the type of data available. The data mining process is guided by the application. For this reason, the classification of the statistical methods depends on the analysis's aim. Therefore, we group the methods into two main classes corresponding to distinct/different phases of the data analysis.
• Descriptive methods. The main objective of this class of methods (also called symmetrical, unsupervised or indirect) is to describe groups of data in a succinct way. This can concern both the observations, which are classified into groups not known beforehand (cluster analysis, Kohonen maps) as well as the variables that are connected among themselves according to links unknown beforehand (association methods, log-linear models, graphical models). In descriptive methods there are no hypotheses of causality among the available variables.
• Predictive methods. In this class of methods (also called asymmetrical, supervised or direct) the aim is to describe one or more of the variables in relation to all the others. This is done by looking for rules of classification or prediction based on the data. These rules help predict or classify the future result of one or more response or target variables in relation to what happens to the explanatory or input variables. The main methods of this type are those developed in the field of machine learning such as neural networks (multilayer perceptrons) and decision trees, but also classic statistical models such as linear and logistic regression models.
Analysis of the data based on the chosen methods. Once the statistical methods have been specified they must be translated into appropriate algorithms for computing the results we need from the available data. Given the wide range of specialised and non-specialised software available for data mining, it is not necessary to develop ad hoc calculation algorithms for the most `standard' applications. However, it is important that those managing the data mining process have a good understanding of the different available methods as well as of the different software solutions, so that they can adapt the process to the specific needs of the company and can correctly interpret the results of the analysis.
Evaluation and comparison of the methods used and choice of the final model for analysis. To produce a final decision it is necessary to choose the best `model' from the various statistical methods available. The choice of model is based on the comparison of the results obtained. It may be that none of the methods used satisfactorily achieves the analysis aims. In this case it is necessary to specify a more appropriate method for the analysis. When evaluating the performance of a specific method, as well as diagnostic measures of a statistical type, other things must be considered such as the constraints on the business both in terms of time and resources, as well as the quality and the availability of data. In data mining it is not usually a good idea to use just one statistical method to analyse data. Each method has the potential to highlight aspects that may be ignored by other methods.
Interpretation of the chosen model and its use in the decision process. Data mining is not only data analysis, but also the integration of the results into the company decision process. Business knowledge, the extraction of rules and their use in the decision process allow us to move from the analytical phase to the production of a decision engine. Once the model has been chosen and tested with a data set, the classification rule can be generalised. For example, we will be able to distinguish which customers will be more profitable or to calibrate differentiated commercial policies for different target consumer groups, thereby increasing the profits of the company.
Having seen the benefits we can get from data mining, it is crucial to implement the process correctly in order to exploit it to its full potential. The inclusion of the data mining process in the company organisation must be done gradually, setting out realistic aims and looking at the results along the way. The final aim is for data mining to be fully integrated with the other activities that are used to support company decisions. This process of integration can be divided into four phases:
• Strategic phase. In this first phase we study the business procedures in order to identify where data mining could be more beneficial. The results at the end of this phase are the definition of the business objectives for a pilot data mining project and the definition of criteria to evaluate the project itself.
• Training phase. This phase allows us to evaluate the data mining activity more carefully. A pilot project is set up and the results are assessed using the objectives and the criteria established in the previous phase. A fundamental aspect of the implementation of a data mining procedure is the choice of the pilot project. It must be easy to use but also important enough to create interest.
• Creation phase. If the positive evaluation of the pilot project results in implementing a complete data mining system it will then be necessary to establish a detailed plan to reorganise the business procedure in order to include the data mining activity. More specifically, it will be necessary to reorganise the business database with the possible creation of a data warehouse; to develop the previous data mining prototype until we have an initial operational version and to allocate personnel and time to follow the project.
• Migration phase. At this stage all we need to do is to prepare the organisation appropriately so that the data mining process can be successfully integrated. This means teaching likely users the potential of the new system and increasing their trust in the benefits that the system will bring to the company. This means constantly evaluating (and communicating) the results obtained from the data mining process.
(Continues...)
Excerpted from Applied Data Mining for Business and Industryby Paolo Giudici Silvia Figini Copyright © 2009 by John Wiley & Sons, Ltd. Excerpted by permission of John Wiley & Sons. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
"Sobre este título" puede pertenecer a otra edición de este libro.
EUR 2,25 gastos de envío en Estados Unidos de America
Destinos, gastos y plazos de envíoEUR 5,75 gastos de envío desde Reino Unido a Estados Unidos de America
Destinos, gastos y plazos de envíoLibrería: PBShop.store UK, Fairford, GLOS, Reino Unido
HRD. Condición: New. New Book. Shipped from UK. Established seller since 2000. Nº de ref. del artículo: FW-9780470058862
Cantidad disponible: 15 disponibles
Librería: Best Price, Torrance, CA, Estados Unidos de America
Condición: New. SUPER FAST SHIPPING. Nº de ref. del artículo: 9780470058862
Cantidad disponible: 2 disponibles
Librería: GreatBookPrices, Columbia, MD, Estados Unidos de America
Condición: New. Nº de ref. del artículo: 4239246-n
Cantidad disponible: Más de 20 disponibles
Librería: Lucky's Textbooks, Dallas, TX, Estados Unidos de America
Condición: New. Nº de ref. del artículo: ABLIING23Feb2215580218640
Cantidad disponible: Más de 20 disponibles
Librería: GreatBookPrices, Columbia, MD, Estados Unidos de America
Condición: As New. Unread book in perfect condition. Nº de ref. del artículo: 4239246
Cantidad disponible: Más de 20 disponibles
Librería: Ria Christie Collections, Uxbridge, Reino Unido
Condición: New. In. Nº de ref. del artículo: ria9780470058862_new
Cantidad disponible: Más de 20 disponibles
Librería: GreatBookPricesUK, Woodford Green, Reino Unido
Condición: New. Nº de ref. del artículo: 4239246-n
Cantidad disponible: Más de 20 disponibles
Librería: GreatBookPricesUK, Woodford Green, Reino Unido
Condición: As New. Unread book in perfect condition. Nº de ref. del artículo: 4239246
Cantidad disponible: Más de 20 disponibles
Librería: Grand Eagle Retail, Mason, OH, Estados Unidos de America
Hardcover. Condición: new. Hardcover. The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies analysed using R.Is accessible to anyone with a basic knowledge of statistics or data analysis.Includes an extensive bibliography and pointers to further reading within the text. Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance. Since the publication of the first edition, the field has seen many dramatic changes. Eight new case studies and 70# new material bring this Second Edition of Applied Data Mining for Business and Industry completely up to date. All the methods described are either computational or of a statistical-modeling nature. Shipping may be from multiple locations in the US or from the UK, depending on stock availability. Nº de ref. del artículo: 9780470058862
Cantidad disponible: 1 disponibles
Librería: THE SAINT BOOKSTORE, Southport, Reino Unido
Hardback. Condición: New. New copy - Usually dispatched within 4 working days. 550. Nº de ref. del artículo: B9780470058862
Cantidad disponible: Más de 20 disponibles