Search and Find

Book Title

Author/Publisher

Table of Contents

Show eBooks for my device only:

 

Principles and Theory for Data Mining and Machine Learning

of: Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang

Springer-Verlag, 2009

ISBN: 9780387981352 , 793 Pages

Format: PDF, Read online

Copy protection: DRM

Windows PC,Mac OSX,Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Read Online for: Windows PC,Mac OSX,Linux

Price: 287,83 EUR



More of the content

Principles and Theory for Data Mining and Machine Learning


 

Preface

1

Variability, Information, and Prediction

16

The Curse of Dimensionality

18

The Two Extremes

19

Perspectives on the Curse

20

Sparsity

21

Exploding Numbers of Models

23

Multicollinearity and Concurvity

24

The Effect of Noise

25

Coping with the Curse

26

Selecting Design Points

26

Local Dimension

27

Parsimony

32

Two Techniques

33

The Bootstrap

33

Cross-Validation

42

Optimization and Search

47

Univariate Search

47

Multivariate Search

48

General Searches

49

Constraint Satisfaction and Combinatorial Search

50

Notes

53

Hammersley Points

53

Edgeworth Expansions for the Mean

54

Bootstrap Asymptotics for the Studentized Mean

56

Exercises

58

Local Smoothers

68

Early Smoothers

70

Transition to Classical Smoothers

74

Global Versus Local Approximations

75

LOESS

79

Kernel Smoothers

82

Statistical Function Approximation

83

The Concept of Kernel Methods and the Discrete Case

88

Kernels and Stochastic Designs: Density Estimation

93

Stochastic Designs: Asymptotics for Kernel Smoothers

96

Convergence Theorems and Rates for Kernel Smoothers

101

Kernel and Bandwidth Selection

105

Linear Smoothers

110

Nearest Neighbors

111

Applications of Kernel Regression

115

A Simulated Example

115

Ethanol Data

117

Exercises

122

Spline Smoothing

132

Interpolating Splines

132

Natural Cubic Splines

138

Smoothing Splines for Regression

141

Model Selection for Spline Smoothing

144

Spline Smoothing Meets Kernel Smoothing

145

Asymptotic Bias, Variance, and MISE for Spline Smoothers

146

Ethanol Data Example -- Continued

148

Splines Redux: Hilbert Space Formulation

151

Reproducing Kernels

153

Constructing an RKHS

156

Direct Sum Construction for Splines

161

Explicit Forms

164

Nonparametrics in Data Mining and Machine Learning

167

Simulated Comparisons

169

What Happens with Dependent Noise Models?

172

Higher Dimensions and the Curse of Dimensionality

174

Notes

178

Sobolev Spaces: Definition

178

Exercises

179

New Wave Nonparametrics

186

Additive Models

187

The Backfitting Algorithm

188

Concurvity and Inference

192

Nonparametric Optimality

195

Generalized Additive Models

196

Projection Pursuit Regression

199

Neural Networks

204

Backpropagation and Inference

207

Barron's Result and the Curse

212

Approximation Properties

213

Barron's Theorem: Formal Statement

215

Recursive Partitioning Regression

217

Growing Trees

219

Pruning and Selection

222

Regression

223

Bayesian Additive Regression Trees: BART

225

MARS

225

Sliced Inverse Regression

230

ACE and AVAS

233

Notes

235

Proof of Barron's Theorem

235

Exercises

239

Supervised Learning: Partition Methods

246

Multiclass Learning

248

Discriminant Analysis

250

Distance-Based Discriminant Analysis

251

Bayes Rules

256

Probability-Based Discriminant Analysis

260

Tree-Based Classifiers

264

Splitting Rules

264

Logic Trees

268

Random Forests

269

Support Vector Machines

277

Margins and Distances

277

Binary Classification and Risk

280

Prediction Bounds for Function Classes

283

Constructing SVM Classifiers

286

SVM Classification for Nonlinearly Separable Populations

294

SVMs in the General Nonlinear Case

297

Some Kernels Used in SVM Classification

303

Kernel Choice, SVMs and Model Selection

304

Support Vector Regression

305

Multiclass Support Vector Machines

308

Neural Networks

309

Notes

311

Hoeffding's Inequality

311

VC Dimension

312

Exercises

315

Alternative Nonparametrics

322

Ensemble Methods

323

Bayes Model Averaging

325

Bagging

327

Stacking

331

Boosting

333

Other Averaging Methods

341

Oracle Inequalities

343

Bayes Nonparametrics

349

Dirichlet Process Priors

349

Polya Tree Priors

351

Gaussian Process Priors

353

The Relevance Vector Machine

359

RVM Regression: Formal Description

360

RVM Classification

364

Hidden Markov Models -- Sequential Classification

367

Notes

369

Proof of Yang's Oracle Inequality

369

Proof of Lecue's Oracle Inequality

372

Exercises

374

Computational Comparisons

379

Computational Results: Classification

380

Comparison on Fisher's Iris Data

380

Comparison on Ripley's Data

383

Computational Results: Regression

390

Vapnik's sinc Function

391

Friedman's Function

403

Conclusions

406

Systematic Simulation Study

411

No Free Lunch

414

Exercises

416

Unsupervised Learning: Clustering

419

Centroid-Based Clustering

422

K-Means Clustering

423

Variants

426

Hierarchical Clustering

427

Agglomerative Hierarchical Clustering

428

Divisive Hierarchical Clustering

436

Theory for Hierarchical Clustering

440

Partitional Clustering

444

Model-Based Clustering

446

Graph-Theoretic Clustering

461

Spectral Clustering

466

Bayesian Clustering

472

Probabilistic Clustering

472

Hypothesis Testing

475

Computed Examples

477

Ripley's Data

479

Iris Data

489

Cluster Validation

494

Notes

498

Derivatives of Functions of a Matrix:

498

Kruskal's Algorithm: Proof

498

Prim's Algorithm: Proof

499

Exercises

499

Learning in High Dimensions

506

Principal Components

508

Main Theorem

509

Key Properties

511

Extensions

513

Factor Analysis

515

Finding and

517

Finding K

519

Estimating Factor Scores

520

Projection Pursuit

521

Independent Components Analysis

524

Main Definitions

524

Key Results

526

Computational Approach

528

Nonlinear PCs and ICA

529

Nonlinear PCs

530

Nonlinear ICA

531

Geometric Summarization

531

Measuring Distances to an Algebraic Shape

532

Principal Curves and Surfaces

533

Supervised Dimension Reduction: Partial Least Squares

536

Simple PLS

536

PLS Procedures

537

Properties of PLS

539

Supervised Dimension Reduction: Sufficient Dimensions in Regression

540

Visualization I: Basic Plots

544

Elementary Visualization

547

Projections

554

Time Dependence

556

Visualization II: Transformations

559

Chernoff Faces

559

Multidimensional Scaling

560

Self-Organizing Maps

566

Exercises

573

Variable Selection

582

Concepts from Linear Regression

583

Subset Selection

585

Variable Ranking

588

Overview

590

Traditional Criteria

591

Akaike Information Criterion (AIC)

593

Bayesian Information Criterion (BIC)

596

Choices of Information Criteria

598

Cross Validation

600

Shrinkage Methods

612

Shrinkage Methods for Linear Models

614

Grouping in Variable Selection

628

Least Angle Regression

630

Shrinkage Methods for Model Classes

633

Cautionary Notes

644

Bayes Variable Selection

645

Prior Specification

648

Posterior Calculation and Exploration

656

Evaluating Evidence

660

Connections Between Bayesian and Frequentist Methods

663

Computational Comparisons

666

The n > p Case

666

When p > n

678

Notes

680

Code for Generating Data in Section 10.5

680

Exercises

684

Multiple Testing

692

Analyzing the Hypothesis Testing Problem

694

A Paradigmatic Setting

694

Counts for Multiple Tests

697

Measures of Error in Multiple Testing

698

Aspects of Error Control

700

Controlling the Familywise Error Rate

703

One-Step Adjustments

703

Stepwise p-Value Adjustments

706

PCER and PFER

708

Null Domination

709

Two Procedures

710

Controlling the Type I Error Rate

715

Adjusted p-Values for PFER/PCER

719

Controlling the False Discovery Rate

720

FDR and other Measures of Error

722

The Benjamini-Hochberg Procedure

723

A BH Theorem for a Dependent Setting

724

Variations on BH

726

Controlling the Positive False Discovery Rate

732

Bayesian Interpretations

732

Aspects of Implementation

736

Bayesian Multiple Testing

740

Fully Bayes: Hierarchical

741

Fully Bayes: Decision theory

744

Notes

749

Proof of the Benjamini-Hochberg Theorem

749

Proof of the Benjamini-Yekutieli Theorem

752

References

756

Index

785