Search and Find

Service

Information & Contact

Principles and Theory for Data Mining and Machine Learning

of: Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang

Springer-Verlag, 2009

ISBN: 9780387981352 , 793 Pages

Format: PDF, Read online

Copy protection: DRM

Read Online for: Windows PC,Mac OSX,Linux

Principles and Theory for Data Mining and Machine Learning

Preface: 1
Variability, Information, and Prediction: 16
The Curse of Dimensionality: 18
The Two Extremes: 19
Perspectives on the Curse: 20
Sparsity: 21
Exploding Numbers of Models: 23
Multicollinearity and Concurvity: 24
The Effect of Noise: 25
Coping with the Curse: 26
Selecting Design Points: 26
Local Dimension: 27
Parsimony: 32
Two Techniques: 33
The Bootstrap: 33
Cross-Validation: 42
Optimization and Search: 47
Univariate Search: 47
Multivariate Search: 48
General Searches: 49
Constraint Satisfaction and Combinatorial Search: 50
Notes: 53
Hammersley Points: 53
Edgeworth Expansions for the Mean: 54
Bootstrap Asymptotics for the Studentized Mean: 56
Exercises: 58
Local Smoothers: 68
Early Smoothers: 70
Transition to Classical Smoothers: 74
Global Versus Local Approximations: 75
LOESS: 79
Kernel Smoothers: 82
Statistical Function Approximation: 83
The Concept of Kernel Methods and the Discrete Case: 88
Kernels and Stochastic Designs: Density Estimation: 93
Stochastic Designs: Asymptotics for Kernel Smoothers: 96
Convergence Theorems and Rates for Kernel Smoothers: 101
Kernel and Bandwidth Selection: 105
Linear Smoothers: 110
Nearest Neighbors: 111
Applications of Kernel Regression: 115
A Simulated Example: 115
Ethanol Data: 117
Exercises: 122
Spline Smoothing: 132
Interpolating Splines: 132
Natural Cubic Splines: 138
Smoothing Splines for Regression: 141
Model Selection for Spline Smoothing: 144
Spline Smoothing Meets Kernel Smoothing: 145
Asymptotic Bias, Variance, and MISE for Spline Smoothers: 146
Ethanol Data Example -- Continued: 148
Splines Redux: Hilbert Space Formulation: 151
Reproducing Kernels: 153
Constructing an RKHS: 156
Direct Sum Construction for Splines: 161
Explicit Forms: 164
Nonparametrics in Data Mining and Machine Learning: 167
Simulated Comparisons: 169
What Happens with Dependent Noise Models?: 172
Higher Dimensions and the Curse of Dimensionality: 174
Notes: 178
Sobolev Spaces: Definition: 178
Exercises: 179
New Wave Nonparametrics: 186
Additive Models: 187
The Backfitting Algorithm: 188
Concurvity and Inference: 192
Nonparametric Optimality: 195
Generalized Additive Models: 196
Projection Pursuit Regression: 199
Neural Networks: 204
Backpropagation and Inference: 207
Barron's Result and the Curse: 212
Approximation Properties: 213
Barron's Theorem: Formal Statement: 215
Recursive Partitioning Regression: 217
Growing Trees: 219
Pruning and Selection: 222
Regression: 223
Bayesian Additive Regression Trees: BART: 225
MARS: 225
Sliced Inverse Regression: 230
ACE and AVAS: 233
Notes: 235
Proof of Barron's Theorem: 235
Exercises: 239
Supervised Learning: Partition Methods: 246
Multiclass Learning: 248
Discriminant Analysis: 250
Distance-Based Discriminant Analysis: 251
Bayes Rules: 256
Probability-Based Discriminant Analysis: 260
Tree-Based Classifiers: 264
Splitting Rules: 264
Logic Trees: 268
Random Forests: 269
Support Vector Machines: 277
Margins and Distances: 277
Binary Classification and Risk: 280
Prediction Bounds for Function Classes: 283
Constructing SVM Classifiers: 286
SVM Classification for Nonlinearly Separable Populations: 294
SVMs in the General Nonlinear Case: 297
Some Kernels Used in SVM Classification: 303
Kernel Choice, SVMs and Model Selection: 304
Support Vector Regression: 305
Multiclass Support Vector Machines: 308
Neural Networks: 309
Notes: 311
Hoeffding's Inequality: 311
VC Dimension: 312
Exercises: 315
Alternative Nonparametrics: 322
Ensemble Methods: 323
Bayes Model Averaging: 325
Bagging: 327
Stacking: 331
Boosting: 333
Other Averaging Methods: 341
Oracle Inequalities: 343
Bayes Nonparametrics: 349
Dirichlet Process Priors: 349
Polya Tree Priors: 351
Gaussian Process Priors: 353
The Relevance Vector Machine: 359
RVM Regression: Formal Description: 360
RVM Classification: 364
Hidden Markov Models -- Sequential Classification: 367
Notes: 369
Proof of Yang's Oracle Inequality: 369
Proof of Lecue's Oracle Inequality: 372
Exercises: 374
Computational Comparisons: 379
Computational Results: Classification: 380
Comparison on Fisher's Iris Data: 380
Comparison on Ripley's Data: 383
Computational Results: Regression: 390
Vapnik's sinc Function: 391
Friedman's Function: 403
Conclusions: 406
Systematic Simulation Study: 411
No Free Lunch: 414
Exercises: 416
Unsupervised Learning: Clustering: 419
Centroid-Based Clustering: 422
K-Means Clustering: 423
Variants: 426
Hierarchical Clustering: 427
Agglomerative Hierarchical Clustering: 428
Divisive Hierarchical Clustering: 436
Theory for Hierarchical Clustering: 440
Partitional Clustering: 444
Model-Based Clustering: 446
Graph-Theoretic Clustering: 461
Spectral Clustering: 466
Bayesian Clustering: 472
Probabilistic Clustering: 472
Hypothesis Testing: 475
Computed Examples: 477
Ripley's Data: 479
Iris Data: 489
Cluster Validation: 494
Notes: 498
Derivatives of Functions of a Matrix:: 498
Kruskal's Algorithm: Proof: 498
Prim's Algorithm: Proof: 499
Exercises: 499
Learning in High Dimensions: 506
Principal Components: 508
Main Theorem: 509
Key Properties: 511
Extensions: 513
Factor Analysis: 515
Finding and: 517
Finding K: 519
Estimating Factor Scores: 520
Projection Pursuit: 521
Independent Components Analysis: 524
Main Definitions: 524
Key Results: 526
Computational Approach: 528
Nonlinear PCs and ICA: 529
Nonlinear PCs: 530
Nonlinear ICA: 531
Geometric Summarization: 531
Measuring Distances to an Algebraic Shape: 532
Principal Curves and Surfaces: 533
Supervised Dimension Reduction: Partial Least Squares: 536
Simple PLS: 536
PLS Procedures: 537
Properties of PLS: 539
Supervised Dimension Reduction: Sufficient Dimensions in Regression: 540
Visualization I: Basic Plots: 544
Elementary Visualization: 547
Projections: 554
Time Dependence: 556
Visualization II: Transformations: 559
Chernoff Faces: 559
Multidimensional Scaling: 560
Self-Organizing Maps: 566
Exercises: 573
Variable Selection: 582
Concepts from Linear Regression: 583
Subset Selection: 585
Variable Ranking: 588
Overview: 590
Traditional Criteria: 591
Akaike Information Criterion (AIC): 593
Bayesian Information Criterion (BIC): 596
Choices of Information Criteria: 598
Cross Validation: 600
Shrinkage Methods: 612
Shrinkage Methods for Linear Models: 614
Grouping in Variable Selection: 628
Least Angle Regression: 630
Shrinkage Methods for Model Classes: 633
Cautionary Notes: 644
Bayes Variable Selection: 645
Prior Specification: 648
Posterior Calculation and Exploration: 656
Evaluating Evidence: 660
Connections Between Bayesian and Frequentist Methods: 663
Computational Comparisons: 666
The n > p Case: 666
When p > n: 678
Notes: 680
Code for Generating Data in Section 10.5: 680
Exercises: 684
Multiple Testing: 692
Analyzing the Hypothesis Testing Problem: 694
A Paradigmatic Setting: 694
Counts for Multiple Tests: 697
Measures of Error in Multiple Testing: 698
Aspects of Error Control: 700
Controlling the Familywise Error Rate: 703
One-Step Adjustments: 703
Stepwise p-Value Adjustments: 706
PCER and PFER: 708
Null Domination: 709
Two Procedures: 710
Controlling the Type I Error Rate: 715
Adjusted p-Values for PFER/PCER: 719
Controlling the False Discovery Rate: 720
FDR and other Measures of Error: 722
The Benjamini-Hochberg Procedure: 723
A BH Theorem for a Dependent Setting: 724
Variations on BH: 726
Controlling the Positive False Discovery Rate: 732
Bayesian Interpretations: 732
Aspects of Implementation: 736
Bayesian Multiple Testing: 740
Fully Bayes: Hierarchical: 741
Fully Bayes: Decision theory: 744
Notes: 749
Proof of the Benjamini-Hochberg Theorem: 749
Proof of the Benjamini-Yekutieli Theorem: 752
References: 756
Index: 785

All prices incl. VAT

Principles and Theory for Data Mining and Machine Learning

of: Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang

Principles and Theory for Data Mining and Machine Learning

Preface

Variability, Information, and Prediction

The Curse of Dimensionality

The Two Extremes

Perspectives on the Curse

Sparsity

Exploding Numbers of Models

Multicollinearity and Concurvity

The Effect of Noise

Coping with the Curse

Selecting Design Points

Local Dimension

Parsimony

Two Techniques

The Bootstrap

Cross-Validation

Optimization and Search

Univariate Search

Multivariate Search

General Searches

Constraint Satisfaction and Combinatorial Search

Notes

Hammersley Points

Edgeworth Expansions for the Mean

Bootstrap Asymptotics for the Studentized Mean

Exercises

Local Smoothers

Early Smoothers

Transition to Classical Smoothers

Global Versus Local Approximations

LOESS

Kernel Smoothers

Statistical Function Approximation

The Concept of Kernel Methods and the Discrete Case

Kernels and Stochastic Designs: Density Estimation

Stochastic Designs: Asymptotics for Kernel Smoothers

Convergence Theorems and Rates for Kernel Smoothers

Kernel and Bandwidth Selection

Linear Smoothers

Nearest Neighbors

Applications of Kernel Regression

A Simulated Example

Ethanol Data

Exercises

Spline Smoothing

Interpolating Splines

Natural Cubic Splines

Smoothing Splines for Regression

Model Selection for Spline Smoothing

Spline Smoothing Meets Kernel Smoothing

Asymptotic Bias, Variance, and MISE for Spline Smoothers

Ethanol Data Example -- Continued

Splines Redux: Hilbert Space Formulation

Reproducing Kernels

Constructing an RKHS

Direct Sum Construction for Splines

Explicit Forms

Nonparametrics in Data Mining and Machine Learning

Simulated Comparisons

What Happens with Dependent Noise Models?

Higher Dimensions and the Curse of Dimensionality

Notes

Sobolev Spaces: Definition

Exercises

New Wave Nonparametrics

Additive Models

The Backfitting Algorithm

Concurvity and Inference

Nonparametric Optimality

Generalized Additive Models

Projection Pursuit Regression

Neural Networks

Backpropagation and Inference

Barron's Result and the Curse

Approximation Properties

Barron's Theorem: Formal Statement

Recursive Partitioning Regression