Search and Find
Service
Preface
1
Variability, Information, and Prediction
16
The Curse of Dimensionality
18
The Two Extremes
19
Perspectives on the Curse
20
Sparsity
21
Exploding Numbers of Models
23
Multicollinearity and Concurvity
24
The Effect of Noise
25
Coping with the Curse
26
Selecting Design Points
26
Local Dimension
27
Parsimony
32
Two Techniques
33
The Bootstrap
33
Cross-Validation
42
Optimization and Search
47
Univariate Search
47
Multivariate Search
48
General Searches
49
Constraint Satisfaction and Combinatorial Search
50
Notes
53
Hammersley Points
53
Edgeworth Expansions for the Mean
54
Bootstrap Asymptotics for the Studentized Mean
56
Exercises
58
Local Smoothers
68
Early Smoothers
70
Transition to Classical Smoothers
74
Global Versus Local Approximations
75
LOESS
79
Kernel Smoothers
82
Statistical Function Approximation
83
The Concept of Kernel Methods and the Discrete Case
88
Kernels and Stochastic Designs: Density Estimation
93
Stochastic Designs: Asymptotics for Kernel Smoothers
96
Convergence Theorems and Rates for Kernel Smoothers
101
Kernel and Bandwidth Selection
105
Linear Smoothers
110
Nearest Neighbors
111
Applications of Kernel Regression
115
A Simulated Example
115
Ethanol Data
117
Exercises
122
Spline Smoothing
132
Interpolating Splines
132
Natural Cubic Splines
138
Smoothing Splines for Regression
141
Model Selection for Spline Smoothing
144
Spline Smoothing Meets Kernel Smoothing
145
Asymptotic Bias, Variance, and MISE for Spline Smoothers
146
Ethanol Data Example -- Continued
148
Splines Redux: Hilbert Space Formulation
151
Reproducing Kernels
153
Constructing an RKHS
156
Direct Sum Construction for Splines
161
Explicit Forms
164
Nonparametrics in Data Mining and Machine Learning
167
Simulated Comparisons
169
What Happens with Dependent Noise Models?
172
Higher Dimensions and the Curse of Dimensionality
174
Notes
178
Sobolev Spaces: Definition
178
Exercises
179
New Wave Nonparametrics
186
Additive Models
187
The Backfitting Algorithm
188
Concurvity and Inference
192
Nonparametric Optimality
195
Generalized Additive Models
196
Projection Pursuit Regression
199
Neural Networks
204
Backpropagation and Inference
207
Barron's Result and the Curse
212
Approximation Properties
213
Barron's Theorem: Formal Statement
215
Recursive Partitioning Regression
217
Growing Trees
219
Pruning and Selection
222
Regression
223
Bayesian Additive Regression Trees: BART
225
MARS
225
Sliced Inverse Regression
230
ACE and AVAS
233
Notes
235
Proof of Barron's Theorem
235
Exercises
239
Supervised Learning: Partition Methods
246
Multiclass Learning
248
Discriminant Analysis
250
Distance-Based Discriminant Analysis
251
Bayes Rules
256
Probability-Based Discriminant Analysis
260
Tree-Based Classifiers
264
Splitting Rules
264
Logic Trees
268
Random Forests
269
Support Vector Machines
277
Margins and Distances
277
Binary Classification and Risk
280
Prediction Bounds for Function Classes
283
Constructing SVM Classifiers
286
SVM Classification for Nonlinearly Separable Populations
294
SVMs in the General Nonlinear Case
297
Some Kernels Used in SVM Classification
303
Kernel Choice, SVMs and Model Selection
304
Support Vector Regression
305
Multiclass Support Vector Machines
308
Neural Networks
309
Notes
311
Hoeffding's Inequality
311
VC Dimension
312
Exercises
315
Alternative Nonparametrics
322
Ensemble Methods
323
Bayes Model Averaging
325
Bagging
327
Stacking
331
Boosting
333
Other Averaging Methods
341
Oracle Inequalities
343
Bayes Nonparametrics
349
Dirichlet Process Priors
349
Polya Tree Priors
351
Gaussian Process Priors
353
The Relevance Vector Machine
359
RVM Regression: Formal Description
360
RVM Classification
364
Hidden Markov Models -- Sequential Classification
367
Notes
369
Proof of Yang's Oracle Inequality
369
Proof of Lecue's Oracle Inequality
372
Exercises
374
Computational Comparisons
379
Computational Results: Classification
380
Comparison on Fisher's Iris Data
380
Comparison on Ripley's Data
383
Computational Results: Regression
390
Vapnik's sinc Function
391
Friedman's Function
403
Conclusions
406
Systematic Simulation Study
411
No Free Lunch
414
Exercises
416
Unsupervised Learning: Clustering
419
Centroid-Based Clustering
422
K-Means Clustering
423
Variants
426
Hierarchical Clustering
427
Agglomerative Hierarchical Clustering
428
Divisive Hierarchical Clustering
436
Theory for Hierarchical Clustering
440
Partitional Clustering
444
Model-Based Clustering
446
Graph-Theoretic Clustering
461
Spectral Clustering
466
Bayesian Clustering
472
Probabilistic Clustering
472
Hypothesis Testing
475
Computed Examples
477
Ripley's Data
479
Iris Data
489
Cluster Validation
494
Notes
498
Derivatives of Functions of a Matrix:
498
Kruskal's Algorithm: Proof
498
Prim's Algorithm: Proof
499
Exercises
499
Learning in High Dimensions
506
Principal Components
508
Main Theorem
509
Key Properties
511
Extensions
513
Factor Analysis
515
Finding and
517
Finding K
519
Estimating Factor Scores
520
Projection Pursuit
521
Independent Components Analysis
524
Main Definitions
524
Key Results
526
Computational Approach
528
Nonlinear PCs and ICA
529
Nonlinear PCs
530
Nonlinear ICA
531
Geometric Summarization
531
Measuring Distances to an Algebraic Shape
532
Principal Curves and Surfaces
533
Supervised Dimension Reduction: Partial Least Squares
536
Simple PLS
536
PLS Procedures
537
Properties of PLS
539
Supervised Dimension Reduction: Sufficient Dimensions in Regression
540
Visualization I: Basic Plots
544
Elementary Visualization
547
Projections
554
Time Dependence
556
Visualization II: Transformations
559
Chernoff Faces
559
Multidimensional Scaling
560
Self-Organizing Maps
566
Exercises
573
Variable Selection
582
Concepts from Linear Regression
583
Subset Selection
585
Variable Ranking
588
Overview
590
Traditional Criteria
591
Akaike Information Criterion (AIC)
593
Bayesian Information Criterion (BIC)
596
Choices of Information Criteria
598
Cross Validation
600
Shrinkage Methods
612
Shrinkage Methods for Linear Models
614
Grouping in Variable Selection
628
Least Angle Regression
630
Shrinkage Methods for Model Classes
633
Cautionary Notes
644
Bayes Variable Selection
645
Prior Specification
648
Posterior Calculation and Exploration
656
Evaluating Evidence
660
Connections Between Bayesian and Frequentist Methods
663
Computational Comparisons
666
The n > p Case
666
When p > n
678
Notes
680
Code for Generating Data in Section 10.5
680
Exercises
684
Multiple Testing
692
Analyzing the Hypothesis Testing Problem
694
A Paradigmatic Setting
694
Counts for Multiple Tests
697
Measures of Error in Multiple Testing
698
Aspects of Error Control
700
Controlling the Familywise Error Rate
703
One-Step Adjustments
703
Stepwise p-Value Adjustments
706
PCER and PFER
708
Null Domination
709
Two Procedures
710
Controlling the Type I Error Rate
715
Adjusted p-Values for PFER/PCER
719
Controlling the False Discovery Rate
720
FDR and other Measures of Error
722
The Benjamini-Hochberg Procedure
723
A BH Theorem for a Dependent Setting
724
Variations on BH
726
Controlling the Positive False Discovery Rate
732
Bayesian Interpretations
732
Aspects of Implementation
736
Bayesian Multiple Testing
740
Fully Bayes: Hierarchical
741
Fully Bayes: Decision theory
744
Notes
749
Proof of the Benjamini-Hochberg Theorem
749
Proof of the Benjamini-Yekutieli Theorem
752
References
756
Index
785
All prices incl. VAT