Search and Find

Book Title

Author/Publisher

Table of Contents

Show eBooks for my device only:

 

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

of: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker

Springer-Verlag, 2008

ISBN: 9783540782469 , 719 Pages

Format: PDF, Read online

Copy protection: DRM

Windows PC,Mac OSX,Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Read Online for: Windows PC,Mac OSX,Linux

Price: 149,79 EUR



More of the content

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007


 

Preface

6

Contents

10

Part I Classification

19

Distance-based Kernels for Real-valued Data

20

1 Introduction

20

2 Kernels and similarities defined on real numbers

21

3 Semantics and applicability

22

4 Truncated Euclidean similarity

23

5 Canberra distance-based similarity

24

6 Kernels defined on real vectors

25

7 Conclusions

27

References

27

Fast Support Vector Machine Classification of Very Large Datasets

28

1 Introduction

28

2 Linear SVM trees

30

3 Non-linear extension

33

4 Experiments

33

5 Conclusion

34

References

35

Fusion of Multiple Statistical Classifiers

36

1 Introduction

36

2 Classifier fusion

37

3 Diversity of ensemble members

37

4 Combination rules

39

5 Open problems

41

6 Results of experiments

41

7 Conclusions

43

References

43

Calibrating Margin–based Classifier Scores into Polychotomous Probabilities

46

1 Introduction

46

2 Reduction to binary problems

47

3 Coupling probability estimates

47

4 Dirichlet calibration

48

5 Comparison

50

6 Conclusion

53

References

53

Classification with Invariant Distance Substitution Kernels

54

1 Introduction

54

2 Background

55

3 Adjustable invariance

57

4 Positive definiteness

58

5 Classification experiments

60

6 Conclusion

61

References

61

Applying the Kohonen Self-organizing Map Networks to Select Variables

62

1 Introduction

62

2 A proposition to reduce the number of variables

63

3 Applications and results

67

4 Conclusions

69

References

71

Computer Assisted Classification of Brain Tumors

72

1 Introduction

72

2 Algorithms

73

3 Results

75

4 Conclusions

76

References

76

Model Selection in Mixture Regression Analysis – A Monte Carlo Simulation Study

78

1 Introduction

78

2 Model selection in mixture models

79

3 Simulation design

80

4 Results summary

81

5 Key contributions and future research directions

83

References

84

Comparison of Local Classification Methods

86

1 Introduction

86

2 Local classification methods

87

3 Simulation study

89

4 Summary

93

References

93

Incorporating Domain Specific Information into Gaia Source Classification

94

1 Introduction

94

2 Classification and parametrization

95

3 Classification results

96

4 Summary

99

References

100

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

102

1 Introduction

102

2 Characteristics of the HINoV method and its modifications

103

3 Simulation models

104

4 Discussion on the simulation results

105

5 Conclusions

107

References

109

Part II Clustering

110

Families of Dendrograms

112

1 Introduction

112

2 A brief introduction to

113

adic geometry

113

3

115

adic dendrograms

115

4 The space of dendrograms

116

5 Distributions on dendrograms

117

6 Hidden vertices

118

7 Conclusions

118

Acknowledgements

119

References

119

Mixture Models in Forward Search Methods for Outlier Detection

120

1 Introduction

120

2 The Forward Search

121

3 Forward Search and Normal Mixture Models: the graphical approach

122

4 Forward Search and Normal Mixture Models: the inferential approach

123

5 Concluding remarks and open issues

126

References

127

On Multiple Imputation Through Finite Gaussian Mixture Models

128

1 Introduction

128

2 Multiple imputation

129

3 Label switching

132

4 Simulation study and results

133

References

134

Mixture Model Based Group Inference in Fused Genotype and Phenotype Data

136

1 Introduction

136

2 Methods

137

3 Results

140

4 Discussion

142

5 Acknowledgements

142

References

142

The Noise Component in Model- based Cluster Analysis

144

1 Introduction

144

2 Two variations on the noise component

149

3 Some theory

150

4 The EM-algorithm

152

5 Simulations

152

6 Conclusion

154

References

154

An Artificial Life Approach for Semi- supervised Learning

156

1 Introduction

156

2 Artificial life

157

3 Semi-supervised artificial life

159

4 Semi-Supervised artificial life for cluster analysis

160

5 Experimental settings and results

160

6 Discussion

161

7 Summary

162

References

163

Hard and Soft Euclidean Consensus Partitions

164

1 Introduction

164

2 Theory

166

3 Applications

168

References

170

Rationale Models for Conceptual Modeling

172

1 Subjectivism in the modeling process

172

2 The design rationale approach

173

3 Classification of rationale fragments

175

4 Conclusion

178

References

179

Measures of Dispersion and Cluster-Trees for Categorical Data

180

1 Motivation

180

2 Measures of dispersion

181

3 Segmentation

185

References

186

Information Integration of Partially Labeled Data

188

1 Introduction

188

2 Related work

189

3 Four problem classes

189

4 Method

191

5 Evaluation

194

6 Conclusion

195

References

196

Part III Multidimensional Data Analysis

198

Data Mining of an On-line Survey - A Market Research Application

200

1 Introduction

200

2 Data and objectives

200

3 Methodology and results

201

4 Conclusions

207

References

208

Nonlinear Constrained Principal Component Analysis in the Quality Control Framework

210

1 Introduction

210

2 Constrained principal component analysis

211

3 Nonlinear Constrained Principal Component Analysis

212

4 Stability analysis

214

5 Results and interpretation

214

6 Concluding remarks

216

References

217

Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline

218

1 Introduction

218

2 Multivariate control charts based on projection methods

219

3 Application: monitoring the painting process of hot-rolled aluminium foils

222

4 Conclusion

224

References

224

Simple Non Symmetrical Correspondence Analysis

226

1 Introduction

226

2 Non symmetrical correspondence analysis

227

3 Simple non symmetrical correspondence analysis

228

4 Father’s and son’s occupations data

230

5 Conclusions

232

References

234

Factorial Analysis of a Set of Contingency Tables

236

1 Introduction

236

2 Methodology

237

3 Application

240

4 Discussion

242

5 Software notes

243

References

243

Part IV Analysis of Complex Data

244

Graph Mining: Repository vs. Canonical Form

246

1 Introduction

246

2 Canonical form pruning

247

3 Repository of processed subgraphs

248

4 Comparison

250

5 Experiments

251

6 Summary

252

References

253

Classification and Retrieval of Ancient Watermarks

254

1 Introduction

254

2 Feature extraction

255

3 Results

257

4 Conclusion

261

References

261

Segmentation and Classification of Hyper- Spectral Skin Data

262

1 Introduction

262

2 Labelling

263

3 Classification

265

4 Results

266

5 Conclusion

268

References

269

FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns

270

1 Introduction

270

2 Foundations and related work

271

3 Algorithms FSMSet and FSMTree

273

4 Performance evaluation and conclusions

276

References

277

A Matlab Toolbox for Music Information Retrieval

278

1 Motivation and approach

278

2 Feature extraction

279

3 Data analysis

282

4 Application to the study of music and emotion

283

References

284

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi- Agent Systems

286

1 Introduction

286

2 Framework for modeling and recognizing situations

287

3 Modeling situations

288

4 Recognizing situations

289

5 Evaluation

291

6 Conclusions and further work

292

References

293

Applying the Qn Estimator Online

294

1 Introduction

294

2 An update algorithm for the Qn and the HL estimator

295

3 Comparative study

299

4 Conclusions

300

References

301

A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods

302

1 Introduction

302

2 Polyphonic model

302

3 Extended polyphonic model

304

4 Results

305

5 Conclusion

308

References

308

Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data

310

1 Introduction

310

2 Related work

311

3 Collective classification

311

4 Feature extraction in 2D maps

313

5 Feature selection

313

6 Experiments

314

7 Conclusions

315

8 Acknowledgment

317

References

317

Lag or Error? - Detecting the Nature of Spatial Correlation

318

1 Introduction

318

2 Model and test statistics

319

3 Monte Carlo study

322

4 Results

322

References

324

Part V Exploratory Data Analysis and Tools for Data Analysis

326

Urban Data Mining Using Emergent SOM

328

1 Introduction

328

2 Inspection and transformation of data

329

3 Method

330

4 Results

332

5 Conclusion

333

References

335

The Konstanz Information Miner

336

1 Overview

336

2 Architecture

337

3 Repository

341

4 Extending

342

5 Conclusion

342

References

343

A Pattern Based Data Mining Approach

344

1 Current situation in data mining

344

2 Introduction to patterns

345

3 Some data mining patterns

347

4 Summary and outlook

349

References

351

A Framework for Statistical Entity Identification in

352

1 Introduction

352

2 Methodological framework

353

3 Implementation

354

4 Conclusion and future work

358

References

359

Combining Several SOM Approaches in Data Mining: Application to ADSL Customer Behaviours Analysis

360

1 Introduction

360

2 Network measurements and data description

361

3 Customer segmentation

363

4 Conclusion

370

References

371

On the Analysis of Irregular Stock Market Trading Behavior

372

1 Introduction

372

2 Irregular trading behavior in a market

373

3 Analysis of trading behavior with complex valued Eigensystem analysis

374

4 Analysis of the dataset

375

5 Conclusion

378

References

379

A Procedure to Estimate Relations in a Balanced Scorecard

380

1 Related work

380

2 Balanced scorecards

381

3 Model

383

4 Case study

384

5 Results

385

6 Conclusion and outlook

387

References

387

The Application of Taxonomies in the Context of Configurative Reference Modelling

390

1 Introduction

390

2 Configurative Reference Modelling and the application of taxonomies

391

3 Conclusion

395

4 Outlook

396

References

396

Two-Dimensional Centrality of a Social Network

398

1 Introduction

398

2 The procedure

399

3 The analysis and the result

399

4 Discussion

401

References

405

Benchmarking Open-Source Tree Learners in R/RWeka

406

1 Introduction

406

2 Design of the benchmark experiment

407

3 Results of the benchmark experiment

409

4 Discussion and further work

412

References

413

From Spelling Correction to Text Cleaning – Using Context Information

414

1 Introduction

414

2 Linguistics and context sensitivity

415

3 Framework for text preparation

416

4 Experimental results

419

5 Conclusion and future work

420

References

421

Root Cause Analysis for Quality Management

422

1 Introduction

422

2 Root Cause Analysis

424

3 Computational results

427

4 Conclusion

428

References

429

Finding New Technological Ideas and Inventions with Text Mining and Technique Philosophy

430

1 Introduction

430

2 A common structure for raw and context information

431

3 Relevant aspects for the text mining approach from technique philosophy

433

4 A text mining approach for

435

new ideas and inventions

435

5 Evaluation and outlook

436

6 Acknowledge

436

References

437

Investigating Classifier Learning Behavior with Experiment Databases

438

1 Introduction

438

2 A database for classification experiments

439

3 The experiments

440

4 Using the database

441

5 Conclusions

445

References

445

Part VI Marketing and Management Science

446

Conjoint Analysis for Complex Services Using Clusterwise Hierarchical Bayes Procedures

448

1 Introduction

448

2 Preference measurement for services

449

3 Hierarchical Bayes procedures for conjoint analysis

449

4 Empirical investigation

450

5 Conclusion and outlook

453

References

454

Building an Association Rules Framework for Target Marketing

456

1 Introduction

456

2 A segment-specific view of cross-category associations

457

3 Methodology

458

4 Empirical application

460

5 Conclusion and future work

463

References

463

AHP versus ACA – An Empirical Comparison

464

1 Preference measurement for complex products

464

2 The Analytic Hierarchy Process – AHP

465

3 Design of the empirical study

467

4 Results

468

5 Conclusions and outlook

470

References

471

On the Properties of the Rank Based Multivariate Exponentially Weighted Moving Average Control Charts

472

1 Introduction

472

2 Data depth

472

3 The proposed

473

control chart

473

4 Effect of the reference sample size on

475

control charts

475

performance

475

5 Conclusion

478

Acknowledgements

479

References

479

Are Critical Incidents Really Critical for a Customer Relationship? A MIMIC Approach

480

1 Introduction

480

2 Hypotheses

481

3 Method

483

4 Results

483

5 Discussion

485

References

486

Heterogeneity in the Satisfaction-Retention Relationship – A Finite- mixture Approach

488

1 Introduction

488

2 The Model

490

3 Discussion

494

References

494

An Early-Warning System to Support Activities in the Management of Customer Equity and How to Obtain the Most from Spatial Customer Equity Potentials

496

1 Introduction1

496

2 Strategic customer control dimensions

497

3 Early-warning system

500

4 Empirical example

502

5 Conclusion

503

References

503

Classifying Contemporary Marketing Practices

506

1 Introduction

506

2 Knowledge on interactive marketing

507

3 A Finite Mixture approach for classifying marketing practices

508

4 Empirical application

510

5 Conclusions

513

References

513

Part VII Banking and Finance

514

Predicting Stock Returns with Bayesian Vector Autoregressive Models

516

1 Introduction

516

2 Literature review

517

3 Model

518

4 Empirical study

519

5 Conclusion and outlook

522

References

523

The Evaluation of Venture-Backed IPOs – Certification Model versus Adverse Selection Model, Which Does Fit Better?

524

1 Introduction

524

2 The theoretical

525

background: the certification model

525

and the adverse selection model

525

3 Data set and non-parametric hypothesis tests

526

4 Multivariate investigation tools: Partial Least squares regression model

527

5 Conclusion

530

Acknowledgments

530

References

530

Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets

532

1 Introduction

532

2 SVM models for unbalanced data sets

533

3 Multiple SVM for unbalanced data sets in practice

534

4 Combination of SVM on random input subsets

536

5 Conclusions and outlook

538

References

539

Part VIII Business Intelligence

540

Comparison of Recommender System Algorithms Focusing on the New- item and User- bias Problem

542

1 Introduction

542

2 Related works

543

3 Observed approaches

544

4 Evaluation protocols

546

5 Evaluation and experimental results

546

6 Conclusion

548

References

549

Collaborative Tag Recommendations

550

1 Introduction

550

2 Related work

551

3 Recommender Systems

552

4 Tag Recommender Systems

553

5 Experimental setup and results

554

6 Conclusions

556

7 Acknowledgments

557

References

557

Applying Small Sample Test Statistics for Behavior- based Recommendations

558

1 Introduction

558

2 The ideal decision maker: The decision maker without preferences

559

3 Library meta catalogs: An exemplary application area

560

4 Mathematical notation

561

5 POSICI: Probability Of Single Item Co-Inspections

561

6 POMICI: Probability Of Multiple Items Co-Inspections

562

7 POSICI vs. POMICI

564

8 Conclusions and further research

564

References

565

Part IX Text Mining, Web Mining, and the Semantic Web

568

Classifying Number Expressions in German Corpora

570

1 Introduction

570

2 Classification of number expressions

571

3 Experimental evaluation

574

4 Conclusions and future work

576

References

577

Non-Profit Web Portals - Usage Based Benchmarking for Success Evaluation

578

1 Introduction

578

2 Related work

579

3 Method

580

4 Case study

583

5 Conclusions

584

References

585

Text Mining of Supreme Administrative Court Jurisdictions

586

1 Introduction

586

2 Administrative Supreme Court jurisdictions

587

3 Investigations

587

4 Conclusion

592

References

593

Supporting Web-based Address Extraction with Unsupervised Tagging

594

1 Introduction

594

2 Data preparation

596

3 Unsupervised tagging

596

4 Experiments and evaluation

597

5 Conclusion and further work

600

References

600

A Two-Stage Approach for Context-Dependent Hypernym Extraction

602

1 Introduction

602

2 Document clustering

603

3 Hypernym extraction

604

4 Evaluation

606

5 Conclusion and future work

609

References

609

Analysis of Dwell Times in Web Usage Mining

610

1 Introduction

610

2 Model specification and estimation

611

3 Real life example

614

4 Conclusion

615

References

617

New Issues in Near-duplicate Detection

618

1 Introduction

618

2 Fingerprint construction

620

3 Wikipedia as evaluation corpus

623

4 Summary

625

References

625

Comparing the University of South Florida Homograph Norms with Empirical Corpus Data

628

1 Introduction

628

2 Resources

629

3 Approach

630

4 Results and discussion

632

5 Conclusions and future work

634

Acknowledgments

635

References

635

Content-based Dimensionality Reduction for Recommender Systems

636

1 Introduction

636

2 Related work

637

3 The proposed approach

637

4 Performance study

641

5 Conclusions

643

References

643

Part X Linguistics

644

The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages

646

1 General situation

646

2 Special situation

647

3 The bias

649

4 Solution and operationalization

651

5 Discussion

651

6 Conclusions

652

References

652

Quantitative Text Analysis Using L-, F- and T- Segments

654

1 Introduction

654

2 Data

655

3 Distribution of segment types

656

4 Length distribution of L-segments

657

5 TTR studies

659

6 Conclusion

661

References

661

Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering

664

1 Introduction

664

2 Background and motivation

665

3 Bootstrapping clustering

667

4 Clustering with noise

667

5 Projecting to geography

668

6 Results

669

7 Discussion

669

Acknowledgments

670

References

670

Structural Differentiae of Text Types – A Quantitative Model

672

1 Introduction

672

2 Category selection

673

3 The evaluation procedure

674

4 Exploring the structural homogeneity of text types by means of the Iterative Categorisation Procedure ( ICP)

675

5 Results

676

6 Discussion

676

7 Conclusion

678

References

679

Part XI Data Analysis in Humanities

680

Scenario Evaluation Using Two-mode Clustering Approaches in Higher Education

682

1 Introduction: Scenario analysis

682

2 Two-Mode clustering (for scenario evaluation)

683

3 Example: Scenario evaluation in higher education

685

4 Conclusions

688

References

688

Visualization and Clustering of Tagged Music Data

690

1 Introduction

690

2 Related work

691

3 Emergent Self Organizing Maps

691

4 Data

692

5 Experimental results

694

6 Conclusion and future work

696

References

696

Effects of Data Transformation on Cluster Analysis of Archaeometric Data

698

1 Introduction

698

2 Data transformation in archaeometry

699

3 Transformation into ranks

700

4 Distances and cluster analysis

701

5 Romano-British vessel glass classified

702

6 Roman bricks and tiles classified

703

7 Summary

704

References

704

Fuzzy PLS Path Modeling: A New Tool For Handling Sensory Data

706

1 Introduction

706

2 Fuzzy PLS path modeling

707

3 Application

710

4 Conclusion

712

References

713

Automatic Analysis of Dewey Decimal Classification Notations

714

1 Introduction

714

2 DDC notations

715

3 Automatic analysis of DDC notations

716

4 Results

719

5 Conclusion

720

References

721

A New Interval Data Distance Based on the Wasserstein Metric

722

1 Introduction

722

2 A brief survey of the existing distances

723

3 Our proposal: Wasserstein distance

724

4 Dynamic clustering algorithm using different criterion functions

726

5 Conclusion and perspectives

727

References

728

Keywords

730

Author Index

734