Search and Find

Service

Information & Contact

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

of: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker

Springer-Verlag, 2008

ISBN: 9783540782469 , 719 Pages

Format: PDF, Read online

Copy protection: DRM

Read Online for: Windows PC,Mac OSX,Linux

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

Preface: 6
Contents: 10
Part I Classification: 19
Distance-based Kernels for Real-valued Data: 20
1 Introduction: 20
2 Kernels and similarities defined on real numbers: 21
3 Semantics and applicability: 22
4 Truncated Euclidean similarity: 23
5 Canberra distance-based similarity: 24
6 Kernels defined on real vectors: 25
7 Conclusions: 27
References: 27
Fast Support Vector Machine Classification of Very Large Datasets: 28
1 Introduction: 28
2 Linear SVM trees: 30
3 Non-linear extension: 33
4 Experiments: 33
5 Conclusion: 34
References: 35
Fusion of Multiple Statistical Classifiers: 36
1 Introduction: 36
2 Classifier fusion: 37
3 Diversity of ensemble members: 37
4 Combination rules: 39
5 Open problems: 41
6 Results of experiments: 41
7 Conclusions: 43
References: 43
Calibrating Margin–based Classifier Scores into Polychotomous Probabilities: 46
1 Introduction: 46
2 Reduction to binary problems: 47
3 Coupling probability estimates: 47
4 Dirichlet calibration: 48
5 Comparison: 50
6 Conclusion: 53
References: 53
Classification with Invariant Distance Substitution Kernels: 54
1 Introduction: 54
2 Background: 55
3 Adjustable invariance: 57
4 Positive definiteness: 58
5 Classification experiments: 60
6 Conclusion: 61
References: 61
Applying the Kohonen Self-organizing Map Networks to Select Variables: 62
1 Introduction: 62
2 A proposition to reduce the number of variables: 63
3 Applications and results: 67
4 Conclusions: 69
References: 71
Computer Assisted Classification of Brain Tumors: 72
1 Introduction: 72
2 Algorithms: 73
3 Results: 75
4 Conclusions: 76
References: 76
Model Selection in Mixture Regression Analysis – A Monte Carlo Simulation Study: 78
1 Introduction: 78
2 Model selection in mixture models: 79
3 Simulation design: 80
4 Results summary: 81
5 Key contributions and future research directions: 83
References: 84
Comparison of Local Classification Methods: 86
1 Introduction: 86
2 Local classification methods: 87
3 Simulation study: 89
4 Summary: 93
References: 93
Incorporating Domain Specific Information into Gaia Source Classification: 94
1 Introduction: 94
2 Classification and parametrization: 95
3 Classification results: 96
4 Summary: 99
References: 100
Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis: 102
1 Introduction: 102
2 Characteristics of the HINoV method and its modifications: 103
3 Simulation models: 104
4 Discussion on the simulation results: 105
5 Conclusions: 107
References: 109
Part II Clustering: 110
Families of Dendrograms: 112
1 Introduction: 112
2 A brief introduction to: 113
adic geometry: 113
3: 115
adic dendrograms: 115
4 The space of dendrograms: 116
5 Distributions on dendrograms: 117
6 Hidden vertices: 118
7 Conclusions: 118
Acknowledgements: 119
References: 119
Mixture Models in Forward Search Methods for Outlier Detection: 120
1 Introduction: 120
2 The Forward Search: 121
3 Forward Search and Normal Mixture Models: the graphical approach: 122
4 Forward Search and Normal Mixture Models: the inferential approach: 123
5 Concluding remarks and open issues: 126
References: 127
On Multiple Imputation Through Finite Gaussian Mixture Models: 128
1 Introduction: 128
2 Multiple imputation: 129
3 Label switching: 132
4 Simulation study and results: 133
References: 134
Mixture Model Based Group Inference in Fused Genotype and Phenotype Data: 136
1 Introduction: 136
2 Methods: 137
3 Results: 140
4 Discussion: 142
5 Acknowledgements: 142
References: 142
The Noise Component in Model- based Cluster Analysis: 144
1 Introduction: 144
2 Two variations on the noise component: 149
3 Some theory: 150
4 The EM-algorithm: 152
5 Simulations: 152
6 Conclusion: 154
References: 154
An Artificial Life Approach for Semi- supervised Learning: 156
1 Introduction: 156
2 Artificial life: 157
3 Semi-supervised artificial life: 159
4 Semi-Supervised artificial life for cluster analysis: 160
5 Experimental settings and results: 160
6 Discussion: 161
7 Summary: 162
References: 163
Hard and Soft Euclidean Consensus Partitions: 164
1 Introduction: 164
2 Theory: 166
3 Applications: 168
References: 170
Rationale Models for Conceptual Modeling: 172
1 Subjectivism in the modeling process: 172
2 The design rationale approach: 173
3 Classification of rationale fragments: 175
4 Conclusion: 178
References: 179
Measures of Dispersion and Cluster-Trees for Categorical Data: 180
1 Motivation: 180
2 Measures of dispersion: 181
3 Segmentation: 185
References: 186
Information Integration of Partially Labeled Data: 188
1 Introduction: 188
2 Related work: 189
3 Four problem classes: 189
4 Method: 191
5 Evaluation: 194
6 Conclusion: 195
References: 196
Part III Multidimensional Data Analysis: 198
Data Mining of an On-line Survey - A Market Research Application: 200
1 Introduction: 200
2 Data and objectives: 200
3 Methodology and results: 201
4 Conclusions: 207
References: 208
Nonlinear Constrained Principal Component Analysis in the Quality Control Framework: 210
1 Introduction: 210
2 Constrained principal component analysis: 211
3 Nonlinear Constrained Principal Component Analysis: 212
4 Stability analysis: 214
5 Results and interpretation: 214
6 Concluding remarks: 216
References: 217
Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline: 218
1 Introduction: 218
2 Multivariate control charts based on projection methods: 219
3 Application: monitoring the painting process of hot-rolled aluminium foils: 222
4 Conclusion: 224
References: 224
Simple Non Symmetrical Correspondence Analysis: 226
1 Introduction: 226
2 Non symmetrical correspondence analysis: 227
3 Simple non symmetrical correspondence analysis: 228
4 Father’s and son’s occupations data: 230
5 Conclusions: 232
References: 234
Factorial Analysis of a Set of Contingency Tables: 236
1 Introduction: 236
2 Methodology: 237
3 Application: 240
4 Discussion: 242
5 Software notes: 243
References: 243
Part IV Analysis of Complex Data: 244
Graph Mining: Repository vs. Canonical Form: 246
1 Introduction: 246
2 Canonical form pruning: 247
3 Repository of processed subgraphs: 248
4 Comparison: 250
5 Experiments: 251
6 Summary: 252
References: 253
Classification and Retrieval of Ancient Watermarks: 254
1 Introduction: 254
2 Feature extraction: 255
3 Results: 257
4 Conclusion: 261
References: 261
Segmentation and Classification of Hyper- Spectral Skin Data: 262
1 Introduction: 262
2 Labelling: 263
3 Classification: 265
4 Results: 266
5 Conclusion: 268
References: 269
FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns: 270
1 Introduction: 270
2 Foundations and related work: 271
3 Algorithms FSMSet and FSMTree: 273
4 Performance evaluation and conclusions: 276
References: 277
A Matlab Toolbox for Music Information Retrieval: 278
1 Motivation and approach: 278
2 Feature extraction: 279
3 Data analysis: 282
4 Application to the study of music and emotion: 283
References: 284
A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi- Agent Systems: 286
1 Introduction: 286
2 Framework for modeling and recognizing situations: 287
3 Modeling situations: 288
4 Recognizing situations: 289
5 Evaluation: 291
6 Conclusions and further work: 292
References: 293
Applying the Qn Estimator Online: 294
1 Introduction: 294
2 An update algorithm for the Qn and the HL estimator: 295
3 Comparative study: 299
4 Conclusions: 300
References: 301
A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods: 302
1 Introduction: 302
2 Polyphonic model: 302
3 Extended polyphonic model: 304
4 Results: 305
5 Conclusion: 308
References: 308
Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data: 310
1 Introduction: 310
2 Related work: 311
3 Collective classification: 311
4 Feature extraction in 2D maps: 313
5 Feature selection: 313
6 Experiments: 314
7 Conclusions: 315
8 Acknowledgment: 317
References: 317
Lag or Error? - Detecting the Nature of Spatial Correlation: 318
1 Introduction: 318
2 Model and test statistics: 319
3 Monte Carlo study: 322
4 Results: 322
References: 324
Part V Exploratory Data Analysis and Tools for Data Analysis: 326
Urban Data Mining Using Emergent SOM: 328
1 Introduction: 328
2 Inspection and transformation of data: 329
3 Method: 330
4 Results: 332
5 Conclusion: 333
References: 335
The Konstanz Information Miner: 336
1 Overview: 336
2 Architecture: 337
3 Repository: 341
4 Extending: 342
5 Conclusion: 342
References: 343
A Pattern Based Data Mining Approach: 344
1 Current situation in data mining: 344
2 Introduction to patterns: 345
3 Some data mining patterns: 347
4 Summary and outlook: 349
References: 351
A Framework for Statistical Entity Identification in: 352
1 Introduction: 352
2 Methodological framework: 353
3 Implementation: 354
4 Conclusion and future work: 358
References: 359
Combining Several SOM Approaches in Data Mining: Application to ADSL Customer Behaviours Analysis: 360
1 Introduction: 360
2 Network measurements and data description: 361
3 Customer segmentation: 363
4 Conclusion: 370
References: 371
On the Analysis of Irregular Stock Market Trading Behavior: 372
1 Introduction: 372
2 Irregular trading behavior in a market: 373
3 Analysis of trading behavior with complex valued Eigensystem analysis: 374
4 Analysis of the dataset: 375
5 Conclusion: 378
References: 379
A Procedure to Estimate Relations in a Balanced Scorecard: 380
1 Related work: 380
2 Balanced scorecards: 381
3 Model: 383
4 Case study: 384
5 Results: 385
6 Conclusion and outlook: 387
References: 387
The Application of Taxonomies in the Context of Configurative Reference Modelling: 390
1 Introduction: 390
2 Configurative Reference Modelling and the application of taxonomies: 391
3 Conclusion: 395
4 Outlook: 396
References: 396
Two-Dimensional Centrality of a Social Network: 398
1 Introduction: 398
2 The procedure: 399
3 The analysis and the result: 399
4 Discussion: 401
References: 405
Benchmarking Open-Source Tree Learners in R/RWeka: 406
1 Introduction: 406
2 Design of the benchmark experiment: 407
3 Results of the benchmark experiment: 409
4 Discussion and further work: 412
References: 413
From Spelling Correction to Text Cleaning – Using Context Information: 414
1 Introduction: 414
2 Linguistics and context sensitivity: 415
3 Framework for text preparation: 416
4 Experimental results: 419
5 Conclusion and future work: 420
References: 421
Root Cause Analysis for Quality Management: 422
1 Introduction: 422
2 Root Cause Analysis: 424
3 Computational results: 427
4 Conclusion: 428
References: 429
Finding New Technological Ideas and Inventions with Text Mining and Technique Philosophy: 430
1 Introduction: 430
2 A common structure for raw and context information: 431
3 Relevant aspects for the text mining approach from technique philosophy: 433
4 A text mining approach for: 435
new ideas and inventions: 435
5 Evaluation and outlook: 436
6 Acknowledge: 436
References: 437
Investigating Classifier Learning Behavior with Experiment Databases: 438
1 Introduction: 438
2 A database for classification experiments: 439
3 The experiments: 440
4 Using the database: 441
5 Conclusions: 445
References: 445
Part VI Marketing and Management Science: 446
Conjoint Analysis for Complex Services Using Clusterwise Hierarchical Bayes Procedures: 448
1 Introduction: 448
2 Preference measurement for services: 449
3 Hierarchical Bayes procedures for conjoint analysis: 449
4 Empirical investigation: 450
5 Conclusion and outlook: 453
References: 454
Building an Association Rules Framework for Target Marketing: 456
1 Introduction: 456
2 A segment-specific view of cross-category associations: 457
3 Methodology: 458
4 Empirical application: 460
5 Conclusion and future work: 463
References: 463
AHP versus ACA – An Empirical Comparison: 464
1 Preference measurement for complex products: 464
2 The Analytic Hierarchy Process – AHP: 465
3 Design of the empirical study: 467
4 Results: 468
5 Conclusions and outlook: 470
References: 471
On the Properties of the Rank Based Multivariate Exponentially Weighted Moving Average Control Charts: 472
1 Introduction: 472
2 Data depth: 472
3 The proposed: 473
control chart: 473
4 Effect of the reference sample size on: 475
control charts: 475
performance: 475
5 Conclusion: 478
Acknowledgements: 479
References: 479
Are Critical Incidents Really Critical for a Customer Relationship? A MIMIC Approach: 480
1 Introduction: 480
2 Hypotheses: 481
3 Method: 483
4 Results: 483
5 Discussion: 485
References: 486
Heterogeneity in the Satisfaction-Retention Relationship – A Finite- mixture Approach: 488
1 Introduction: 488
2 The Model: 490
3 Discussion: 494
References: 494
An Early-Warning System to Support Activities in the Management of Customer Equity and How to Obtain the Most from Spatial Customer Equity Potentials: 496
1 Introduction1: 496
2 Strategic customer control dimensions: 497
3 Early-warning system: 500
4 Empirical example: 502
5 Conclusion: 503
References: 503
Classifying Contemporary Marketing Practices: 506
1 Introduction: 506
2 Knowledge on interactive marketing: 507
3 A Finite Mixture approach for classifying marketing practices: 508
4 Empirical application: 510
5 Conclusions: 513
References: 513
Part VII Banking and Finance: 514
Predicting Stock Returns with Bayesian Vector Autoregressive Models: 516
1 Introduction: 516
2 Literature review: 517
3 Model: 518
4 Empirical study: 519
5 Conclusion and outlook: 522
References: 523
The Evaluation of Venture-Backed IPOs – Certification Model versus Adverse Selection Model, Which Does Fit Better?: 524
1 Introduction: 524
2 The theoretical: 525
background: the certification model: 525
and the adverse selection model: 525
3 Data set and non-parametric hypothesis tests: 526
4 Multivariate investigation tools: Partial Least squares regression model: 527
5 Conclusion: 530
Acknowledgments: 530
References: 530
Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets: 532
1 Introduction: 532
2 SVM models for unbalanced data sets: 533
3 Multiple SVM for unbalanced data sets in practice: 534
4 Combination of SVM on random input subsets: 536
5 Conclusions and outlook: 538
References: 539
Part VIII Business Intelligence: 540
Comparison of Recommender System Algorithms Focusing on the New- item and User- bias Problem: 542
1 Introduction: 542
2 Related works: 543
3 Observed approaches: 544
4 Evaluation protocols: 546
5 Evaluation and experimental results: 546
6 Conclusion: 548
References: 549
Collaborative Tag Recommendations: 550
1 Introduction: 550
2 Related work: 551
3 Recommender Systems: 552
4 Tag Recommender Systems: 553
5 Experimental setup and results: 554
6 Conclusions: 556
7 Acknowledgments: 557
References: 557
Applying Small Sample Test Statistics for Behavior- based Recommendations: 558
1 Introduction: 558
2 The ideal decision maker: The decision maker without preferences: 559
3 Library meta catalogs: An exemplary application area: 560
4 Mathematical notation: 561
5 POSICI: Probability Of Single Item Co-Inspections: 561
6 POMICI: Probability Of Multiple Items Co-Inspections: 562
7 POSICI vs. POMICI: 564
8 Conclusions and further research: 564
References: 565
Part IX Text Mining, Web Mining, and the Semantic Web: 568
Classifying Number Expressions in German Corpora: 570
1 Introduction: 570
2 Classification of number expressions: 571
3 Experimental evaluation: 574
4 Conclusions and future work: 576
References: 577
Non-Profit Web Portals - Usage Based Benchmarking for Success Evaluation: 578
1 Introduction: 578
2 Related work: 579
3 Method: 580
4 Case study: 583
5 Conclusions: 584
References: 585
Text Mining of Supreme Administrative Court Jurisdictions: 586
1 Introduction: 586
2 Administrative Supreme Court jurisdictions: 587
3 Investigations: 587
4 Conclusion: 592
References: 593
Supporting Web-based Address Extraction with Unsupervised Tagging: 594
1 Introduction: 594
2 Data preparation: 596
3 Unsupervised tagging: 596
4 Experiments and evaluation: 597
5 Conclusion and further work: 600
References: 600
A Two-Stage Approach for Context-Dependent Hypernym Extraction: 602
1 Introduction: 602
2 Document clustering: 603
3 Hypernym extraction: 604
4 Evaluation: 606
5 Conclusion and future work: 609
References: 609
Analysis of Dwell Times in Web Usage Mining: 610
1 Introduction: 610
2 Model specification and estimation: 611
3 Real life example: 614
4 Conclusion: 615
References: 617
New Issues in Near-duplicate Detection: 618
1 Introduction: 618
2 Fingerprint construction: 620
3 Wikipedia as evaluation corpus: 623
4 Summary: 625
References: 625
Comparing the University of South Florida Homograph Norms with Empirical Corpus Data: 628
1 Introduction: 628
2 Resources: 629
3 Approach: 630
4 Results and discussion: 632
5 Conclusions and future work: 634
Acknowledgments: 635
References: 635
Content-based Dimensionality Reduction for Recommender Systems: 636
1 Introduction: 636
2 Related work: 637
3 The proposed approach: 637
4 Performance study: 641
5 Conclusions: 643
References: 643
Part X Linguistics: 644
The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages: 646
1 General situation: 646
2 Special situation: 647
3 The bias: 649
4 Solution and operationalization: 651
5 Discussion: 651
6 Conclusions: 652
References: 652
Quantitative Text Analysis Using L-, F- and T- Segments: 654
1 Introduction: 654
2 Data: 655
3 Distribution of segment types: 656
4 Length distribution of L-segments: 657
5 TTR studies: 659
6 Conclusion: 661
References: 661
Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering: 664
1 Introduction: 664
2 Background and motivation: 665
3 Bootstrapping clustering: 667
4 Clustering with noise: 667
5 Projecting to geography: 668
6 Results: 669
7 Discussion: 669
Acknowledgments: 670
References: 670
Structural Differentiae of Text Types – A Quantitative Model: 672
1 Introduction: 672
2 Category selection: 673
3 The evaluation procedure: 674
4 Exploring the structural homogeneity of text types by means of the Iterative Categorisation Procedure ( ICP): 675
5 Results: 676
6 Discussion: 676
7 Conclusion: 678
References: 679
Part XI Data Analysis in Humanities: 680
Scenario Evaluation Using Two-mode Clustering Approaches in Higher Education: 682
1 Introduction: Scenario analysis: 682
2 Two-Mode clustering (for scenario evaluation): 683
3 Example: Scenario evaluation in higher education: 685
4 Conclusions: 688
References: 688
Visualization and Clustering of Tagged Music Data: 690
1 Introduction: 690
2 Related work: 691
3 Emergent Self Organizing Maps: 691
4 Data: 692
5 Experimental results: 694
6 Conclusion and future work: 696
References: 696
Effects of Data Transformation on Cluster Analysis of Archaeometric Data: 698
1 Introduction: 698
2 Data transformation in archaeometry: 699
3 Transformation into ranks: 700
4 Distances and cluster analysis: 701
5 Romano-British vessel glass classified: 702
6 Roman bricks and tiles classified: 703
7 Summary: 704
References: 704
Fuzzy PLS Path Modeling: A New Tool For Handling Sensory Data: 706
1 Introduction: 706
2 Fuzzy PLS path modeling: 707
3 Application: 710
4 Conclusion: 712
References: 713
Automatic Analysis of Dewey Decimal Classification Notations: 714
1 Introduction: 714
2 DDC notations: 715
3 Automatic analysis of DDC notations: 716
4 Results: 719
5 Conclusion: 720
References: 721
A New Interval Data Distance Based on the Wasserstein Metric: 722
1 Introduction: 722
2 A brief survey of the existing distances: 723
3 Our proposal: Wasserstein distance: 724
4 Dynamic clustering algorithm using different criterion functions: 726
5 Conclusion and perspectives: 727
References: 728
Keywords: 730
Author Index: 734

All prices incl. VAT

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

of: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker

Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

Preface

Contents

Part I Classification

Distance-based Kernels for Real-valued Data

1 Introduction

2 Kernels and similarities defined on real numbers

3 Semantics and applicability

4 Truncated Euclidean similarity

5 Canberra distance-based similarity

6 Kernels defined on real vectors

7 Conclusions

References

Fast Support Vector Machine Classification of Very Large Datasets

1 Introduction

2 Linear SVM trees

3 Non-linear extension

4 Experiments

5 Conclusion

References

Fusion of Multiple Statistical Classifiers

1 Introduction

2 Classifier fusion

3 Diversity of ensemble members

4 Combination rules

5 Open problems

6 Results of experiments

7 Conclusions

References

Calibrating Margin–based Classifier Scores into Polychotomous Probabilities

1 Introduction

2 Reduction to binary problems

3 Coupling probability estimates

4 Dirichlet calibration

5 Comparison

6 Conclusion

References

Classification with Invariant Distance Substitution Kernels

1 Introduction

2 Background

3 Adjustable invariance

4 Positive definiteness

5 Classification experiments

6 Conclusion

References

Applying the Kohonen Self-organizing Map Networks to Select Variables

1 Introduction

2 A proposition to reduce the number of variables

3 Applications and results

4 Conclusions

References

Computer Assisted Classification of Brain Tumors

1 Introduction

2 Algorithms

3 Results

4 Conclusions

References

Model Selection in Mixture Regression Analysis – A Monte Carlo Simulation Study

1 Introduction

2 Model selection in mixture models

3 Simulation design

4 Results summary

5 Key contributions and future research directions

References

Comparison of Local Classification Methods

1 Introduction

2 Local classification methods

3 Simulation study

4 Summary

References

Incorporating Domain Specific Information into Gaia Source Classification

1 Introduction

2 Classification and parametrization

3 Classification results

4 Summary

References

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

1 Introduction