Search and Find
Service
More of the content
Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007
Preface
6
Contents
10
Part I Classification
19
Distance-based Kernels for Real-valued Data
20
1 Introduction
20
2 Kernels and similarities defined on real numbers
21
3 Semantics and applicability
22
4 Truncated Euclidean similarity
23
5 Canberra distance-based similarity
24
6 Kernels defined on real vectors
25
7 Conclusions
27
References
27
Fast Support Vector Machine Classification of Very Large Datasets
28
1 Introduction
28
2 Linear SVM trees
30
3 Non-linear extension
33
4 Experiments
33
5 Conclusion
34
References
35
Fusion of Multiple Statistical Classifiers
36
1 Introduction
36
2 Classifier fusion
37
3 Diversity of ensemble members
37
4 Combination rules
39
5 Open problems
41
6 Results of experiments
41
7 Conclusions
43
References
43
Calibrating Margin–based Classifier Scores into Polychotomous Probabilities
46
1 Introduction
46
2 Reduction to binary problems
47
3 Coupling probability estimates
47
4 Dirichlet calibration
48
5 Comparison
50
6 Conclusion
53
References
53
Classification with Invariant Distance Substitution Kernels
54
1 Introduction
54
2 Background
55
3 Adjustable invariance
57
4 Positive definiteness
58
5 Classification experiments
60
6 Conclusion
61
References
61
Applying the Kohonen Self-organizing Map Networks to Select Variables
62
1 Introduction
62
2 A proposition to reduce the number of variables
63
3 Applications and results
67
4 Conclusions
69
References
71
Computer Assisted Classification of Brain Tumors
72
1 Introduction
72
2 Algorithms
73
3 Results
75
4 Conclusions
76
References
76
Model Selection in Mixture Regression Analysis – A Monte Carlo Simulation Study
78
1 Introduction
78
2 Model selection in mixture models
79
3 Simulation design
80
4 Results summary
81
5 Key contributions and future research directions
83
References
84
Comparison of Local Classification Methods
86
1 Introduction
86
2 Local classification methods
87
3 Simulation study
89
4 Summary
93
References
93
Incorporating Domain Specific Information into Gaia Source Classification
94
1 Introduction
94
2 Classification and parametrization
95
3 Classification results
96
4 Summary
99
References
100
Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis
102
1 Introduction
102
2 Characteristics of the HINoV method and its modifications
103
3 Simulation models
104
4 Discussion on the simulation results
105
5 Conclusions
107
References
109
Part II Clustering
110
Families of Dendrograms
112
1 Introduction
112
2 A brief introduction to
113
adic geometry
113
3
115
adic dendrograms
115
4 The space of dendrograms
116
5 Distributions on dendrograms
117
6 Hidden vertices
118
7 Conclusions
118
Acknowledgements
119
References
119
Mixture Models in Forward Search Methods for Outlier Detection
120
1 Introduction
120
2 The Forward Search
121
3 Forward Search and Normal Mixture Models: the graphical approach
122
4 Forward Search and Normal Mixture Models: the inferential approach
123
5 Concluding remarks and open issues
126
References
127
On Multiple Imputation Through Finite Gaussian Mixture Models
128
1 Introduction
128
2 Multiple imputation
129
3 Label switching
132
4 Simulation study and results
133
References
134
Mixture Model Based Group Inference in Fused Genotype and Phenotype Data
136
1 Introduction
136
2 Methods
137
3 Results
140
4 Discussion
142
5 Acknowledgements
142
References
142
The Noise Component in Model- based Cluster Analysis
144
1 Introduction
144
2 Two variations on the noise component
149
3 Some theory
150
4 The EM-algorithm
152
5 Simulations
152
6 Conclusion
154
References
154
An Artificial Life Approach for Semi- supervised Learning
156
1 Introduction
156
2 Artificial life
157
3 Semi-supervised artificial life
159
4 Semi-Supervised artificial life for cluster analysis
160
5 Experimental settings and results
160
6 Discussion
161
7 Summary
162
References
163
Hard and Soft Euclidean Consensus Partitions
164
1 Introduction
164
2 Theory
166
3 Applications
168
References
170
Rationale Models for Conceptual Modeling
172
1 Subjectivism in the modeling process
172
2 The design rationale approach
173
3 Classification of rationale fragments
175
4 Conclusion
178
References
179
Measures of Dispersion and Cluster-Trees for Categorical Data
180
1 Motivation
180
2 Measures of dispersion
181
3 Segmentation
185
References
186
Information Integration of Partially Labeled Data
188
1 Introduction
188
2 Related work
189
3 Four problem classes
189
4 Method
191
5 Evaluation
194
6 Conclusion
195
References
196
Part III Multidimensional Data Analysis
198
Data Mining of an On-line Survey - A Market Research Application
200
1 Introduction
200
2 Data and objectives
200
3 Methodology and results
201
4 Conclusions
207
References
208
Nonlinear Constrained Principal Component Analysis in the Quality Control Framework
210
1 Introduction
210
2 Constrained principal component analysis
211
3 Nonlinear Constrained Principal Component Analysis
212
4 Stability analysis
214
5 Results and interpretation
214
6 Concluding remarks
216
References
217
Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline
218
1 Introduction
218
2 Multivariate control charts based on projection methods
219
3 Application: monitoring the painting process of hot-rolled aluminium foils
222
4 Conclusion
224
References
224
Simple Non Symmetrical Correspondence Analysis
226
1 Introduction
226
2 Non symmetrical correspondence analysis
227
3 Simple non symmetrical correspondence analysis
228
4 Father’s and son’s occupations data
230
5 Conclusions
232
References
234
Factorial Analysis of a Set of Contingency Tables
236
1 Introduction
236
2 Methodology
237
3 Application
240
4 Discussion
242
5 Software notes
243
References
243
Part IV Analysis of Complex Data
244
Graph Mining: Repository vs. Canonical Form
246
1 Introduction
246
2 Canonical form pruning
247
3 Repository of processed subgraphs
248
4 Comparison
250
5 Experiments
251
6 Summary
252
References
253
Classification and Retrieval of Ancient Watermarks
254
1 Introduction
254
2 Feature extraction
255
3 Results
257
4 Conclusion
261
References
261
Segmentation and Classification of Hyper- Spectral Skin Data
262
1 Introduction
262
2 Labelling
263
3 Classification
265
4 Results
266
5 Conclusion
268
References
269
FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns
270
1 Introduction
270
2 Foundations and related work
271
3 Algorithms FSMSet and FSMTree
273
4 Performance evaluation and conclusions
276
References
277
A Matlab Toolbox for Music Information Retrieval
278
1 Motivation and approach
278
2 Feature extraction
279
3 Data analysis
282
4 Application to the study of music and emotion
283
References
284
A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi- Agent Systems
286
1 Introduction
286
2 Framework for modeling and recognizing situations
287
3 Modeling situations
288
4 Recognizing situations
289
5 Evaluation
291
6 Conclusions and further work
292
References
293
Applying the Qn Estimator Online
294
1 Introduction
294
2 An update algorithm for the Qn and the HL estimator
295
3 Comparative study
299
4 Conclusions
300
References
301
A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods
302
1 Introduction
302
2 Polyphonic model
302
3 Extended polyphonic model
304
4 Results
305
5 Conclusion
308
References
308
Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data
310
1 Introduction
310
2 Related work
311
3 Collective classification
311
4 Feature extraction in 2D maps
313
5 Feature selection
313
6 Experiments
314
7 Conclusions
315
8 Acknowledgment
317
References
317
Lag or Error? - Detecting the Nature of Spatial Correlation
318
1 Introduction
318
2 Model and test statistics
319
3 Monte Carlo study
322
4 Results
322
References
324
Part V Exploratory Data Analysis and Tools for Data Analysis
326
Urban Data Mining Using Emergent SOM
328
1 Introduction
328
2 Inspection and transformation of data
329
3 Method
330
4 Results
332
5 Conclusion
333
References
335
The Konstanz Information Miner
336
1 Overview
336
2 Architecture
337
3 Repository
341
4 Extending
342
5 Conclusion
342
References
343
A Pattern Based Data Mining Approach
344
1 Current situation in data mining
344
2 Introduction to patterns
345
3 Some data mining patterns
347
4 Summary and outlook
349
References
351
A Framework for Statistical Entity Identification in
352
1 Introduction
352
2 Methodological framework
353
3 Implementation
354
4 Conclusion and future work
358
References
359
Combining Several SOM Approaches in Data Mining: Application to ADSL Customer Behaviours Analysis
360
1 Introduction
360
2 Network measurements and data description
361
3 Customer segmentation
363
4 Conclusion
370
References
371
On the Analysis of Irregular Stock Market Trading Behavior
372
1 Introduction
372
2 Irregular trading behavior in a market
373
3 Analysis of trading behavior with complex valued Eigensystem analysis
374
4 Analysis of the dataset
375
5 Conclusion
378
References
379
A Procedure to Estimate Relations in a Balanced Scorecard
380
1 Related work
380
2 Balanced scorecards
381
3 Model
383
4 Case study
384
5 Results
385
6 Conclusion and outlook
387
References
387
The Application of Taxonomies in the Context of Configurative Reference Modelling
390
1 Introduction
390
2 Configurative Reference Modelling and the application of taxonomies
391
3 Conclusion
395
4 Outlook
396
References
396
Two-Dimensional Centrality of a Social Network
398
1 Introduction
398
2 The procedure
399
3 The analysis and the result
399
4 Discussion
401
References
405
Benchmarking Open-Source Tree Learners in R/RWeka
406
1 Introduction
406
2 Design of the benchmark experiment
407
3 Results of the benchmark experiment
409
4 Discussion and further work
412
References
413
From Spelling Correction to Text Cleaning – Using Context Information
414
1 Introduction
414
2 Linguistics and context sensitivity
415
3 Framework for text preparation
416
4 Experimental results
419
5 Conclusion and future work
420
References
421
Root Cause Analysis for Quality Management
422
1 Introduction
422
2 Root Cause Analysis
424
3 Computational results
427
4 Conclusion
428
References
429
Finding New Technological Ideas and Inventions with Text Mining and Technique Philosophy
430
1 Introduction
430
2 A common structure for raw and context information
431
3 Relevant aspects for the text mining approach from technique philosophy
433
4 A text mining approach for
435
new ideas and inventions
435
5 Evaluation and outlook
436
6 Acknowledge
436
References
437
Investigating Classifier Learning Behavior with Experiment Databases
438
1 Introduction
438
2 A database for classification experiments
439
3 The experiments
440
4 Using the database
441
5 Conclusions
445
References
445
Part VI Marketing and Management Science
446
Conjoint Analysis for Complex Services Using Clusterwise Hierarchical Bayes Procedures
448
1 Introduction
448
2 Preference measurement for services
449
3 Hierarchical Bayes procedures for conjoint analysis
449
4 Empirical investigation
450
5 Conclusion and outlook
453
References
454
Building an Association Rules Framework for Target Marketing
456
1 Introduction
456
2 A segment-specific view of cross-category associations
457
3 Methodology
458
4 Empirical application
460
5 Conclusion and future work
463
References
463
AHP versus ACA – An Empirical Comparison
464
1 Preference measurement for complex products
464
2 The Analytic Hierarchy Process – AHP
465
3 Design of the empirical study
467
4 Results
468
5 Conclusions and outlook
470
References
471
On the Properties of the Rank Based Multivariate Exponentially Weighted Moving Average Control Charts
472
1 Introduction
472
2 Data depth
472
3 The proposed
473
control chart
473
4 Effect of the reference sample size on
475
control charts
475
performance
475
5 Conclusion
478
Acknowledgements
479
References
479
Are Critical Incidents Really Critical for a Customer Relationship? A MIMIC Approach
480
1 Introduction
480
2 Hypotheses
481
3 Method
483
4 Results
483
5 Discussion
485
References
486
Heterogeneity in the Satisfaction-Retention Relationship – A Finite- mixture Approach
488
1 Introduction
488
2 The Model
490
3 Discussion
494
References
494
An Early-Warning System to Support Activities in the Management of Customer Equity and How to Obtain the Most from Spatial Customer Equity Potentials
496
1 Introduction1
496
2 Strategic customer control dimensions
497
3 Early-warning system
500
4 Empirical example
502
5 Conclusion
503
References
503
Classifying Contemporary Marketing Practices
506
1 Introduction
506
2 Knowledge on interactive marketing
507
3 A Finite Mixture approach for classifying marketing practices
508
4 Empirical application
510
5 Conclusions
513
References
513
Part VII Banking and Finance
514
Predicting Stock Returns with Bayesian Vector Autoregressive Models
516
1 Introduction
516
2 Literature review
517
3 Model
518
4 Empirical study
519
5 Conclusion and outlook
522
References
523
The Evaluation of Venture-Backed IPOs – Certification Model versus Adverse Selection Model, Which Does Fit Better?
524
1 Introduction
524
2 The theoretical
525
background: the certification model
525
and the adverse selection model
525
3 Data set and non-parametric hypothesis tests
526
4 Multivariate investigation tools: Partial Least squares regression model
527
5 Conclusion
530
Acknowledgments
530
References
530
Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets
532
1 Introduction
532
2 SVM models for unbalanced data sets
533
3 Multiple SVM for unbalanced data sets in practice
534
4 Combination of SVM on random input subsets
536
5 Conclusions and outlook
538
References
539
Part VIII Business Intelligence
540
Comparison of Recommender System Algorithms Focusing on the New- item and User- bias Problem
542
1 Introduction
542
2 Related works
543
3 Observed approaches
544
4 Evaluation protocols
546
5 Evaluation and experimental results
546
6 Conclusion
548
References
549
Collaborative Tag Recommendations
550
1 Introduction
550
2 Related work
551
3 Recommender Systems
552
4 Tag Recommender Systems
553
5 Experimental setup and results
554
6 Conclusions
556
7 Acknowledgments
557
References
557
Applying Small Sample Test Statistics for Behavior- based Recommendations
558
1 Introduction
558
2 The ideal decision maker: The decision maker without preferences
559
3 Library meta catalogs: An exemplary application area
560
4 Mathematical notation
561
5 POSICI: Probability Of Single Item Co-Inspections
561
6 POMICI: Probability Of Multiple Items Co-Inspections
562
7 POSICI vs. POMICI
564
8 Conclusions and further research
564
References
565
Part IX Text Mining, Web Mining, and the Semantic Web
568
Classifying Number Expressions in German Corpora
570
1 Introduction
570
2 Classification of number expressions
571
3 Experimental evaluation
574
4 Conclusions and future work
576
References
577
Non-Profit Web Portals - Usage Based Benchmarking for Success Evaluation
578
1 Introduction
578
2 Related work
579
3 Method
580
4 Case study
583
5 Conclusions
584
References
585
Text Mining of Supreme Administrative Court Jurisdictions
586
1 Introduction
586
2 Administrative Supreme Court jurisdictions
587
3 Investigations
587
4 Conclusion
592
References
593
Supporting Web-based Address Extraction with Unsupervised Tagging
594
1 Introduction
594
2 Data preparation
596
3 Unsupervised tagging
596
4 Experiments and evaluation
597
5 Conclusion and further work
600
References
600
A Two-Stage Approach for Context-Dependent Hypernym Extraction
602
1 Introduction
602
2 Document clustering
603
3 Hypernym extraction
604
4 Evaluation
606
5 Conclusion and future work
609
References
609
Analysis of Dwell Times in Web Usage Mining
610
1 Introduction
610
2 Model specification and estimation
611
3 Real life example
614
4 Conclusion
615
References
617
New Issues in Near-duplicate Detection
618
1 Introduction
618
2 Fingerprint construction
620
3 Wikipedia as evaluation corpus
623
4 Summary
625
References
625
Comparing the University of South Florida Homograph Norms with Empirical Corpus Data
628
1 Introduction
628
2 Resources
629
3 Approach
630
4 Results and discussion
632
5 Conclusions and future work
634
Acknowledgments
635
References
635
Content-based Dimensionality Reduction for Recommender Systems
636
1 Introduction
636
2 Related work
637
3 The proposed approach
637
4 Performance study
641
5 Conclusions
643
References
643
Part X Linguistics
644
The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages
646
1 General situation
646
2 Special situation
647
3 The bias
649
4 Solution and operationalization
651
5 Discussion
651
6 Conclusions
652
References
652
Quantitative Text Analysis Using L-, F- and T- Segments
654
1 Introduction
654
2 Data
655
3 Distribution of segment types
656
4 Length distribution of L-segments
657
5 TTR studies
659
6 Conclusion
661
References
661
Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering
664
1 Introduction
664
2 Background and motivation
665
3 Bootstrapping clustering
667
4 Clustering with noise
667
5 Projecting to geography
668
6 Results
669
7 Discussion
669
Acknowledgments
670
References
670
Structural Differentiae of Text Types – A Quantitative Model
672
1 Introduction
672
2 Category selection
673
3 The evaluation procedure
674
4 Exploring the structural homogeneity of text types by means of the Iterative Categorisation Procedure ( ICP)
675
5 Results
676
6 Discussion
676
7 Conclusion
678
References
679
Part XI Data Analysis in Humanities
680
Scenario Evaluation Using Two-mode Clustering Approaches in Higher Education
682
1 Introduction: Scenario analysis
682
2 Two-Mode clustering (for scenario evaluation)
683
3 Example: Scenario evaluation in higher education
685
4 Conclusions
688
References
688
Visualization and Clustering of Tagged Music Data
690
1 Introduction
690
2 Related work
691
3 Emergent Self Organizing Maps
691
4 Data
692
5 Experimental results
694
6 Conclusion and future work
696
References
696
Effects of Data Transformation on Cluster Analysis of Archaeometric Data
698
1 Introduction
698
2 Data transformation in archaeometry
699
3 Transformation into ranks
700
4 Distances and cluster analysis
701
5 Romano-British vessel glass classified
702
6 Roman bricks and tiles classified
703
7 Summary
704
References
704
Fuzzy PLS Path Modeling: A New Tool For Handling Sensory Data
706
1 Introduction
706
2 Fuzzy PLS path modeling
707
3 Application
710
4 Conclusion
712
References
713
Automatic Analysis of Dewey Decimal Classification Notations
714
1 Introduction
714
2 DDC notations
715
3 Automatic analysis of DDC notations
716
4 Results
719
5 Conclusion
720
References
721
A New Interval Data Distance Based on the Wasserstein Metric
722
1 Introduction
722
2 A brief survey of the existing distances
723
3 Our proposal: Wasserstein distance
724
4 Dynamic clustering algorithm using different criterion functions
726
5 Conclusion and perspectives
727
References
728
Keywords
730
Author Index
734
All prices incl. VAT