Search and Find
Service
More of the content
Handbook on Analyzing Human Genetic Data - Computational Approaches and Software
Preface and Introduction
5
Contents
10
Population Genetics
14
1 Introduction
14
2 Within-Population Analyses
15
2.1 Genotype and Allele Frequencies
15
2.2 Maximum Likelihood Estimation
16
2.3 Inbreeding Coefficient
17
2.4 Testing for Hardy–Weinberg Equilibrium
19
2.5 Linkage Disequilibrium
21
2.6 Composite Linkage Disequilibrium
22
2.7 Testing for Linkage Equilibrium
23
2.8 Application to Data
24
3 Between-Population and Analyses
29
3.1 F-statistics
29
3.2 Application to Data
33
4 Discussion
35
5 Web Resources
35
References
36
Haplotype Structure
37
1 Population Haplotype Structure
37
1.1 Haplotype Block Structure in Human Populations
37
1.2 Wright–Fisher Model
38
1.3 Coalescent Theory
40
2 Public Genotype/Haplotype Databases
42
2.1 International HapMap Project
43
Download Genotype Data from HapMap
44
2.2 The HapMap ENCODE Resequencingand Genotyping Project
45
2.2.1 Download ENCODE Genotype Data
46
2.3 Haplotype Simulation
46
3 Haploview
48
3.1 What is Haploview?
48
3.2 How to Download and Install Haploview
48
3.3 How to Run Haploview
49
3.3.1 How to Use HapMap Data in Haploview
49
3.3.2 How to Use Non-HapMap Genotype Data in Haploview
50
4 Haplotype Inference Methods
53
4.1 Clark's Algorithm
54
4.1.1 Software Usage
55
4.2 PHASE
58
4.2.1 PHASE Algorithm
59
4.2.2 Software Usage
60
4.3 HAPLOTYPER
65
4.3.1 Bayesian Model
65
4.3.2 Partition–Ligation (PL)
66
4.3.3 Software Usage
68
4.4 CHB
70
4.4.1 CHB Model
70
4.4.2 MCMC Sampling and Convergence
72
4.4.3 Software Usage
73
4.5 Comparison of Phasing Results
76
5 Estimation of Recombination Rate
76
5.1 LDhat
77
5.1.1 Composite Likelihood Estimation of
77
5.1.2 Likelihood Permutation Test
78
5.1.3 Software Usage
79
5.2 HOTSPOTTER
83
5.2.1 PAC Model
83
5.2.2 Computing the Conditional Distribution
85
5.2.3 Software Usage
86
Summary
88
Web Resources
89
References
89
Linkage Analysis of Qualitative Traits
92
1 Introduction
92
2 Model-Based Linkage Analysis
93
2.1 Phase-Known Pedigrees
93
2.2 Phase-Unknown Pedigrees
96
2.3 Linkage Analysis in General Case
98
2.4 Elston–Stewart Algorithm
99
3 Model-Free Linkage Analysis
101
3.1 Fundamental Principle of Model-Free Linkage Analysis
101
3.2 Measure of Genetic Similarity
102
3.3 Model-Free Linkage Analysis for Affected Sib Pairs
103
3.4 Multipoint Analysis for Affected Sib Pairs
106
3.5 Model-Free Linkage Analysis for General Pedigrees
108
3.5.1 Inheritance Vector
108
3.5.2 NPL Score When the Inheritance Vector Is Known
109
3.5.3 NPL Score When the Inheritance Vector Is Uncertain
110
3.6 Lander–Green Algorithm
111
4 Practical Examples
113
5 Identifying SNPs Responsible for a Linkage Signal
117
5.1 Assumptions and Definitions
117
5.2 Conditional Probability of Marker Data Given ASP
118
5.3 Relationship Between Disease Locus and Candidate SNP
119
5.4 Hypothesis Testing
120
5.5 Extension to Sibship Data and Nuclear Families
122
5.6 Summary
124
6 Comparison of Model-Based and Model-Free Linkage Analysis Methods
124
6.1 Software Packages for Linkage Analysis
126
Web Resources
126
References
127
Linkage Analysis of Quantitative Traits
130
1 Introduction and Description of Data
130
2 Methods
132
2.1 Classical Model-Based Linkage Analysis
134
2.2 Model-Free Haseman–Elston Regression Approach
138
2.3 Variance-Components Approaches
139
2.4 Model-Free Variance Regression
145
2.5 Multivariate Models
147
2.6 Joint Linkage and Association Analysis
149
3 Discussion
149
4 Web Resources
151
References
152
Markov Chain Monte Carlo Linkage Analysis Methods
157
1 Introduction
157
2 Test Data
159
2.1 Data from the Framingham study
159
2.2 Simulated data
160
3 MCMC Methods and Packages
161
4 Comparison of Methods
162
4.1 Analysis Strategies
162
4.1.1 Estimation of Segregation Models for TH
163
4.1.2 Linkage Analysis Based on Loki
164
4.1.3 Linkage Analysis Based on MORGAN
164
4.1.4 Linkage Analysis Based on SimWalk2
164
4.2 Comparison of the Three Linkage Analysis Software
165
4.2.1 Framingham Data
165
4.2.2 Simulated Data
169
5 Conclusions, Recommendations, and Other Considerations
173
6 Web Resources
177
References
177
Population-Based Association Studies
180
1 Introduction
180
2 The Data
181
2.1 Association of a Genetic Marker and a Disease
182
2.2 Testing for Association When No PopulationStratification Is Present
184
2.3 False Positive Can Be Aroused When PopulationStratification Is Present
186
3 Genome-Control Approach
186
4 Structured Association Approach
187
5 Methods Based on Principal Components (PC)
189
5.1 Mixture Model
190
5.2 Semi-Parametric Approach
192
5.3 Linear Model Approach
194
6 Discussion
195
Web Resources
196
References
197
Family-Based Association Studies
200
1 Introduction
201
2 Basic Notations
202
3 Qualitative Traits, Trios, Bi-Allelic Markers
203
3.1 Qualitative Traits, Trios, Multi-Allelic Markers
204
4 Family with Multiple Siblings
207
5 Families with Missing Parental Genotypes
210
6 Quantitative Phenotypes
217
7 Joint Analysis of Multiple Markers
221
8 Other Association Methods Using Family-Based Designs
228
8.1 General Pedigrees
228
8.2 Gene–Gene (GG) interaction and Gene–Environment (GE) Interaction
230
9 Software Packages and Power Consideration
231
10 Discussion
236
References
240
Haplotype Association Analysis
250
1 Introduction
250
1.1 The FUSION Study
252
1.2 General Notation
253
2 Haplotype Analysis of Unrelated Samples
253
2.1 Cross-Sectional Studies
253
2.1.1 Analyses Using Phased Haplotypes
253
2.1.2 Analyses Using Unphased Haplotypes
255
2.1.3 Stability Issues in Haplotype Analysis
256
2.1.4 Modeling Interaction Effects
257
2.1.5 Haplotype Clustering
257
2.1.6 Software Packages
259
2.1.7 Software Application to FUSION Data
261
2.2 Cohort Studies
263
2.2.1 Software Packages
264
2.3 Case–Control Studies
264
2.3.1 Related Study Designs
267
2.3.2 Haplotype Similarity Analyses
268
2.3.3 Software Packages
270
2.3.4 Software Application to FUSION Data
271
3 Haplotype Analysis of Family-based Samples
273
3.1 Haplotype Approach of Horvath et al.
274
3.2 Haplotype Approach of Allen and Satten
276
3.3 Software Packages
279
4 Summary
280
Electronic-Database Information
281
References
282
Multiple Comparisons/Testing Issues
286
1 Introduction
286
2 Bonferroni Correction
287
3 False Discovery Rate
288
4 Randomization Testing
289
5 Single Experiment-Wise Test Statistic
291
6 Example Dataset: Parkinson Disease
292
7 Discussion
294
Web Resources
295
References
295
Estimating the Absolute Risk of Disease Associated with Identified Mutations
297
1 Introduction
297
2 Population-Based Cohort Studies
300
3 Case–Control Designs
302
4 Case–Control Family Study Design
303
5 Kin–Cohort Design
308
6 Discussion
312
References
312
Processing Large-Scale, High-Dimension Genetic and Gene Expression Data
314
1 Introduction
314
2 Data Management, Access and Workflow
316
3 Analysis Issues with High-Dimensional Data
318
3.1 Power
318
3.2 Data Trends and Unaccounted for Heterogeneity
320
3.3 Outliers and Transformations
320
4 Implementing a Standard First-Pass Analysis Pipeline
321
4.1 The Model – Common vs. Individual
321
4.2 Estimating Heritability
322
4.3 Ethnicity and Substructure
323
4.4 Multiplicity
323
5 High-Performance Computing
324
6 Further Recommendations for Efficiency Gainsin GOGE Studies
326
7 Constructing Gene Networks to Enhance GWASand GOGE Results
327
7.1 Constructing Weighted and UnweightedCo-Expression Networks
328
7.2 Using Genetics in Constructing Co-Expression Networks
329
7.3 Identifying Modules of Highly Interconnected Genes in Co-Expression Networks
329
8 Looking Toward the Future: Probabilistic Causal Networks
331
9 Summary
332
Web Resources
333
References
334
Index
338
All prices incl. VAT