Search and Find
Service
Table of Contents
6
Editors, Reviewers, and Authors
12
Introduction
20
Section 1: Scientific Simulation
22
Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals
26
1.1.Introduction, Problem Statement, and Context
26
1.2.Core Method
27
1.3.Algorithms, Implementations, and Evaluations
29
1.4.Final Evaluation
37
1.5.Future Directions
39
References
39
Chapter 2. Large-Scale Chemical Informatics on GPUs
40
2.1.Introduction, Problem Statement, and Context
40
2.2.Core Methods
43
2.3.Gaussian Shape Overlay: Parallelization and Arithmetic Optimization
43
2.4.LINGO: Algorithmic Transformation and Memory Optimization
48
2.5.Final Evaluation
51
2.6.Future Directions
54
Acknowledgments
54
References
55
Chapter 3. Dynamical Quadrature Grids: Applications in Density Functional Calculations
56
3.1.Introduction
56
3.2.Core Method
57
3.3.Implementation
58
3.4.Performance Improvement
60
3.5.Future Work
62
References
63
Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs
64
4.1.Introduction, Problem Statement, and Context
64
4.2.Core Method
66
4.3.Algorithms, Implementations, and Evaluations
66
4.4.Final Evaluation
75
4.5.Future Directions
79
References
79
Chapter 5. Quantum Chemistry: Propagation of Electronic Structure on a GPU
80
5.1.Problem Statement
80
5.2.Core Technology and Algorithm
82
5.3.The Key Insight on the Implementation—the Choice of Building Blocks
86
5.4.Final Evaluation and Benefits
90
5.5.Conclusions and Future Directions
93
Acknowledgments
93
References
94
Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
96
6.1.Introduction, Problem Statement, and Context
96
6.2.Core Methods
97
6.3.Algorithms and Implementations
99
6.4.Evaluation and Validation of Results, Total Benefits, and Limitations
109
6.5.Future Directions
113
Acknowledgments
113
References
113
Chapter 7. Leveraging the Untapped Computation Power of GPUs: Fast Spectral Synthesis Using Texture Interpolation
114
7.1.Background and Problem Statement
114
7.2.Flux Calculation and Aggregation
116
7.3.The GRASSY Platform
118
7.4.Initial Testing
121
7.5.Impact and Future Directions
122
Acknowledgments
122
References
123
Chapter 8. Black Hole Simulations with CUDA
124
8.1.Introduction
124
8.2.The Post-Newtonian Approximation
125
8.3.Numerical Algorithm
126
8.4.GPU Implementation
127
8.5.Performance Results
128
8.6.GPU Supercomputing Clusters
128
8.7.Statistical Results for Black Hole Inspirals
130
8.8.Conclusion
130
Acknowledgments
131
References
131
Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA
134
9.1.Introduction
134
9.2.Fast N-Body Simulation
135
9.3.CUDA Implementation of the Fast N-Body Algorithms
137
9.4.Improvements of Performance
141
9.5.Detailed Description of the GPU Kernels
143
9.6.Overview of Advanced Techniques
150
9.7.Conclusions
152
References
152
Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures
154
10.1.Introduction, Problem Statement, and Context
154
10.2.Core Method
156
10.3.Algorithms, Implementations, and Evaluations
159
10.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
165
10.5.Conclusions and Future Directions
168
References
172
Section 2: Life Sciences
174
Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm
176
11.1.Introduction, Problem Statement, and Context
176
11.2.Core Method
177
11.3.CUDA Implementation of the SW Algorithm for Identification of Homologous Proteins
177
11.4.Discussion
190
11.5.Final Evaluation
191
References
191
Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching
194
12.1.Introduction, Problem Statement, and Context
194
12.2.Core Methods
195
12.3.Algorithms, Implementations, and Evaluations
197
12.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
204
12.5.Future Directions
204
References
205
Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching
206
13.1.Introduction, Problem Statement, and Context
206
13.2.Core Method
207
13.3.Algorithms, Implementations, and Evaluations
208
13.4.Final Evaluation
214
13.5.Future Direction
217
Acknowledgments
217
Appendix
217
References
219
Chapter 14. GPU Accelerated RNA Folding Algorithm
220
14.1.Problem Statement
220
14.2.Core Method
221
14.3.Algorithms, Implementations, and Evaluations
222
14.4.Final Evaluation
228
14.5.Future Directions
230
References
230
Chapter 15. Temporal Data Mining for Neuroscience
232
15.1.Introduction
232
15.2.Core Methodology
233
15.3.GPU Parallelization: Algorithms and Implementations
235
15.4.Experimental Results
243
15.5.Discussion
247
References
248
Section 3: Statistical Modeling
250
Chapter 16. Parallelization Techniques for Random Number Generators
252
16.1.Introduction
252
16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a
253
16.3.Sobol Generator
256
16.4.Mersenne Twister MT19937
258
16.5.Performance Benchmarks
263
Acknowledgments
265
References
266
Chapter 17. Monte Carlo Photon Transport on the GPU
268
17.1.Physics of Photon Transport
268
17.2.Photon Transport on the GPU
270
17.3.The Complete System
277
17.4.Results and Evaluation
279
17.5.Future Directions
280
References
282
Chapter 18. High-Performance Iterated Function Systems
284
18.1.Problem Statement and Mathematical Background
284
18.2.Core Technology
287
18.3.Implementation
287
18.4.Final Evaluation
291
18.5.Conclusion
293
References
293
Section 4: Emerging Data-Intensive Applications
296
Chapter 19. Large-Scale Machine Learning
298
19.1.Introduction
298
19.2.Core Technology
299
19.3.GPU Algorithm and Implementation
301
19.4.Improvements of Performance
308
19.5.Conclusions and Future Work
311
Acknowledgments
312
References
312
Chapter 20. Multiclass Support Vector Machine
314
20.1.Introduction, Problem Statement, and Context
314
20.2.Core Method
315
20.3.Algorithms, Implementations, and Evaluations
317
20.4.Final Evaluation
327
20.5.Future Direction
331
References
331
Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA
334
21.1.Introduction, Problem Statement, and Context
334
21.2.Final Evaluation and Validation of Results
341
21.3.Conclusions, Benefits and Limitations, and Future Work
344
References
345
Chapter 22. GPU-Accelerated Ant Colony Optimization
346
22.1.Introduction, Problem Statement, and Context
346
22.2.Core Method
347
22.3.Algorithms, Implementations, and Evaluations
348
22.4.Final Evaluation
358
22.5.Future Direction
360
Acknowledgments
361
References
361
Section 5: Electronic Design Automation
362
Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs
364
23.1.Introduction
364
23.2.Simulator Overview
366
23.3.Compilation and Simulation
368
23.4.Experimental Results
376
23.5.Future Directions
383
Related Work
384
References
384
Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization
386
24.1.Introduction, Problem Statement, and Context
386
24.2.Core Method
388
24.3.Algorithms, Implementations, and Evaluations
390
24.4.Final Evaluation
394
24.5.Future Direction
397
References
399
Section 6: Ray Tracing and Rendering
400
Chapter 25. Lattice Boltzmann Lighting Models
402
25.1.Introduction, Problem Statement, and Context
402
25.2.Core Methods
403
25.3.Algorithms, Implementation, and Evaluation
404
25.4.Final Evaluation
414
25.5.Future Directions
416
25.6.Derivation of the Diffusion Equation
416
Acknowledgments
419
References
419
Chapter 26. Path Regeneration for Random Walks
422
26.1.Introduction
422
26.2.Path Tracing as Case Study
423
26.3.Random Walks in Path Tracing
423
26.4.Implementation Details
427
26.5.Results
429
26.6.Discussion
432
Acknowledgments
432
References
433
Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation
434
27.1.System Overview
434
27.2.Background
435
27.3.Core Technology and Algorithms
435
27.4.Future Directions
446
Acknowledgments
447
References
447
Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency
448
28.1.Introduction, Problem Statement, and Context
448
28.2.Core Method
449
28.3.Algorithms, Implementations, and Evaluations
449
28.4.Final Evaluation
454
28.5.Future Direction
456
References
456
Section 7: Computer Vision
458
Chapter 29. Fast Graph Cuts for Computer Vision
460
29.1.Introduction, Problem Statement, and Context
460
29.2.Core Method
460
29.3.Algorithms, Implementations, and Evaluations
461
29.4.Final evaluation and validation of results
468
29.5.Multilabel Graph Cuts
469
References
471
Chapter 30. Visual Saliency Model on Multi-GPU
472
30.1.Introduction
472
30.2.Visual Saliency Model
473
30.3.GPU Implementation
475
30.4.Results
487
30.5.Conclusion
492
References
492
Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows
494
31.1.Introduction, Problem Statement, and Context
494
31.2.Core Method
496
References
515
Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU
518
32.1.Introduction
518
32.2.Methods
520
32.3.Implementation
526
32.4.Results and Discussion
528
32.5.Conclusion and Future Work
534
References
535
Chapter 33. Haar Classifiers for Object Detection with CUDA
538
33.1.Introduction
538
33.2.Viola-Jones Object Detection Retrospective
538
33.3.Object Detection Pipeline with NVIDIA CUDA
547
33.4.Benchmarking and Implementation Details
562
33.5.Future Direction
564
33.6.Conclusion
564
References
564
Section 8: Video and Image Processing
566
Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL
568
34.1.Introduction, Problem Statement, and Background
568
34.2.Core Technology or Algorithm
569
34.3.Key Insights from Implementation and Evaluation
572
34.4.Final Evaluation
586
References
588
Chapter 35. Connected Component Labeling in CUDA
590
35.1.Introduction
590
35.2.Core Algorithm
591
35.3.CUDA Algorithm and Implementation
593
35.4.Final Evaluation and Results
598
References
602
Chapter 36. Image De-Mosaicing
604
36.1.Introduction, Problem Statement, and Context
604
36.2.Core Method
606
36.3.Algorithms, Implementations, and Evaluations
606
36.4.Final Evaluation
618
References
619
Section 9: Signal and Audio Processing
620
Chapter 37. Efficient Automatic Speech Recognition on the GPU
622
37.1.Introduction, Problem Statement, and Context
622
37.2.Core Methods
624
37.3.Algorithms, Implementations, and Evaluations
625
37.4.Conclusion and Future Directions
636
References
638
Chapter 38. Parallel LDPC Decoding
640
38.1.Introduction, Problem Statement, and Context
640
38.2.Core Technology
641
38.3.Algorithms, Implementations, and Evaluations
643
38.4.Final Evaluation
647
38.5.Future Directions
648
References
648
Chapter 39. Large-Scale Fast Fourier Transform
650
39.1.Introduction
650
39.2.Memory Hierarchy of GPU Clusters
652
39.3.Large-Scale Fast Fourier Transform
654
39.4.Algebraic Manipulation of Array Dimensions
656
39.5.Performance Results
660
39.6.Conclusion and Future Work
660
References
663
Section 10: Medical Imaging
664
Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis
668
40.1.Introduction
668
40.2.Digital Breast Tomosynthesis
670
40.3.Accelerating Iterative DBT using GPUs
671
40.4.Conclusions
677
Acknowledgments
677
References
678
Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU
680
41.1.Introduction, Problem, and Context
680
41.2.Core Methods
680
41.3.Algorithms, Implementations, and Evaluations
682
41.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
693
41.5.Related Work
696
41.6.Future Directions
697
41.7.Summary
697
References
697
Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA
700
42.1.Introduction
700
42.2.Core Methods
703
42.3.Implementation
705
42.4.Evaluation and Validation of Results, Total Benefits, and Limitations
707
42.5.Future Directions
711
References
712
Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms
714
43.1.Introduction, Problem Statement, and Context
714
43.2.Core Method(s)
715
43.3.Algorithms, Implementations, and Evaluations
716
43.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
721
43.5.Future Directions
727
References
728
Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation
730
44.1.Introduction
730
44.2.Core Method: Advanced Image Reconstruction Toolbox for MRI
731
44.3.MRI Reconstruction Algorithms and Implementation on GPUs
734
44.4.Final Results and Evaluation
740
44.5.Conclusion and Future Directions
741
References
742
Chapter 45. ?1 Minimization in ?1-SPIRiT Compressed Sensing MRI Reconstruction
744
45.1.Introduction, Problem Statement, and Context
744
45.2.Core Methods (High Level Description)
747
45.3.Algorithms, Implementations, and Evaluations (Detailed Description)
748
45.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
754
45.5.Discussion and Conclusion
756
References
756
Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters
758
46.1.Introduction
758
46.2.Core Methods
758
46.3.Implementation
761
46.4.Results
767
46.5.Future Directions
769
46.6. Acknowledgments
769
References
770
Chapter 47. Deformable Volumetric Registration Using B-Splines
772
47.1.Introduction
772
47.2.An Overview of B-Spline Registration
773
47.3.Implementation Details
777
47.4.Results
788
47.5.Conclusions
790
References
790
Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs
792
48.1.Introduction, Problem Statement, and Context
792
48.2.Core Methods
795
48.3.Algorithms, Implementations, and Evaluations
796
48.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
807
48.5.Future Directions
810
Acknowledgments
811
References
812
Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs
814
49.1.Introduction
814
49.2.Core Methods
814
49.3.Implementation
818
49.4.Results
830
49.5.Future Directions
832
Acknowledgments
833
References
833
Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA
834
50.1.Introduction, Problem Statement, and Context
834
50.2.Core Methods
835
50.3.Algorithms, Implementations, and Evaluations
836
50.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations
843
50.5.Future Directions
848
References
849
Index
852
All prices incl. VAT