Search and Find

Book Title

Author/Publisher

Table of Contents

Show eBooks for my device only:

 

GPU Computing Gems: Emerald Edition

GPU Computing Gems: Emerald Edition

of: Wen-Mei W. Hwu

Elsevier Reference Works, 2011

ISBN: 9780123849892 , 889 Pages

Format: PDF, ePUB, Read online

Copy protection: DRM

Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Apple iPod touch, iPhone und Android Smartphones Read Online for: Windows PC,Mac OSX,Linux

Price: 57,95 EUR



More of the content

GPU Computing Gems: Emerald Edition


 

Table of Contents

6

Editors, Reviewers, and Authors

12

Introduction

20

Section 1: Scientific Simulation

22

Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals

26

1.1.Introduction, Problem Statement, and Context

26

1.2.Core Method

27

1.3.Algorithms, Implementations, and Evaluations

29

1.4.Final Evaluation

37

1.5.Future Directions

39

References

39

Chapter 2. Large-Scale Chemical Informatics on GPUs

40

2.1.Introduction, Problem Statement, and Context

40

2.2.Core Methods

43

2.3.Gaussian Shape Overlay: Parallelization and Arithmetic Optimization

43

2.4.LINGO: Algorithmic Transformation and Memory Optimization

48

2.5.Final Evaluation

51

2.6.Future Directions

54

Acknowledgments

54

References

55

Chapter 3. Dynamical Quadrature Grids: Applications in Density Functional Calculations

56

3.1.Introduction

56

3.2.Core Method

57

3.3.Implementation

58

3.4.Performance Improvement

60

3.5.Future Work

62

References

63

Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs

64

4.1.Introduction, Problem Statement, and Context

64

4.2.Core Method

66

4.3.Algorithms, Implementations, and Evaluations

66

4.4.Final Evaluation

75

4.5.Future Directions

79

References

79

Chapter 5. Quantum Chemistry: Propagation of Electronic Structure on a GPU

80

5.1.Problem Statement

80

5.2.Core Technology and Algorithm

82

5.3.The Key Insight on the Implementation—the Choice of Building Blocks

86

5.4.Final Evaluation and Benefits

90

5.5.Conclusions and Future Directions

93

Acknowledgments

93

References

94

Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

96

6.1.Introduction, Problem Statement, and Context

96

6.2.Core Methods

97

6.3.Algorithms and Implementations

99

6.4.Evaluation and Validation of Results, Total Benefits, and Limitations

109

6.5.Future Directions

113

Acknowledgments

113

References

113

Chapter 7. Leveraging the Untapped Computation Power of GPUs: Fast Spectral Synthesis Using Texture Interpolation

114

7.1.Background and Problem Statement

114

7.2.Flux Calculation and Aggregation

116

7.3.The GRASSY Platform

118

7.4.Initial Testing

121

7.5.Impact and Future Directions

122

Acknowledgments

122

References

123

Chapter 8. Black Hole Simulations with CUDA

124

8.1.Introduction

124

8.2.The Post-Newtonian Approximation

125

8.3.Numerical Algorithm

126

8.4.GPU Implementation

127

8.5.Performance Results

128

8.6.GPU Supercomputing Clusters

128

8.7.Statistical Results for Black Hole Inspirals

130

8.8.Conclusion

130

Acknowledgments

131

References

131

Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA

134

9.1.Introduction

134

9.2.Fast N-Body Simulation

135

9.3.CUDA Implementation of the Fast N-Body Algorithms

137

9.4.Improvements of Performance

141

9.5.Detailed Description of the GPU Kernels

143

9.6.Overview of Advanced Techniques

150

9.7.Conclusions

152

References

152

Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures

154

10.1.Introduction, Problem Statement, and Context

154

10.2.Core Method

156

10.3.Algorithms, Implementations, and Evaluations

159

10.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

165

10.5.Conclusions and Future Directions

168

References

172

Section 2: Life Sciences

174

Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm

176

11.1.Introduction, Problem Statement, and Context

176

11.2.Core Method

177

11.3.CUDA Implementation of the SW Algorithm for Identification of Homologous Proteins

177

11.4.Discussion

190

11.5.Final Evaluation

191

References

191

Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching

194

12.1.Introduction, Problem Statement, and Context

194

12.2.Core Methods

195

12.3.Algorithms, Implementations, and Evaluations

197

12.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

204

12.5.Future Directions

204

References

205

Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching

206

13.1.Introduction, Problem Statement, and Context

206

13.2.Core Method

207

13.3.Algorithms, Implementations, and Evaluations

208

13.4.Final Evaluation

214

13.5.Future Direction

217

Acknowledgments

217

Appendix

217

References

219

Chapter 14. GPU Accelerated RNA Folding Algorithm

220

14.1.Problem Statement

220

14.2.Core Method

221

14.3.Algorithms, Implementations, and Evaluations

222

14.4.Final Evaluation

228

14.5.Future Directions

230

References

230

Chapter 15. Temporal Data Mining for Neuroscience

232

15.1.Introduction

232

15.2.Core Methodology

233

15.3.GPU Parallelization: Algorithms and Implementations

235

15.4.Experimental Results

243

15.5.Discussion

247

References

248

Section 3: Statistical Modeling

250

Chapter 16. Parallelization Techniques for Random Number Generators

252

16.1.Introduction

252

16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a

253

16.3.Sobol Generator

256

16.4.Mersenne Twister MT19937

258

16.5.Performance Benchmarks

263

Acknowledgments

265

References

266

Chapter 17. Monte Carlo Photon Transport on the GPU

268

17.1.Physics of Photon Transport

268

17.2.Photon Transport on the GPU

270

17.3.The Complete System

277

17.4.Results and Evaluation

279

17.5.Future Directions

280

References

282

Chapter 18. High-Performance Iterated Function Systems

284

18.1.Problem Statement and Mathematical Background

284

18.2.Core Technology

287

18.3.Implementation

287

18.4.Final Evaluation

291

18.5.Conclusion

293

References

293

Section 4: Emerging Data-Intensive Applications

296

Chapter 19. Large-Scale Machine Learning

298

19.1.Introduction

298

19.2.Core Technology

299

19.3.GPU Algorithm and Implementation

301

19.4.Improvements of Performance

308

19.5.Conclusions and Future Work

311

Acknowledgments

312

References

312

Chapter 20. Multiclass Support Vector Machine

314

20.1.Introduction, Problem Statement, and Context

314

20.2.Core Method

315

20.3.Algorithms, Implementations, and Evaluations

317

20.4.Final Evaluation

327

20.5.Future Direction

331

References

331

Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA

334

21.1.Introduction, Problem Statement, and Context

334

21.2.Final Evaluation and Validation of Results

341

21.3.Conclusions, Benefits and Limitations, and Future Work

344

References

345

Chapter 22. GPU-Accelerated Ant Colony Optimization

346

22.1.Introduction, Problem Statement, and Context

346

22.2.Core Method

347

22.3.Algorithms, Implementations, and Evaluations

348

22.4.Final Evaluation

358

22.5.Future Direction

360

Acknowledgments

361

References

361

Section 5: Electronic Design Automation

362

Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs

364

23.1.Introduction

364

23.2.Simulator Overview

366

23.3.Compilation and Simulation

368

23.4.Experimental Results

376

23.5.Future Directions

383

Related Work

384

References

384

Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization

386

24.1.Introduction, Problem Statement, and Context

386

24.2.Core Method

388

24.3.Algorithms, Implementations, and Evaluations

390

24.4.Final Evaluation

394

24.5.Future Direction

397

References

399

Section 6: Ray Tracing and Rendering

400

Chapter 25. Lattice Boltzmann Lighting Models

402

25.1.Introduction, Problem Statement, and Context

402

25.2.Core Methods

403

25.3.Algorithms, Implementation, and Evaluation

404

25.4.Final Evaluation

414

25.5.Future Directions

416

25.6.Derivation of the Diffusion Equation

416

Acknowledgments

419

References

419

Chapter 26. Path Regeneration for Random Walks

422

26.1.Introduction

422

26.2.Path Tracing as Case Study

423

26.3.Random Walks in Path Tracing

423

26.4.Implementation Details

427

26.5.Results

429

26.6.Discussion

432

Acknowledgments

432

References

433

Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation

434

27.1.System Overview

434

27.2.Background

435

27.3.Core Technology and Algorithms

435

27.4.Future Directions

446

Acknowledgments

447

References

447

Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency

448

28.1.Introduction, Problem Statement, and Context

448

28.2.Core Method

449

28.3.Algorithms, Implementations, and Evaluations

449

28.4.Final Evaluation

454

28.5.Future Direction

456

References

456

Section 7: Computer Vision

458

Chapter 29. Fast Graph Cuts for Computer Vision

460

29.1.Introduction, Problem Statement, and Context

460

29.2.Core Method

460

29.3.Algorithms, Implementations, and Evaluations

461

29.4.Final evaluation and validation of results

468

29.5.Multilabel Graph Cuts

469

References

471

Chapter 30. Visual Saliency Model on Multi-GPU

472

30.1.Introduction

472

30.2.Visual Saliency Model

473

30.3.GPU Implementation

475

30.4.Results

487

30.5.Conclusion

492

References

492

Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows

494

31.1.Introduction, Problem Statement, and Context

494

31.2.Core Method

496

References

515

Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU

518

32.1.Introduction

518

32.2.Methods

520

32.3.Implementation

526

32.4.Results and Discussion

528

32.5.Conclusion and Future Work

534

References

535

Chapter 33. Haar Classifiers for Object Detection with CUDA

538

33.1.Introduction

538

33.2.Viola-Jones Object Detection Retrospective

538

33.3.Object Detection Pipeline with NVIDIA CUDA

547

33.4.Benchmarking and Implementation Details

562

33.5.Future Direction

564

33.6.Conclusion

564

References

564

Section 8: Video and Image Processing

566

Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL

568

34.1.Introduction, Problem Statement, and Background

568

34.2.Core Technology or Algorithm

569

34.3.Key Insights from Implementation and Evaluation

572

34.4.Final Evaluation

586

References

588

Chapter 35. Connected Component Labeling in CUDA

590

35.1.Introduction

590

35.2.Core Algorithm

591

35.3.CUDA Algorithm and Implementation

593

35.4.Final Evaluation and Results

598

References

602

Chapter 36. Image De-Mosaicing

604

36.1.Introduction, Problem Statement, and Context

604

36.2.Core Method

606

36.3.Algorithms, Implementations, and Evaluations

606

36.4.Final Evaluation

618

References

619

Section 9: Signal and Audio Processing

620

Chapter 37. Efficient Automatic Speech Recognition on the GPU

622

37.1.Introduction, Problem Statement, and Context

622

37.2.Core Methods

624

37.3.Algorithms, Implementations, and Evaluations

625

37.4.Conclusion and Future Directions

636

References

638

Chapter 38. Parallel LDPC Decoding

640

38.1.Introduction, Problem Statement, and Context

640

38.2.Core Technology

641

38.3.Algorithms, Implementations, and Evaluations

643

38.4.Final Evaluation

647

38.5.Future Directions

648

References

648

Chapter 39. Large-Scale Fast Fourier Transform

650

39.1.Introduction

650

39.2.Memory Hierarchy of GPU Clusters

652

39.3.Large-Scale Fast Fourier Transform

654

39.4.Algebraic Manipulation of Array Dimensions

656

39.5.Performance Results

660

39.6.Conclusion and Future Work

660

References

663

Section 10: Medical Imaging

664

Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis

668

40.1.Introduction

668

40.2.Digital Breast Tomosynthesis

670

40.3.Accelerating Iterative DBT using GPUs

671

40.4.Conclusions

677

Acknowledgments

677

References

678

Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU

680

41.1.Introduction, Problem, and Context

680

41.2.Core Methods

680

41.3.Algorithms, Implementations, and Evaluations

682

41.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

693

41.5.Related Work

696

41.6.Future Directions

697

41.7.Summary

697

References

697

Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA

700

42.1.Introduction

700

42.2.Core Methods

703

42.3.Implementation

705

42.4.Evaluation and Validation of Results, Total Benefits, and Limitations

707

42.5.Future Directions

711

References

712

Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms

714

43.1.Introduction, Problem Statement, and Context

714

43.2.Core Method(s)

715

43.3.Algorithms, Implementations, and Evaluations

716

43.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

721

43.5.Future Directions

727

References

728

Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation

730

44.1.Introduction

730

44.2.Core Method: Advanced Image Reconstruction Toolbox for MRI

731

44.3.MRI Reconstruction Algorithms and Implementation on GPUs

734

44.4.Final Results and Evaluation

740

44.5.Conclusion and Future Directions

741

References

742

Chapter 45. ?1 Minimization in ?1-SPIRiT Compressed Sensing MRI Reconstruction

744

45.1.Introduction, Problem Statement, and Context

744

45.2.Core Methods (High Level Description)

747

45.3.Algorithms, Implementations, and Evaluations (Detailed Description)

748

45.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

754

45.5.Discussion and Conclusion

756

References

756

Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters

758

46.1.Introduction

758

46.2.Core Methods

758

46.3.Implementation

761

46.4.Results

767

46.5.Future Directions

769

46.6. Acknowledgments

769

References

770

Chapter 47. Deformable Volumetric Registration Using B-Splines

772

47.1.Introduction

772

47.2.An Overview of B-Spline Registration

773

47.3.Implementation Details

777

47.4.Results

788

47.5.Conclusions

790

References

790

Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs

792

48.1.Introduction, Problem Statement, and Context

792

48.2.Core Methods

795

48.3.Algorithms, Implementations, and Evaluations

796

48.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

807

48.5.Future Directions

810

Acknowledgments

811

References

812

Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs

814

49.1.Introduction

814

49.2.Core Methods

814

49.3.Implementation

818

49.4.Results

830

49.5.Future Directions

832

Acknowledgments

833

References

833

Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA

834

50.1.Introduction, Problem Statement, and Context

834

50.2.Core Methods

835

50.3.Algorithms, Implementations, and Evaluations

836

50.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations

843

50.5.Future Directions

848

References

849

Index

852