GPU and Matrix Computation (Part I)



Mr. Yukai Hung
Department of Mathematics
National Taiwan University
洪郁凱先生(台灣大學數學系)

Lecture 1. (2010/5/14)
(a) Parallel concept and hardware architectures
(b) CUDA programming model Overview

Slides: 01_pdf, 01_pptx
Audio: 01a_aac, 01b_aac, 01c_aac
Video: 01a_wmv, 01b_wmv, 01c_wmv
Code Examples: ce_vec_addition.cu

Lecture 2. (2010/5/28)
Hardware Hierarchy and Optimization
Slides: 02_pdf, 02_pptx
Slides with Audio: 02a_mov, 02b_mov, 02c_mov, 02d_mov

Lecture 3. (2010/6/4)
Memory Hierarchy and Optimization
Slides: 03_pdf, 03_pptx
Slides with Audio: 03a_mov, 03b_mov, 03c_mov

Lecture 4. (2010/6/11)
CUDA Advanced Memory Usage and Optimization
Slides: 04_pdf, 04_pptx
Slides with Audio: 04a_wmv, 04b_mov, 04c_mov
Code Example: ce_04.tar

Lecture 5. (2010/6/18)
CUDA Asynchronous Memory Usage and Execution
Slides: 05_pdf, 05_pptx
Slides with Audio: 05a_mov, 05b_mov
Code Example: ce_05.tar
Reference: intel_write_combining_memory.pdf

Lecture 6. (2010/6/25)
CUDA Linear Algebra Library and Next Generation Architecture
Slides: 06_pdf, 06_pptx
Slides with Audio: 06a_mov, 06b_mov
Reference: CUDPP, CULA, MAGMA, THRUST, CUSP, OpenNL, GATLAS,
Efficient Sparse Matrix-Vector Multiplication on CUDA 2008,
Imple. Sparse Mtx-Vec Mult on Throughput-Oriented Processors,
Sparse Matrix-Vector Multiplication Toolkit for GPUs

Lecture 7. (2010/7/28)
Portable Operating System Interface Thread
Slides: 07_pdf, 07_pptx
Video: 07a_wmv, 07b_wmv, 07c_wmv (online)
Code Example: ce_07.tar


GPU and Matrix Computation (Part II)



Mr. Vasily Volkov
Department of Computer Science
University of California, Berkeley


Lecture 1. (2010/5/21)
(a) Understanding performance bottlenecks in numerical kernels on GPU
(b) Programming inverse memory hierarchy: case of stencils on GPU
Slides: 01_pdf, 02_pdf


GPU and Matrix Computation (Part III)



Mr. Wei-Jen Chang
Institute of Photonics and Optoelectronics
National Taiwan University
張為仁先生 (台灣大學光電所)

Lecture 1. (2010/5/7)
Using PETSc and SLEPc to solve large sparse linear system and eigenvalue problems on parallel computers


References



NVIDIA Documents
Related Course Webpages