"This book is required reading for anyone working with accelerator-based computing systems." --From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required--just the ability to program in a modestly extended version of C. CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You'll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance. Major topics covered include *Parallel programming*Thread cooperation*Constant memory and events*Texture memory*Graphics interoperability*Atomics*Streams*CUDA C on multiple GPUs*Advanced atomics*Additional CUDA resources All the CUDA software tools you'll need are freely available for download from NVIDIA.
......(更多)
Jason Sanders是NVIDIA公司CUDA平台小组的高级软件工程师。他在NVIDIA的工作包括帮助开发早期的CUDA系统软件,并参与OpenCL 1.0规范的制定,该规范是一个用于异构计算的行业标准。Jason在加州大学伯克利分校获得计算机科学硕士学位,他发表了关于GPU计算的研究论文。此外,他还获得了普林斯顿大学电子工程专业学士学位。在加入NVIDIA公司之前,他曾在ATI技术公司、Apple公司以及Novell公司工作过。
Edward Kandrot是NVIDIA公司CUDA算法小组的高级软件工程师。他在代码优化和提升性能等方面拥有20余年的工作经验,参与过Photoshop和Mozilla等项目。Kandrot曾经在Adobe公司、Microsoft公司工作过,他还是许多公司的咨询师,包括Apple公司和Autodesk公司。
......(更多)
......(更多)
CUDA C编译器对共享内存中的变量与普通变量将分别采取不同的处理方式。对于在GPU上启动的每个线程块,CUDA C编译器都将创建该变量的一个副本。线程块中的每个线程都共享这块内存,但线程却无法看到也不能修改其他线程块的变量副本。这就实现了一种非常好的方式,使得一个线程块中的多个线程能够在计算上进行通信和协作。
与从全局内存中读取数据相比,从常量内存中读取相同的数据可以节约内存带宽,原因有两个: 1.对常量内存的单词读操作可以广播到其他的邻近线程,这将节约15次读取操作; 2.常量内存的数据将缓存起来,因此对相同地址的连续读操作将不会产生额外的内存通信量。
......(更多)