CUDA 笔记

by adie
2020-05-08 16:02:58

1. nvcc 编译器工具 (使用 VS2017 Native X64命令行, nvcc 会调用 cl 编译器)

-gencode arch=compute_75,code=sm_75 生成指定架构的二级制代码

-cubin 生成 cubin 文件

-ptx 生成 ptx 文件

2. nvprof 计时统计工具

--metrics achiveed_ocupancy 内核占用率

--metrics gld_throughput 内存读取效率

--metrics gld_efficiency 全局加载吞吐量
--metrics gld_transactions 全局内存加载效率

3. 分配设备内存: cudaMalloc/cudaFree

拷贝: cudaMemcpy

4. 分配不分页的主机内存: cudaHostAlloc/cudaFreeHost/cudaHostRegister

cudaHostAllocPortable/cudaHostRegisterPortable 分配的内存可在多个设备使用
cudaHostAllocWriteCombined 分配的内存不使用 CPU 缓存, 可加速传输, 但在主机端读取非常慢
cudaHostAllocMapped -> cudaHostGetDevicePointer 支持映射的内存, 零拷贝内存

5. 流

cudaStreamCreate
cudaMemcpyAsync
kernel<grid,block,0,streamid>(...)
cudaStreamAddCallback
cudaStreamSynchronize/cudaStreamWaitEvent/cudaSteamQuery
cudaStreamDestroy

6. 图

cudaGraphCreate
cudaGraphAddKernelNode
cudaGraphAddDependencies

首页 | 作品 | 资料 | 工具 | 关于 | 留言