老黄好忙。不过这些软件才是老黄的非对称优势。
https://developer.nvidia.com/rdp/cudnn-download
cuda 9.0 RC刚刚发布。
https://www.chiphell.com/thread-1761446-1-1.html
就一个头文件,看了下,多是paper算法实现层面的,ctc,rnn。和线代库(cublas)一样也是加了一个传统math和tensor op的枚举。
捕获.JPG (20.45 KB, 下载次数: 0)
要么和v100相关的内容还没添加进去(毕竟只是rc),要么tensor core的操作在用户层面是透明的。
======================更新=========================
果然是的,这个枚举和tensor core有关,文档中有描述
2.7.1. Tensor Core Operations
NotesSome notes on Tensor Core Operations use in cuDNN v7 on sm_70:
Tensor Core operations are supported on the Volta GPU family, those operationsperform parallel floating point accumulation of multiple floating point products.Setting the math mode to CUDNN_TENSOR_OP_MATH indicates that thelibrary will use Tensor Core operations as mention previously. The default isCUDNN_DEFAULT_MATH, this default indicates that the Tensor Core operationswill be avoided by the library.
设置为CUDNN_TENSOR_OP_MATH,将会启用tensor core进行运算,设置为CUDNN_DEFAULT_MATH,运算时会自动略过tensor core。
The default mode is a serialized operation, the TensorCore operations are parallelized operation, thus the two might result in slight differentnumerical results due to the different sequencing of operations. Note: The library fallsback to the default math mode when Tensor Core operations are not supported or notpermitted.The result of multiplying two matrices using Tensor Core Operations is very close, butnot always identical, to the product achieved using some sequence of legacy scalarfloating point operations.
So cuDNN requires explicit user opt-in before enabling theuse of Tensor Core Operations. However, experiments training common Deep Learningmodels show negligible difference between using Tensor Core Operations and legacyfloating point paths as measured by both final network accuracy and iteration count toconvergence. Consequently, the library treats both modes of operation as functionallyindistinguishable, and allows for the legacy paths to serve as legitimate fallbacks forcases in which the use of Tensor Core Operations is unsuitable.
用户层面需要显式的指定运算类型(是用tensor core进行运算还是使用传统的sp进行运算),在最终精度上两者没有什么区别(区别在于收敛性能)。当tensor core调用失败后,系统会自动使用传统sp来进行运算,无需用户再进行干涉。
新的方法,给tensorDescription设置运算类型。
捕获2.JPG (36.36 KB, 下载次数: 1)
评论
支持inner product/linear/matrix multiplication layer吗?
评论
都支持的,这些都算分子级的运算。
评论
能不能用OpenCL开发个类似的东西?
评论
当然能做,而且光做cudnn其实一点都不难,其实它就是每年各种新深度学习算法的cuda based实现。难的是做cuda库。 电路 电子 维修 求创维42c08RD电路图 评论 电视的图纸很少见 评论 电视的图纸很少见 评论 创维的图纸你要说 版号,不然无能为力 评论 板号5800-p42ALM-0050 168P-P42CLM-01 电路 电子 维修 我现在把定影部分拆出来了。想换下滚,因为卡纸。但是我发现灯管挡住了。拆不了。不会拆。论坛里的高手拆解过吗? 评论 认真看,认真瞧。果然有收
·日本留学生活 求个大阪合租
·日本留学生活 自家房招租求
·日本留学生活 东京地区出9成新lv钱包
·日本育儿教育 孩子从国内过来如何学习日语
·日本育儿教育 明年四月横滨招月嫂
·日本育儿教育 请问咋让娃突破识字关?感谢分享中文共读和学习经验的妈妈
·中文新闻 东区明星迈克尔·格列柯,53 岁,将在第一次出生两年后第二次
·中文新闻 《爱情岛》明星卡米拉·瑟洛和杰米·朱维特在透露即将迎来第三