日本电子维修技术 显卡VEGA is better than you think



http://www.moepc.net/?post=2672

VEGA 10 (GCN 5.0) Architecture is at present being judged by the Frontier Edition (Workstation / PRO) Drivers, and while it does have (Consumer / RX) Drivers included with the ability to switch between the two... currently neither of the VEGA 10 Drivers actually support the VEGA 10 Features beyond HBCC.

Yes, the Workstation Drivers do support FP16 / FP32 / FP64, as opposed to the Consumer Drivers that support only FP32 (Native) and FP16 via Atomics.
Atomics allows a Feature to be used that is Supported but you're still Restricted by Driver Implementation as opposed to Direct GPU Optimisation.

FP16 Atomics does not provide the same leverage for Optimisation as a Native FP16 Pipeline.
Essentially we're talking the difference Vs. FP32 Pipeline of +20% Vs. +60% Performance.

Now it should still be noted that, we're not seeing a +100% Performance; because...
The Asynchronous Compute Engines (ACE) are still limited to 4 Pipelines and only support Packed Math Formats, which requires a slightly larger and more complex ACE than an FP32 version... thus you're not strictly getting 8x FP16 or 4x FP32 as in Legitimate Threads, but instead the Packing and Unpacking of the Results is occurring via the CPU (Drivers), so you have added Latency and what can be best described as "Software Threading"

So yeah you're looking ~40% Performance compared to a pure Hardware Solution, still this is within the same region of performance improvement that NVIDIA achieve through Giga-Threading. Which is almost literally Hyper-Threading for CUDA.

And such will see marginal benefits (up to 30%) in Non-Predictive Branches (i.e. Games) and 60% in Predictive Branches (i.e. Deep Learning, Rendering, Mining, etc.)

As this is entirely Software Handled, assuming support for Packed Math within the ACE... this is why we're seeing the RX VEGA Frontier Edition is essentially on-par with GCN 3.0 IPC if it were capable of being Overclocked to the same Clock Speeds. So, eh... this provides Decent Performance but keep in mind, essentially what we're seeing is what VEGA is capable of on FIJI (GCN 3.0) Drivers.

In short... what is happening is the Drivers are acting as a Limiter, in essence you have a Bugatti Veyron in "Road" Mode; where it just ends up a more pleasant drive overall... but that's a W12 under-the-hood. It can do better than the 150MPH that it's currently limiting you to.
The question here ends up being, "Well just how much of a difference will Drivers make?" ... Conservatively speaking, the RX VEGA Consumer Drivers are almost certainly going to provide 20 - 35% Performance Uplift over what the Frontier Edition has showcased on FIJI Drivers.

Yet most of that optimisation will come from FP16 Support, Tile-Based Rendering, Geometry Discard Pipeline, etc. while HBCC will continue to ensure that the GPU isn't starved for Data maintaining very respectable Minimums that are almost certainly making NVIDIA start to feel quite nervous.

Still, this isn't the "Party Trick" of the VEGA Architecture.
Something that most never really noticed was AMDs claim when they revealed Features of Vega.

Primarily that it supports 2X Thread Throughput. This might seem minor, but I'm not sure people quite grasped (NVIDIA did, because they got the GTX 1080 Ti and Titan Xp out to market ASAP following the official announcement of said features) is this actually is perhaps THE most remarkable aspect of the Architecture.
So... what does this mean?

In essence the ACE on GCN 1.0 to 4.0 has 4 Pipelines, each is 128-Bit Wide. This means it processes 64-Bit on the Rising Edge, and 64-Bit on the Falling Edge of a Clock Cycle.
Now each CU (64 Stream Processors) is actually 16 SIMD (Single Instruction, Multiple Data / Arithmetic Logic Units) each SIMD Supports a Single 128-Bit Vector (4x 32-Bit Components, i.e. [X, Y, Z, W]) and because you can process each individual Component ... this is why it's denoted as 64 "Stream" Processors, because 4x16 = 64.

As I note, the ACE has 4 Pipelines that Process, 4x128-Bit Threads Per Clock.
The Minimum Operation Time is 4 Clocks ... as such 4x4 = 16x 128-Bit Asynchronous Operations Per Clock (or 64x 32-Bit Operations Per Clock)

GCN 5.0 still has the same 4 Pipelines, but each is now 256-Bit Wide. This means it processes 128-Bit on the Riding Edge, and 128-Bit on the Falling Edge.
Each CU is also now 16 SIMD that support a Single 256-Bit Vector or Double 128-Bit Vector or Quad 64-Bit Vector (4x 64-Bit, 8x 32-Bit, 16x 16-Bit).

It does remain the same SIMD merely the Functionality is expanded to support Multiple Width Registers, in a very similar approach to AMD64 SIMD on their CPU; which believe it or not, AMD SIMD (SSE) is FASTER than Intel because of their approach. This is why Intel kept introducing new Slightly Incompatible versions of SSE / AVX / etc.
They're literally doing it to screw over AMD Hardware being better by using their Market Dominance to force a Standard that deliberately slows down AMD Performance, hence why Bulldozer Architecture appeared to be somewhat less capable in a myriad of common scenarios.

Anyway, what this means is Vega remains 100% Compatible and can be run as if it were a current Generation GCN Architecture.
So all of the Stability, Performance Improvements, etc. they should translated pretty well and it will act in essence like a 64CU Polaris / Fiji at 1600MHz; and well that's what we see in the Frontier Edition Benchmarks.

Now a downside of this, is well it's still strictly speaking using the "Entire" GPU to do this... so the power utilisation numbers appear curiously High for the performance it's providing; but remember is being used as if under 100% Load; while in reality it's Utilisation is actually 50%.
Here's where it begins to make sense as to why when they originally began showing RX VEGA at Trade Conventions, they were using it in a Crossfire Combination; as it is a Subtle (to anyone paying attention, again like NVIDIA) hint at when fully Optimised the Ballpark of what a SINGLE RX VEGA will be capable of under a Native Driver.

And well... it's performance is frankly staggering as it was running Battlefield 1, Battlefront, Doom and Sniper Elite 4 at UHD 5K at 95 FPS+
For those somewhat less versed in the processing Power Required here.

The Titan Xp, is capable of UHD 4K on those games at about 120 FPS, if you were to increase it to UHD 5K it would drop to 52 FPS; and at this point it's perhaps dawning on those reading this why NVIDIA have somewhat entered "Full Alert Mode" ... because Volta was aimed at ~20% Performance Improvement, and this was being achieved primarily via just making a larger GPU with more CUDA Cores.

RX VEGA has the potential to dwarf this in it's current state.
Still this also begins to bring up the question... "If AMD have that much performance just going to waste... Why aren't they using it to Crush NVIDIA? Give them a Taste of their own Medicine!"

Simple... they don't need to, and it's actually not advantageous for them to do so.
While doing this might give them the Top-Dog Spot for the next 12-18 months... NVIDIA aren't idiots, and they'll find a way to become competitive; either Legitimate, or via utilising their current Market Share.

And people will somewhat accept them doing this to "Be Competitive", but if AMD aren't being overly aggressive and letting NVIDIA remain in their Dominant Position; while offering value and slowly removing NVIDIA from the Mainstream / Entry Level... well then not only do they know that they can with each successive "Re-Brand" Lower Costs, Improve the Architecture and offer a Meaningful Performance Uplift for their Consumers while remaining Competitive with anything NVIDIA produce.

They can also (which they do appear to be doing) with Workstation GPUs appear to be offering better performance and value in those scenarios... again better than what NVIDIA can offer, and in said Arena NVIDIA don't have the same tools (i.e. Developer Support / GameWorks / etc.) to really do anything about this beyond throwing their toys out of the pram.
As I note here, NVIDIA can't exactly respond without essentially appearing to be petty / vindictive and potential breaking Anti-Trust (Monopoly) Laws to really strike out against AMD essentially Sandbagging them.

With perhaps the worst part for NVIDIA here being, they can see it plain as bloody day what AMD are doing; but can't do anything about it. Knowing that regardless what they do, AMD can within a matter of weeks put together a next-generation launch (rebrand); push out new drivers that tweak performance and simply match it while undercutting the price by £20-50.
Even at the same price, it will make NVIDIA look like it's loosing it's edge.

THAT is what Vega and Polaris have both been about for AMD, the same is true with Ryzen, Threadripper and Epyc.
AMD aren't looking at a short term "Win" for a Generation... they're clearly seeking to destroy their competitors stranglehold on the Industry as a whole.

< • >

Oh and if you don't believe me on how seriously NVIDIA are taking this... the Titan Xp Driver up-date that unlocked it's Professional (Workstation) Functionality, essentially brings it inline with the P100 in terms of Performance.

The Titan Xp is $1,200, the Quadro P100 is $4,500 ... they've essentially made with a driver update, that P100 obsolete; and basically given up on $3,000 of pure profit each Point-of-Sale of said Workstation Card gave them. You don't do that if you've not had an "Oh Shit!" Moment about what the Competition is offering.

via:https://www.reddit.com/r/Amd/com ... tter_than_you_think
via:MoePC.net, 地址:http://www.moepc.net/?post=2672


评论
1080 faster than 980sli

评论
捧得越高,到时候摔得越疼。

评论
可不可以翻译一下啊,本人表示只看得懂ABC

评论
普通用户很难买到,已经被旷工预定一空了!!

评论
没姬翻就算了,还弄个红底黑字,看瞎眼啊

评论
And well... it's performance is frankly staggering as it was running Battlefield 1, Battlefront, Doom and Sniper Elite 4 at UHD 5K at 95 FPS+

The Titan Xp, is capable of UHD 4K on those games at about 120 FPS, if you were to increase it to UHD 5K it would drop to 52 FPS

简单说就是5K分辨率可以吊打TTXP

评论
用fp16的2倍flops吊打ttxp fp32

评论
天天吹,倒是上市啊

评论
请问这个几月上市?

评论
比特币自从七月中旬碰到前期低点,一路反弹已经突破前高再上22000了

评论
amd说,挖矿算不算需求,算吧,是不是显卡应用的方向之一,算吧,显卡挖矿以后是不是生产力工具,是吧,显卡挖矿以后算不算不务正业,成天只知道娱乐,不算吧,挖矿是不是拓宽了显卡的应用范围,是吧。那就出高价,vega不愁没人买。下面一个人说:三哥,功耗有点儿高。三哥说:所以说下一代navi我要亲自操刀,你们懂的,会设计成什么样。

评论
AMD = A Mysterious Device

新卡怎么样都上不了市也就说的过去了

评论
这个reddit回复大概是听写了一遍他给的youtube视频吧,一个男人对着屏幕blahblah了10分钟
主要就讲了两个新的特性,一个是fp16,然后解释了一下为啥用fp16相对于fp32提升不是100%,大概说了点限制的点。
我本来就一点都不懂,然后这文章一直在玩些逻辑游戏,老是偷换概念,看的我一头雾水
看reddit下面回复也挺有趣的,很多梦想家


评论
没事,你看黄某人已经开始害怕了,再压一下价amd就要翻身了

评论
老黄一害怕宣布提前推出下一代显卡,性能提升60%

评论
哦吼?better than you think?

评论

专业卡驱动不支持FP16 / FP32 / fp64。

评论
挖矿我最强

评论
vega 与 1080ti 帧数几乎完全一致。 在24k 分辨率下

评论
关键点:两倍于fiji的threading throughput per clock,理论上达到furyX [email protected]的吞吐能力,光这一项能带来多少性能提升不明。

但如果游戏可以用fp16,那么就是两倍的吞吐量+2倍的运算能力,理论上就是两倍的性能。 算上用cpu(驱动)做线程封装的开销,大概有60%的fury X [email protected]性能。

来个人算算furyX CF在1.6G下有什么性能吧?






评论

人视频自己都没这么说,针对你说的这点人强调的是“为什么没有”而不是“理论上有”

评论
amd说,vega的出现改变了diy市场,改变了人们对于显卡的态度:
A:老婆,我要买一张显卡。
B:干什么
A:玩游戏
B:你个没出息的,就知道玩。
vega出现以后
A:老婆,我要买3张vega
B:干什么
A:挖矿赚钱.你看看这个收益
B :为什么要三张
A :因为有一张是用来跟另外2张通讯的,不然钱取不出来。
B:好吧,买吧。

评论

视频我又没看,但是reddit上的帖子说了个40%的cpu overhead(软线程对硬件线程)

正因为理论上有,才需要解释为什么没有。理论上都达不到,解释什么呢?

另外我只不过是总结下人家写出来的东西,不代表我认同他的观点。个人认为这就是个英文版意淫贴,并没有表现出超越贴吧的档次。

评论
文章的大意是,vega的原生计算单元是fp32的,而现在的驱动在计算fp16的时候也是占了一个fp32单元。新驱动会把一个fp32单元掰成两个来计算fp16。但是因为是软件层面的,所以效率有损失,不能到100%,只有大概40%到60%。然后再计算其它一堆什么pipeline的损耗啥的,跟现在的驱动比有个20~35%的实际游戏性能提升。文章大意,不代表我的观点(我什么都不知道)。

评论

自从DX9(貌似是9C 以后的游戏就被定死了只能跑FP32模式下,如果不对游戏做特别的调整优化是不可能用FP16来跑的....

评论

嗯,这些我就不知道了。

评论
前段时间索尼和微软不是还说今后想充分利用半精度数据优化,优化GPU效率么

评论
半精度就算了吧,需要游戏优化

评论

双方都是1帧向2帧进步你被强化了快上 这口毒奶我服

评论
RX VEGA has the potential to dwarf this in it's current state.
Still this also begins to bring up the question... "If AMD have that much performance just going to waste... Why aren't they using it to Crush NVIDIA? Give them a Taste of their own Medicine!"

Simple... they don't need to, and it's actually not advantageous for them to do so.
While doing this might give them the Top-Dog Spot for the next 12-18 months... NVIDIA aren't idiots, and they'll find a way to become competitive; either Legitimate, or via utilising their current Market Share.



评论
求翻译

评论
vega56大概8500分 自己揣摩一下吧

评论
被标红了,我感觉到了阴谋

评论



评论

我信了

评论


老外真会yy

评论
啥意思 vega性能被隐藏了?

评论
amd吹vega,就跟那边intel吹7350k一个性质

评论

不对

你看7350k的枪文,炮村那些写的,那特挑的几个秒全家软件,WOW啊WINRAR什么的,泛用度可不低

vega现在还没见到哪个泛用度高的软件里能灭老黄全家

说难听的,买7350k智商估计比买vega的还高不少....


评论


评论

7350K除了价格真没什么黑点,如果7350K价格腰斩卖500,不是新一代客厅神U?

倒是VEGA的优点。。。。。你先给我说一个出来?


评论

冬天减少供暖成本

帮助国家消化过剩电力,让国家能平稳的去产能,有利于国家经济政策推广

促进超大功率开关电源技术发展,促进高功率元器件技术发展,反哺高铁、电机等重工业行业

让chh电脑区能多水几个帖子,离上市又近了一步,轮子看了美滋滋

证明除了阿米尔汗和做飞饼的大师傅之外的所有阿三都不怎么靠谱

这么多优点还不够??N卡和vega比哪个好你心里还没点b数吗?

评论

那你说,我是不是应该买块VEGA回家给我奶奶治老年风湿?

评论

看你家怎么想的了,我觉得vega替代周林频谱仪这种红外治疗仪问题应该不大,毕竟发热不比人家小

评论

车车车车告诉我我要修手机是不是可以买个VEGA回家用来加热手机里面的胶?
会导致手机爆炸嘛?

评论
单纯就挖矿性能大幅提升这一点来说,VEGA已经不愁卖了,玩儿游戏的能不能抢得到都成问题

评论

照你这样的话,以后上水弯管都可以用VEGA代替了。。。。。热风枪电风吹全都可以886.

评论

照你这么说,如果VEGA 卖999人民币不是一代神卡?然而 7350K并没有卖到500 vega也不是999

评论

Timespy图形分8500 ?


评论

FSU 电路 电子 维修 我现在把定影部分拆出来了。想换下滚,因为卡纸。但是我发现灯管挡住了。拆不了。不会拆。论坛里的高手拆解过吗? 评论 认真看,认真瞧。果然有收 电路 电子 维修 求创维42c08RD电路图 评论 电视的图纸很少见 评论 电视的图纸很少见 评论 创维的图纸你要说 版号,不然无能为力 评论 板号5800-p42ALM-0050 168P-P42CLM-01
 ·日本留学生活 求个大阪合租
·日本留学生活 自家房招租求
·日本留学生活 东京地区出9成新lv钱包
·日本育儿教育 孩子从国内过来如何学习日语
·日本育儿教育 明年四月横滨招月嫂
·日本育儿教育 请问咋让娃突破识字关?感谢分享中文共读和学习经验的妈妈
 ·中文新闻 东区明星迈克尔·格列柯,53 岁,将在第一次出生两年后第二次
·中文新闻 《爱情岛》明星卡米拉·瑟洛和杰米·朱维特在透露即将迎来第三

维修经验

CPUcpu-z 1.77版低调发布

日本维修技术更新: New benchmark “submit and compare” feature New clocks dialog reporting all system’s clock speeds in real-time Preliminary support for Intel Kaby Lake AMD Bristol Ridge processors 主要是增加了支持I、A两个新架构的 ...

维修经验

CPU这几天经常开机黑屏,热重启后又正常

日本维修技术这几天经常开机黑屏,热重启后又正常,今天热重启也不管用了。折腾半天总算点亮,显示超频失败,以前出这个画面我是不理它的,直接重启就能正常进系统了,今天不敢托大,因为 ...

维修经验

CPU超频求助!关于华擎H170和6700K

日本维修技术问题见楼主的show贴 https://www.chiphell.com/thread-1634895-1-1.html 这次华擎的H170 Hyper最大的特色应该是自带时钟发生器可以自由超外频 可是楼主好久没有折腾超频了。。。 两图中除了CPU外频 以 ...

维修经验

CPU液态金属会侵蚀cpu核心吗?

日本维修技术前阵子看到有人说,液态金属时间长了会侵蚀cpu铜盖,那么问题来了,这货会不会侵蚀核心呢? 评论 这玩意儿好像只对铝起反应 评论 不是说,cpu的盖子是铜的吗。。。 评论 不会,核 ...

维修经验

CPUm6i究竟支不支持e3 1231v3

日本维修技术官网上看支持列表没写有e3 1231v3,装机帖又有人晒,百度也没个明确答案,那究竟能不能点亮?有在用的chher说一下么 评论 升级最新bios肯定可以支持 评论 我的p67evo官网上也没说支持12 ...

维修经验

CPU华擎 HYPER 妖板 正确玩法

日本维修技术600元的 B150,10相供电,释放洪荒之力 注意必须官网 Beta 区的 BIOS 有 AVX 的 CPU 可能会掉缓存 启动时按 X 键激活 SKY OC,重启后进入 BIOS 160924164727.jpg (95.63 KB, 下载次数: 1) 2016-9-24 17:47 上传 ...

维修经验

CPUE5 2686 V3和i7 6800K如何选择

日本维修技术默认用,不超频,两者功耗是一模一样的 E5 2686 V3:2.0主频,3.5睿频, 18核心36线程 ,45M L3 咸鱼大约2500~3000元 i7 6800K : 3.5主频,3.8睿频 ,6核心12线程 ,盒装3000元 评论 性能应该是26 ...

维修经验

CPUHD530硬解4K能力还是有点弱呀!

日本维修技术播放器用PotPlay 64bit,各种优化后,跑4K @120Hz视频只能到70帧左右的速度,勉强能用! 显示器用的4K的优派VP2780 未标题-1.jpg (211.97 KB, 下载次数: 0) 2016-9-26 21:29 上传 评论 这个估计你没优化 ...

维修经验

CPU6900k 1.25V到4.2体质怎么样

日本维修技术如图,体质怎么样,ring是35,没敢试了,都说ring高了毁硬件 评论 不错的U,但不算雕,上4.4就大雕了,这电压上4.5的目前没见有人发图 评论 谢谢前辈告知 评论 我这个用1.2V超的4.2,R ...

维修经验

CPUI3 6100 华擎B150M pro4超4.5g测试。

日本维修技术看看论坛没多少i3 6100的帖子,就转下自己发的show贴里面的数据,给大家参考下。家里还有当年的神U i3 540 oc 4.5G在给老妈用。 不知道数据上正常吗?有6100的朋友可以告诉下,另外是不有 ...

维修经验

CPU7系u会兼容100系主板吗?

日本维修技术RT,听说要推200系板,100系还能用吗以后。。 评论 兼容的 评论 感谢!以后换u就行了,目前消息200系板会有新的特性吗? 评论 24条PCI-E 3.0通道、支持Intel Optane混合存储技术、十个USB 3 ...

维修经验

CPU有心入5820k了,求教下温度问题

日本维修技术一直徘徊在6700k和5820k之间,6700k现在这德行直接把我推向了5820k啊,从2600k升级上来,三大件都要换,现在唯一疑惑的是IB-E ex这种顶级风冷能不能压住4.5g的5820呢?毕竟刚刚买一个多月。 ...

维修经验

CPU6600&amp;6600K才100的差价

日本维修技术太少了吧。。。 6600.JPG (106.91 KB, 下载次数: 0) 2016-10-1 10:30 上传 评论 毕竟只是i5而已…… 评论 上z170 6600也能超,等于没区别,差价能有100已经不错了 评论 然后又见不超频人士推荐超频 ...