OK, so we now have the background and Nic's analysis of the hardware, let's move him onto his specialist area of Developer Relations. What does Kaveri mean to developers? What does AMD's approach to HSA bring?
“Data structures such as binary trees are common in the programming world. Parsing such trees is an operation employed by many algorithms and while the CPU can perform this task the GPU is much more efficient due to the parallel operations involved. Prior to HSA, applications would have to copy the (flattened) tree onto GPU-accessible memory to benefit from such accelerated computing, while in the case of HSA the GPU inside Kaveri can directly access this memory through hUMA (heterogeneous Unified Memory Architecture)”, said Nic.
“Then we have cases where HSA allows both the CPU and GPU parts of the APU simultaneous access to a tree, with both of them doing modifications”, he explained. “Platform Atomics allow these actions to be synchronised – again resulting in higher performance through parallel operations”.
Nic then told us, “The same is true for both Large Data Sets and CPU Callbacks. In each of these cases, AMD's approach to HSA with Kaveri allows for much better parallel performance”.
The actual operation is faster, but is it easier to code?
“When you're dealing with Data Pointers, for example”, said Nic. “The fact that the GPU and CPU can access memory without copy overhead means the code’s complexity is considerably reduced. You can find many instances where the HSA version of a section of OpenCL 2.0 code will be significantly smaller than the legacy code. It's smaller, more efficient and, overall, more intuitive to write for developers”.
When Nic went through is actual presentation, he had plenty of examples where we were left with the impression ‘New Good, Old Bad'.
In another slide, the data presented by AMD made for a compelling case. They have compared the tree searching capability of a single core CPU, quad core CPU, legacy APU and the latest HSA-friendly part. According to Nic, it's this kind of improvement that Kaveri brings to the market, that will make more and more developers sit up and listen.
.
What about giving developers the right tools for the job? The products that they need in order to pull out the advantages presented by APUs like Kaveri.
“That's where AMD's unified SDK comes into play”, said Nic. “Version 2.9 of the App SDK has now been combined with v1.0 of the Media SDK. The latter provide access to the acceleration blocks VCE (Video Codec Engine) and UVD (Unified Video Decoder) present in Kaveri, while the former is essentially our OpenCL SDK. Those SDKs are unified for simplicity and ease of access. For example it is a common occurrence that a programmer would want to apply some kind of OpenCL kernel filter (de-noise, scaling etc.) during an encoding/decoding process so having all software required to program this task under a unified SDK makes sense”.
Providing the SDK is one thing, but are people using it?
“Definitely!”, exclaimed Nic. “With previous versions, we were looking at 10,000 and 20,000 downloads in the first 60 days after launch. With the latest version, we saw around 55,000 downloads in just 30 days. This just goes to show the interest we’re seeing from developers towards heterogeneous computing. Programmers are more and more aware of the importance of using the GPU to accelerate at least part of their code and now that HSA features are available it is easier to do so than ever before”.
“Also, with AMD Code XL v1.3, we have packaged a comprehensive set of HSA developer tools”, said Nic. “Including a host of new features like support for Java 9 with native APU acceleration, integrated static kernel analysis, remote debugging/profiling and support for all of the latest APU and GPU products”.
Intel can start HSA using their own gpu chips which are poor but still something. They can surely create a much better memory controller and implement a scaled down eDRAM writeback cache to speed up DDR3 accesses much like the Xbox 1 did. In fact, Iris PRO already has it implemented but with 128MB which is too big and eats 20watts of power constantly. Using 32MB eDRAM is enough to make a serious bite to HSA performance they will aim at.