According to an Intel slide that we got eyes-on at the end of 2013, the growth in processing capability for the chip giant has been nothing short of astounding. It's not easy to extrapolate forward for the next generation, but it's clear that you will be able to game properly (i.e. without having to set everything to ‘Low/No') with Intel in the very near future.
.
But how does this increase in GFlops manifest itself inside the processors themselves and where do the performance bumps come from?
The initial HD2500 and HD4000 products were 6 core and 16 core respectively, although as Huddy was quick to point out “Not all cores are equal and they all operate in a very different way”.
So you can't put 40 of these ‘Intel Execution Units' in a line up against 40 CUDA cores and say that you're comparing like with like.
With Haswell, the entry level (referred to as GT1 or Intel HD) is still a 6 core part, but by the time you reach the upper echelons of the GT3e part (which Intel calls HD5200 or Iris Pro), you have 40 cores linked in with a special 128MB ‘level 4 cache' called Crystalwell that has a 50GB/sec, bi-directional connection to the main processor.
It’s a bit chunk of silicon, so we asked Huddy why it's there, “If you’re running at HD resolutions of 1920 x 1080 with 4 bytes a pixel and the additional amount needed for Z, then a quick calculation shows that the entire render can fit into the 128MB that Crystalwell offers, easily ”. He added, “That gives you a huge advantage, because you can complete a huge rendering operation in one go”.