Apart from HMDs the other main focus was changes to the way graphics are programmed. The last major release of DirectX, DirectX11 was 5 years ago. Since then a lot has changed around DirectX11. Most notably multicore CPU processors are far more common. The changes from DirectX11 to DirectX12 seem to be focused on enabling programmers to get more out of the CPU rather than the GPU, as well as reducing the number and the size of the calls that need to be sent to the graphics card.
DXII

Below is my summary of the big changes. Bear in mind though, that DirectX12 isn’t scheduled to be released until around Christmas 2015 and I’ve not played with it, so I have no first-hand experience. Take everything with a pinch of salt. First though, here is a recap of graphics hardware.

So much of the process of moving model vertices in 3D space, squashing triangles on to the screen and shaded pixels is completely independent of other vertices and pixels. That means that the results don’t depend on each other, and so they could be done in any order. This type of problem is sometimes referred to as an “embarrassingly parallel” problem. The vast majority of computation can be done at the same time. That’s what your graphics card is for. It’s a hugely powerful parallel computer, capable of running the same program on different bits of data at the same time. It’s kind of like having a fleet of Ferraris sitting in your PC waiting to execute your code.

This super computer is controlled using instructions from the CPU running the application, and the changing of state each frame is what takes up most of the time. These instructions are sent over the data bus – visualised using an actual bus. Each frame these buses carry instructions and data to the fleet of Ferraris, telling them what they should do. This bus is the bottleneck, so the less we have to use it the better. Another constraint of this model is that much of the state had to be sent from a main render thread. In DX11 something called Deferred Contexts tried to overcome part of this problem, allowing any thread to send data to the graphics card, but because of the nature of relationships between data sent from different threads there still needed to be a lot of communication and synchronisation with the main thread. This synchronisation means that a lot of the time thread and CPU cores become idle whilst they wait on results from other cores. Additionally a lot of data associated with the Deferred Contexts had to be sent on the bus. These are the problems that DirectX 12 is trying to overcome.

In DirectX11 you have to take the bus!

In DirectX11 draw calls include a lot of data and state changes, so you have to take the bus!

The strategy is to remove dependencies between calls sent from different cores, and to reduce the amount of data that has to be sent on a frame by frame basis. It’s a little bit like trying to replace the data bus with a smaller, more agile data Ducati. Here’s how (I think) it works:

New data structures that are stored on the graphics card are better aligned with the hardware, removing the need to build up parts of the pipeline during the draw call and enabling more draw calls per frame. The objects, called Pipeline State Objects can still be swapped in and out at run time, but they are created and saved on the graphics cards ahead of time.

Command lists are a similar to what DX11 tried to achieve with Deferred Contents. Commands can be compiled and sent to the graphics card from any thread, but because of the changes with the introduction of PSOs executing these command lists are no longer so large, and because they can share PSOs with other draw calls they are less dependent on one another. They just store the information about which PSO they should use and send the calls off to the graphics card.

Bundles offer similar functionality, but allow some state to be inherited from other calls, and some state to be changed. This means that the instructions for a bundle are computed once and replayed with different variables. Whilst the intention appears to be that command lists are constructed every frame and then discarded, bundles seem to be a way of computing commands and saving them between frames to render with different data (both in the same frame, and in different frames).

DirectX 12 allows more state to be stored in graphics memory, meaning smaller, faster draw calls.

DirectX 12 allows more state to be stored in graphics memory, meaning smaller, faster draw calls.

Finally, Descriptor Heaps give the power to the programmer to build their own heap and table of resources in graphics memory. This means that state concerning the current resources that are being used no longer has to set by the CPU. Instead, the GPU can request resourced from a list held in graphics memory without the need for a call from the CPU to bind that resource.

All of these improvements mean that draw calls are smaller, and can be executed more quickly, which means that there can be more draw calls in any frame. It also means that there is less need for synchronisation between CPU threads, which means less time wasted waiting and frees the CPU to spend more time doing useful processing.

Some of the best news is that unlike the change to DirectX11, which required many people to buy new hardware, DirectX12 will work on many existing graphics chips, including the chips in the Xbox One! Exciting times ahead!