The AMD FX processor, code named “Bulldozer” starts out with two cores per module. Each of the AMD FX processors has four modules. This gives us a total of 8 cores per processor. Of course this may seem quite obvious, considering we are looking at the AMD FX-8150 today, we have to remember AMD is releasing a number of parts ranging from 8 cores down to 4 cores. Each of these is based on the same die. Will we see motherboards that have the ability to unlock the disabled modules? AMD says that they are not going to allow it, though motherboard manufacturers may yet find a way to achieve this. Though only Time will tell.
Each of the Bulldozer Modules contains 2Mb of L2 Cache, and each AMD FX processor contains 8Mb of L3 Cache. When talking about the the 8 core processors such as the FX-8150 or FX-8120 the L2-L3 cache is a 1:1 ratio. During the development phase, AMD determined that this was the most efficient ratio of L2 to L3 cache.
The L1 and L2 BTB (Branch Target Buffer), Prediction Queue, Ucode ROM, ICache, Fetch Queue, and the 4 x86 Decoders are present on each of the Bulldozer modules. These particular components of the module are shared between the two cores.
Each of the cores within the Bulldozer module has its own Integer Scheduler, Instruction Retire, L1 Data Cache and L1 DTLB. These components, among others make up the Integer unit.
The Bulldozer core microarchitecture also shares the Floating Point Unit. The FPU consists of Dual 128-bit FMAC pipes, Dual 128-bit packed integer pipes, PRF-based register renaming and a Unified scheduler for both threads.
Each of the Bulldozer modules has 2MB of 16-way unified L2 Cache. Depending on the number of active modules on your particular AMD FX processor, you can have anywhere from 4MB L2 to 8MB L2 Cache.
AMD included some information on the Prdiction-Directed Instruction Prefetch, you can check out the details on that above.
AMD FX processors will support the latest instruction set extensions, SSE 4.1 and 4.2; AVX, -256-bit YMM registers, -Non-destructive source operand capability, -AES subset, -FMAC subset (AMD 4-operand form); XSAVE state space management; XOP Instructions. The FX series of processors will also include Light Weight Profiling with Low-overhead user-level profiling; they use XSAVE state space; and will store records for configured events.
The new instruction sets will most likely help out AMD in some of the benchmarks tests that we will run today and down the road when Windows 8 comes out. For example Microsoft Visual Studio 2010 SP1 supports the new instructions in the AMD FX processors such as XOP/FMA4/AVX/SSE 4.x.
Over the next couple of years AMD has some big plans. Each year AMD is speculating a 10%-15% gain in performance per Watt over the previous year. This will be gained by increasing the IPC (instructions per clock), and by reducing the power.
The AMD Bulldozer processor core is the same across the “Zambezi” desktop platform, as well the “Interlagos” and “Valencia” server platforms.
The new processors feature 128KB of L1 Data Cache (16KB per core), 256KB of L1 Instruction Cache (64KB per module), and 8MB of L2 Cache (2MB per module). When asked about the latencies of the various cache, AMD opted for no comment. The Northbridge onboard the AMD FX “Bulldozer” controls the 8MB of L3 Cache, two 72bit wide DDR3 memory channels and four 16bit HyperTransport links.
Above we can see the simplified block diagram of the Northbridge on the AMD FX processors.
One of the key points of the new AMD “Bulldozer” is the power efficiency. AMD has minimized the silicon by sharing components between the cores. Throughout the design of the FX processors there is extensive flip-flop clock-gating, as well as having ciruits that are power-gated dynamically. There is also a number of power saving features that are controlled by firmware or software.
The AMD Turbo Core Technology has a couple of different levels. First we have the processor at the base frequency. When it’s needed we have a turbo boost for all eight cores, the AMD FX processors also have the ability to disable half of the cores and hit the maximum processor frequency.
In the more recent AMD platforms, AM2, AM2+, and AM3 we were able to take the latest generation processor and drop it into the previous generation motherboard. For example we were able to take an AM3 processor and drop it into an AM2+ motherboard. For “Bulldozer” however, aren’t able to do that. To utilize an AMD FX processor we will need to use an socket AM3+ motherboard.
The “Valencia” platform will be a drop in replacement for existing 4000 series processors in a 1-2 socket server, with a BIOS update of course. The “Valencia” processors will have up to eight cores and available for one or two socket servers.
The “Interlagos” platform is a drop in replacement for AMD Opteron 6100 processor. The “Interlagos” server processors will be available for servers that range from one to four sockets, and each processor will have as many as 16 processor cores!