The Core i7 Series Arrives

Intel Core i7 Processor - LGA 1366

Intel has finally lifted the embargo on the yet-to-be-launched Intel Core i7 processors and the Intel X58 Express chipset.  Intel strongly believes that this new platform will be the must have work horse for digital media & gaming enthusiasts for many months to come. With so much to talk about this new platform we made the decision to focus just on processor performance for this article and then take a deeper dive at other features in the weeks to come.  This should work out nicely as the processors won't be available to purchase until later this month and many companies are just now getting us production grade triple channel memory kits and video card drivers for this new platform.

Intel Core i7 Processor - LGA 1366

The Intel Core i7 Processor (known as Nehalem internally) has some very big architecture changes as you can tell from the picture above.  The new Core i7 processor has 1366 pins and as a result the size of the processor, socket and heat sink mounting brackets are all larger than LGA 775 based processors that have been out now for a couple of years. The die size of Core i7 processors is 263 mm2 and the transistor count is 731 Million.

Intel Core i7 Nehalem Die Diagram

Taking a look at the die of the Core i7 processor we see a first for Intel processors -- the integrated memory controller. This on-die, triple channel, DDR3 memory controller is unique in the fact that it allows consumers to run three memory modules together for optimal performance. By moving to an integrated memory controller and triple channel memory the platform has over 25GB/s of throughput between the processor and DDR3 memory modules!

For those that follow processor architecture you will notice a brand new cache structure on the Core i7 diagram shown above. All Intel Core i7 processors feature L1, L2, and shared L3 caches. Before, Intel Core 2 Duo and Quad processors had just an L1 and L2 cache. The break down on the cache is as follows: there is a 64K L1 cache (32K Instruction, 32K Data) per core, 1MB of total L2 cache, and an impressive 8MB chunk of L3 cache that is shared across all the cores. That means that all Intel Core i7 processors have over 9MB of memory right there on the 45nm processor!

Can it get any better than this?

Intel Core i7 965 Performance Features

Of course it can! The new Core i7 processor has a huge list of improvements that have been made to it.

Intel always told us that Hyper-Threading was not dead and they were right as the technology has surfaced again and is enabled on all of the Core i7 processors. With Hyper-Threading enabled on quad-core Core i7's processors the operating system sees eight virtual cores that can be used. Intel has told Legit Reviews that when Hyper-Threading originally came out the idea was solid, but that the Pentium 4 processor might not have been the best processor to bring it to market.  The Core i7 series should highlight all the strong points of Hyper-Threading as they are calling it Hyper-Threading "done right" now.  If you want a deeper look at the Intel Core i7 architecture take a look at this presentation that was given at the Spring 2008 IDF and this one that was given at the Fall IDF.

Intel will be releasing three Core i7 processors and all have a TDP of 130W and an on-die shared L3 cache of 8MB. All current Core i7 processors are not intended for multi-processor motherboards, so it has only one Quick Path Interconnect (QPI).

Now that we know what the general processor improvements are let's take a closer look at the chipset changes.

The Intel X58 Express Chipset

In order to understand this new platform it is best to look at the motherboard chipsets that are going to be used.

The Intel X58 Express Block Diagram

The Intel X58 Express chipset is the chipset that was designed just for the Intel Core i7 series of processors as they require a new socket. Since the DDR3 memory controller is located inside the processor itself hundreds of new pins had to be added and the result was a larger CPU with more pins.  Intel designed the X58 Express chipset from the ground up for Core i7, but re-used the ICH10/ICH10R southbridge chipset that has been out for several months now.

The Intel X58 Express Block Chipset

Together the Intel X58 Express chipset and the ICH10 Southbridge make up what is certain to be a very solid platform to use on high performance systems.  The Intel ICH10/ICH10R Southbridge was launched with the Intel P45 Express chipset and has already proven itself a winner with some of the best Solid State Drive performance numbers of any chipset on the market.  The X58 Express supports up to 36 lanes of PCI Express 2.0 connectivity, and since many boards using these chipsets will have both NVIDIA SLI and ATI CrossFire enabled it will mean that Triple-SLI and Quad CrossFireX will be easy to implement.  This is due to the fact that NVIDIA is allowing motherboard makers to use a special sBIOS if they pay a licensing fee for SLI Technology. So, finally multi-GPU technology from both graphics card companies can be used on the same board. If that isn't enough Intel has done away with the Front Side Bus and now has the Quick Path Interconnet to handle the flow of data between the processor and the chipset.  The memory now has over 25.5 GB/s of throughput since it now has a direct connection to the processor. 

The Test System

Before we look at the numbers, here is a brief glance at the test system that was used.

The Intel Core i7 Test System

All testing was done on a fresh install of Windows Vista Ultimate 64-bit. All benchmarks were completed on the desktop with no other software programs running. All of the modules were run in dual channel mode with a 120mm fan placed on top of them to keep them cool except for the Core i7 system that was run in triple channel. The EVGA GeForce 8800 GTS 512MB used NVIDIA ForceWare 169.28 video card drivers and the. The LGA 775 test system used the ASUS P5E3 motherboard using BIOS version 1201 and the LGA 1366 test system used the ASUS P6T Deluxe motherboard with BIOS v8004. The AMD Phenom testing was done on the MSI K9A2 Platinum motherboard with BIOS v1.5b5 installed along with ATI system driver version 8.50. 

Memory Settings:

Here is the Intel LGA 1366 Test platform:

Intel Test Platform

Component

Brand/Model

Live Pricing

Processor

See Above

Motherboard

ASUS P6T Deluxe

Memory

6GB Corsair DDR3 1600MHz

Video Card

EVGA GeForce 8800 GTS 512

Hard Drive

Western Digital RaptorX 150GB

Cooling

Thermaltake BigWater 760i

Power Supply

Corsair HX1000W

Operating System

Windows Vista Ultimate 64-Bit

Here is the Intel LGA 775 Test platform:

Intel Test Platform

Component

Brand/Model

Live Pricing

Processor

See Above

Motherboard

ASUS P5E3 Deluxe

Memory

4GB Corsair DDR3 1800C7

Video Card

EVGA GeForce 8800 GTS 512

Hard Drive

Western Digital RaptorX 150GB

Cooling

Corsair Nautilus 500

Power Supply

PC Power and Cooling 1KW

Operating System

Windows Vista Ultimate 64-Bit

Here is the Intel Skulltrail Test platform:

Skulltrail Test Platform

Component

Brand/Model

Live Pricing

Processor

2x Intel Core 2 QX9775

Motherboard

Intel D5400XS 

Memory

4GB Micron 800MHz FB-DIMM

Video Card

EVGA GeForce 8800 GTS 512

Hard Drive

Western Digital RaptorX 150GB

Cooling

Zalman AT Fan/Heatsink

Power Supply

PC Power and Cooling 1KW

Operating System

Windows Vista Ultimate 64-Bit

The AMD Phenom X4 9950 Processor Test System

Here is the AMD Phenom Test platform:

AMD Test Platform

Component

Brand/Model

Live Pricing

Processor

All AM2 and AM2+ CPUs

Motherboard

MSI K9A2 Platinum

Memory

4GB OCZ Flex PC2-6400

Video Card

EVGA GeForce 8800 GTS 512

Hard Drive

Western Digital RaptorX 150GB

Cooling

Zalman AT Fan/Heatsink

Power Supply

PC Power and Cooling 1KW

Operating System

Windows Vista Ultimate 64-Bit

Sandra 2009 Memory Bandwidth

Sisoft; Sandra 2009:

Sisoftware Sandra 2009

The Sisoft Sandra 2009 benchmark utility just came out recently and we have started to include it in our benchmarking. With Sandra 2009 you can now easily compare the performance of the tested device with its speed and its (published) power (TDP)! Sandra XII SP2 also has SSE4 (Intel) and SSE4A (AMD) benchmark code-paths, which is great for those of you testing next-generation AMD & Intel chips.

Sandra XII SP1 Benchmark Scores

Results: Sandra 2009 showed that the Intel Core i7 processors blow away the competion thanks to the new memory design being used.  The Core i7 platform used three 2GB memory modules in Triple-Channel at 1600MHz with 8-8-8-24 1T timings, which is what we think will become the standard kit for this platform.  Corsair already has announced 1866MHz CL9 kits for this platform and Kingston Technology has announced 2GHz 3GB kits, so enthusiasts will easily break the 30GB/Sec mark with high performance memory kits.

Photodex ProShow Gold 3.2

ProShow Gold allows the user to combine photos, videos and music to create spectacular slide shows. The software provides the capability to share memories with friends and family on DVD, PC and the Web. ProShow Gold brings still photos to life by adding motion effects like pan, zoom, and rotate. The user can also add captions to a photo or video and choose from over 280 transition effects.

Photodex Proshow Gold 3.2 Benchmark Settings

The workload we are using takes 29 high resolution jpeg photos and converts them to an mpeg2, widescreen DVD quality, 3min 9sec slideshow video file. The input photos are in 3872x2592 resolution and total about 170MB in size.

Photodex Proshow Gold 3.2 Benchmarking

ProShow Gold 3.2 lets you share your slide shows in virtually any format and on any device. You can upload your shows directly to YouTube or choose from over 20 devices to directly output to including the iPod, Blackberry, ZuneTM and more. Not bad for software that runs under $70 and is optimized for eight-cores! Our benchmark testing wasn't at 100% load the entire time, but averaged around 95% during the testing period.

Photodex Proshow Gold 3.2 Benchmark Results

Benchmark Results: Photodex Proshow software showed that the Intel Core i7 quad-core processors do well with Hyper-Threading, but it wasn't enough to pass up the true 8-core QX9775 platform. The 3.2GHz Intel Core i7-965 was 11 seconds faster than the Intel Core 2 Quad QX9770, which is very impressive as they offer the same clock frequency.

Sony Vegas 8.0b

The Vegas Pro collection combines Vegas Pro 8, DVD Architect Pro 4.5, and Dolby Digital AC-3 encoding software to offer an integrated environment for all phases of professional video, audio, DVD, and broadcast production. These tools let you edit and process DV, AVCHD, HDV, SD/HD-SDI, and all XDCAM formats in real time, fine-tune audio with precision, and author surround sound, dual-layer DVDs. Vegas Pro software also supports 24p, HD and HDV editing, which is what we are going to look at in this benchmark.

Sony Vegas Benchmarking

The Sony Vegas 8.0b workload that we are using takes a series of short movie and audio files and creates a single video that incorporated special effects and transitions. It uses a MainConcept HDV encoding profile to render the 24p widescreen video clip at a resolution of 1440x1080x32.

Sony Vegas Benchmark Results

Benchmark Results: Running our custom Sony Vegas 8.0b benchmark shows just how important a CPU is when it comes to create a single video clip from multiple clips. The Intel Core i7-965 was the fastest processor that we have ever benchmarked on Sony Vegas! Even the 2.66GHz Core i7-920 beat out the 3.2GHz Core 2 Quad QX9770, which is awesome as the i7-920 is under $300 and the QX9770 is $999.

Microsoft Excel 2007

Microsoft Office Excel 2007 is a powerful and widely used tool with which you can create and format spreadsheets, and analyze and share information to make more informed decisions. It allows you to import, organize and explore massive data sets within spreadsheets and then communicate your analysis with professional-looking charts. Excel 2007 also provides tools to “see” important trends and find exceptions in your data. Legit Reviews has two benchmarking tests that we do on Microsoft Office Excel 2007.

Microsoft Excel 2007 Testing

The first workload executes approximately 28,000 sets of calculations using the most common calculations and functions found in Excel. These include common arithmetic operations like addition, subtraction, division, rounding and square root. It also includes common statistical analysis functions such as Max, Min, Median and Average. The calculations are performed after a spreadsheet with a large dataset is updated with new values and must re-calculate many data points. The input file is the 6.2 MB spreadsheet seen above.

Microsoft Excel 2007 Benchmark Results

Benchmark Results: Lots of people use Microsoft Office at work and home, so this is an important test for many of our readers. Many people don't run 28,000 sets of calculations at once, but if you do the CPU will determine how fast the task is completed. 

The Black-Scholes model is used in our second Excel test to calculate a theoretical call and put price using the five key determinants of an option's price: stock price, strike price, volatility, time to expiration, and short-term (risk free) interest rate.

Microsoft Excel 2007 Testing

This workload calculates the European Put and Call option valuation for Black-Scholes option pricing using Monte Carlo simulation. It simulates the calculations performed when a spreadsheet with input parameters is updated and must recalculate the option valuation. In this scenario we execute approximately 300,000 iterations of Monte Carlo simulation. In addition, the workload uses Excel lookup functions to compare the put price from the model with the historical market price for 50,000 rows to understand the convergence. The input file is a 70.1 MB spreadsheet and with 10 times the calculations of the first test, this one should take a bit longer to complete.

Microsoft Excel 2007 Benchmark Results

Benchmark Results: With 300,000 iterations of Monte Carlo simulation taking place in this benchmark it takes all the processors a bit longer to finish as it puts a good load on the system.  The Intel Skulltrail system is in a league of its own as it completes the task in less than ten seconds, but the Core i7 processors are right behind.  

Cinebench R9.5

MAXON; CINEBENCH 9.5:

CINEBENCH is the free benchmarking tool for Windows and Mac OS based on the powerful 3D software CINEMA 4D. Consequently, the results of tests conducted using CINEBENCH 9.5 carry significant weight when analyzing a computer's performance in everyday use. Especially a system's CPU and the OpenGL capabilities of its graphics card are put through their paces (even multiprocessor systems with up to 16 dedicated CPUs or processor cores). During the testing procedure, all relevant data is ascertained with which the performance of different computers can subsequently be compared, regardless of operating system. Again, higher Frames/Second and lower rendering time in seconds equal better performance.

Cinebench 9.5 Benchmarking

Cinebench 9.5 was able to put a 100% load across all the cores, which makes this a great benchmark to look at multi-core platforms.

Cinebench 9.5 Benchmark Results

Benchmark Results: Cinebench 9.5 was tested in both 64-bit and 32-bit, which resulted in some minor performance differences as seen above. The Intel Core i7 family of processors showed some nice performance gains over the current generation quad-core processors!

Cinebench R10

MAXON; CINEBENCH R10:

CINEBENCH is the free benchmarking tool for Windows and Mac OS based on the powerful 3D software CINEMA 4D. Consequently, the results of tests conducted using CINEBENCH 10 carry significant weight when analyzing a computer's performance in everyday use. Especially a system's CPU and the OpenGL capabilities of its graphics card are put through their paces (even multiprocessor systems with up to 16 dedicated CPUs or processor cores). The test procedure consists of two main components: The first test sequence is dedicated to the computer's main processor. A 3D scene file is used to render a photo reaslistic image. The scene makes use of various CPU-intensive features such as reflection, ambient occlusion, area lights and procedural shaders. In the first run, the benchmark only uses one CPU (or CPU core), to ascertain a reference value. On machines that have multiple CPUs or CPU cores, and also on those who simulate multiple CPUs (via HyperThreading or similar technolgies), MAXON CINEBENCH will run a second test using all available CPU power. Again, higher Frames/Second and lower rendering time in seconds equal better performance.

Cinebench 10

Cinebench R10 was able to put a 100% load across all the cores on all of the processors, which makes this a great benchmark to look at multi-core platforms.

Cinebench R10 Results

Results: Running Cinebench R10 in 64-bit mode showed a significant improvement in performance on all of the processors and the results were in-line with what we expected from running Cinebench R9.5!  The Intel Core i7 965 was 27% quicker than the Intel Core 2 Quad QX9770 and both are the same clock frequency!

POV-Ray 3.7 Beta 25

Processor Performance on Pov-Ray 3.7 Beta 25:

The Persistence of Vision Ray-Tracer was developed from DKBTrace 2.12 (written by David K. Buck and Aaron A. Collins) by a bunch of people (called the POV-Team) in their spare time. It is an high-quality, totally free tool for creating stunning three-dimensional graphics. It is available in official versions for Windows, Mac OS/Mac OS X and i86 Linux. The POV-Ray package includes detailed instructions on using the ray-tracer and creating scenes. Many stunning scenes are included with POV-Ray so you can start creating images immediately when you get the package. These scenes can be modified so you do not have to start from scratch. In addition to the pre-defined scenes, a large library of pre-defined shapes and materials is provided. You can include these shapes and materials in your own scenes by just including the library file name at the top of your scene file, and by using the shape or material name in your scene. Since this is free software feel free to download this version and try it out on your own.

The most significant change from the end-user point of view between versions 3.6 and 3.7 is the addition of SMP (symmetric multiprocessing) support, which, in a nutshell, allows the renderer to run on as many CPU's as you have installed on your computer. This will be particularly useful for those users who intend on purchasing a dual-core CPU or who already have a two (or more) processor machine. On a two-CPU system the rendering speed in some scenes almost doubles. For our benchmarking we used version 3.7 beta 25, which is the most recent version available.  The benchmark used all available cores to complete the render.

Pov-Ray 3.7 Beta 25

Once rendering on the object we selected was completed, we took the score from dialog box, which indicates the average PPS for the benchmark. A higher PPS indicates faster system performance.

Pov-Ray 3.7 Beta 25

Benchmark Results: Looking at POV-Ray 3.7 Beta 25, the Intel Core i7-965 was over 30% faster than the QX9770 and 56% faster than the quickest processor AMD offers.

POV-Ray Real-Time Raytracing

Legit Reviews was e-mailed by one of the developers over at POV-Ray to see if LR could include real-time raytracing in our performance analysis, and we were more than happy to include the data in our testing. 

E-Mail From POV-Ray -- I thought I might ping you about an experimental feature we've added to the POV-Ray SMP beta: real-time raytracing. It's mostly useful to folks who have multi-core systems and in fact is something that I've wanted to do for years but the hardware just wasn't there (at least not in the consumer price range). It works best on a kentsfield or later, but a core 2 duo should be sufficient if you don't mind sub-10fps frame rates.

If you want to try it out it please feel free to grab it from:  http://www.povray.org/beta/rtr/

POV-Ray real-time raytracing

This experimental software by POV-Ray was a welcomed addition to our testing and was able to spread the work load across all the cores in even our eight core test system as seen above.

POV Ray RTR Benchmark Chart

Results: POV-Ray Real-Time Raytracing is a great benchmark that we love to use on Legit Reviews and it does a great job at showing how performance scales with CPU cores. The Core i7 series really struts their stuff with Real-Time Raytracing as all three processors rendered the scene over 20FPS.

Futuremark 3DMark06

Futuremark 3DMark 2006

3DMark06

Futuremark's 3DMark06 has a built-in CPU test is a multi-threaded DirectX gaming metric that's useful for comparing relative performance between similarly equipped systems. This test consists of two different 3D scenes that are processed with a software renderer that is dependent on the host CPU's performance. Calculations that are normally reserved for your 3D accelerator are instead sent to the CPU for processing and rendering. The frame-rate generated in each test is used to determine the final score.

Futuremark CPU Benchmark Results

Futuremark CPU Benchmark Results

Benchmark Results: The 3DMark 2006 CPU test showed that the Intel Core i7 920, 940 and 965 are hands down the fastest Intel quad-core processors we have ever seen!  The pair of 9775 quad-core processors were still the overall leaders, but they cost twice as much and require an expensive dual-socket motherboard.

Overclocking Results

Overclocking greatly varies due to what hardware is being used and who is doing the overclocking. Always remember that no two pieces of hardware will perform the same, so our results will differ from what you might be able to get.

Intel Core i7 965 Processor Overclocking

Using the ASUS P6T motherboard with BIOS v8004 we pushed the limits of our early revision processor to see what it could do.  At stock settings the Intel Core i7 965 processor runs with a 133MHz baseclock that is multiplied by the CPU multiplier to get the CPU speed and by the QPI multiplier to get the QPI speed. The Intel Core i7 965 has a 24x multiplier that is used to reach the final core clock of 3.20GHz.  As you can see above, the ASUS P6T Deluxe motherboard runs at 133.6MHz, so the overall clock frequency is 7Mhz higher than the processor is rated.

Intel Core i7 965 Processor Overclocking

By not touching anything in the BIOS other than the CPU Voltage (Auto to 1.35V) we were able to reach 4GHz right off the bat!  This is not bad at all and is nearly an 800MHz overclock for a few seconds worth of work.

Intel Core i7 965 Processor Overclocking

With a little extra voltage to the processor and a boost of the QPI from 133MHz to 145MHz we were able to hit 4.2GHz.  With the system running at 4.2GHz it wasn't fully stable, but we feel certain that with a little more effort that 4.2GHz should be easily had on most enthusiast motherboards.  The Intel Core i7 series can overclock over 1GHz, which is a great sign for a brand new architecture!

Final Thoughts and Conclusions

Power Consumption

Since power consumption is a big deal these days, we ran some simple power consumption tests on our test beds. The systems ran with the power supplies, case fan, video card and hard drive model. To measure idle usage, we ran the system at idle for one hour on the desktop with no screen saver and took the measurement. For load measurements, POV-Ray 3.7 was run on all cores to make sure each and every processor was at 100% load. All of the systems used identical hardware minus the motherboard and processor. It should be noted that the Core i7 processors used a Thermaltake BigWater 760i water cooler and the rest of the systems used a Corsair Nautilus 500 water cooler.

Power Consumption Results

Results: When it came to idle power consumption the Intel Core i7 series used more power than we expected, but for having such a large cache they didn't do badly by any means. The entire system with a water cooler was still under 300 Watts, which is impressive for being the fastest quad-core processor in the world.

Intel Core i7 Retail Heatsink

Final Thoughts

This is just a quick look at the Intel Core i7 processor family performance on a number of respected benchmarks. Expect more deep dives in the weeks to come as we have numerous boards, cooling solutions and memory kits that we are still trying out on this new platform. 

The performance numbers speak for themselves as the Intel Core i7 965 Extreme Edition proved itself to be more than 35% faster than the equally clocked Core 2 Extreme QX9770 processor in a number of benchmarks.  This is an impressive number and one that may be higher than many expected.  When overclocked the Core i7 965 was wickedly fast and ripped through performance tests faster than anything we have ever seen.  Nehalem offers obvious clock-for-clock performance improvements and that is something the community must see before making a platform change. Pricing for the three new Intel Core i7 processors is fairly aggressive and the Core i7 965 Extreme Edition comes in at $999, which is the price that the Intel Core 2 Extreme QX9770 used to be. 

Intel has once again launched a great part that once again increases the performance gap between them and AMD.  With the Intel Core i7 pulling so far ahead of the AMD Phenom series of processors it almost makes you wonder if AMD will be able to ever catch up. 

Legit Reviews Editor's Choice

Legit Bottom Line: The performance benchmarks confirm that the Intel Core i7 series of processors are the real deal and the new platform is solid.