Geeks With Blogs
Mark Pearl


Designing for Performance

The basic building blocks for todays computers are virtually the same as those of the early IAS computers. The focus has been on optimizing and increasing speeds while keeping the same architecture.


Increasing speed with processors uses several techniques including…

  • Branch prediction – a processor looks ahead in the instruction code fetched from memory and predicts which branches or groups of instructions are likely to be processed next.
  • Data flow analysis – the processor analyses which instructions are dependent on each others results or data to create an optimized schedule of instructions preventing unnecessary delay
  • Speculative execution – Using branch prediction and data flow analysis some processors speculatively execute instructions ahead of their actual appearance in the program execution saving the results in a temporary location

Performance Balance

While processor power has increased over the years, other critical components have not kept up. The result is a need to look for ways of balancing the performance. One of the main bottlenecks has been the interface between processor and main memory.

There are a number of ways that a system architect can address this problem including the following…

  • Increase the number of bits that are retrieved at one time by making DRAMs wider rather than deeper and using wide bus data paths
  • Change the DRAM interface to make it more efficient by including a cache or other buffering scheme on the DRAM chip
  • Reduce the frequency of memory access by incorporating increasingly complex and efficient cache structures between the processor and main memory including the incorporation of one or more caches on the processor as well as on an off chip cache close to the processor chip
  • Increase the interconnect bandwidth between processors and memory by using higher speed buses and by using a hierarchy of buses to buffer and structure data flow.

Another are of design focus is the handling of I/O devices. The main challenge is getting the data from these devices moved between processor and peripheral. Strategies include buffering and cache techniques. The use of multiple processor configurations can aid in satisfying I/O demands as well.

There are generally two constantly evolving factors

  1. The rate at which performance is changing in the various technology areas differs greatly from one type of element to another
  2. New applications and new peripheral devices constantly change the nature of the demand on the system in terms of typical instruction profile and the data access patterns

Improvements in Chip Organization and Architecture

There are 3 approaches to achieving increased processor speed…

  1. Increase the hardware speed of the processor (i.e. shrinking the size of the logic gates on the processor chip which would increase the individual operations executed on the chip)
  2. Increase the size and speed of caches that are interposed between the processor and main memory.
  3. Make changes to the processor organization and architecture that increase the effective speed od instruction execution (normally via parallelism)

As clock speeds and logic density increase, a number of obstacles become more significant including…

  • Power – the power density increases with an increase in logic density and clock speed. One challenge of this is the difficulty of dissipating the heat generated on high-density, high-speed chips
  • RC delay – The speed at which electrons can flow on a chip between transistors is limited by the resistance and capacitance of the metal wires connecting them. delay increases as the RC product increases. As components on the chip decrease in size, the wires are closer together, increasing capacitance
  • Memory latency – Memory speeds lag processor speeds as previously discussed

With these challenges becoming harder to reduce, designers of chips are now resorting to placing multiple processors on the same chip, with a large shared cache. Multicore processors provide the potential to increase performance without increasing the clock rate. Thus the current strategy is to rather use two simpler processors instead of one more complicated processor.

The Evolutions of the Intel x86 Architecture

There are two main architectures that we will examine – Intel x86 and ARM processors

  • CISC Design - The Intel x86 is an excellent example of CISC design (Complex Instruction Set Computers).
  • RISK Design - The ARM architecture is used in a wide variety of devices and embedded systems and is a good example of RISC design (Reduced instruction set).

Some of the main differences between the Pentium, Pentium Pro, Pentium II, Pentium III, Pentium 4, the Core (Duo) and the Core 2

  • Pentium – introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel
  • Pentium Pro – Made aggressive use of register renaming, branch prediction, data flow analysis and speculative execution
  • Pentium II – Used Intel MMX technology which is designed to specifically process video, audio, and graphics data efficiently
  • Pentium III – Incorporates additional floating-point instructions to support 3D graphics software
  • Pentium 4 – Includes additional floating point and other enhancements for multimedia
  • Core – First microprocessor with a dual core, i.e. two processors on a single chip
  • Core 2 – Extends the architecture to 64 bits
Posted on Saturday, February 4, 2012 11:59 AM UNISA COS 2621 Computer Organization | Back to top

Comments on this post: Organization & Architecture UNISA Studies – Chap 2

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © MarkPearl | Powered by: