Saturday, December 23, 2006

Amd MultiCore, Intel MultiCore V

AMD Multicore and Intel Multicore
----------------------------



Parallel computing
Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination.

Parallel computing systems
A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recently introduced multicore processors contain multiple logical processors in a single package.

There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory.

Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures, classifies parallel (and serial) computers according to

* whether all processors execute the same instructions at the same time (single instruction/multiple data -- SIMD) or
* each processor executes different instructions (multiple instruction/multiple data -- MIMD).

One major way to classify parallel computers is based on their memory architectures. Shared memory parallel computers have multiple processors accessing all available memory as global address space. They can be further divided into two main classes based on memory access times: Uniform memory access (UMA), in which access times to all parts of memory are equal, or Non-Uniform memory access (NUMA), in which they are not. Distributed memory parallel computers also have multiple processors, but each of the processors can only access its own local memory; no global memory address space exists across them.

Parallel computing systems can also be categorized by the numbers of processors in them. Systems with thousands of such processors are known as massively parallel. Subsequently there are what are referred to as "Large scale" vs "Small scale" parallel processors. This depends on the size of the processor, eg. a PC based parallel system would generally be considered a small scale system.

Parallel processor machines are also divided into symmetric and asymmetric multiprocessors, depending on whether all the processors are the same or not (for instance if only one is capable of running the operating system code and others are less privileged).

A variety of architectures have been developed for parallel processing. For example a Ring architecture has processors linked by a ring structure. Other architectures include Hypercubes, Fat trees, systolic arrays, and so on.

Theory and practice
Parallel computers can be modelled as Parallel Random Access Machines (PRAMs). The PRAM model ignores the cost of interconnection between the constituent computing units, but is nevertheless very useful in providing upper bounds on the parallel solvability of many problems. In reality the interconnection plays a significant role.

The processors may communicate and cooperate in solving a problem or they may run independently, often under the control of another processor which distributes work to and collects results from them (a "processor farm").

Processors in a parallel computer may communicate with each other in a number of ways, including shared (either multiported or multiplexed) memory, a crossbar, a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), an n-dimensional mesh, etc. Parallel computers based on interconnect network need to employ some kind of routing to enable passing of messages between nodes that are not directly connected. The communication medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines. Similarly, memory may be either private to the processor, shared between a number of processors, or globally shared. Systolic array is an example of a multiprocessor with fixed function nodes, local-only memory and no message routing.

Approaches to parallel computers include:

* Multiprocessing
* Computer cluster
* Parallel supercomputers
* Distributed computing
* NUMA vs. SMP vs. massively parallel computer systems
* Grid computing

Performance vs. cost

While a system of n parallel processors is less efficient than one n-times-faster processor, the parallel system is often cheaper to build. Parallel computation is used for tasks which require very large amounts of computation, take a lot of time, and can be divided into n independent subtasks. In recent years, most high performance computing systems, also known as supercomputers, have parallel architectures.

Terminology in parallel computing
Some frequently used terms in parallel computing are:

Task
a logically high level, discrete, independent section of computational work. A task is typically executed by a processor as a program
Synchronization
the coordination of simultaneous tasks to ensure correctness and avoid unexpected race conditions.
Speedup
also called parallel speedup, which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution.
Parallel overhead
the extra work associated with parallel version compared to its sequential code, mostly the extra CPU time and memory space requirements from synchronization, data communications, parallel environment creation and cancellation, etc.
Scalability
a parallel system's ability to gain proportionate increase in parallel speedup with the addition of more processors.

Algorithms
Parallel algorithms can be constructed by redesigning serial algorithms to make effective use of parallel hardware. However, not all algorithms can be parallelized. This is summed up in a famous saying:

One woman can have a baby in nine months, but nine women can't have a baby in one month.

In practice, linear speedup (i.e., speedup proportional to the number of processors) is very difficult to achieve. This is because many algorithms are essentially sequential in nature (Amdahl's law states this more formally).

Certain workloads can benefit from pipeline parallelism when extra processors are added. This uses a factory assembly line approach to divide the work. If the work can be divided into n stages where a discrete deliverable is passed from stage to stage, then up to n processors can be used. However, the slowest stage will hold up the other stages so it is rare to be able to fully use n processors.

Parallel programming
Parallel programming is the design, implementation, and tuning of parallel computer programs which take advantage of parallel computing systems. It also refers to the application of parallel programming methods to existing serial programs (parallelization).

Parallel programming focuses on partitioning the overall problem into separate tasks, allocating tasks to processors and synchronizing the tasks to get meaningful results. Parallel programming can only be applied to problems that are inherently parallelizable, mostly without data dependence. A problem can be partitioned based on domain decomposition or functional decomposition, or a combination.

There are two major approaches to parallel programming.

* implicit parallelism -- the system (the compiler or some other program) partitions the problem and allocates tasks to processors automatically (also called automatic parallelizing compilers) -- or
* explicit parallelism where the programmer must annotate their program to show how it is to be partitioned.

Many factors and techniques impact the performance of parallel programming:

* Load balancing attempts to keep all processors busy by moving tasks from heavily loaded processors to less loaded ones.

Some people consider parallel programming to be synonymous with concurrent programming. Others draw a distinction between parallel programming, which uses well-defined and structured patterns of communications between processes and focuses on parallel execution of processes to enhance throughput, and concurrent programming, which typically involves defining new patterns of communication between processes that may have been made concurrent for reasons other than performance. In either case, communication between processes is performed either via shared memory or with message passing, either of which may be implemented in terms of the other.

Programs which work correctly in a single CPU system may not do so in a parallel environment. This is because multiple copies of the same program may interfere with each other, for instance by accessing the same memory location at the same time. Therefore, careful programming (synchronization) is required in a parallel system.

Parallel programming models

Main article: Parallel programming model

A parallel programming model is a set of software technologies to express parallel algorithms and match applications with the underlying parallel systems. It encloses the areas of applications, languages, compilers, libraries, communication systems, and parallel I/O. People have to choose a proper parallel programming model or a form of mixture of them to develop their parallel applications on a particular platform.

Parallel models are implemented in several ways: as libraries invoked from traditional sequential languages, as language extensions, or complete new execution models. They are also roughly categorized for two kinds of systems: shared memory systems and distributed memory systems, though the lines between them are largely blurred nowadays.

Topics in parallel computing


Generic:

* Automatic parallelization
* Parallel algorithm
* Cellular automaton
* Grand Challenge problems

Computer science topics:

* Lazy evaluation vs strict evaluation
* Complexity class NC
* Communicating sequential processes
* Dataflow architecture
* Parallel graph reduction

Practical problems:

* Parallel computer interconnects
* Parallel computer I/O
* Reliability problems in large systems

Programming languages/models:

* OpenMP
* Message Passing Interface/MPICH
* Occam
* Linda
* Cilk

Specific:

* Atari Transputer Workstation
* BBN Butterfly computers
* Beowulf cluster
* Blue Gene
* Deep Blue
* Fifth generation computer systems project
* ILLIAC III
* ILLIAC IV
* Parallel Element Processing Ensemble
* Meiko Computing Surface
* NCUBE
* Teramac
* Transputer

Parallel computing to increase fault tolerance:

* Master-checker

Companies (largely historical):

* Thinking Machines
* Convex Computer Corporation
* Meiko
* Control Data Corporation
* Myrias Research Corporation

======================
Source: http://en.wikipedia.org/wiki/Parallel_computing
======================

1 comment:

WIRED said...

Hi. Just read your article on multi cores which was very interesting. Would you pop over to http://theitcenter.blogspot.com/ and if you want become an author because you have an excellent writing ability!