Saturday, 3 May 2014

Parallel Processing

Basics

Computations that use multi-processor computers and several
independent computers interconnected in some way, working together on a common task is called Parallel Computing. It is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently i.e in parallel. There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Main memory in a parallel computer is either shared memory or distributed memory. Distributed memory refers to the fact that the memory is logically distributed, but often implies that it is physically distributed as well.Accesses to local memory are typically faster than accesses to non-local memory.

Which Parallelis is supported and why ?

Instruction-level parallelism (ILP) is supported by in some systems because
1. Multiple instructions from the same instruction stream can be executed
concurrently
2. Generated and managed by hardware (superscalar) or by compiler (VLIW)
3. Limited in practice by data and control dependences

Thread-level parallelism (TLP) is supported by in some systems because
1. Multiple threads or instruction sequences from the same application can be
executed concurrently
2. Generated by compiler/user and managed by compiler and hardware
3. Limited in practice by communication/synchronization overheads and by
algorithm characteristics

Granularity
In parallel computing, granularity is a qualitative measure of the ratio of computation to communication.
Fine-grain Parallelism:
1. Relatively small amounts of computational work are done between communication events.
2. Low computation to communication ratio
3. Facilitates load balancing
4. Implies high communication overhead and less opportunity for performance enhancement
If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation.

Coarse-grain Parallelism:
1. Relatively large amounts of computational work are done between communication/synchronization events
2. High computation to communication ratio
3. Implies more opportunity for performance increase
4. Harder to load balance efficiently

Which is Best?
The most efficient granularity is dependent on the algorithm and the hardware environment in which it runs.In most cases the overhead associated with communications and synchronization is high relative to execution speed so it is advantageous to have coarse granularity.Fine-grain parallelism can help reduce overheads due to load imbalance.

Interprocess Communication (IPC)
In computing, inter-process communication (IPC) is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network.
IPC enables one application to control another application, and for several applications to share the same data without interfering with one another. IPC is required in all multiprocessing systems, but it is not generally supported by single-process operating systems such as DOS. OS/2 and MS-Windows support an IPC mechanism called DDE .

Using DDE for IPC
DDE is a protocol that enables applications to exchange data in a variety of formats. Applications can use DDE for one-time data exchanges or for ongoing exchanges in which the applications update one another as new data becomes available.

Some of the IPC methods are :

1. Socket - A data stream sent over a network interface, either to a different process on the same computer or to another computer and is provided by most os like windows, posix, unix.

2.Message Queue -An anonymous data stream similar to the pipe, but stores and retrieves information in packets and is provided by os like windows and all posix systems.

3. Semaphore -A simple structure that synchronizes threads or processes acting on shared resources and is provided by windows and all posix systems.

UNIX FOR MULTIPROCESSOR SYSTEM

The UNIX operating system for a multiprocessor system has some additional features as compared to the normal UNIX operating system.The OS functions including processor scheduling, virtual memory management, I/O devices etc, are implemented with a large amount of system software. Normally the size of the OS is greater than the size of the main memory. The portion of OS that resides in
the main memory is called kernel. For a multiprocessor, OS is developed on three models.These UNIX kernels are implemented with locks semaphores and monitors.

1) Master slavekernel:In this model, only one of the processors is designated as
Master.
The master is responsible for the following activities:
i) running the kernel code
ii) handling the system calls
iii) handling the interrupts.
The rest of the processors in the systemrun only the user code and are called slaves.
2) Floating-Executive model: The master-slave kernel model is too restrictive in the
sense that only one of the processors viz the designated master can run the kernel.
This restriction maybe relaxed by having more than one processors capable of
running the kernel and allowing additional capability by which the master mayfloat
among the various processors capable of running the kernel.
3) Multi-threaded UNIX kernel:We know that threads are light-weight processors
requiring minimal state information comprising the processor state and contents of
relevant registers. A threadbeing a (light weight) process is capable of executing
alone. In a multiprocessor system, morethan one processors may execute
simultaneously with each processor possibly executing more than one threads.


Compilers

The best compilers to use in terms of code execution speed is gcc
On the old Itaniums (all except lobster6) these are called with efc/ecc
Documentation sources in order of increasing complexity:
[ifort/icc/icpc] -help" will show notes about using the compiler.
man [ifort/icc/icpc] will display informative manual pages on compiler flags/usage
Fortran v8.1 and C/C++ v8.1 documentation


GCC (GNU Compiler Collection)

Currently contains front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).
In addition, G95 is installed on porcupine and the cluster at /usr/bin/g95. It has a resume from dump feature that can be signal initiated, as well as an extensive environment configuration.

BY-
Shubham Gupta
336/CO/11