Wednesday 29 January 2014

Fwd: Optimizing Compilers



---------- Forwarded message ----------
From: Siddharth Saluja <siddharthsaluja53@gmail.com>
Date: Thu, Jan 30, 2014 at 2:19 AM
Subject: Optimizing Compilers
To: apmahs.nsit.coeunited@blogger.com


                          Optimizing Compilers

 

Effective optimizing compilers need to gather information about the structure and the flow of control through programs.

 

·        Which instructions are always executed before a given instruction.

·        Which instructions are always executed after a given instruction.

·        Where the loops in a program are 90% of any computation is normally spent in 10% of the code: the inner loops.

 

Features of Optimization Techniques:

·        The most complex component of modern compilers must always be

sound, with  semantics-preserving.

·        Need to pay attention to exception cases as well

·        Use a conservative approach: risk missing out optimization rather

than changing semantics.

·        Reduce runtime resource requirements (most of the time)‏

·        Usually, runtime, but there are memory optimizations as well

·        Runtime optimizations focus on frequently executed code

·        Cost-effective, i.e., benefits of optimization must be worth

the effort of its implementation.

 

Various Levels :

 

High-level optimizations

• Operate at a level close to that of source-code

• Often language-dependent

 

 

Intermediate code optimizations

• Most optimizations fall here

• Typically, language-independent

 

Low-level optimizations

• Usually specific to each architecture

 

High Level Optimizations :

• Inlining

•Replace function call with the function body

• Partial evaluation

•Statically evaluate those components of a program that can be evaluated

• Tail recursion elimination

• Loop reordering

• Array alignment, padding, layout

 

Intermediate Level Optimizations :

• Common subexpression elimination

• Constant propagation

• Jump-threading

• Loop-invariant code motion

• Dead-code elimination

• Strength reduction

 

Low-level Optimizations

• Register allocation

• Instruction Scheduling for pipelined machines.

• loop unrolling

• instruction reordering

• delay slot filling

• Utilizing features of specialized components, e.g., floating-point units.

• Branch Prediction

 

Constant propogation :

 

·        Identify expressions that can be evaluated at compile time, and replace them with their values.

 

Strength reduction :

·        Replace expensive operations with equivalent cheaper (more efficient) ones.

y = 2;

ð y = 2;

z = x^y; z = x*x;

 

·        The underlying architecture may determine which operations are cheaper and which ones are more expensive.

 

Loop-Invariant Code Motion :

•Move code whose effect is independent of the loop's iteration outside the loop.

 

Common Subexpression Elimination:

 

·        Expression previously computed

 

§  Values of all variables in expression have not changed.

§  Based on available expressions analysis

 

·        Dead Code Elimination :

 

§  Dead variable: a variable whose value is no longer used

§  Live variable: opposite of dead variable

§  Dead code: a statement that assigns to a dead variable

 

BY -
Siddharth Saluja , Varun Jain

Optimizing Compilers

                          Optimizing Compilers

 

Effective optimizing compilers need to gather information about the structure and the flow of control through programs.

 

·        Which instructions are always executed before a given instruction.

·        Which instructions are always executed after a given instruction.

·        Where the loops in a program are 90% of any computation is normally spent in 10% of the code: the inner loops.

 

Features of Optimization Techniques:

·        The most complex component of modern compilers must always be

sound, with  semantics-preserving.

·        Need to pay attention to exception cases as well

·        Use a conservative approach: risk missing out optimization rather

than changing semantics.

·        Reduce runtime resource requirements (most of the time)‏

·        Usually, runtime, but there are memory optimizations as well

·        Runtime optimizations focus on frequently executed code

·        Cost-effective, i.e., benefits of optimization must be worth

the effort of its implementation.

 

Various Levels :

 

High-level optimizations

• Operate at a level close to that of source-code

• Often language-dependent

 

 

Intermediate code optimizations

• Most optimizations fall here

• Typically, language-independent

 

Low-level optimizations

• Usually specific to each architecture

 

High Level Optimizations :

• Inlining

•Replace function call with the function body

• Partial evaluation

•Statically evaluate those components of a program that can be evaluated

• Tail recursion elimination

• Loop reordering

• Array alignment, padding, layout

 

Intermediate Level Optimizations :

• Common subexpression elimination

• Constant propagation

• Jump-threading

• Loop-invariant code motion

• Dead-code elimination

• Strength reduction

 

Low-level Optimizations

• Register allocation

• Instruction Scheduling for pipelined machines.

• loop unrolling

• instruction reordering

• delay slot filling

• Utilizing features of specialized components, e.g., floating-point units.

• Branch Prediction

 

Constant propogation :

 

·        Identify expressions that can be evaluated at compile time, and replace them with their values.

 

Strength reduction :

·        Replace expensive operations with equivalent cheaper (more efficient) ones.

y = 2;

ð y = 2;

z = x^y; z = x*x;

 

·        The underlying architecture may determine which operations are cheaper and which ones are more expensive.

 

Loop-Invariant Code Motion :

•Move code whose effect is independent of the loop's iteration outside the loop.

 

Common Subexpression Elimination:

 

·        Expression previously computed

 

§  Values of all variables in expression have not changed.

§  Based on available expressions analysis

 

·        Dead Code Elimination :

 

§  Dead variable: a variable whose value is no longer used

§  Live variable: opposite of dead variable

§  Dead code: a statement that assigns to a dead variable

 

Tuesday 28 January 2014

FLYNN'S CLASSIFICATION (cont)

Hierarchy of generic computer system:

APPLICATION

OPERATING SYSTEM

SYSTEM PROGRAM / COMPILER

COMMUNICATION / NETWORKING

ADDRESS SPACE

INSTRUCTION SET ARCHITECTURE

HARDWARE

 

While transforming from single instruction single data (SISD) to single instruction multiple data (SIMD) OR multiple instruction single data (MISD) OR multiple instruction multiple data (MIMD),

almost every layer of computer system hierarchy gets affected.

SISD, SIMD, MISD and MIMD  ,all involves the variations of the VON-NEWMEN concept.

 Chipset:

A chipset is a set of electronic components in an integrated circuit that manages the data flow between the processor, memory and peripherals. Because it controls communications between the processor and external devices, the chipset plays a crucial role in determining system performance.

There are two chips in the core logic chipset :

è Northbridge  - implements faster capabilities of the mother board in a chipset computer architecture.

è Southbridge  - typically implements the slower capabilities of the motherboard in a chipset computer architecture.

 

Multiple Instruction Single Data :

 

It is a type of parallel computing architecture where many processing units(PUs) perform different operations on the same data.

 

  In the above block diagrram, we have multiple instructions IS1,IS2,IS3 given to 

individual control units which are further connected to processing units.There are latches between each processing unit which stores the data being processed by the previous stage.

 

Lets take an example of image processing:

 

Suppose we are having 3 instructions as decreasing the intensity of the image(IS1),encoding a message int the image(IS2) and then decreasing the size of the image(IS3).

All the instructions are given to CUs parallely. Data of image is taken by PU1 from main memory  and IS1 is processed. The processed data is then stored in the latch provied after PU1. In the meanwhile data is being saved, data of next image start transferring to PU1. The data present in first latch is processed in PU2. The process goes on until nall the data is being processed. 

 

 The above example shows that there is a kind of pipelining being used in data processing.

 

A systolic array is an example of a MISD structure.

Systollic arrays : It is a pipe network arrangement of processing units called cells. It is a specialized form of parallel computing, where cells (i.e. processors), compute data and store it independently of each other. Each cell shares the information with its neighbours immediately after processing.

 

Multiple Instruction Multiple Data :

MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time , different processors may be executing different instrucyions on different pieces of data. MIMD  machines can be either of shared memory or distributed memory categories.

 

Shared memory models :

The processors are all connected to globally available memory . The OS usually maintains the coherence.

Examples of shared memory multiprocessors are :

1.       NUMA (Non uniform memory access) :  Under NUMA a processor can access its own local memory faster than non-local memory. The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users.

2.       UMA (Uniform Memory Access) :  All the processors share the physical memory uniformly. In a UMA architecture, access time to a memory location is independent of which processor makes the request or which memory chip contains the transferred data.

 

Distributed memory models :

In distributed memory MIMD machines, each processor has its own individual memory location. Each processor has no direct knowledge about other processor's memory. For data to be shared, it must be passed from one processor to another as a message.

Example – NORMA (No remote memory access) :
Accesses to remote memory modules are only indirectly possible by messages via interconnection network .

MIMD machines may have combination of both the above memory models . The independent UMA clusters may communicate via Global InterConnection similar to a NORMA.

 

Questions to  ponder upon :

 

1)      Is this maximum parallelism we can achieve ?

2)      Can we get rid of von-newmen concept?

3)      How to decompose a program?

4)      Relationship between architecture and application.

 

By Shefali Singh and Upasana Mehta


Friday 24 January 2014

PIPELINING

PIPELINING:

Pipelining is an implementation technique where multiple instructions are overlapped  in execution. The computer pipeline is divided in stages. Each stage completes a part  of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.

 

·         F: Fetch

·         D: Decode

·         C: Calculating the address of operand

·         E: Execute

·         W: Write back

 

Types of hazards:

1. Structural hazards

2. Data hazards

3. Control hazards

 

Structural Hazards

Structural hazards occur when a certain resource (memory, functional unit) is requested by more than

one instruction at the same time.

Clock cycle  →  1 2 3 4 5 6 7 8 9 10 11 12

 

·         Instr. i        F D C F E W

·         Instr. i+1        F D C F E W

·         Instr. i+2          F D C F E W

·         Instr. i+3              F D C F E W (4 is stalled)

·         Instr. i+4                F D C F E  W

 

1st F is for fetching instruction and 2nd for fetching operand.

In class we were told after decode there are 3 execute operation but in this first 2 are given specific task to help it understand better.

Penalty

: 1 cycle

 

Data Hazards

We have two instructions, I1 and I2. In a pipeline the execution of I2 can start only when I1 has been executed.

If in a certain stage of the pipeline, I2 needs the result produced by I1, but if this result has not yet been generated,then we have a data hazard and thus we can say that I2 is dependent on I1 i.e Data Dependency.

I1: MUL R2,R3 R2 ← R2 * R3

I2: ADD R1,R2 R1 ← R1 + R2

 

Before executing its F stage, the ADD instruction is stalled until the MUL instruction has written the result into R2.

Penalty

: 2 cycles

 

Control Hazards

 

This type of hazard is caused by uncertainty of execution path, branch taken or not taken.It results when we branch to a new location in the program, invalidating everything we have loaded in our pipeline.

Pipeline stall until branch target known.

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12

BR TARGET     F D C F E W

target                            F D C F E W(2,3,4 are stalled)

next instruction of target is not exexuted until the target is fetched.

 

Example of Inordering and Reordering

           Z ← X + Y                                              C←A*B

Instr. 1>  R1 ←Mem(X)                                    Instr. 5> R5 ←Mem(A)                                        

Instr. 2>  R2 ←Mem(Y)                                    Instr. 6> R6 ←Mem(B)

Instr. 3>  R3 ←R1+R2                                       Instr. 7> R7 ←R5*R6

Instr. 4>  Mem(Z) ←R3                                    Instr. 8> Mem(C) ←R7      

 

 

Clock cycle  →  1 2 3 4 5 6 7 8 9 10 11 12 13 14

Instr. 1               F D I C F E W

Instr. 2                  F D I C F E W

Instr. 3                      F D      I C F  E  W

Instr. 4                          F      D       I   C    F   E   W

               

 

now if the instructuion are in inorder i.e 1 2 3 4 5 6 7 8 then, then there is stalling

But if there is reordering i.e

1 2 5 6 3 4 7 8 then the stalling can be reduced.

 

 

Static pipelining                                                           Dynamic Pipelining

>There is no reordering is instructions.                    >There is reordering of instuctions.

>If one clock cycle is empty in one instruction         >No necessity of empty inst. Due to reordering  

then another it will also be empty in                                other instruction can be executed.

the following instructions.     

 

Mechanism for Instruction Pipelining:

 

Here use of caching,collision avoidance,,multiple functional units,register tagging,and internal forwarding is explained to smoothen the pipeline flow and to remove

bottlenecks.

 

Prefetch Buffer:

In one access time one block of memory is loaded into prefetch buffer.The block access time can be reduced using cache or interleaved memory modules.

Types of Prefetch Buffers:

 

1. Sequential Buffers

2. Target Buffers

3. Loop Buffers

 

Sequential and Target Buffers:

Sequential instructions are loaded into pair of sequential buffers for in sequence pipelining.Instructions from a branch target are loaded into a

pair of target buffers for out-of-sequence pipelining.Both buffers operate in FIFO fashion.A conditional branch like a if condition causes both

sequential and target buffers to fill with instructions.Instructions are taken on the basis of the branch condition from the corresponding buffer and discards

the instruction in other buffer.Within each pair one can usebuffer to load instructions from memory and other to feed instructions to pipeline.

 

Loop Buffer:

The buffers stores sequential statements written in small loops.Loop buffers are maintained by fetch stage of pipeline.Prefetched instructions in loop body will

be executed until all iterations are complete execution.It executes in two steps:

 

1.First step contains the pointer of the instruction just ahead of current instruction.This saves instruction fetch time from memory.

2.It recognizes when the target of a branch falls within the loop boundary.

 

Multiple Functional units:

Sometimes a certain pipeline stage becomes bottleneck.This corresponds to maximum checkmark in reservation table.This problem is solved by using multiple copies

of that state.This leads to use of multiple functional units.

In order to resolve data or resource dependences among successive instructions entering into pipeline reservation table is used with each functional unit.

Operations wait in table until their dependences are resolved.This removes bottleneck in pipeline.

 

Internal Data Forwarding:

Why to do  : The throughput of pipelined processor can be improved with internal data forwarding between functional units.Moreover some memory operations can be replaced by register transfer operations.

 

Types:

 

1.Store Load forwarding

2.Load Load forwarding

3.Store Store forwarding

 

Store Load forwarding:

In this load operation (LD,R2,M) from memory to register can be replaced by move instruction (MOVE,R2,R1) from register R1 to register R2 s,since register transfers are faster.

 

 

By Siddharth Saluja and Varun Jain

 

 

 

 

 

 

Wednesday 22 January 2014

PIPELINING

PIPELINING     

 

A pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion ,in that case, some amount of buffer storage is often inserted between elements.

Example of pipeling :

Consider the assembly of a car: assume that certain steps in the assembly line are to install the engine, install the hood, and install the wheels (in that order, with arbitrary interstitial steps). A car on the assembly line can have only one of the three steps done at once. After the car has its engine installed, it moves on to having its hood installed, leaving the engine installation facilities available for the next car. The first car then moves on to wheel installation, the second car to hood installation, and a third car begins to have its engine installed. If engine installation takes 20 minutes, hood installation takes 5 minutes, and wheel installation takes 10 minutes, then finishing all three cars when only one car can be assembled at once would take 105 minutes. On the other hand, using the assembly line, the total time to complete all three is 75 minutes. At this point, additional cars will come off the assembly line at 20 minute increments.

There are two types of pipelining :-

1.   Linear pipelining

  In this, processing stages are placed in cascading manner  ,such that each stage is executed one after the other  i.e   a stage is executed only when previous stages are being executed , to perform a fixed function over a stream of data.

They are static pipelines because they are used to nperform fixed functions.

Linear pipelines are applied for instruction execution, arithmeticn computation and memory access operations.

 

It is further divided into two categories :

 

è Synchronous linear pipeline

In this, every stage transfers data to the next stage at same point of time(i.e at same clock pulse) using latches and flip-flops.The pipelining stages are combinational logic circuits . so, it is desired to have approximately equal delay in every stage.

è Asynchronous linear pipeline

In this ,it is not necessary to have transfer of data in every stage, at the same point of time.Data flow between adjacent stages is controlled by handshaking protocol.

Used in MPI (message passing interface).

 

Reservation table  :

The reservation table mainly displays the time space flow of data through the pipeline for a function. Different functions in a reservation table follow different paths.

The number of columns in a reservation table specifies the evaluation time of a given function.

Determination of clock cycle 'Ƭ' of a pipeline :

 

         Ƭ = max (Ƭi)ᴷ₁+d = Ƭm + d

 OR  Ƭ = m  if (Ƭm >> d)

Where, Ƭm= maximum stage delay

Ƭi =time delay of stage Si

d = time delay of a latch

        Pipeline frequency, f = 1⁄ Ƭ

Suppose if we are not using pipeling , we have 'n' number of tasks and 'k' number of stages

Then, required clock cycles = nk

Total time required, T₁ = nk * Ƭ

By using pipelining,

Required clock cycles = k+(n-1)

Total time required, Tk = [k +(n-1)]* Ƭ

Speed up factor, Sk = Ƭ₁ / Ƭk

                                  =nk /[k+(n-1)] 

Efficiency, Ek = Sk/k = nk/k[k+(n-1)]

                       = n/k+(n-1)

Throughput, Hk    = number of tasks / total time

                               = n/[k+(n-1)]Ƭ

The larger the number 'k' of subdivided pipelined stages,the higher the potential speedup performance.

'k' cannot increase indefinitely due to practical constraints on cost,control complexity and circuitary.

  

   PCR(performance to cost ratio)= f/(C+kh)

                                                                    (where C =cost of all logic states, h=cost of each latch,k=no of stages)

 Using the PCR , the optimal number of stages can be found out.

 

      No of optimal stages , =  sqrt(t*C/(d*h))

                                              (where t = total flow through delay, d=latch delay)

                              

 

 

 

 

 

2.Non linear/Dynamic Pipelining

  A non-linear pipeline (also called dynamic pipeline) can be configured to perform various functions at different times. In a dynamic pipeline there is also feed forward or feedback connection.

 

 

  Reservation Table/State Time diagram

  The reservation table in non linear pipelining are intersterting as they don't follow a linear pattern. It dsplay time-space flow of data for evaluation of one or more functions. The checkmarks in each row correspond to time instants that particular stage will be used.

There may be multiple checkmarks in a row indicating repeated usage of same stage in different cycles.

                                        

STATE-TIME DIAGRAM

 

Let the sequence of stages in a task be S1->S2->S3->S2->S3->S1->S3->S1, then its state-time diagram would be:

 

 



1

2

3

4

5

6

7

8

S1

 -------







 

 -------



 -------

S2



 -------



 -------







S3





 -------



 -------



 -------





------ INDICATES THE ACTIVE STAGE IN A GIVEN CLOCK CYCLE

 

 

 

If two or more processes attempt to use same pipeline stage at the same time, Collision/Resource Conflict occurs.

To resolve the collisions,some scheduling is done.

 

Latency

Latency is defined as no of clock cycles between two initiations of a pipeline or the no of clock cycles after which the next task needs to be started.

Latencies that causes collisions are called forbidden latencies.

Forbidden latencies are detected by checking the distance between any 2 checkmarks in the same row of the reservation table.

 

For eg.

In the above table the forbidden latencies are 2,4,5,7.

 

Let m be the max forbidden latency

      p be permissible latency values (values where collisions don't occur)

&  n be the total number of clock pulses /column number in reservation table ,

Then

                m<=n-1

and        1<=p<=m-1.

 

Collision vector

A collision vector is a m-bit binary vector C . The value of Ci=1 if i causes a collision and 0 if latency I is permissible.

 

For example in the above reservation table the forbidden latency are 2,4,5,7.

So the collision vector will be 1011010.

 

Latency Squence

Latency sequence is a sequence of permissible non forbidden latencies between successive task initiations.

 

Latency cycle

Latency cycle is a latency sequence which repeats the same subsequence indefinitely.

For example , one of the latency cycle in the above example could be 2,5,2,5,2,5,…,

This implies successive intiaitions of new task  are by 2 and 5 cycles alternately.

 

Average latency

Avg latency is obtained by dividing the sum of all latencies by no of latencies in the cycle.

The avg latency is cycle 2,5,2,5,2,5.. would be (2+5)/2 = 3.5.

 

Constant cycle

Constant cycle is a latency cycle which contains only one latency value.

 

State Diagrams

State diagrams specify the permissible state transitions among successive initiations.

The next state is obtained by ORing the initial state and the latency time right shifted state.

 

 

 

NOTE :

 

Supercomputers in India

 

According to the latest news India's PARAM is listed among world's most power efficient supercomputer

 

Development of Advanced Computing (C-DAC) said its super computer--PARAM Yuva II was ranked on the first place inIndia in the Green500 List of Supercomputers in the World. PARAM YUVA II has been ranked number 9 in the Asia PacificRegion and stands at the 44th place in the world among the most power efficient systems as per the list that was announced on November 20, at the Supercomputing Conference (SC'2013) in Denver, Colorado, USA.

 

C-DAC is the 2nd organisation in the world to have carried the level-3 measurement of power as compared to performance for Green500 list, which is an indication of the most rigorous level of measurement exercise performed for such ranking.

 

Information about Supercomputers In India

 

Top Supercomputers-India is an effort to list the most powerful supercomputers in India. It is supported by Supercomputer Education and Research Centre, Indian Institute of Science , Bangalore. 

The project is meant to create and promote healthy competition among the supercomputing initiatives in India and can substantially lead to significant supercomputing advancement in the nation.
It list the top 500 supercomputer in india with regular updations.
Below is the link

http://topsupercomputers-india.iisc.ernet.in/

 

 



By Shefali Singh and Upasana Mehta

Wednesday 15 January 2014

INTERESTING FACTS ABOUT SUPERCOMPUTERS


**FASTEST SUPERCOMPUTER OF THE WORLD**:

TIANHE-2,CHINESE SUPERCOMPUTER


A Chinese university has built the world's fastest supercomputer, almost doubling the speed of the U.S. machine that previously claimed the top spot and underlining China's rise as a science and technology powerhouse.

 

FEATURES

 

->Developed by the National University of Defense Technology in central China's Changsha city. It is capable of sustained computing of 33.86 petaflops per second.

   That's the equivalent of 33,860 trillion calculations per second

 

->The Tianhe-2, which means Milky Way-2, knocks the U.S. Energy Department's Titan machine off the No. 1 spot. It achieved 17.59 petaflops per second.

 

->SOLVES COMPLEX PROBLEMS


 Examples:

    a.modeling weather systems

     b.simulating nuclear explosions

     c.designing jetliners

     d.government security applications      (wow........)

 

->INTEL IS EVERGREEN

Uses Intel for the main computing part.

"Most of the features of the system were developed in China, and they are only using Intel for the main compute part," TOP500 editor Jack Dongarra, who toured the Tianhe-2 facility in May, said in a news release. "That is, the interconnect, operating system, front-end processors and software are mainly Chinese."

 

->CHINA- A SUPERCOMPUTING POWER

This computer has made China a recognized supercomputing power leaving everybody behind..

 

->HUGE EFFORT

 It was developed by a team of 1300 scientists and engineers("such a huge effort").

 

 

IMP FACT

 

 In computing, FLOPS (for FLoating-point Operations Per Second) is a measure of computer performance, useful in fields of scientific calculations that make heavy use of floating-point calculations. For such cases it is a more accurate measure than the generic instructions per second.

 

 HARDWARE

 

1.    PROCESSORS

With 16,000 computer nodes, each comprising two Intel Ivy Bridge Xeon processors and three Xeon Phi chips, it represents the world's largest

  installation of Ivy Bridge and Xeon Phi chips,counting a total of 3,120,000 cores.

 

2.    MEMORY

Each of the 16,000 nodes possess 88 gigabytes of memory.

The total CPU plus coprocessor memory is 1,375 TiB.

 

3.    POWER

 The system itself would draw 17.6 megawatts of power and including external cooling,

   the system would draw an aggregate of 24 megawatts.

 

4.    SPACE

 The computer complex would occupy 720 square meters of space.

 

ARE THE TOOLS WHICH MEASURE THE PERFORMANCE OF SUPERCOMPUTERS FULLY RELIABLE?

 

A team, led by a professor from Germany's University of Mannheim, compiles the Top500 list bi-yearly and the latest list of five fastest supercomputers remaind unchanged compared to the list released in June.

 

Per the Linpack benchmark test, Intel-powered Tianhe-2 is able to operate at 33.86 petaflop/sec, which is equivalent to 33,863 trillion calculations per second. Its closest competitors were Cray Inc's Titan with 17.59 petaflop/sec and IBM's Sequoia with 17.17 petaflop/sec.

 

The only change near the top was Switzerland's new Piz Daint supercomputer, which made it to the sixth spot with 6.27 petaflop/sec.

 

The Linpack benchmark test measures how quickly computers can crack a special type of linear equation to determine its speed. However, the benchmark does not take into consideration factors like the speed with which data can be transferred from one area of the system to another. This factor can influence the real world performance of the device.

 

"A very simple benchmark, like the Linpack, cannot reflect the reality of how many real application perform on today's complex computer systems," said Erich Strohmaier. More representative benchmarks have to be much more complex in their coding, their execution and how many aspects of their performance need to be recorded and published. This makes understanding their behaviour more difficult."

IBM created five out of the 10 fastest supercomputers and its head of the computational sciences department at IBM's Zurich research lab, Dr Alessandro Curioni, said that the manner in which the list was calculated needed to be updated. He would voice the same concern at a conference in Denver, Colorado, which will be held this week.

 

"The Top500 has been a very useful tool in the past decades to try to have a single number that could be used to measure the performance and the evolution of high-performance computing," notes Dr Curioni,. "[But] today we need a more practical measurement that reflects the real use of these supercomputers based on their most important applications."

 

Tianhe-2 has been developed by China's National University of Defence Technology (NUDT) and has been set up in National Super Computer Center in Guangzhou.

DO WATCH THIS VIDEO AND ENTER THE WORLD OF SUPERCOMPUTERS :

 http://www.youtube.com/watch?v=9ebPoYbCz-U

 

ARE PURPOSE BUILT SUPERCOMPUTERS ENERGY SUSTAINABLE?

Read This

http://www.vertatique.com/are-purpose-built-supercomputers-more-sustainable

 

BY SUNCHIT DUDEJA AND UJJWAL RELAN