The need for Parallel Programming

This is the technical report I wrote in my Technical writing class in 2009 at Louisiana Tech. I was going through my old blogs and found this so, thought i would migrate it here.


In this report I have examined how Multi-core computers are emerging as a new wave of technology and how implementation of parallel running applications can better utilize the multi-core resources. I have researched that traditional ways of programming and generating software have failed to fully utilize the enormous resources of today’s multi-core computers. 

The Multi-core Architecture:

Multi-core processors are formed as a result of combining two or more solo cores into a unit comprising a single integrated circuit called a die. Multi-core processors came into being since improving performance by increasing the clock speed had reached its physical limitations. Also, the cost of producing a single powerful core proved to be really expensive. As Marc Tremblay, chief architect of Sun Microsystems’ Scalable Systems Group points out, “We could build a slightly faster chip, but it would cost twice the die area while gaining only a 20 percent speed increase”( Geer 11).

Current transistor technology prohibits any more increase in the clock speed. This problem is highlighted when Geer states, “as a transistor gets smaller, the gate, which switches the electricity on and off, gets thinner and less able to block the flow of electrons. Thus, small transistors tend to use electricity all the time, even when they aren’t switching. This wastes power”. Consequently, manufacturers have started buildings chips with multiple cores each having dedicated coolers which are energy efficient.


The following diagram is based on Intel tests and it shows that the multi-core chips perform better than single core ones. The benchmarks achieved by the multi-core computers rise exponentially whereas those of the single core follow a more linear graph.

(Adapted from: “ Chip Makers Turn to Multi-core Processors)

Utilizing the Multi-core architectures

The traditional way of programming will not efficiently take advantage of multi-core systems.

In order to fully exploit these multi-core machines, organizations need to redesign applications so that the processors can treat them as multiple threads of execution. Programmers need to hunt for optimum spots in their codes to insert the parallel code, divide the work approximately into equal parts that can be run simultaneously and associate the precise times for the communication of the threads. Redesigning applications to implement recognition of the core speed of one core by another core in the die must also be taken into grave consideration. As Jones points out, “While that next-generation chip will have more CPUs, each individual CPU will be no faster than the previous year’s model. If we want our programs to run faster, we must learn to write parallel programs” (Beautiful Concurrency). Therefore,software developers must take steps to modify the traditional way of writing programs to make way for the implementation of concurrency.

The Traditional Approach

The performance of traditional software depends on three main areas of the system :

  • Clock Speed

  • Optimization of the Execution Flow

  • Cache

Clock Speed:

Clock speed basically means the speed of the computer. However, the practice of increasing the clock speeds of computers to increase their overall efficiency is slowly becoming an antiquated concept. Clock speeds have hit a wall in recent years and the highest clock speed one may encounter nowadays in a computer is about 3.73 GHz. ncreasing the clock speed beyond 3.73 GHz has serious consequences such as “increased heat production, high power consumption and current leakage problems” (Sutter).


The following graph showcases Intel Company’s Trend in increasing the clock speed of its computers through time. Towards the year 2003, a sharp bend is observed in the rising of clock speeds.

(Adapted from “ The Free Lunch is Over: A Fundamental Turn Towards Concurrency”)


As Geer points out, “Chip performance increased 60 percent per year in the 1990s but slowed to 40 percent per year from 2000 to 2004, when performance increased by only 20 percent”.At the moment 3.73 GHz is the highest clock speed in a computer. It is not feasible to go higher as it gives rise to several physical issues already mentioned. Consequently, processor producing companies like Intel and AMD have swayed from the traditional path of increasing clock speeds and adopted a much better alternative – Multi-core computer architectures.

Optimization of the Execution Flow:

Execution Flow basically relates to the amount of work that is done in a particular CPU cycle. Therefore, it is directly proportional to the clock speed of a computer. Therefore, in order to get more out of each CPU cycle, many optimizations are conducted, “ including pipelining, branch prediction,

executing multiple instructions in the same clock cycle, and even reordering the instruction stream for out-of-order execution. These techniques are all designed to make the instructions flow better and/or execute faster, and to squeeze the most work out of each clock cycle by reducing latency and maximizing the work accomplished per clock cycle”. (Sutter)

However, in order to combat the saturation of the clock speed, today’s chip designers are aggressively pursuing optimizations to get more work done in each CPU cycle. Unfortunately, these optimizations have the potential to change the overall semantics of the source code of one’s software. Moreover, with the dawn of multi-core processors, chip designers have started to engineer these optimizations in favor of parallel computing and concurrent programming. Thus, the traditional approach to software development is on the verge of a revolution. Today’s Systems will prove very beneficial if it adopts concurrent programming paradigm from the very start.


Cache Memory is the memory designed to instantiate faster access of data which are frequently used. These frequently accessed data are stored elsewhere and are expensive to read. So, their duplicates are stored in the Cache Memory where the cost of reading is comparatively low. Traditional software enjoyed faster performance when the Cache was increased. It is the only performance enhancing procedure that will continue in the near future.

An important thing to witness is that all of the aforementioned performance enhancing techniques are agnostic of concurrency. As computers all around the world start to adopt the multi-core architecture, concurrency will be definitely be the norm of computing.


In order to ensure top notch performance of applications on these new processors, many applications will have to be written or rewritten following the rules and regulations of parallel programming. Development of parallel applications can be arduous and it expects fresh programming

skills in potential developers. Organizations all over are facing a challenge trying to meet the new demands of the multi-core software transition. Concurrent programming, specially Open Multi-Processing for multi-core processors, will prove to be a solution to these new and upcoming challenges (Leiserson IV).

Concurrent Programming

Concurrent Programming utilizes two or more processors by making them cooperative in performing a task. It is not a new way of programming. However, majority of programmers today hesitate to use it as it implicates changes in the way programmers develop their software. Also,it is a bit harder to understand. As Hasselbring puts it, “Concurrent programming is conceptually harder to undertake and to understand than sequential programming, because a programmer has to manage the coexistence and coordination of multiple concurrent activities”.The very semantics of one’s code has to be compromised when concurrent programming is involved. Nevertheless, it has limitless advantages in parallel computing and nowadays when the demand for multi-core computers are rapidly growing in the market, concurrent programming is definitely on the highway to popularity. If a software adopts the strategies of concurrent programming, it will be able to take advantage of the benefits of today’s modern multi-core computers. It true when Sutter says, “ Applications will increasingly need to be concurrent if they want to fully exploit continuing exponential CPU throughput gains”.

Most of today’s softwares have long been using sequential programming concepts which uses a single thread of program execution in developing its applications. Sequential programming proved beneficial when computers had single core architectures but now when multi-core computers have hit the arena it proves to be inefficient. Concurrent programming fully exploits today’s multi-cores by increasing the software’s throughput, the number of tasks completed in a certain time. If we adopt concurrent programming paradigms such as OpenMP, it will prove beneficial as its consumers will experience smoothers and faster execution of their software without having to sacrifice a huge portion of their valuable memory.

Open Specifications for Multi Processing (OpenMP)

OpenMP is an open-source application programming interface that supports multi-platform shared memory multi-processing through multi-threaded programming using Fortran and C/C++ language compiler directives. Lawrence Livermore National Laboratory defines OpenMP as, “an Application Program Interface (API), jointly defined by a group of major computer hardware and software vendors. OpenMP provides a portable, scalable model for developers of shared memory parallel applications”(Barney). Compilers, programs that transform the source code written in a programming language to a language that the machine understands,for OpenMP are provided by several companies such as Hewlett-Packard, Portland Group,Microsoft, Intel,IBM and Sun Microsystems. It is also, supported by the GNU gcc compiler. The programmer identifies the parallel sections of the code by inserting pragmas, compiler directives , into the code (Leiserson and Mirman).

How OpenMP will help Developers

Applications nowadays need to take advantage of the multi-cores on the processor chips in today’s modern computer. If they do otherwise, the application will not be able to operate any faster.

This is where OpenMP takes the center stage.OpenMP assists developers to design multithreaded, applications quickly. Multithreaded applications are those applications that exploit the multi-core computers to its full capabilities. The threads in a multithreaded applications are somewhat like copies of their parent process, a running program, and they share resources and global variables. OpenMP

has many advantages such as:

  • It lets the programmer decide which part of the code he wants to run concurrently.

  • It creates and manages the threads for the programmer.

  • Its compiler are available from a lager number of companies such as Intel and IBM to name a few.

  • It provides a set of pragmas, runtime routines, and environment variables that programmers can use to specify shared-memory parallelism in Fortran, C, and C++ programs.” (Copty)

  • OpenMP can run the code as a serial code.

  • OpenMP is easier to program than other parallel programming languages such as MPI(Message Passing Interface)

How does OpenMP work?

Before OpenMP can be incorporated into one’s source code, a simple extension has to be added to the Fortran/C/C++ compiler. After the addition of the extension is complete OpenMP helps an ordinary program achieve parallelism just by addition of a few specifics codes that entails the boundary of the parallel sections of the program. OpenMP utilizes a list of compiler directions called # pragmas which dictates how the program will work. These pragmas are implemented in a way such that even if it the code is run in a compiler oblivious to OpenMP, it will still compile and run but without any parallelism.

Thus, by giving the developers an easy start, OpenMP manages to make learning parallel computing and concurrency easier for naïve programmers who are oblivious to the programming paradigms of concurrency.


 Software writing is at the brink of concurrency revolution. More and more CPU producing companies have started switching from the traditional single core processors to the new multi-core processors to better enhance performance of their systems. As pointed out in this report, traditional software applications need to go through semantic changes if it is to fully reap the advantages of the multi-core architectures of the new era. New software programs must be written following the guidelines of concurrency compatible interfaces and older programs must also be re-written to enhance performance. One such interface which provides an easy start up for programmers to learn how to write parallel programs is OpenMP. If more people adopt the usage of OpenMP or other parallel programming concepts, they could benefit from improved performance of the software in multi-core processors which are already very popular in the market. This could without a doubt increase the sale of our softwares substantially, paving the way for a mega profit margin.



Blaise, Barney. “OpenMP.” January 2009. Lawrence Livermore National

Laboratory. 18 Feb 2009 <>.

Copty, Nawal. “Introducing OpenMP: A Portable, Parallel Programming API for Shared Memory

Multiprocessors.” January 2007. Sun Microsystems. 18 Feb 2009


Geer, David. “Chip Makers Turn to Multicore Processors.” May 2005. 18

Feb 2009 <>.

Hasselbring, Wilhelm. “Programming Languages and Systems for Prototyping Concurrent

Applications.” http://se.informatik.uni. 2004. 18 Feb 2009 <http://se.informatik.uni->.

Jones, Simon Peyton. “Beautiful COncurreny.” May 2007. Microsoft

Research. 18 Feb 2009 <>.

Leiserson, Charles E.. “How to Survive the Multicore Software Revolution.” May

2008. Cilk. 18 Feb 2009 <>.

Sutter, Herb. “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software.” http:// March 2005. 18 Feb 2009